reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning

Authors: Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Tan, Hamid R. Rabiee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical analysis is complemented by comprehensive empirical evaluations in both off-policy learning and evaluation scenarios, confirming the practical advantages of our approach.
Researcher Affiliation	Academia	1Department of Computer Engineering, Sharif University of Technology 2The Alan Turing Institute, London, UK 3Department of Computer Science, Stony Brook University 4Department of Statistic, London School of Economics 5Department of Electrical and Computer Engineering, National University of Singapore.
Pseudocode	No	The paper describes the Log-Sum-Exponential Estimator and its theoretical foundations and experiments, but does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code for our estimator is available at the following link: https: //github.com/armin-behnamnia/ lse-offpolicy-learning .
Open Datasets	Yes	Datasets: In off-policy learning scenario, we apply the standard supervised to bandit transformation (Beygelzimer & Langford, 2009) on a classification dataset: Extended MNIST (EMNIST) (Xiao et al., 2017) to generate the LBF dataset. We also run on FMNIST in App.G.2 .", "We applied our method to the Kuairec, a public real-world recommendation system dataset ((Gao et al., 2022)).", "We evaluate our method s performance in OPE by conducting experiments on 5 UCI classification datasets, as explained in Table 32,"
Dataset Splits	Yes	For LSE and ES, we use 0.2 of the dataset as a validation set to find the hyperparameter with the lowest MSE by grid search and evaluate the method on the remaining 0.8 of the dataset.
Hardware Specification	Yes	Computational resources: We have taken all our experiments using 3 servers, one with a nvidia 1080 Ti and one with two nvidia GTX 4090, and one with three nvidia 2070-Super GPUs.
Software Dependencies	No	The code for this study is written in Python. We use Pytorch for the training of our model. The supplementary material includes a zip file named rl_without_reward.zip with the following files: ... requirements.txt contains the Python libraries required to reproduce our results.
Experiment Setup	Yes	We use mini-batch SGD as an optimizer for all experiments. The learning rate used for EMNIST and FMNIST datasets is 0.001. Furthermore, we use early stopping in our training phase and the maximum number of epochs is 300." and "Table 8: Hyperparameter of Different Estimators"