Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
Authors: Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Tan, Hamid R. Rabiee
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical analysis is complemented by comprehensive empirical evaluations in both off-policy learning and evaluation scenarios, confirming the practical advantages of our approach. |
| Researcher Affiliation | Academia | 1Department of Computer Engineering, Sharif University of Technology 2The Alan Turing Institute, London, UK 3Department of Computer Science, Stony Brook University 4Department of Statistic, London School of Economics 5Department of Electrical and Computer Engineering, National University of Singapore. |
| Pseudocode | No | The paper describes the Log-Sum-Exponential Estimator and its theoretical foundations and experiments, but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our estimator is available at the following link: https: //github.com/armin-behnamnia/ lse-offpolicy-learning . |
| Open Datasets | Yes | Datasets: In off-policy learning scenario, we apply the standard supervised to bandit transformation (Beygelzimer & Langford, 2009) on a classification dataset: Extended MNIST (EMNIST) (Xiao et al., 2017) to generate the LBF dataset. We also run on FMNIST in App.G.2 .", "We applied our method to the Kuairec, a public real-world recommendation system dataset ((Gao et al., 2022)).", "We evaluate our method s performance in OPE by conducting experiments on 5 UCI classification datasets, as explained in Table 32," |
| Dataset Splits | Yes | For LSE and ES, we use 0.2 of the dataset as a validation set to find the hyperparameter with the lowest MSE by grid search and evaluate the method on the remaining 0.8 of the dataset. |
| Hardware Specification | Yes | Computational resources: We have taken all our experiments using 3 servers, one with a nvidia 1080 Ti and one with two nvidia GTX 4090, and one with three nvidia 2070-Super GPUs. |
| Software Dependencies | No | The code for this study is written in Python. We use Pytorch for the training of our model. The supplementary material includes a zip file named rl_without_reward.zip with the following files: ... requirements.txt contains the Python libraries required to reproduce our results. |
| Experiment Setup | Yes | We use mini-batch SGD as an optimizer for all experiments. The learning rate used for EMNIST and FMNIST datasets is 0.001. Furthermore, we use early stopping in our training phase and the maximum number of epochs is 300." and "Table 8: Hyperparameter of Different Estimators" |