reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Doubly Optimal Policy Evaluation for Reinforcement Learning

Authors: Shuze Liu, Claire Chen, Shangtong Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, compared with previous works, we show our method reduces variance substantially and achieves superior empirical performance.
Researcher Affiliation	Academia	Shuze Daniel Liu Department of Computer Science University of Virginia EMAIL Claire Chen School of Arts and Science University of Virginia EMAIL Shangtong Zhang Department of Computer Science University of Virginia EMAIL
Pseudocode	Yes	Algorithm 1: Doubly Optimal (DOpt) Policy Evaluation
Open Source Code	No	The paper discusses the source code of a third-party tool or platform that the authors used ('We use the the default PPO implementation in Huang et al. (2022)'), but does not provide their own implementation code for the methodology described in this paper.
Open Datasets	Yes	Mu Jo Co: We also conduct experiments in Mu Jo Co robot simulation tasks (Todorov et al., 2012).
Dataset Splits	No	To learn functions qπ,t and uπ,t, we split the offline data into a training set and a test set. (No specific percentages or sample counts for the splits are provided.)
Hardware Specification	No	The paper does not mention any specific hardware specifications like CPU or GPU models used for running the experiments.
Software Dependencies	No	The paper mentions several algorithms and tools (Fitted Q-Evaluation, PPO, Adam optimizer, Gymnasium) and refers to an implementation from another paper (Huang et al., 2022), but it does not provide specific version numbers for any libraries, programming languages, or other ancillary software components used for their experiments.
Experiment Setup	Yes	We choose a one-hidden-layer neural network and test the neural network size with [64, 128, 256] and choose 64 as the final size. We test the learning rate for Adam optimizer with [1e-5, 1e-4, 1e-3, 1e-2] and choose to use the default learning rate 1e-3 as learning rate for Adam optimizer (Kingma and Ba, 2015).