reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model-Free Offline Reinforcement Learning with Enhanced Robustness

Authors: Chi Zhang, Zain Ulabedeen Farhat, George Atia, Yue Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments further demonstrate that our approach significantly improves robustness in a more scalable manner than existing methods. We conduct extensive numerical experiments to demonstrate the improvements in robustness achieved by our algorithms in both simulated environments (Archibald et al., 1995) and real physics-based Classic Control problems (Brockman et al., 2016). In each case, our algorithm consistently outperforms existing methods in handling model uncertainty, showcasing its enhanced ability to maintain stable performance across a wide range of environmental perturbations.
Researcher Affiliation	Academia	Chi Zhang1, Zain Ulabedeen Farhat1, George K. Atia1,2, Yue Wang1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Central Florida Orlando, FL 32816, USA EMAIL
Pseudocode	Yes	Algorithm 1 Double-Pessimism Q-Learning for finite-horizon RMDPs. ... Algorithm 3 Double-Pessimism Q-Learning for infinite-horizon RMDPs. ... Algorithm 2 Double-Pessimism Q-Learning for infinite-horizon RMDPs with χ2-divergence uncertainty set.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We first evaluate the performance of our algorithm on the Garnet problem (Archibald et al., 1995), a randomly generated MDP G(a, b, c) with a states, b actions, and c branches (see Appendix A for a more detailed description). ... To further demonstrate the improvements in both scalability and robustness offered by our approach, we consider more complex Classic Control tasks from Open AI Gym (Brockman et al., 2016), specifically Mountain Car and Cart Pole (results are shown in Figure 4 in Appendix).
Dataset Splits	No	The paper describes how datasets are generated (e.g., '10 datasets are generated at each dataset size from T = 1000 to T = 20000') and how policies are evaluated in perturbed environments, but it does not specify explicit training/test/validation splits for dataset evaluation in the traditional sense.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Open AI Gym (Brockman et al., 2016)', 'Conservative Q-learning (CQL, (Kumar et al., 2020)) and Implicit Q-learning (IQL, (Kostrikov et al., 2021))', but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We set γ = 0.95, Cb = 1 x 10-4 and δ = 0.02. ... The uncertainty set is constructed using the lα-norm, with the radius Rs,a ∈ [0.1, 0.5]. ... The randomness (i.e., optimality) of the behavior policy is controlled via temperature parameter tb = 1. State-action pairs with probabilities Ps,a ≥ 0.03 (for G(20, 30, 20)), Ps,a ≥ 0.02 (for G(30, 50, 30)) and Ps,a ≥ 0.01 (for G(50, 100, 50)) are then excluded to achieve partial coverage. ... In our experiments, we set γ = 0.95, Cb = 1 x 10-4 and δ = 0.02. After a policy is learned, we test its performance under a perturbed environment with the parameter randomly generated from [-τ, τ] for 800 times.