reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Fully-Observable Solutions for Improved Partially-Observable Offline Reinforcement Learning

Authors: Chulabhaya Wijesundara, Andrea Baisero, Gregory David Castanon, Alan S Carlin, Robert Platt, Christopher Amato

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation on a wide variety of partially-observable challenges demonstrates that CO-CQL is able to exploit the guidance of fully-observable experts to outperform other state-of-the-art offline algorithms. We perform two types of evaluations: an analysis of learning performance comparing CO-CQL to other baselines, and an analysis on how robust CO-CQL is to the quality of the training dataset.
Researcher Affiliation	Collaboration	Chulabhaya Wijesundara EMAIL Khoury College of Computer Sciences Northeastern University STR Andrea Baisero EMAIL Khoury College of Computer Sciences Northeastern University Gregory Castañón EMAIL STR Alan Carlin EMAIL STR Robert Platt EMAIL Khoury College of Computer Sciences Northeastern University Christopher Amato EMAIL Khoury College of Computer Sciences Northeastern University
Pseudocode	No	The paper describes the proposed algorithm CO-CQL using mathematical formulations in Section 4.2 and equations (7), (8), (9), (10), (11), (12), but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states, 'All datasets will be made publicly available upon publication.' in Appendix A.6, but there is no explicit statement or link indicating that the source code for the methodology described in the paper is openly available.
Open Datasets	Yes	We employ modified versions of Half Cheetah and Lunar Lander (Brockman et al., 2016). In addition to the above, we perform an analysis on dataset robustness for CO-CQL based on modified versions of Cart Pole (Brockman et al., 2016) and simplified Heaven Hell (Blai & Geffner, 1998). Grid Verse (Baisero et al., 2021) is a framework for customizable 2D grid world control problems that allows for both full state observability and partial observability. All datasets will be made publicly available upon publication.
Dataset Splits	No	The composition and size of each dataset varies per environment, as shown in Table 2. For each control problem, we create datasets made up from partially-observable expert and random demonstrations in relative ratios of 100 %, 50 %, and 0 % respectively. The paper describes the composition of the training datasets but does not explicitly provide details about training, validation, or test splits for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud computing resources) used for running the experiments.
Software Dependencies	No	The paper mentions several algorithms and frameworks such as DQN, SAC (Haarnoja et al., 2018), TD3+BC (Fujimoto & Gu, 2021), CQL (Kumar et al., 2020), IQL (Kostrikov et al., 2021), and PPO (Schulman et al., 2017) and environment platforms like OpenAI Gym (Brockman et al., 2016) and Grid Verse (Baisero et al., 2021), but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	Appendix A.7 and Tables 3-7 provide detailed hyperparameters for CO-CQL and other baselines, including Discount γ, Batch Size, History Length, Actor Learning Rate, Critic Learning Rate, Actor Update Frequency, Target Network Update Frequency, Target Entropy Scaling, IQL τ, IQL β, TD3+BC α, Exploration noise, Policy noise, and Noise clip.