reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Authors: Xingshuai Huang, Di Wu, Benoit Boulet

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA s effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.
Researcher Affiliation	Academia	Xingshuai Huang EMAIL Department of Electrical and Computer Engineering Mc Gill University Di Wu EMAIL Department of Electrical and Computer Engineering Mc Gill University Benoit Boulet EMAIL Department of Electrical and Computer Engineering Mc Gill University
Pseudocode	Yes	Algorithm 1 outlines the overall process of our GODA method.
Open Source Code	No	The paper provides GitHub links and citations for baseline methods (TATU, Synth ER, Diff Stitch, CORL codebase for IQL/TD3+BC, and Data Light implementation) but does not include an explicit statement or link for the source code of GODA itself.
Open Datasets	Yes	We adopt three popular Mujoco locomotion tasks from Gym 1, i.e., Half Cheetah, Hopper, and Walker2D, and a navigation task, i.e., Maze2D (Fu et al., 2020), as well as more complex tasks, specifically the Pen and Door tasks from the Adroit benchmark (Rajeswaran et al., 2017; Fu et al., 2020). For locomotion tasks, we adopt four data quality levels: Random, Medium-Replay, Medium, and Medium-Expert. For Maze2D, three datasets collected from different maze layouts are adopted, i.e., Umaze, Medium, and Large. For the Adroit benchmark, we use two different datasets: Human and Cloned.
Dataset Splits	No	The paper describes generating specific quantities of samples for augmentation (e.g., "We generate a total of 24K samples for each dataset... Additionally, we augment 20K samples for each task"), and mentions evaluating policies on D4RL tasks. However, it does not provide explicit training/validation/test splits, percentages, or methodology for splitting the datasets (either original or augmented) for reproduction of the policy training and evaluation phases.
Hardware Specification	Yes	All training and sampling are conducted on an AMD Ryzen 7 7700X 8-Core Processor and a single NVIDIA GeForce RTX 4080 GPU.
Software Dependencies	No	The paper cites several algorithms and techniques (e.g., SiLU, Random Fourier Feature embedding, Sinusoidal positional embedding, EDM, Heun's 2nd order ODE solver) and uses implementations for baselines (CORL codebase, Data Light), but it does not specify exact version numbers for programming languages, libraries, or frameworks used for the implementation of GODA itself (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The denoising neural network utilizes the adaptive gated conditioning architecture, as shown in Figure ref fig:gac. Table 7 details the associated hyperparameters. We use Random Fourier Feature embedding (Rahimi & Recht, 2007) and Sinusoidal positional embedding (Vaswani et al., 2017) to process the noise level and timestep of each RTG, respectively, with an embedding dimension of 128. The width of the linear layers in the MLP block is set to 512, with Si LU (Elfwing et al., 2018) as the activation function. The total number of trainable parameters for the denoiser neural network is approximately 3.3 M. We train our GODA model with 100K steps of gradient updates, with a batch size of 256 and a learning rate of 0.0003.