Goal-Conditioned Data Augmentation for Offline Reinforcement Learning
Authors: Xingshuai Huang, Di Wu, Benoit Boulet
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA s effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms. |
| Researcher Affiliation | Academia | Xingshuai Huang EMAIL Department of Electrical and Computer Engineering Mc Gill University Di Wu EMAIL Department of Electrical and Computer Engineering Mc Gill University Benoit Boulet EMAIL Department of Electrical and Computer Engineering Mc Gill University |
| Pseudocode | Yes | Algorithm 1 outlines the overall process of our GODA method. |
| Open Source Code | No | The paper provides GitHub links and citations for baseline methods (TATU, Synth ER, Diff Stitch, CORL codebase for IQL/TD3+BC, and Data Light implementation) but does not include an explicit statement or link for the source code of GODA itself. |
| Open Datasets | Yes | We adopt three popular Mujoco locomotion tasks from Gym 1, i.e., Half Cheetah, Hopper, and Walker2D, and a navigation task, i.e., Maze2D (Fu et al., 2020), as well as more complex tasks, specifically the Pen and Door tasks from the Adroit benchmark (Rajeswaran et al., 2017; Fu et al., 2020). For locomotion tasks, we adopt four data quality levels: Random, Medium-Replay, Medium, and Medium-Expert. For Maze2D, three datasets collected from different maze layouts are adopted, i.e., Umaze, Medium, and Large. For the Adroit benchmark, we use two different datasets: Human and Cloned. |
| Dataset Splits | No | The paper describes generating specific quantities of samples for augmentation (e.g., "We generate a total of 24K samples for each dataset... Additionally, we augment 20K samples for each task"), and mentions evaluating policies on D4RL tasks. However, it does not provide explicit training/validation/test splits, percentages, or methodology for splitting the datasets (either original or augmented) for reproduction of the policy training and evaluation phases. |
| Hardware Specification | Yes | All training and sampling are conducted on an AMD Ryzen 7 7700X 8-Core Processor and a single NVIDIA GeForce RTX 4080 GPU. |
| Software Dependencies | No | The paper cites several algorithms and techniques (e.g., SiLU, Random Fourier Feature embedding, Sinusoidal positional embedding, EDM, Heun's 2nd order ODE solver) and uses implementations for baselines (CORL codebase, Data Light), but it does not specify exact version numbers for programming languages, libraries, or frameworks used for the implementation of GODA itself (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The denoising neural network utilizes the adaptive gated conditioning architecture, as shown in Figure ref fig:gac. Table 7 details the associated hyperparameters. We use Random Fourier Feature embedding (Rahimi & Recht, 2007) and Sinusoidal positional embedding (Vaswani et al., 2017) to process the noise level and timestep of each RTG, respectively, with an embedding dimension of 128. The width of the linear layers in the MLP block is set to 512, with Si LU (Elfwing et al., 2018) as the activation function. The total number of trainable parameters for the denoiser neural network is approximately 3.3 M. We train our GODA model with 100K steps of gradient updates, with a batch size of 256 and a learning rate of 0.0003. |