MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Authors: Claas Voelcker, Marcel Hussing, ERIC EATON, Amir-massoud Farahmand, Igor Gilitschenski
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments further highlight the importance of employing a good model to generate data, MAD-TD s ability to combat value overestimation, and its practical stability gains for continued learning. We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b). |
| Researcher Affiliation | Academia | Claas A Voelcker University of Toronto Vector Institute Marcel Hussing University of Pennsylvania Eric Eaton University of Pennsylvania Amir-massoud Farahmand Polytechnique Montr eal Mila Quebec AI Institute University of Toronto Igor Gilitschenski University of Toronto Vector Institute |
| Pseudocode | No | The paper describes the method and design choices in sections 4 and 4.1 but does not include a structured pseudocode or algorithm block. |
| Open Source Code | Yes | For reference, our code is available at https://github.com/adaptive-agents-lab/mad-td. |
| Open Datasets | Yes | We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b). In Subsection E.8 we furthermore show results for the metaworld benchmark (Yu et al., 2019). |
| Dataset Splits | Yes | While training a TD3 agent (Fujimoto et al., 2018), we save transitions in a validation buffer with a 5% probability. At regular intervals we compute the critic loss on this validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or other detailed computer specifications used for running its experiments. |
| Software Dependencies | No | Our experiments are implemented in the jax library to allow for easy parallelization of multiple experiments across seeds. We use mish activation functions (Misra, 2020) and the Adam optimizer to train our models (Kingma & Ba, 2015). |
| Experiment Setup | Yes | Full hyperparameters are presented in Table 2 and the architecture can be found in Table 1. |