MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Authors: Claas Voelcker, Marcel Hussing, ERIC EATON, Amir-massoud Farahmand, Igor Gilitschenski

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments further highlight the importance of employing a good model to generate data, MAD-TD s ability to combat value overestimation, and its practical stability gains for continued learning. We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b).
Researcher Affiliation Academia Claas A Voelcker University of Toronto Vector Institute Marcel Hussing University of Pennsylvania Eric Eaton University of Pennsylvania Amir-massoud Farahmand Polytechnique Montr eal Mila Quebec AI Institute University of Toronto Igor Gilitschenski University of Toronto Vector Institute
Pseudocode No The paper describes the method and design choices in sections 4 and 4.1 but does not include a structured pseudocode or algorithm block.
Open Source Code Yes For reference, our code is available at https://github.com/adaptive-agents-lab/mad-td.
Open Datasets Yes We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b). In Subsection E.8 we furthermore show results for the metaworld benchmark (Yu et al., 2019).
Dataset Splits Yes While training a TD3 agent (Fujimoto et al., 2018), we save transitions in a validation buffer with a 5% probability. At regular intervals we compute the critic loss on this validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or other detailed computer specifications used for running its experiments.
Software Dependencies No Our experiments are implemented in the jax library to allow for easy parallelization of multiple experiments across seeds. We use mish activation functions (Misra, 2020) and the Adam optimizer to train our models (Kingma & Ba, 2015).
Experiment Setup Yes Full hyperparameters are presented in Table 2 and the architecture can be found in Table 1.