reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Authors: Claas Voelcker, Marcel Hussing, ERIC EATON, Amir-massoud Farahmand, Igor Gilitschenski

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments further highlight the importance of employing a good model to generate data, MAD-TD s ability to combat value overestimation, and its practical stability gains for continued learning. We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b).
Researcher Affiliation	Academia	Claas A Voelcker University of Toronto Vector Institute Marcel Hussing University of Pennsylvania Eric Eaton University of Pennsylvania Amir-massoud Farahmand Polytechnique Montr eal Mila Quebec AI Institute University of Toronto Igor Gilitschenski University of Toronto Vector Institute
Pseudocode	No	The paper describes the method and design choices in sections 4 and 4.1 but does not include a structured pseudocode or algorithm block.
Open Source Code	Yes	For reference, our code is available at https://github.com/adaptive-agents-lab/mad-td.
Open Datasets	Yes	We conduct all of our experiments on the Deep Mind Control suite (Tunyasuvunakool et al., 2020b). In Subsection E.8 we furthermore show results for the metaworld benchmark (Yu et al., 2019).
Dataset Splits	Yes	While training a TD3 agent (Fujimoto et al., 2018), we save transitions in a validation buffer with a 5% probability. At regular intervals we compute the critic loss on this validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or other detailed computer specifications used for running its experiments.
Software Dependencies	No	Our experiments are implemented in the jax library to allow for easy parallelization of multiple experiments across seeds. We use mish activation functions (Misra, 2020) and the Adam optimizer to train our models (Kingma & Ba, 2015).
Experiment Setup	Yes	Full hyperparameters are presented in Table 2 and the architecture can be found in Table 1.