reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Authors: Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability.
Researcher Affiliation	Collaboration	1DFKI Gmb H, SAIROL 2 Department of Computer Science, TU Darmstadt 3 Hessian.ai, TU Darmstadt 4Center for AI and Data Science, University of W urzburg
Pseudocode	Yes	Algorithm 1 Adaptive Deep Q-Network (Ada DQN). Modifications to DQN are marked in purple. [...] Algorithm 2 Adaptive Soft Actor-Critic (Ada SAC). Modifications to SAC are marked in purple.
Open Source Code	Yes	Our code is available at https://github.com/theovincent/Ada DQN. [...] The code is available in the supplementary material and will be made open source upon acceptance.
Open Datasets	Yes	We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability. [...] We use 20 seeds for Lunar Lander (Brockman et al., 2016), 9 seeds for Mu Jo Co (Todorov et al., 2012), and 5 seeds for Atari (Bellemare et al., 2013). All environments are generated from the Gymansium library (Brockman et al., 2016).
Dataset Splits	No	The paper does not explicitly provide traditional training/test/validation dataset splits. It operates in the context of Reinforcement Learning environments (e.g., MuJoCo, Atari games) where data is generated through interaction, and performance is evaluated in these environments rather than on pre-split static datasets. It mentions using multiple seeds for experiments and a budget of frames for training, but these are not dataset splits.
Hardware Specification	No	The paper mentions 'computations of our experiments' and discusses 'additional 300Mb from the v RAM' and '200Mb more from the v RAM' in the context of memory requirements, but it does not specify any particular GPU models, CPU types, or detailed computer specifications used for running the experiments.
Software Dependencies	No	The code is based on the Stable Baselines implementation (Raffin et al., 2021) and Dopamine RL (Castro et al., 2018). While these are software frameworks, no specific version numbers for these or other core libraries (e.g., Python, PyTorch, TensorFlow, CUDA) are provided.
Experiment Setup	Yes	Appendix B describes all the hyperparameters used for the experiments along with the description of the environment setups and evaluation protocols. [...] Table 1: Summary of all fixed hyperparameters used for the Lunar Lander experiments (see Section 5.1). [...] Table 2: Summary of all fixed hyperparameters for the Mu Jo Co experiments (see Section 5.2). [...] Table 3: Summary of all fixed hyperparameters used for the Atari experiments with a finite hyperparameter space (see Section 5.3). [...] Table 4: Summary of all hyperparameters used for the Atari experiments with an infinite hyperparameter space (see Section 5.4).