reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ADDQ: Adaptive distributional double Q-learning

Authors: Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bı̂rsan, Martin Slowik

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are provided for tabular, Atari, and Mu Jo Co environments.
Researcher Affiliation	Academia	1Institute of Mathematics, University of Mannheim, Germany 2Department of Mathematics and Computer Science, Freie Universit at Berlin, Germany.
Pseudocode	Yes	Algorithm 1 Distributional Q-learning update step Algorithm 2 ADDQ update step
Open Source Code	Yes	The code used in our experiments can be found on Git Hub: https://github.com/Bomme HD/ADDQ.git.
Open Datasets	Yes	Experiments are provided for tabular, Atari, and Mu Jo Co environments. ... We run experiments on Atari environments from the Arcade Learning Environment (Bellemare et al., 2013) using the Gymnasium API (Towers et al., 2023). ... Mu Jo Co [(Todorov et al., 2012)] environments
Dataset Splits	No	The paper mentions evaluating on environments and providing '10 evaluation episodes on 10 evaluation environments without exploration', but does not specify how the datasets within these environments (e.g., Atari, Mu Jo Co) were split into training, validation, or test sets.
Hardware Specification	Yes	The experiments were executed on a HPC cluster with NVIDIA Tesla V100 and NVIDIA A100 GPUs.
Software Dependencies	No	The paper references software frameworks like 'Gymnasium API (Towers et al., 2023)', 'RL Baselines3 Zoo (Raffin, 2020)', and 'Stable-Baselines3 (Raffin et al., 2021)' but does not provide specific version numbers for these software components or libraries, which are crucial for reproducibility.
Experiment Setup	Yes	The C51 algorithm obtained its name from using a categorical representation of return distributions with m = 51 atoms. ... target network which is kept constant and is overwritten from η every e.g. 10000 steps with the parameters from the online network. ... Accordingly, we use twice the batch size for these methods ... step-size schedule αt(s, a) = 1/Ts,a(t), with Ts,a(t) the number of visits in (s, a) up to time t, i.e. 1/n state-action wise counted, exploration: ε-greedy with ε linearly decreasing from 1 to 0.1 in 10000 steps, then constant