ADDQ: Adaptive distributional double Q-learning
Authors: Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bı̂rsan, Martin Slowik
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are provided for tabular, Atari, and Mu Jo Co environments. |
| Researcher Affiliation | Academia | 1Institute of Mathematics, University of Mannheim, Germany 2Department of Mathematics and Computer Science, Freie Universit at Berlin, Germany. |
| Pseudocode | Yes | Algorithm 1 Distributional Q-learning update step Algorithm 2 ADDQ update step |
| Open Source Code | Yes | The code used in our experiments can be found on Git Hub: https://github.com/Bomme HD/ADDQ.git. |
| Open Datasets | Yes | Experiments are provided for tabular, Atari, and Mu Jo Co environments. ... We run experiments on Atari environments from the Arcade Learning Environment (Bellemare et al., 2013) using the Gymnasium API (Towers et al., 2023). ... Mu Jo Co [(Todorov et al., 2012)] environments |
| Dataset Splits | No | The paper mentions evaluating on environments and providing '10 evaluation episodes on 10 evaluation environments without exploration', but does not specify how the datasets within these environments (e.g., Atari, Mu Jo Co) were split into training, validation, or test sets. |
| Hardware Specification | Yes | The experiments were executed on a HPC cluster with NVIDIA Tesla V100 and NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper references software frameworks like 'Gymnasium API (Towers et al., 2023)', 'RL Baselines3 Zoo (Raffin, 2020)', and 'Stable-Baselines3 (Raffin et al., 2021)' but does not provide specific version numbers for these software components or libraries, which are crucial for reproducibility. |
| Experiment Setup | Yes | The C51 algorithm obtained its name from using a categorical representation of return distributions with m = 51 atoms. ... target network which is kept constant and is overwritten from η every e.g. 10000 steps with the parameters from the online network. ... Accordingly, we use twice the batch size for these methods ... step-size schedule αt(s, a) = 1/Ts,a(t), with Ts,a(t) the number of visits in (s, a) up to time t, i.e. 1/n state-action wise counted, exploration: ε-greedy with ε linearly decreasing from 1 to 0.1 in 10000 steps, then constant |