Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
Authors: Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability. |
| Researcher Affiliation | Collaboration | 1DFKI Gmb H, SAIROL 2 Department of Computer Science, TU Darmstadt 3 Hessian.ai, TU Darmstadt 4Center for AI and Data Science, University of W urzburg |
| Pseudocode | Yes | Algorithm 1 Adaptive Deep Q-Network (Ada DQN). Modifications to DQN are marked in purple. [...] Algorithm 2 Adaptive Soft Actor-Critic (Ada SAC). Modifications to SAC are marked in purple. |
| Open Source Code | Yes | Our code is available at https://github.com/theovincent/Ada DQN. [...] The code is available in the supplementary material and will be made open source upon acceptance. |
| Open Datasets | Yes | We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability. [...] We use 20 seeds for Lunar Lander (Brockman et al., 2016), 9 seeds for Mu Jo Co (Todorov et al., 2012), and 5 seeds for Atari (Bellemare et al., 2013). All environments are generated from the Gymansium library (Brockman et al., 2016). |
| Dataset Splits | No | The paper does not explicitly provide traditional training/test/validation dataset splits. It operates in the context of Reinforcement Learning environments (e.g., MuJoCo, Atari games) where data is generated through interaction, and performance is evaluated in these environments rather than on pre-split static datasets. It mentions using multiple seeds for experiments and a budget of frames for training, but these are not dataset splits. |
| Hardware Specification | No | The paper mentions 'computations of our experiments' and discusses 'additional 300Mb from the v RAM' and '200Mb more from the v RAM' in the context of memory requirements, but it does not specify any particular GPU models, CPU types, or detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The code is based on the Stable Baselines implementation (Raffin et al., 2021) and Dopamine RL (Castro et al., 2018). While these are software frameworks, no specific version numbers for these or other core libraries (e.g., Python, PyTorch, TensorFlow, CUDA) are provided. |
| Experiment Setup | Yes | Appendix B describes all the hyperparameters used for the experiments along with the description of the environment setups and evaluation protocols. [...] Table 1: Summary of all fixed hyperparameters used for the Lunar Lander experiments (see Section 5.1). [...] Table 2: Summary of all fixed hyperparameters for the Mu Jo Co experiments (see Section 5.2). [...] Table 3: Summary of all fixed hyperparameters used for the Atari experiments with a finite hyperparameter space (see Section 5.3). [...] Table 4: Summary of all hyperparameters used for the Atari experiments with an infinite hyperparameter space (see Section 5.4). |