Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Authors: Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability.
Researcher Affiliation Collaboration 1DFKI Gmb H, SAIROL 2 Department of Computer Science, TU Darmstadt 3 Hessian.ai, TU Darmstadt 4Center for AI and Data Science, University of W urzburg
Pseudocode Yes Algorithm 1 Adaptive Deep Q-Network (Ada DQN). Modifications to DQN are marked in purple. [...] Algorithm 2 Adaptive Soft Actor-Critic (Ada SAC). Modifications to SAC are marked in purple.
Open Source Code Yes Our code is available at https://github.com/theovincent/Ada DQN. [...] The code is available in the supplementary material and will be made open source upon acceptance.
Open Datasets Yes We demonstrate that Ada QN is theoretically sound and empirically validate it in Mu Jo Co control problems and Atari 2600 games, showing benefits in sampleefficiency, overall performance, robustness to stochasticity and training stability. [...] We use 20 seeds for Lunar Lander (Brockman et al., 2016), 9 seeds for Mu Jo Co (Todorov et al., 2012), and 5 seeds for Atari (Bellemare et al., 2013). All environments are generated from the Gymansium library (Brockman et al., 2016).
Dataset Splits No The paper does not explicitly provide traditional training/test/validation dataset splits. It operates in the context of Reinforcement Learning environments (e.g., MuJoCo, Atari games) where data is generated through interaction, and performance is evaluated in these environments rather than on pre-split static datasets. It mentions using multiple seeds for experiments and a budget of frames for training, but these are not dataset splits.
Hardware Specification No The paper mentions 'computations of our experiments' and discusses 'additional 300Mb from the v RAM' and '200Mb more from the v RAM' in the context of memory requirements, but it does not specify any particular GPU models, CPU types, or detailed computer specifications used for running the experiments.
Software Dependencies No The code is based on the Stable Baselines implementation (Raffin et al., 2021) and Dopamine RL (Castro et al., 2018). While these are software frameworks, no specific version numbers for these or other core libraries (e.g., Python, PyTorch, TensorFlow, CUDA) are provided.
Experiment Setup Yes Appendix B describes all the hyperparameters used for the experiments along with the description of the environment setups and evaluation protocols. [...] Table 1: Summary of all fixed hyperparameters used for the Lunar Lander experiments (see Section 5.1). [...] Table 2: Summary of all fixed hyperparameters for the Mu Jo Co experiments (see Section 5.2). [...] Table 3: Summary of all fixed hyperparameters used for the Atari experiments with a finite hyperparameter space (see Section 5.3). [...] Table 4: Summary of all hyperparameters used for the Atari experiments with an infinite hyperparameter space (see Section 5.4).