Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Authors: Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the advantages of i-QN in Atari 2600 games and Mu Jo Co continuous control problems. Our code is publicly available at https: // github. com/ theovincent/ i-DQN and the trained models are uploaded at https: // huggingface. co/ Theo Vincent/ Atari_ i-QN . |
| Researcher Affiliation | Academia | Théo Vincent EMAIL DFKI, SAIROL Team & TU Darmstadt Daniel Palenicek EMAIL TU Darmstadt & Hessian.ai Boris Belousov EMAIL DFKI, SAIROL Team Jan Peters EMAIL DFKI, SAIROL Team & TU Darmstadt & Hessian.ai Carlo D Eramo EMAIL University of Würzburg & TU Darmstadt & Hessian.ai |
| Pseudocode | Yes | Algorithm 1 Iterated Deep Q-Network (i-DQN). Modifications to DQN are marked in purple. [...] Algorithm 2 Iterated Soft Actor-Critic (i-SAC). Modifications to SAC are marked in purple. |
| Open Source Code | Yes | Our code is publicly available at https: // github. com/ theovincent/ i-DQN and the trained models are uploaded at https: // huggingface. co/ Theo Vincent/ Atari_ i-QN . |
| Open Datasets | Yes | We empirically demonstrate the advantages of i-QN in Atari 2600 games and Mu Jo Co continuous control problems. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes data collection for a specific environment (car-on-hill) with 50,000 samples and batch sizes, but not how these samples are partitioned into distinct training, validation, and test sets for evaluation in a traditional supervised learning sense. For Atari and MuJoCo, the approach is based on continuous interaction with environments and learning from a replay buffer rather than fixed dataset splits. |
| Hardware Specification | Yes | Computations are made on an NVIDIA Ge Force RTX 4090 Ti. |
| Software Dependencies | No | The paper mentions software like JAX and the Adam optimizer, but does not provide specific version numbers for these or other software libraries. |
| Experiment Setup | Yes | Table 3: Summary of all hyperparameters used for the Atari experiments. We note Convd a,b C a 2D convolutional layer with C filters of size a b and of stride d, and FC E a fully connected layer with E neurons. [...] Table 4: Summary of all hyperparameters used for the Mu Jo Co experiments. We note FC E a fully connected layer with E neurons. |