reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Universal Approximation Theorem of Deep Q-Networks

Authors: Qian Qi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To complement our theoretical analysis and investigate the practical behavior of DQNs with residual blocks in a continuous-time setting (approximated via discretization), we conduct numerical experiments on a simplified control task.
Researcher Affiliation	Academia	1School of Computer Science, Peking University, Beijing, China. Correspondence to: Qian Qi <EMAIL>.
Pseudocode	No	The paper describes the Q-learning algorithm mathematically and discusses its update rules (Equation 22), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions 'The DQN agent described in the implementation code.' and 'assuming figure generated by the code' in the Numerical Experiments section, implying code was used. However, it does not provide any explicit statement about releasing the code, a repository link, or mention of code in supplementary materials.
Open Datasets	No	The paper describes a custom-defined 1D continuous control environment governed by a stochastic differential equation (Equation 44) for its numerical experiments. It does not use or provide access to any external, publicly available datasets.
Dataset Splits	No	The paper uses a simulated environment for its experiments, generating data through interaction. The concept of training/test/validation splits for a fixed dataset is not applicable here, and no such splits are mentioned for the simulated data.
Hardware Specification	No	The paper mentions 'The numerical experiments demonstrate that the DQN architecture incorporating residual blocks can be effectively trained...' but does not specify any hardware details like GPU or CPU models used for these experiments.
Software Dependencies	No	The paper mentions 'The DQN agent described in the implementation code.' in the Numerical Experiments section. However, it does not specify any particular software libraries, frameworks, or their version numbers used for implementation.
Experiment Setup	Yes	Key hyperparameters for the baseline configuration (Baseline) are: learning rate LR = 5 10 4, discount factor γ = 0.99 (GAMMA), replay buffer size = 10,000 (BUFFER SIZE), batch size = 64 (BATCH SIZE), and target network update frequency = 100 steps (TARGET UPDATE). Epsilon-greedy exploration is used with ϵ decaying exponentially from 1.0 (EPS START) to 0.01 (EPS END) with a decay factor of 0.99 per episode (EPS DECAY FACTOR). Training runs for 300 episodes (N EPISODES). For reproducibility and fair comparison, a fixed random seed (SEED=42) is used across all runs.