Universal Approximation Theorem of Deep Q-Networks
Authors: Qian Qi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement our theoretical analysis and investigate the practical behavior of DQNs with residual blocks in a continuous-time setting (approximated via discretization), we conduct numerical experiments on a simplified control task. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University, Beijing, China. Correspondence to: Qian Qi <EMAIL>. |
| Pseudocode | No | The paper describes the Q-learning algorithm mathematically and discusses its update rules (Equation 22), but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'The DQN agent described in the implementation code.' and 'assuming figure generated by the code' in the Numerical Experiments section, implying code was used. However, it does not provide any explicit statement about releasing the code, a repository link, or mention of code in supplementary materials. |
| Open Datasets | No | The paper describes a custom-defined 1D continuous control environment governed by a stochastic differential equation (Equation 44) for its numerical experiments. It does not use or provide access to any external, publicly available datasets. |
| Dataset Splits | No | The paper uses a simulated environment for its experiments, generating data through interaction. The concept of training/test/validation splits for a fixed dataset is not applicable here, and no such splits are mentioned for the simulated data. |
| Hardware Specification | No | The paper mentions 'The numerical experiments demonstrate that the DQN architecture incorporating residual blocks can be effectively trained...' but does not specify any hardware details like GPU or CPU models used for these experiments. |
| Software Dependencies | No | The paper mentions 'The DQN agent described in the implementation code.' in the Numerical Experiments section. However, it does not specify any particular software libraries, frameworks, or their version numbers used for implementation. |
| Experiment Setup | Yes | Key hyperparameters for the baseline configuration (Baseline) are: learning rate LR = 5 10 4, discount factor γ = 0.99 (GAMMA), replay buffer size = 10,000 (BUFFER SIZE), batch size = 64 (BATCH SIZE), and target network update frequency = 100 steps (TARGET UPDATE). Epsilon-greedy exploration is used with ϵ decaying exponentially from 1.0 (EPS START) to 0.01 (EPS END) with a decay factor of 0.99 per episode (EPS DECAY FACTOR). Training runs for 300 episodes (N EPISODES). For reproducibility and fair comparison, a fixed random seed (SEED=42) is used across all runs. |