reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bridging the Gap Between Target Networks and Functional Regularization

Authors: Alexandre Piché, Valentin Thomas, Joseph Marino, Rafael Pardinas, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experimental study, we explored a variety of environments, including the two-state MDP (Tsitsiklis & Van Roy, 1996), the Four Rooms environment (Sutton et al., 1999), and the Atari suite (Bellemare et al., 2013), to assess the efficacy of regularization introduced by TN and FR in relation to performance, accuracy, and divergence. Our findings emphasize that Functional Regularization without regularization weight tuning can be used as a drop-in replacement for Target Networks without loss of performance and can result in performance improvement. Additionally, the combined use of the additional regularization weight and the network update period in FR can lead to enhanced performance compared to merely tuning the network update period for TN. Section 4: Experiments
Researcher Affiliation	Collaboration	Alexandre Piché EMAIL Service Now Research Mila, Université de Montréal Valentin Thomas EMAIL Mila, Université de Montréal Rafael Pardinas EMAIL Service Now Research Joseph Marino EMAIL Deep Mind, London Gian Maria Marconi EMAIL RIKEN Center for Advanced Intelligence Project Christopher Pal EMAIL Mila, Polytechnique Montréal Canada CIFAR AI Chair Mohammad Emtiyaz Khan EMAIL RIKEN Center for Advanced Intelligence Project
Pseudocode	Yes	Algorithm 1 Deep Q-Network (DQN) Algorithm with TN or FR
Open Source Code	Yes	The code is available here https://github.com/Alex Piche/fr-tmlr/.
Open Datasets	Yes	In our experimental study, we explored a variety of environments, including the two-state MDP (Tsitsiklis & Van Roy, 1996), the Four Rooms environment (Sutton et al., 1999), and the Atari suite (Bellemare et al., 2013)
Dataset Splits	No	The paper describes generating data through interaction with environments (e.g., 'collect 10000 environment transitions', 'run each algorithm for 10M steps') but does not specify explicit training/test/validation splits for a static dataset. The nature of RL often involves on-the-fly data generation rather than pre-split datasets.
Hardware Specification	No	The paper mentions 'approximately 60, 000 GPU hours' and 'a total of 30,000 GPU hours' but does not specify any particular GPU models, CPU models, or other hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions using the 'Clean RL library (Huang et al., 2022)' and the 'rliable library' and 'Adam optimizer (Kingma & Ba, 2014)', but does not provide specific version numbers for these software components or any programming language used.
Experiment Setup	Yes	Table 1: Four Rooms Hyper-parameters Hyperparameter Value learning rate 1e-4 optimizer adam (Kingma & Ba, 2014) discount factor γ 0.99 DNN layers [128, 128, 4] dimension 11 11 Section 4.4.1 Experimental Set-Up: 'For each environment, we decay the probability of a random action from 1 to ϵ and the discount factor γ use to train the Q-value. We report the results for different γ and ϵ since they can both increase instability and result in divergence. Unless specified otherwise, we use the default hyper-parameters from the Clean RL library (Huang et al., 2022).'