reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

One-Step Distributional Reinforcement Learning

Authors: Mastane Achab, Reda ALAMI, YASSER ABDELAZIZ DAHOU DJILALI, Kirill Fedyanin, Eric Moulines

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, titled 'Numerical experiments', the paper describes experiments on both tabular and Atari games environments. Figures 2, 3, 4, and 5 present various results and performance comparisons (e.g., 'Episode score', 'Learning dynamics') against baselines like C51 and DQN across different environments (Beamrider, Breakout, Pong), indicating empirical validation of the proposed methods.
Researcher Affiliation	Collaboration	Mastane Achab, Reda Alami, Yasser Abdelaziz Dahou Djilali, Kirill Fedyanin are affiliated with 'Technology Innovation Institute, 9639 Masdar City, Abu Dhabi, United Arab Emirates', which appears to be a government or corporate research institution (indicated by '.ae' email domain for Mastane Achab). Eric Moulines is affiliated with 'Ecole polytechnique, Palaiseau, France', which is an academic institution. The presence of both types of affiliations indicates a collaborative effort.
Pseudocode	Yes	The paper includes 'Algorithm 1 Tabular one-step categorical Distr RL' and 'Algorithm 2 OS-C51 (single update)', which are clearly labeled algorithm blocks detailing the proposed methods.
Open Source Code	Yes	The paper explicitly provides a link to source code in footnote 4: 'here: https://github.com/mastane/cleanrl'.
Open Datasets	Yes	The paper references well-known public environments for its experiments: 'Frozen Lake environment from Open AI Gym (Brockman et al., 2016)' and 'Atari games (Bellemare et al., 2013)'.
Dataset Splits	No	The paper describes experiments on the Frozen Lake environment and Atari games, which are reinforcement learning environments where agents interact continuously rather than using pre-defined train/test/validation dataset splits. For Frozen Lake, it mentions 'observing empirical transitions' and for Atari games, 'transitions are collected in a replay memory buﬀer of size 1000000'. It does not specify fixed dataset splits in the conventional sense for supervised learning.
Hardware Specification	No	The paper mentions training 'deep neural network with 3 convolutional layers followed by 2 fully connected layers with Re LU activation functions' for Atari games but does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for these experiments.
Software Dependencies	No	The paper states, 'we implement the OS-C51 agent on top of the popular Clean RL codebase (Huang et al., 2022)'. While 'Clean RL' is mentioned as a codebase, no specific version number for Clean RL itself or any other key software libraries (e.g., PyTorch, TensorFlow, etc.) is provided.
Experiment Setup	Yes	For the tabular setting (Frozen Lake), the paper specifies a 'constant stepsize α = 0.6 and ε-greedy exploration with ε exponentially decaying from 1 to 0.25'. For Atari games, it mentions using the 'Adam optimizer (Kingma & Ba, 2014) with learning rate equal to 0.0001 and batch size of 32' for DQN, and 'Adam optimizer with learning rate set to 0.00025 and batch size equal to 32' for C51 and OS-C51. It also states 'transitions are collected in a replay memory buﬀer of size 1000000' and 'results are averaged over 5 seeds'.