One-Step Distributional Reinforcement Learning
Authors: Mastane Achab, Reda ALAMI, YASSER ABDELAZIZ DAHOU DJILALI, Kirill Fedyanin, Eric Moulines
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, titled 'Numerical experiments', the paper describes experiments on both tabular and Atari games environments. Figures 2, 3, 4, and 5 present various results and performance comparisons (e.g., 'Episode score', 'Learning dynamics') against baselines like C51 and DQN across different environments (Beamrider, Breakout, Pong), indicating empirical validation of the proposed methods. |
| Researcher Affiliation | Collaboration | Mastane Achab, Reda Alami, Yasser Abdelaziz Dahou Djilali, Kirill Fedyanin are affiliated with 'Technology Innovation Institute, 9639 Masdar City, Abu Dhabi, United Arab Emirates', which appears to be a government or corporate research institution (indicated by '.ae' email domain for Mastane Achab). Eric Moulines is affiliated with 'Ecole polytechnique, Palaiseau, France', which is an academic institution. The presence of both types of affiliations indicates a collaborative effort. |
| Pseudocode | Yes | The paper includes 'Algorithm 1 Tabular one-step categorical Distr RL' and 'Algorithm 2 OS-C51 (single update)', which are clearly labeled algorithm blocks detailing the proposed methods. |
| Open Source Code | Yes | The paper explicitly provides a link to source code in footnote 4: 'here: https://github.com/mastane/cleanrl'. |
| Open Datasets | Yes | The paper references well-known public environments for its experiments: 'Frozen Lake environment from Open AI Gym (Brockman et al., 2016)' and 'Atari games (Bellemare et al., 2013)'. |
| Dataset Splits | No | The paper describes experiments on the Frozen Lake environment and Atari games, which are reinforcement learning environments where agents interact continuously rather than using pre-defined train/test/validation dataset splits. For Frozen Lake, it mentions 'observing empirical transitions' and for Atari games, 'transitions are collected in a replay memory buffer of size 1000000'. It does not specify fixed dataset splits in the conventional sense for supervised learning. |
| Hardware Specification | No | The paper mentions training 'deep neural network with 3 convolutional layers followed by 2 fully connected layers with Re LU activation functions' for Atari games but does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for these experiments. |
| Software Dependencies | No | The paper states, 'we implement the OS-C51 agent on top of the popular Clean RL codebase (Huang et al., 2022)'. While 'Clean RL' is mentioned as a codebase, no specific version number for Clean RL itself or any other key software libraries (e.g., PyTorch, TensorFlow, etc.) is provided. |
| Experiment Setup | Yes | For the tabular setting (Frozen Lake), the paper specifies a 'constant stepsize α = 0.6 and ε-greedy exploration with ε exponentially decaying from 1 to 0.25'. For Atari games, it mentions using the 'Adam optimizer (Kingma & Ba, 2014) with learning rate equal to 0.0001 and batch size of 32' for DQN, and 'Adam optimizer with learning rate set to 0.00025 and batch size equal to 32' for C51 and OS-C51. It also states 'transitions are collected in a replay memory buffer of size 1000000' and 'results are averaged over 5 seeds'. |