reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ChainerRL: A Deep Reinforcement Learning Library

Authors: Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To foster reproducible research, and for instructional purposes, Chainer RL provides scripts that closely replicate the original papers experimental settings and reproduce published benchmark results for several algorithms. Lastly, Chainer RL oﬀers a visualization tool that enables the qualitative inspection of trained agents. The entire Section 3, 'Reproducibility,' details experiments and comparisons against published results on Atari and MuJoCo benchmarks, providing extensive tables (Tables 2 and 4) of performance metrics.
Researcher Affiliation	Collaboration	Yasuhiro Fujita EMAIL Prabhat Nagarajan EMAIL Toshiki Kataoka EMAIL Preferred Networks Tokyo, Japan and Takahiro Ishikawa EMAIL The University of Tokyo Tokyo, Japan. The affiliations include a company (Preferred Networks) and a university (The University of Tokyo), indicating a collaboration.
Pseudocode	Yes	Appendix D. Pseudocode: The following pseudocode depicts the simplicity of creating and training a Rainbow agent with Chainer RL. 1 import chainerrl as crl 2 import gym 4 q_func = crl.q_functions.Distributional Dueling DQN (...)# dueling 5 crl.links.to_factorized_noisy (q_func) # noisy networks 6 # Prioritized Experience Replay Buffer with a 3-step reward 7 per = crl.replay_buffers.Prioritized Replay Buffer ( num_step_return =3 ,...) 8 # Create a rainbow agent 9 rainbow = crl.agents.Categorical Double DQN (per , q_func ,...) 10 num_envs = 5 # Train in five environments 11 env = crl.envs.Multiprocess Vector Env ( 12 [gym.make('Breakout') for _ in range(num_envs)]) 14 # Train the agent and collect evaluation statistics 15 crl.experiments.train_agent_batch_with_evaluation (rainbow , env , steps =...)
Open Source Code	Yes	The Chainer RL source code can be found on Git Hub: https://github.com/chainer/chainerrl.
Open Datasets	Yes	For the Atari benchmark (Bellemare et al., 2013), we have successfully reproduced DQN, IQN, Rainbow, and A3C. For the Open AI Gym Mujoco benchmark tasks, we have successfully reproduced DDPG, TRPO, PPO, TD3, and SAC. These are well-known and publicly available benchmark environments.
Dataset Splits	Yes	Table 3: Evaluation protocols used for the Atari reproductions. Eval Frequency (timesteps) 250K Eval Phase (timesteps) 125K Eval Episode Length (time) 5 min / 30 min Eval Episode Policy ϵ = 0.05 / ϵ = 0.001 / ϵ = 0.0 / N/A Reporting Protocol re-eval / best-eval. These details specify how the environment interactions are structured for evaluation, analogous to dataset splits in RL.
Hardware Specification	No	No specific hardware details (GPU/CPU models, processor types, memory amounts) are mentioned for the experiments. The paper only refers to 'CPU and GPU training' generally.
Software Dependencies	No	The paper mentions 'Python and the Chainer deep learning framework' but does not provide specific version numbers for Python, Chainer, or other software dependencies like Open AI Gym.
Experiment Setup	Yes	Table 3 provides detailed evaluation protocols for Atari reproductions, including 'Eval Frequency (timesteps) 250K', 'Eval Phase (timesteps) 125K', 'Eval Episode Length (time) 5 min', and 'Reporting Protocol re-eval'. The caption for Table 4 states: 'For DDPG and TD3, each Chainer RL score represents the maximum evaluation score during 1M-step training, averaged over 10 trials with different random seeds, where each evaluation phase of ten episodes is run after every 5000 steps. For PPO and TRPO, each Chainer RL score represents the final evaluation of 100 episodes after 2M-step training, averaged over 10 trials with different random seeds. For SAC, each Chainer RL score reports the final evaluation of 10 episodes after training for 1M (Hopper-v2), 3M (Half Cheetah-v2, Walker2d-v2, and Ant-v2), or 10M (Humanoid-v2) steps, averaged over 10 trials with different random seeds.'