ChainerRL: A Deep Reinforcement Learning Library

Authors: Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To foster reproducible research, and for instructional purposes, Chainer RL provides scripts that closely replicate the original papers experimental settings and reproduce published benchmark results for several algorithms. Lastly, Chainer RL offers a visualization tool that enables the qualitative inspection of trained agents. The entire Section 3, 'Reproducibility,' details experiments and comparisons against published results on Atari and MuJoCo benchmarks, providing extensive tables (Tables 2 and 4) of performance metrics.
Researcher Affiliation Collaboration Yasuhiro Fujita EMAIL Prabhat Nagarajan EMAIL Toshiki Kataoka EMAIL Preferred Networks Tokyo, Japan and Takahiro Ishikawa EMAIL The University of Tokyo Tokyo, Japan. The affiliations include a company (Preferred Networks) and a university (The University of Tokyo), indicating a collaboration.
Pseudocode Yes Appendix D. Pseudocode: The following pseudocode depicts the simplicity of creating and training a Rainbow agent with Chainer RL. 1 import chainerrl as crl 2 import gym 4 q_func = crl.q_functions.Distributional Dueling DQN (...)# dueling 5 crl.links.to_factorized_noisy (q_func) # noisy networks 6 # Prioritized Experience Replay Buffer with a 3-step reward 7 per = crl.replay_buffers.Prioritized Replay Buffer ( num_step_return =3 ,...) 8 # Create a rainbow agent 9 rainbow = crl.agents.Categorical Double DQN (per , q_func ,...) 10 num_envs = 5 # Train in five environments 11 env = crl.envs.Multiprocess Vector Env ( 12 [gym.make('Breakout') for _ in range(num_envs)]) 14 # Train the agent and collect evaluation statistics 15 crl.experiments.train_agent_batch_with_evaluation (rainbow , env , steps =...)
Open Source Code Yes The Chainer RL source code can be found on Git Hub: https://github.com/chainer/chainerrl.
Open Datasets Yes For the Atari benchmark (Bellemare et al., 2013), we have successfully reproduced DQN, IQN, Rainbow, and A3C. For the Open AI Gym Mujoco benchmark tasks, we have successfully reproduced DDPG, TRPO, PPO, TD3, and SAC. These are well-known and publicly available benchmark environments.
Dataset Splits Yes Table 3: Evaluation protocols used for the Atari reproductions. Eval Frequency (timesteps) 250K Eval Phase (timesteps) 125K Eval Episode Length (time) 5 min / 30 min Eval Episode Policy ϵ = 0.05 / ϵ = 0.001 / ϵ = 0.0 / N/A Reporting Protocol re-eval / best-eval. These details specify how the environment interactions are structured for evaluation, analogous to dataset splits in RL.
Hardware Specification No No specific hardware details (GPU/CPU models, processor types, memory amounts) are mentioned for the experiments. The paper only refers to 'CPU and GPU training' generally.
Software Dependencies No The paper mentions 'Python and the Chainer deep learning framework' but does not provide specific version numbers for Python, Chainer, or other software dependencies like Open AI Gym.
Experiment Setup Yes Table 3 provides detailed evaluation protocols for Atari reproductions, including 'Eval Frequency (timesteps) 250K', 'Eval Phase (timesteps) 125K', 'Eval Episode Length (time) 5 min', and 'Reporting Protocol re-eval'. The caption for Table 4 states: 'For DDPG and TD3, each Chainer RL score represents the maximum evaluation score during 1M-step training, averaged over 10 trials with different random seeds, where each evaluation phase of ten episodes is run after every 5000 steps. For PPO and TRPO, each Chainer RL score represents the final evaluation of 100 episodes after 2M-step training, averaged over 10 trials with different random seeds. For SAC, each Chainer RL score reports the final evaluation of 10 episodes after training for 1M (Hopper-v2), 3M (Half Cheetah-v2, Walker2d-v2, and Ant-v2), or 10M (Humanoid-v2) steps, averaged over 10 trials with different random seeds.'