reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OptionZero: Planning with Learned Options

Authors: Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments conducted in 26 Atari games demonstrate that Option Zero outperforms Mu Zero, achieving a 131.58% improvement in mean human-normalized score.
Researcher Affiliation	Academia	1Institute of Information Science, Academia Sinica, Taiwan 2Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan
Pseudocode	No	The paper describes the modifications to MCTS in section 4.2 using prose and mathematical equations but does not include a distinct, labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero. The source code, scripts for processing behavior analysis, and trained models are available at https://rlg.iis.sinica.edu.tw/papers/optionzero.
Open Datasets	Yes	We conduct experiments on Atari games, which are visually complex environments with relatively small frame differences between states, making them suitable for learning options.
Dataset Splits	No	The paper mentions training on 'Atari games' and using a 'self-play process [that] collects game trajectories' for training. It does not explicitly define traditional training, validation, and test splits for these games or for the Grid World environment.
Hardware Specification	Yes	The experiments are conducted on machines with 24 CPU cores and four NVIDIA GTX 1080 Ti GPUs.
Software Dependencies	No	The paper states, 'Our Option Zero implementation, which is built upon a publicly available Mu Zero framework (Wu et al., 2025).' However, it does not specify version numbers for any software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Detailed experiment setups are provided in Appendix B. In this section, we describe the details for training Option Zero models used in the experiments. The experiments are conducted on machines with 24 CPU cores and four NVIDIA GTX 1080 Ti GPUs. For the training configurations, we generally follow those in Mu Zero, where the hyperparameters are listed in Table 4. Table 4: Hyperparameters for training. Parameter: Optimizer SGD, Optimizer: learning rate 0.1, Optimizer: momentum 0.9, Optimizer: weight decay 0.0001, Discount factor 0.997, Priority exponent (α) 1, Priority correction (β) 0.4, Bootstrap step (n-step return) 5, MCTS simulation 50, Softmax temperature 1, Frames skip 4, Frames stacked 4, Iteration 300 400, Training steps 60k 80k, Batch size 512 1024, # Blocks 2 1, Replay buffer size 1M frames 8k games, Max frames per episode 108k, Dirichlet noise ratio 0.25 0.3.