reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Highway Graph to Accelerate Reinforcement Learning

Authors: Zidu Yin, Zhen Zhang, Dong Gong, Stefano V Albrecht, Javen Qinfeng Shi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across four categories of environments demonstrate that our method learns significantly faster than established and state-of-the-art model-free and model-based RL algorithms (often by a factor of 10 to 150) while maintaining equal or superior expected returns.
Researcher Affiliation	Academia	Zidu Yin EMAIL School of Information Science and Technology Yunnan Normal University Zhen Zhang EMAIL School of Computer and Mathematical Sciences Adelaide University Dong Gong EMAIL School of Computer Science and Engineering The University of New South Wales Stefano V. Albrecht EMAIL School of Informatics University of Edinburgh Javen Q. Shi EMAIL School of Computer and Mathematical Sciences Adelaide University
Pseudocode	Yes	Algorithm 1 Highway graph incremental construction Algorithm 2 Value updating on highway graph
Open Source Code	Yes	The implementation of our highway graph RL method is publicly available at https://github.com/coodest/highway RL.
Open Datasets	Yes	Simple Maze 1: a simple maze environment with customizable sizes. Toy Text (Towers et al., 2023): a tiny and simple game set, with small discrete state and action spaces, including Frozen Lake, Taxi, Cliff Walking, and Blackjack. Google Research Football (GRF) (Kurach et al., 2020): a physical-based football simulator. Atari learning environment (Bellemare et al., 2013): a simulator for Atari 2600 console games.
Dataset Splits	No	To better show the training efficiency advantages of our highway graph RL method, we only use one million frames of interaction from different types of Environments. Whether the information from one million frames is enough to solve the task in the environments will also be shown.
Hardware Specification	Yes	All the experiments were running in the Docker container with identical system resources including 8 CPU cores with 128 GB RAM, and an NVIDIA RTX 3090Ti GPU with 24 GB VRAM.
Software Dependencies	No	In addition, we use RLlib (Liang et al., 2018) implementations for DQN, PPO, A3C, R2D2, and IMPALA. Other baselines including NEC, MFEC, EMDQN, GEM, and Gumbel Mu Zero are obtained from its official repository.
Experiment Setup	Yes	For Simple Maze, Toy text, and Google Research Football, we adopt a discount factor of 0.99, and for Atari games, we adopt a discount factor of 1 1e6 since very long trajectories in Atari games are not suitable for recurrent random projectors. DQN: double DQN is used with dueling enabled. N-step of Q-learning is 1. A Huber loss is computed for TD error. A3C: the coefficient for the value function term and the entropy regularizer term in the loss function are 0.5 and 0.01 respectively. The grad clip is set to 40. PPO: initial coefficient and target value for KL divergence are 0.5 and 0.01 respectively. The coefficient of the value function loss is 1.0.