reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies

Authors: Zhouyu He, Peng Qiao, Rongchun Li, Yong Dou, Yusong Tan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments. Tian Ji achieves a convergence time acceleration ratio of up to 4.37 compared to related comparison systems. When scaled to eight computational nodes, Tian Ji shows a convergence time speedup of 1.6 and a throughput speedup of 7.13 relative to Xing Tian, demonstrating its capability to accelerate training and scalability. In data transmission efficiency experiments, Tian Ji significantly outperforms other systems, approaching hardware limits. Tian Ji also shows effectiveness in on-policy algorithms, achieving convergence time acceleration ratios of 4.36 and 2.95 compared to RLlib and Xing Tian.
Researcher Affiliation	Academia	1College of Computer Science and Technology, National University of Defense Technology 2National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology EMAIL
Pseudocode	Yes	Algorithm 1: Pseudo-code for Function Approximationbased Temporal Difference(0)
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described in this paper. It mentions 'Ape X s open-source implementation' but that refers to a baseline, not their own work.
Open Datasets	Yes	Algorithms & Environment. We conducted experiments in Atari and Open AI Gym. The algorithms evaluated are DQN(off-policy) and PPO(on-policy).
Dataset Splits	No	The paper mentions using 'RLlib s default network architecture and parameters as benchmarks for both Gym and Atari tasks' and states 'All comparisons used identical network architectures and hyperparameters across the same games to ensure fairness.' However, it does not provide specific details on how the data from the Atari and OpenAI Gym environments were split into training, validation, or test sets (e.g., percentages, exact counts, or specific split methodologies).
Hardware Specification	Yes	Testbed. We conﬁgured two hardware platforms for our experiments. The ﬁrst platform is a CPU-only Slurm cluster with 8 computing nodes. Each node is equipped with 2 Intel Xeon Gold 6248 processors, providing a total of 40 physical cores per node. Each node has 384GB of memory, and they are interconnected using Connect X-6 high-speed interconnects. The second platform is a heterogeneous machine, equipped with one A100 GPU, 40 physical cores, and 376GB of memory.
Software Dependencies	No	The paper mentions software like 'RLlib', 'Ray', and 'gRPC library' but does not provide specific version numbers for any of these components, which are necessary for reproducible descriptions of ancillary software.
Experiment Setup	Yes	The optimal computational-resource mapping, identiﬁed by the distributed strategy, includes 2 learners and 4 actors, using 16 cores, denoted as L2A8-C16. A random computational-resource mapping, labeled L1A14C16, was also evaluated. After introducing asynchrony, simulations with serial sample distributions were conducted using two sample ratios: 1:1 (denoted as New) and 1:8 (denoted as Staleness). Four sets of control experiments were conducted.