reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy Gradient with Tree Expansion

Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify our results, we implemented a practical version of Soft Tree Max that exhaustively searches the entire tree and applies a neural network on its leaves. We test our algorithm on a parallelized Atari GPU simulator (Dalton et al., 2020). Using this implementation in Atari, we show that Soft Tree Max reduces the gradient variance by three orders of magnitude. This leads to better sample complexity and improved performance compared to distributed PPO.
Researcher Affiliation	Collaboration	1NVIDIA Research 2Indian Institute of Science 3Technion University 4Bar-Ilan University.
Pseudocode	Yes	This section provides the pseudocode for our Soft Tree Max implementation. Algorithm 1 details the C-Soft Tree Max policy computation, which efficiently utilizes GPU parallelization to perform tree expansion. Algorithm 2 shows how Soft Tree Max integrates with the PPO algorithm, distinguishing the usage of our new policy in red.
Open Source Code	Yes	The code for our implementation is available at https://github.com/NVlabs/Soft Tree Max. We provide a docker file for setting up the environment and a README file with instructions on how to run both training and evaluation.
Open Datasets	Yes	We conduct our experiments on multiple games from the Atari simulation suite (Bellemare et al., 2013).
Dataset Splits	No	The paper mentions using the Atari simulation suite but does not specify how game data or trajectories are split into training, test, or validation sets for model evaluation or training. It describes training agents within the environment rather than using a pre-split dataset.
Hardware Specification	Yes	We use Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz equipped with one NVIDIA Tesla V100 32GB.
Software Dependencies	Yes	The environment engine is the highly efficient Atari-Cu LE (Dalton et al., 2020), a CUDA-based version of Atari that runs on GPU. ... We extend Stable-Baselines3 (Raffin et al., 2019)...
Experiment Setup	Yes	We train Soft Tree Max for depths d = 1 . . . 8, with a single worker. We use five seeds for each experiment. ... For depths d >= 3, we limited the tree to a maximum width of 1024 nodes and pruned non-promising trajectories with low estimated weights. ... we ran all experiments for one week on the same machine. ... In Algorithm 2, we use Generalized Advantage Estimation (GAE) with lambda = 0.95 for calculating advantage estimates...