Policy Gradient with Tree Expansion

Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify our results, we implemented a practical version of Soft Tree Max that exhaustively searches the entire tree and applies a neural network on its leaves. We test our algorithm on a parallelized Atari GPU simulator (Dalton et al., 2020). Using this implementation in Atari, we show that Soft Tree Max reduces the gradient variance by three orders of magnitude. This leads to better sample complexity and improved performance compared to distributed PPO.
Researcher Affiliation Collaboration 1NVIDIA Research 2Indian Institute of Science 3Technion University 4Bar-Ilan University.
Pseudocode Yes This section provides the pseudocode for our Soft Tree Max implementation. Algorithm 1 details the C-Soft Tree Max policy computation, which efficiently utilizes GPU parallelization to perform tree expansion. Algorithm 2 shows how Soft Tree Max integrates with the PPO algorithm, distinguishing the usage of our new policy in red.
Open Source Code Yes The code for our implementation is available at https://github.com/NVlabs/Soft Tree Max. We provide a docker file for setting up the environment and a README file with instructions on how to run both training and evaluation.
Open Datasets Yes We conduct our experiments on multiple games from the Atari simulation suite (Bellemare et al., 2013).
Dataset Splits No The paper mentions using the Atari simulation suite but does not specify how game data or trajectories are split into training, test, or validation sets for model evaluation or training. It describes training agents within the environment rather than using a pre-split dataset.
Hardware Specification Yes We use Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz equipped with one NVIDIA Tesla V100 32GB.
Software Dependencies Yes The environment engine is the highly efficient Atari-Cu LE (Dalton et al., 2020), a CUDA-based version of Atari that runs on GPU. ... We extend Stable-Baselines3 (Raffin et al., 2019)...
Experiment Setup Yes We train Soft Tree Max for depths d = 1 . . . 8, with a single worker. We use five seeds for each experiment. ... For depths d >= 3, we limited the tree to a maximum width of 1024 nodes and pruned non-promising trajectories with low estimated weights. ... we ran all experiments for one week on the same machine. ... In Algorithm 2, we use Generalized Advantage Estimation (GAE) with lambda = 0.95 for calculating advantage estimates...