Gradient Boosting Reinforcement Learning

Authors: Benjamin Fuhrer, Chen Tessler, Gal Dalal

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that GBRL outperforms NNs in domains with structured observations and categorical features while maintaining competitive performance on standard continuous control benchmarks. Like its supervised learning counterpart, GBRL demonstrates superior robustness to out-of-distribution samples and better handles irregular state-action relationships.Our experiments aim to answer two core questions: 1. GBT as an RL function approximator: Can GBT-based AC algorithms effectively solve complex high-dimensional RL tasks? And how do they compare with NNs? 2. Advantages of GBT: Building on the GBT s success in irregular and tabular data, does its inductive bias offer similar robustness benefits in RL specifically for out-of-distribution states, noisy inputs, and spurious correlations?
Researcher Affiliation Industry 1NVIDIA, Tel-Aviv, Israel 2NVIDIA Research, Tel Aviv, Israel. Correspondence to: Benjamin Fuhrer <EMAIL>, Chen Tessler <EMAIL>, Gal Dalal <EMAIL>.
Pseudocode No The paper describes the GBT framework and its application in RL using mathematical formulations and descriptive text (e.g., equations 1-5 and the text in Section 4 and 4.1), but it does not present any explicit pseudocode or algorithm blocks.
Open Source Code Yes 1The GBRL core library is available at https:// github.com/NVlabs/gbrl. 2Actor-Critic implementations integrated within Stablebaselines3, are available at https://github.com/NVlabs/ gbrl sb3.
Open Datasets Yes First, we tested classic RL tasks using Classic-Control and Box2D environments from Gymnasium (Towers et al., 2024). ...the Football domain (Kurach et al., 2020)... and the Atari RAM domain (Bellemare et al., 2013). Finally, we assessed performance on categorical environments, specifically targeting the Mini Grid domain (Chevalier-Boisvert et al., 2023).
Dataset Splits No The paper mentions using standard RL environments (Gymnasium, Football, Atari RAM, Mini Grid) and a custom Variable Isolation Environment. For these environments, training and evaluation are conducted dynamically, but the paper does not specify fixed training/validation/test dataset splits with percentages or sample counts in the manner of supervised learning datasets. Evaluation is typically done by averaging performance over a number of episodes (e.g., 'averaged across the last 100 episodes').
Hardware Specification Yes Our training setup consists of a single NVIDIA V100 GPU. All experiments were done on the NVIDIA NGC platform on a single NVIDIA V100-32GB GPU per experiment.
Software Dependencies Yes We provide a CUDA-accelerated (NVIDIA, 2025) implementation... By leveraging optimization frameworks for gradient computation, such as Py Torch (Paszke et al., 2019), GBRL can be integrated with most Actor-Critic algorithms and implemented within existing RL libraries... 1The GBRL core library is available at https:// github.com/NVlabs/gbrl. 2Actor-Critic implementations integrated within Stablebaselines3 (Raffin et al., 2021) ... NVIDIA, 2025. CUDA Toolkit Documentation, 2025. URL https://developer.nvidia.com/cudatoolkit. Version 12.1.
Experiment Setup Yes For our experiments, we implemented a GBT-based version of PPO within Stable Baselines3 (Raffin et al., 2021). Where available, we use standard hyperparameters, environment-specific, and normalization wrappers according to RL Baselines3 Zoo (Raffin, 2020); otherwise, we optimize the hyperparameters for specific environments. For each experiment, we report the aggregated performance across five random seeds. We refer the reader to the supplementary material for additional technical details, such as hyperparameters, implementation, and environment details (Appendix C)... Table 2 lists GBRL hyperparameters for all experiments.