reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Value-Based Deep RL Scales Predictably

Authors: Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Victor Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach using three algorithms: SAC, BRO, and PQL on Deep Mind Control, Open AI gym, and Isaac Gym, when extrapolating to higher levels of data, compute, budget, or performance. [...] We run several experiments and estimate scaling trends from the results. [...] Experimental Details
Researcher Affiliation	Academia	1UC Berkeley 2University of Warsaw 3CMU. Correspondence to: Oleh Rybkin <EMAIL>, Aviral Kumar <EMAIL>. [...] Pieter Abbeel holds concurrent appointments as a Professor at UC Berkeley and as an Amazon Scholar. This work was done at UC Berkeley and CMU, and is not associated with Amazon.
Pseudocode	No	The paper describes methods and equations but does not contain any clearly labeled pseudocode or algorithm blocks formatted as such.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets	Yes	Our findings apply to algorithms such as SAC, BRO, and PQL, and domains such as the Deep Mind Control Suite (DMC), Open AI Gym, and Isaac Gym. [...] On Open AI Gym (Brockman et al., 2016), we use Soft Actor Critic, a commonly used TDlearning algorithm (Haarnoja et al., 2018). We use DMC (Tassa et al., 2018), where, we utilize the state-of-the-art Bigger, Regularized, Optimistic (BRO) algorithm (Nauman et al., 2024b). [...] Finally, we test our approach with more data on Isaac Gym (Makoviychuk et al., 2021), where we use the Parallel Q-Learning (PQL) algorithm (Li et al., 2023b).
Dataset Splits	No	The paper focuses on reinforcement learning where agents collect data by interacting with environments (Deep Mind Control, Open AI Gym, Isaac Gym) rather than using predefined, static dataset splits for training, validation, and testing as in supervised learning. Therefore, explicit dataset splits are not applicable or provided in the conventional sense.
Hardware Specification	No	The paper mentions "compute support from the Berkeley Research Compute, Polish high-performance computing infrastructure, PLGrid (HPC Center: ACK Cyfronet AGH)", but it does not specify any exact GPU/CPU models, processor types, or detailed computer specifications used for running its experiments.
Software Dependencies	No	The paper mentions using specific algorithms and frameworks like "Soft Actor Critic", "BRO", "PQL", and the "Sci Py package" for analysis. However, it does not provide specific version numbers for these software components or other ancillary software dependencies, which would be necessary for reproduction.
Experiment Setup	Yes	To understand relationships between batch size B, learning rate η, and the UTD ratio σ, we ran an extensive grid search. [...] We first run a sweep on 5 values of η, then a grid of runs with 4 values of σ and 3 values of B, and then use hyperparameter fits to run 2 more value of σ with 8 seeds per task. [...] We first run 5 values of B, 4 values of η, and 4 σ; and then use hyperparameters fits to run 2 more values of σ, with 10 seeds per task. [...] We first run 4 values of σ, 3 values of η, as well as 5 values of B, with 5 seeds per task, after which we run a second round of grid search with 7 values of σ. Further details are in Appendices B and D and Table 3. Table 3: Tested configurations (lists specific values for Updates-to-data σ, Batch size B, Learning rate η for different domains).