reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards General-Purpose Model-Free Reinforcement Learning

Authors: Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.
Researcher Affiliation	Industry	Scott Fujimoto, Pierluca D Oro, Amy Zhang, Yuandong Tian, Michael Rabbat Meta FAIR Correspondence: EMAIL.
Pseudocode	Yes	4.2 ALGORITHM We now present the details of MR.Q (Model-based Representations for Q-learning). ...Given the transition (s,a,r,d,s ) from the replay buffer: Output MR.Q Trained end-to-end State Encoder zs = fω(s) State-Action Encoder zsa = gω(zs,a) MDP predictor zs , r, d = z sam Decoupled RL Value Qi = Qθ(zsa) Policy aπ = πϕ(zs) Update MR.Q if t % Ttarget = 0 then Target networks: θ ,ϕ ,ω θ,ϕ,ω. Reward scaling: r r, r mean Dr. for Ttarget time steps do Encoder update: Equation 14. Value update: Equation 19. Policy update: Equation 20.
Open Source Code	Yes	Code: https://github.com/facebookresearch/MRQ.
Open Datasets	Yes	We evaluate MR.Q on four widely used RL benchmarks and 118 environments... Gym Locomotion. This subset of the Gym benchmark (Brockman et al., 2016; Towers et al., 2024)... DMC Proprioceptive. The Deep Mind Control suite (DMC) (Tassa et al., 2018)... Atari. The Atari benchmark is built on the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits	Yes	Evaluations are based on the average performance over 10 episodes, measured every 5k time steps for Gym and DM control and every 100k time steps for Atari. Gym Locomotion...Agents are trained for 1M time steps... DMC Proprioceptive...Agents are trained for 500k time steps, equivalent to 1M frames... Atari...Agents are trained for 2.5M time steps (equivalent to 10M frames)...
Hardware Specification	No	No specific hardware details (like GPU/CPU models or processor types) are mentioned in the paper.
Software Dependencies	Yes	B.5 SOFTWARE VERSIONS Gymnasium 0.29.1 (Towers et al., 2024) Mu Jo Co 3.2.2 (Todorov et al., 2012) Num Py 2.1.1 (Harris et al., 2020) Python 3.11.8 (Van Rossum & Drake Jr, 1995) Py Torch 2.4.1 (Paszke et al., 2019)
Experiment Setup	Yes	Table 1: Hyperparameter differences between Rainbow (Hessel et al., 2018) and TD3 (Fujimoto et al., 2018). ... Table 3: MR.Q Hyperparameters. Hyperparameters values are kept fixed across all benchmarks.