reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Authors: Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using the soft actor-critic as a base algorithm, Simba V2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at dojeon-ai.github.io/Simba V2. 5. Experiments We now present a series of experiments designed to evaluate Simba V2. Our investigation centers on four main setups: Optinmization Analysis (Section 5.2). Investigate whether Simba V2 stabilizes the optimization process. Scaling Analysis (Section 5.3). Investigate whether Simba V2 allows scaling model capacity and computation. Comparisons (Sections 5.4). Compare Simba V2 against state-of-the-art RL algorithms. Design Study (Section 5.5.) Conducts ablation studies on individual architectural components of Simba V2.
Researcher Affiliation	Collaboration	1KAIST 2Sony AI 3UT Austin. Correspondence to: Hojoon Lee <EMAIL>.
Pseudocode	Yes	Listings 1, 2 and 3 provide the Google JAX implementation of scaling vector (Section 4.4), input embedding (Section 4.1), and MLP block (Section 4.2), respectively. Listing 1. A JAX implementation of Scaler (Section 4.4) Listing 2. A JAX implementation of Input Embedding (Section 4.1). Listing 3. A JAX implementation of MLP block (Section 4.2).
Open Source Code	Yes	The code is available at dojeon-ai.github.io/Simba V2.
Open Datasets	Yes	We evaluated Simba V2 on four standard online RL benchmarks: Mu Jo Co (Todorov et al., 2012), DMC Suite (Tassa et al., 2018), Myo Suite (Caggiano et al., 2022), and Humanoid Bench (Sferrazza et al., 2024); as well as the D4RL Mu Jo Co benchmark (Fu et al., 2020) for offline RL.
Dataset Splits	Yes	Results are averaged over 57 continuous control tasks from Mu Jo Co, DMC, Myo Suite, and Humanoid Bench, each trained on 1 million samples. For offline RL, we simply add a behavioral cloning loss during training with using identical configurations to the online RL. Despite minimal changes, Simba V2 performs competitively with existing baselines (Appendix D).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It discusses 'compute' generally but lacks specific models or configurations.
Software Dependencies	No	Appendix B provides JAX implementations of components, but does not specify a version for JAX. Appendix C mentions 'Adam' as an optimizer but does not specify a version for Adam or any other software dependencies.
Experiment Setup	Yes	For all experiments, we use consistent hyperparameters across benchmarks. The default settings are listed in Table 3. Table 3. Hyperparameters Table. The hyperparameters listed below are used consistently across all tasks using Simba V2, unless stated otherwise. For the discount factor γ, we set it automatically using heuristics used by TD-MPC2 (Hansen et al., 2023). Input Shift constant cshift 3.0 Output Number of return bins natoms 101 ... (numerous other hyperparameters listed in Appendix C)