reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

Authors: Eliot Xing, Vernon Luk, Jean Oh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We re-implement challenging manipulation and locomotion tasks in Rewarped, and show that SAPO outperforms baselines over a range of tasks that involve interaction between rigid bodies, articulations, and deformables. [...] We evaluate our proposed maximum entropy FO-MBRL algorithm, Soft Analytic Policy Optimization (SAPO, Section 4), against baselines on a range of locomotion and manipulation tasks involving rigid and soft bodies. [...] In Figure 2, we visualize training curves to compare algorithms. SAPO shows better training stability across different random seeds, against existing FO-MBRL algorithms APG and SHAC. In Table 2, we report evaluation performance for final policies after training.
Researcher Affiliation	Academia	Eliot Xing & Vernon Luk & Jean Oh Carnegie Mellon University EMAIL
Pseudocode	Yes	Pseudocode for SAPO is shown in Appendix B.2, and the computational graph of SAPO is illustrated in Appendix Figure 4. [...] Algorithm 1: Soft Analytic Policy Optimization (SAPO)
Open Source Code	No	Additional details at rewarped.github.io. (This link is to a project website and does not explicitly state it contains source code, nor is it a direct link to a code repository.)
Open Datasets	No	The paper describes reimplemented tasks (e.g., "Ant Run Ant locomotion task from DFlex", "Rolling Flat Rolling pin manipulation task from Plasticine Lab"), which are simulation environments or benchmarks, not publicly available datasets with specific access information (links, DOIs, or citations with authors/year) in the main text.
Dataset Splits	No	The paper conducts experiments in simulated environments for reinforcement learning, which typically do not involve static training/testing/validation dataset splits in the same way as supervised learning. The text mentions "Mean and 95% CIs over 10 random seeds with 2N episodes per seed for N = 32 or 64 parallel envs," which refers to experimental repetitions and parallel execution, not dataset splitting.
Hardware Specification	Yes	We run all algorithms on consumer workstations with NVIDIA RTX 4090 GPUs. Each run uses a single GPU, on which we run both the GPU-accelerated parallel simulation and optimization loop. [...] We report all timings on a consumer workstation with an AMD Threadripper 5955WX CPU, NVIDIA RTX 4090 GPU, and 128GB DDR4 3200MHz RAM.
Software Dependencies	No	We build Rewarped on NVIDIA Warp (Macklin, 2022) [...] We use a custom Py Torch autograd function to interface simulation data and model parameters between Warp and Py Torch [...]. (The paper mentions NVIDIA Warp and PyTorch but does not provide specific version numbers for these software components, which is required for a reproducible description.)
Experiment Setup	Yes	Implementation details (network architecture, common hyperparameters, etc.) are standardized between methods for fair comparison, see Appendix C. [...] Appendix C: HYPERPARAMETERS. Table 4: Shared hyperparameters. Algorithms use hyperparameter settings in the shared column unless otherwise specified in an individual column. (This table lists specific values for Num envs, Batch size, Horizon, Mini-epochs, Discount, TD/GAE lambda, learning rates, optimizer types, beta values, gradient clip, norm type, activation type, actor sigma, num critics, critic tau, replay buffer size, target entropy, and initial temperature.)