reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Authors: Patrick Yin, Tyler Westenbroek, Ching-An Cheng, Andrey Kolobov, Abhishek Gupta

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this approach across five real-world dexterous manipulation tasks where zero-shot sim-to-real transfer fails. We further demonstrate our framework substantially outperforms baseline fine-tuning methods, requiring up to an order of magnitude fewer real-world samples and succeeding at difficult tasks where prior approaches fail entirely.
Researcher Affiliation	Collaboration	1. University of Washington 2. Microsoft Research
Pseudocode	Yes	Algorithm 1 Simulation-Guided Fine-tuning ( SGFT) ... Algorithm 2 Dyna-SGFT ... Algorithm 3 MPC-SGFT
Open Source Code	Yes	Project webpage: weirdlabuw.github.io/sgft
Open Datasets	No	The paper describes custom real-world manipulation tasks (hammering, insertion, pushing) and refers to the 'Furniturebench' task, but does not provide specific access information (links, DOIs, repositories, or formal citations) for the datasets used in their experiments or state that their collected data is publicly available.
Dataset Splits	No	The paper mentions pre-collecting "20 real-world trajectories" for fine-tuning but does not specify formal training, validation, or test splits for the experimental evaluation. While it references standard sim-to-sim setups, it doesn't provide specific split information for its primary real-world experiments.
Hardware Specification	No	The paper describes the robotic setup: "We use a 7-Do F Franka FR3 robot with a 1-Do F parallel-jaw gripper. Two calibrated Intel Realsense D455 cameras are mounted across from the robot... Commands are sent to the controller at 5Hz." However, it does not specify the computing hardware (e.g., GPU models, CPU types, memory) used for training the models.
Software Dependencies	No	The paper mentions using specific algorithms and optimizers like "SAC", "TDMPC-2", "IQL", "RLPD", and "Adam optimizer". However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	We train SAC with autotuned temperature set initially to 1 and a UTD of 1. We use Adam optimizer with a learning rate of 3 10 4, batch size of 256, and discount factor γ = .99. ... For hammering and puck pushing, we collect 25,000,000 transitions of random actions... We normalize our observations by the pre-computed mean and standard deviation... We additionally add Gaussian noise centered at 0 with standard deviation 0.004 to our observations with 30% probability during training. ... We pre-collect 20 real-world trajectories with the policy learned in simulation to fill the empty replay buffer. We then reset the critic with random weights and continue training SAC with a fixed temperature of α = 0.01 and with a UTD of 2d with the pretrained actor and dynamics model.