Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Authors: Patrick Yin, Tyler Westenbroek, Ching-An Cheng, Andrey Kolobov, Abhishek Gupta
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this approach across five real-world dexterous manipulation tasks where zero-shot sim-to-real transfer fails. We further demonstrate our framework substantially outperforms baseline fine-tuning methods, requiring up to an order of magnitude fewer real-world samples and succeeding at difficult tasks where prior approaches fail entirely. |
| Researcher Affiliation | Collaboration | 1. University of Washington 2. Microsoft Research |
| Pseudocode | Yes | Algorithm 1 Simulation-Guided Fine-tuning ( SGFT) ... Algorithm 2 Dyna-SGFT ... Algorithm 3 MPC-SGFT |
| Open Source Code | Yes | Project webpage: weirdlabuw.github.io/sgft |
| Open Datasets | No | The paper describes custom real-world manipulation tasks (hammering, insertion, pushing) and refers to the 'Furniturebench' task, but does not provide specific access information (links, DOIs, repositories, or formal citations) for the datasets *used in their experiments* or state that *their collected data* is publicly available. |
| Dataset Splits | No | The paper mentions pre-collecting "20 real-world trajectories" for fine-tuning but does not specify formal training, validation, or test splits for the experimental evaluation. While it references standard sim-to-sim setups, it doesn't provide specific split information for its primary real-world experiments. |
| Hardware Specification | No | The paper describes the robotic setup: "We use a 7-Do F Franka FR3 robot with a 1-Do F parallel-jaw gripper. Two calibrated Intel Realsense D455 cameras are mounted across from the robot... Commands are sent to the controller at 5Hz." However, it does not specify the computing hardware (e.g., GPU models, CPU types, memory) used for training the models. |
| Software Dependencies | No | The paper mentions using specific algorithms and optimizers like "SAC", "TDMPC-2", "IQL", "RLPD", and "Adam optimizer". However, it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | We train SAC with autotuned temperature set initially to 1 and a UTD of 1. We use Adam optimizer with a learning rate of 3 10 4, batch size of 256, and discount factor γ = .99. ... For hammering and puck pushing, we collect 25,000,000 transitions of random actions... We normalize our observations by the pre-computed mean and standard deviation... We additionally add Gaussian noise centered at 0 with standard deviation 0.004 to our observations with 30% probability during training. ... We pre-collect 20 real-world trajectories with the policy learned in simulation to fill the empty replay buffer. We then reset the critic with random weights and continue training SAC with a fixed temperature of α = 0.01 and with a UTD of 2d with the pretrained actor and dynamics model. |