reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

Authors: Yijie Guo, Bingjie Tang, Iretiayo Akinola, Dieter Fox, Abhishek Gupta, Yashraj Narang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that SRSA significantly outperforms the leading baseline. When retrieving and fine-tuning skills on unseen tasks, SRSA achieves a 19% relative improvement in success rate, exhibits 2.6x lower standard deviation across random seeds, and requires 2.4x fewer transition samples to reach a satisfactory success rate, compared to the baseline. In a continual learning setup, SRSA efficiently learns policies for new tasks and incorporates them into the skill library, enhancing future policy learning. Furthermore, policies trained with SRSA in simulation achieve a 90% mean success rate when deployed in the real world. Please visit our project webpage https://srsa2024.github.io/.
Researcher Affiliation	Collaboration	Yijie Guo1, Bingjie Tang2, Iretiayo Akinola1, Dieter Fox1,3, Abhishek Gupta1,3 & Yashraj Narang1 1NVIDIA Corporation, 2University of Southern California, 3University of Washington
Pseudocode	Yes	Algorithm 1 Policy Fine-tuning with Self-imitation Learning Algorithm 2 Continual Learning with Skill Library Expansion
Open Source Code	No	The paper provides a project webpage link (https://srsa2024.github.io/) but does not explicitly state that the source code for the methodology is available there, nor does it provide a direct link to a code repository. The text states "Please visit our project webpage", which is a high-level overview page rather than a specific code repository.
Open Datasets	Yes	We investigate these questions on the Auto Mate benchmark (Tang et al., 2024), which consists of 100 two-part assembly tasks with diverse parts, enabling us to study challenging contact-rich assembly tasks in simulation and the real world.
Dataset Splits	Yes	Given the 100 tasks in the Auto Mate benchmark, we split the task set into 90 prior tasks (to build the skill library) and 10 test tasks (as the new tasks to solve). For both SRSA and baseline methods, we train the retrieval model with three random seeds and report the average and standard deviation of transfer success across these seeds. Fig. 4 shows the result on the set of test tasks. In Appendix A.5, we show additional comparisons for other splits of prior and test task sets.
Hardware Specification	No	The paper mentions using a Franka robot for real-world experiments and simulation environments, but it does not specify the computing hardware (e.g., GPU models, CPU types, memory) used for training the models.
Software Dependencies	No	The paper mentions several algorithms and tools like PPO (Schulman et al., 2017), Adam optimizer, Self-imitation Learning (Oh et al., 2018), and Point Net (Qi et al., 2017), but it does not specify version numbers for any software libraries or dependencies. Table 1 lists hyperparameters but not software versions.
Experiment Setup	Yes	Hyperparameters Value Policy Network Architecture [256, 128, 64] Value Function Architecture [256, 128, 64] LSTM network size 256 Horizon length (T) 32 Adam learning rate 1e-4 Discount factor (γ) 0.99 GAE parameter (λ) 0.95 Entropy coefficient 0.0 Critic coefficient 2 Minibatch size 8192 Minibatch epochs 8 Clipping parameter ( ) 0.2 LSTM network size 256 SIL update per iteration 1 SIL batch size 8192 SIL loss weight 1 SIL value loss weight (β) 0.01 Replay buffer size 105 Exponent for prioritization 0.6