reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Self-Correcting Robot Manipulation via Gaussian-Splatted Foresight

Authors: Shaohui Pan, Yong Xu, Ruotao Xu, Zihan Zhou, Si Wu, Zhuliang Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on ten RLBench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms stateof-the-art methods by 12.0% success rate on average.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China 2Institute for Super Robotics (Huangpu), Guangzhou, 510000, China 3Guangdong Provincial Key Laboratory of Multimodal Big Data Intelligent Analysis, Guangzhou, 510000, China 4Peng Cheng Laboratory of Shenzhen, Shenzhen, 518120, China 5Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot, 510000, China 6School of College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510640, China 7Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, 510006, China 8School of Automation Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Pseudocode	Yes	Algorithm 1: Self-correction Algorithm
Open Source Code	No	The paper does not explicitly state that the authors are releasing their own code or provide a direct link to a code repository for the methodology described.
Open Datasets	Yes	Following the GNFActor (Ze et al. 2023), we collected 20 episodes of demonstrations for each of 10 challenging language-conditioned manipulation tasks in the dataset collected Per Act (Shridhar, Manuelli, and Fox 2023b), including 166 variations in object properties and scene arrangements. To further validate the performance of the proposed method, we collected 20 episodes of demonstrations for each of 6 tasks from the Hive Former (Guhur et al. 2023b) dataset, all of which are commonly encountered in real-life scenarios.
Dataset Splits	Yes	Following the GNFActor (Ze et al. 2023), we collected 20 episodes of demonstrations for each of 10 challenging language-conditioned manipulation tasks in the dataset collected Per Act (Shridhar, Manuelli, and Fox 2023b), including 166 variations in object properties and scene arrangements. To further validate the performance of the proposed method, we collected 20 episodes of demonstrations for each of 6 tasks from the Hive Former (Guhur et al. 2023b) dataset... We evaluate 25 episodes per task at the final checkpoint utilizing 3 random seeds across 10 challenging tasks.
Hardware Specification	Yes	Our experiments were conducted on the Py Torch deep learning platform utilizing two A800 GPUs.
Software Dependencies	No	The paper mentions "Py Torch deep learning platform" and "LAMB optimizer" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The hyper-parameters for the experiment are configured as follows: the physical workspace spans 1m3 with a voxel resolution of 100, the number of points in the scanned pointcloud is set to 16384, the number of Gaussian primitives N is set as 16384, the Gaussian noise ϵ is sampled from a normal Gaussian distribution scaled by 0.001, and the predefined threshold τ is set to 0.40. We employ SE(3) augmentation (Shridhar, Manuelli, and Fox 2023b; Ze et al. 2023; Lu et al. 2025) for expert demonstrations in the training set. All comparative methods were trained on Per Act s dataset for 300000 iterations, while the Hive Former dataset for 100000 iterations, both with a batch size of 2. The LAMB optimizer (You et al. 2019) was used with an initial learning rate of ( 5 10 4 ) with a cosine scheduler. During traning , the model is trained with 3000 iterations in the first phase, i.e.with only action loss ℓaction, and then with the remained number of iterations in the second phase.