Self-Correcting Robot Manipulation via Gaussian-Splatted Foresight

Authors: Shaohui Pan, Yong Xu, Ruotao Xu, Zihan Zhou, Si Wu, Zhuliang Yu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on ten RLBench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms stateof-the-art methods by 12.0% success rate on average.
Researcher Affiliation Academia 1School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China 2Institute for Super Robotics (Huangpu), Guangzhou, 510000, China 3Guangdong Provincial Key Laboratory of Multimodal Big Data Intelligent Analysis, Guangzhou, 510000, China 4Peng Cheng Laboratory of Shenzhen, Shenzhen, 518120, China 5Key Laboratory of Large-Model Embodied-Intelligent Humanoid Robot, 510000, China 6School of College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510640, China 7Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, 510006, China 8School of Automation Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Pseudocode Yes Algorithm 1: Self-correction Algorithm
Open Source Code No The paper does not explicitly state that the authors are releasing their own code or provide a direct link to a code repository for the methodology described.
Open Datasets Yes Following the GNFActor (Ze et al. 2023), we collected 20 episodes of demonstrations for each of 10 challenging language-conditioned manipulation tasks in the dataset collected Per Act (Shridhar, Manuelli, and Fox 2023b), including 166 variations in object properties and scene arrangements. To further validate the performance of the proposed method, we collected 20 episodes of demonstrations for each of 6 tasks from the Hive Former (Guhur et al. 2023b) dataset, all of which are commonly encountered in real-life scenarios.
Dataset Splits Yes Following the GNFActor (Ze et al. 2023), we collected 20 episodes of demonstrations for each of 10 challenging language-conditioned manipulation tasks in the dataset collected Per Act (Shridhar, Manuelli, and Fox 2023b), including 166 variations in object properties and scene arrangements. To further validate the performance of the proposed method, we collected 20 episodes of demonstrations for each of 6 tasks from the Hive Former (Guhur et al. 2023b) dataset... We evaluate 25 episodes per task at the final checkpoint utilizing 3 random seeds across 10 challenging tasks.
Hardware Specification Yes Our experiments were conducted on the Py Torch deep learning platform utilizing two A800 GPUs.
Software Dependencies No The paper mentions "Py Torch deep learning platform" and "LAMB optimizer" but does not provide specific version numbers for these software components.
Experiment Setup Yes The hyper-parameters for the experiment are configured as follows: the physical workspace spans 1m3 with a voxel resolution of 100, the number of points in the scanned pointcloud is set to 16384, the number of Gaussian primitives N is set as 16384, the Gaussian noise ϵ is sampled from a normal Gaussian distribution scaled by 0.001, and the predefined threshold τ is set to 0.40. We employ SE(3) augmentation (Shridhar, Manuelli, and Fox 2023b; Ze et al. 2023; Lu et al. 2025) for expert demonstrations in the training set. All comparative methods were trained on Per Act s dataset for 300000 iterations, while the Hive Former dataset for 100000 iterations, both with a batch size of 2. The LAMB optimizer (You et al. 2019) was used with an initial learning rate of ( 5 10 4 ) with a cosine scheduler. During traning , the model is trained with 3000 iterations in the first phase, i.e.with only action loss ℓaction, and then with the remained number of iterations in the second phase.