Real2Code: Reconstruct Articulated Objects via Code Generation
Authors: Mandi Zhao, Yijia Weng, Dominik Bauer, Shuran Song
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Real2Code significantly outperforms the previous state-of-the-art in terms of reconstruction accuracy, and is the first approach to extrapolate beyond objects structural complexity in the training set, as we show for objects with up to 10 articulated parts. |
| Researcher Affiliation | Academia | Zhao Mandi1, Yijia Weng1, Dominik Bauer2, Shuran Song1 1 Stanford University 2 Columbia University |
| Pseudocode | No | The paper describes the methodology in prose and through diagrams (e.g., Figure 2 for pipeline overview, Figure 4 for articulation prediction as code) but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Project Website: https://real2code.github.io/ |
| Open Datasets | Yes | For a more systematic evaluation, we validate the performance of Real2Code on the well-established Part Net-Mobility dataset (Mo et al., 2019), using an extensive test set of unseen objects that contain various numbers of articulated parts. |
| Dataset Splits | Yes | The same split of 467 train and 35 test objects are used to construct our image segmentation, shape completion, and code datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions various software components and models like Blender (Community, 2018), MuJoCo (Todorov et al., 2012), Kaolin (Fuji Tsang et al., 2022), SAM (Kirillov et al., 2023), Code Llama (Rozi ere et al., 2023), and DUSt3R (Wang et al., 2023b). However, it does not provide specific version numbers for underlying programming languages or libraries (e.g., Python, PyTorch, CUDA) or for the tools themselves in a clear software version format (e.g., Blender 2.x, MuJoCo 2.x). |
| Experiment Setup | Yes | The fine-tuning data consists of 28,020 RGB images... Each fine-tuning batch contains 24 RGB images; for every RGB image in the batch, we sample 16 prompt points uniformly across each image s ground-truth masks... we update the model with a weighted average of Focal Loss (Lin et al., 2018), Dice Loss (Sudre et al., 2017) and MSE Io U prediction loss. ... For a training batch of size B, we sample B point clouds of size 2048, and sample B 12,000 query points on the label occupancy grids... sampling 25% occupied works the best for balancing between occupied areas and empty space... We use Lo RA (Hu et al., 2021), a low-rank weight fine-tuning technique, to fine-tune the model with the next-token prediction loss. |