Real2Code: Reconstruct Articulated Objects via Code Generation

Authors: Mandi Zhao, Yijia Weng, Dominik Bauer, Shuran Song

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Real2Code significantly outperforms the previous state-of-the-art in terms of reconstruction accuracy, and is the first approach to extrapolate beyond objects structural complexity in the training set, as we show for objects with up to 10 articulated parts.
Researcher Affiliation Academia Zhao Mandi1, Yijia Weng1, Dominik Bauer2, Shuran Song1 1 Stanford University 2 Columbia University
Pseudocode No The paper describes the methodology in prose and through diagrams (e.g., Figure 2 for pipeline overview, Figure 4 for articulation prediction as code) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Project Website: https://real2code.github.io/
Open Datasets Yes For a more systematic evaluation, we validate the performance of Real2Code on the well-established Part Net-Mobility dataset (Mo et al., 2019), using an extensive test set of unseen objects that contain various numbers of articulated parts.
Dataset Splits Yes The same split of 467 train and 35 test objects are used to construct our image segmentation, shape completion, and code datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions various software components and models like Blender (Community, 2018), MuJoCo (Todorov et al., 2012), Kaolin (Fuji Tsang et al., 2022), SAM (Kirillov et al., 2023), Code Llama (Rozi ere et al., 2023), and DUSt3R (Wang et al., 2023b). However, it does not provide specific version numbers for underlying programming languages or libraries (e.g., Python, PyTorch, CUDA) or for the tools themselves in a clear software version format (e.g., Blender 2.x, MuJoCo 2.x).
Experiment Setup Yes The fine-tuning data consists of 28,020 RGB images... Each fine-tuning batch contains 24 RGB images; for every RGB image in the batch, we sample 16 prompt points uniformly across each image s ground-truth masks... we update the model with a weighted average of Focal Loss (Lin et al., 2018), Dice Loss (Sudre et al., 2017) and MSE Io U prediction loss. ... For a training batch of size B, we sample B point clouds of size 2048, and sample B 12,000 query points on the label occupancy grids... sampling 25% occupied works the best for balancing between occupied areas and empty space... We use Lo RA (Hu et al., 2021), a low-rank weight fine-tuning technique, to fine-tune the model with the next-token prediction loss.