reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Real2Code: Reconstruct Articulated Objects via Code Generation

Authors: Mandi Zhao, Yijia Weng, Dominik Bauer, Shuran Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Real2Code significantly outperforms the previous state-of-the-art in terms of reconstruction accuracy, and is the first approach to extrapolate beyond objects structural complexity in the training set, as we show for objects with up to 10 articulated parts.
Researcher Affiliation	Academia	Zhao Mandi1, Yijia Weng1, Dominik Bauer2, Shuran Song1 1 Stanford University 2 Columbia University
Pseudocode	No	The paper describes the methodology in prose and through diagrams (e.g., Figure 2 for pipeline overview, Figure 4 for articulation prediction as code) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1Project Website: https://real2code.github.io/
Open Datasets	Yes	For a more systematic evaluation, we validate the performance of Real2Code on the well-established Part Net-Mobility dataset (Mo et al., 2019), using an extensive test set of unseen objects that contain various numbers of articulated parts.
Dataset Splits	Yes	The same split of 467 train and 35 test objects are used to construct our image segmentation, shape completion, and code datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions various software components and models like Blender (Community, 2018), MuJoCo (Todorov et al., 2012), Kaolin (Fuji Tsang et al., 2022), SAM (Kirillov et al., 2023), Code Llama (Rozi ere et al., 2023), and DUSt3R (Wang et al., 2023b). However, it does not provide specific version numbers for underlying programming languages or libraries (e.g., Python, PyTorch, CUDA) or for the tools themselves in a clear software version format (e.g., Blender 2.x, MuJoCo 2.x).
Experiment Setup	Yes	The fine-tuning data consists of 28,020 RGB images... Each fine-tuning batch contains 24 RGB images; for every RGB image in the batch, we sample 16 prompt points uniformly across each image s ground-truth masks... we update the model with a weighted average of Focal Loss (Lin et al., 2018), Dice Loss (Sudre et al., 2017) and MSE Io U prediction loss. ... For a training batch of size B, we sample B point clouds of size 2048, and sample B 12,000 query points on the label occupancy grids... sampling 25% occupied works the best for balancing between occupied areas and empty space... We use Lo RA (Hu et al., 2021), a low-rank weight fine-tuning technique, to fine-tune the model with the next-token prediction loss.