reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

Authors: Yan Shen, Ruihai Wu, Yubin Ke, Xinyuan Song, Zeyi Li, Xiaoqi Li, Hongwei Fan, Haoran Lu, Hao Dong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of our approach over both previous affordance-based and imitation-based methods. Project page: https://sites.google.com/view/biassembly/. ... Extensive experiments on diverse categories demonstrate the superiority of our method both quantitatively and qualitatively.
Researcher Affiliation	Academia	1CFCS, School of Computer Science, Peking University 2PKU-Agibot Lab. Correspondence to: Hao Dong <EMAIL>.
Pseudocode	No	The paper describes its methodology in prose and mathematical formulations across sections 3 and 4, and further details in Appendix C. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions a project page: "Project page: https://sites.google.com/view/biassembly/". However, it does not explicitly state that source code for the described methodology is provided on this page, nor does it give a direct link to a code repository.
Open Datasets	Yes	We utilize the Breaking Bad Dataset (Sell an et al., 2022), which models the natural destruction of objects into fragments. ... To address this, we further introduce a real-world benchmark featuring globally available objects with reproducible broken parts, along with their corresponding 3D meshes, which can be integrated into simulation environment.
Dataset Splits	Yes	We randomly select 10 out of the 15 categories for training, reserving the remaining 5 categories exclusively for testing. Within the 10 training categories, 60% of the shapes are randomly chosen for the training set, while the remaining 40% serve as a test set to assess the models performance on novel instances within the training categories (shape-level). For the reserved 5 categories, all shapes are included in the test set to evaluate the methods generalization capabilities on unseen categories (category-level). In summary, the training set consists of 10 categories, totaling 237 shapes and 6,403 pairs of fragments. The shape-level test set includes 10 categories, comprising 131 shapes and 3,638 pairs of fragments. The category-level test set encompasses 5 categories, containing 77 shapes and 1,779 pairs of fragments. Detailed statistics for each category can be found in Table 2.
Hardware Specification	Yes	Using a single NVIDIA V100 GPU, the total training time for our model is approximately 48 hours: the combination of the Disassembly Predictor and Transformation Predictor converges in about 20 hours, while the Bi Affordance Predictor converges in about 48 hours.
Software Dependencies	No	The paper mentions various tools and libraries such as SAPIEN (Xiang et al., 2020), COLMAP (Sch onberger & Frahm, 2016), Grounded SAM 2 (Ren et al., 2024a), Depth Anything V2 (Yang et al., 2024), SDFStudio (Yu et al., 2022), Blender (Community, 2018), Robot Operating System (ROS) (Quigley et al., 2009), frankapy library (Zhang et al., 2020), and pyk4a library (pyk4a, 2019). However, it does not provide specific version numbers for any of these software components as used in their experiments.
Experiment Setup	No	The paper does not provide specific details such as learning rates, batch sizes, number of epochs, or optimizer settings for the experimental setup. It only states, "For each method, we provide 7,000 positive and 7,000 negative samples" which relates to dataset usage, not training hyperparameters.