SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
Authors: Jiayi Liu, Denys Iliash, Angel Chang, Manolis Savva, Ali Mahdavi Amiri
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our method outperforms the state-of-the-art in articulated object creation by a large margin in terms of the generated object realism, resemblance to the input image, and reconstruction quality. [...] 4 EXPERIMENTS |
| Researcher Affiliation | Academia | Jiayi Liu1, Denys Iliash1, Angel X. Chang1,2, Manolis Savva1, Ali Mahdavi-Amiri1 1Simon Fraser University, 2Canada-CIFAR AI Chair, Amii |
| Pseudocode | No | The paper describes the pipeline and model architecture in text and diagrams (Figures 1, 2, 3, 6) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 3dlg-hcvc.github.io/singapo |
| Open Datasets | Yes | We collect data from Part Net-Mobility dataset (Xiang et al., 2020) to train our model across 7 categories [...] We also use 135 objects from the ACD dataset (Iliash et al., 2024) for additional evaluation in the zero-shot manner to test the generalization capability. |
| Dataset Splits | Yes | With several augmentation strategies applied, we end up with 3,063 objects paired with 20 images rendered at resting state for training, and additional 77 objects paired with 2 random views for testing. We also use 135 objects from the ACD dataset (Iliash et al., 2024) for additional evaluation in the zero-shot manner to test the generalization capability. In total, we have 55K training samples and 424 test samples in the experiments. |
| Hardware Specification | Yes | Our model is trained on 4 NVIDIA A40 GPUs for 23 hours. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer but does not specify version numbers for any key software components like Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | We train our model for 200 epochs after initializing CAGE pretrained weights with a batch size of 40 and each with 16 timesteps sampled from the diffusion process for each iteration. We train 1,000 diffusion steps in total. We use the Adam W Loshchilov (2017) optimizer with learning rate 5e 4 for ICA module and 5e 5 for the base model parameters, and the beta values are set to (0.9, 0.99). We schedule the learning rate with 3 epochs of warm-up from 1e 6 to the initial learning rate and then consine annealing to 1e 5. The network has 6 layers of attention blocks with 4 heads and 128 hidden units. |