IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Authors: Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation. |
| Researcher Affiliation | Collaboration | 1 The Chinese University of Hong Kong 2 Zhejiang University 3 Shanghai AI Laboratory 4 CPII under Inno HK |
| Pseudocode | No | The paper describes the methodology in prose and through diagrams (Figure 2, an overview of IDArb and its attention block) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project website: https://lizb6626.github.io/IDArb/. This is a project website which may link to code, but it is not a direct link to a source-code repository nor does it contain an explicit statement that code is released for the described methodology. |
| Open Datasets | Yes | To address these shortcomings, we develop a custom dataset, Arb-Objaverse. We select 68k 3D models from Objaverse (Deitke et al., 2022), and filter out low-quality and texture-less cases. For training, we further enhance the variability by combining this dataset with G-Objaverse and ABO. For synthetic data, we sample 441 objects from Arb Objaverse and G-Objaverse, selecting four viewpoints for each object. We conduct experiments on the real-world Open Illumination dataset (Liu et al., 2024a) and the synthetic Ne RFactor dataset (Zhang et al., 2021b). Additionally, we conduct experiments on standard benchmarks, MIT-Intrinsic (Grosse et al., 2009) and Stanford-ORB (Kuang et al., 2023). |
| Dataset Splits | No | For synthetic data, we sample 441 objects from Arb Objaverse and G-Objaverse, selecting four viewpoints for each object. During training, the number of input images N is randomly set to 3 or 1 per object. The paper describes aspects of dataset usage and evaluation sets but does not provide specific training/validation/test splits (e.g., percentages or exact counts) for its primary or combined datasets. |
| Hardware Specification | Yes | The entire training procedure takes approximately 4 days on a cluster of 16 Nvidia Tesla A100 GPUs. |
| Software Dependencies | No | We finetune the UNet from the pretrained Stable Diffusion with the zero terminal SNR schedule (Lin et al., 2024). We utilize the v-prediction as training objective and the Adam W optimizer with a learning rate of 1e-4. The paper mentions software components like Stable Diffusion and Adam W optimizer but does not provide specific version numbers for any key software libraries or frameworks. |
| Experiment Setup | Yes | We finetune the UNet from the pretrained Stable Diffusion with the zero terminal SNR schedule (Lin et al., 2024). We utilize the v-prediction as training objective and the Adam W optimizer with a learning rate of 1e-4. The model is trained on downsampled 256x256 resolution over 80,000 steps. During training, the number of input images N is randomly set to 3 or 1 per object. |