IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

Authors: Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.
Researcher Affiliation Collaboration 1 The Chinese University of Hong Kong 2 Zhejiang University 3 Shanghai AI Laboratory 4 CPII under Inno HK
Pseudocode No The paper describes the methodology in prose and through diagrams (Figure 2, an overview of IDArb and its attention block) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No Project website: https://lizb6626.github.io/IDArb/. This is a project website which may link to code, but it is not a direct link to a source-code repository nor does it contain an explicit statement that code is released for the described methodology.
Open Datasets Yes To address these shortcomings, we develop a custom dataset, Arb-Objaverse. We select 68k 3D models from Objaverse (Deitke et al., 2022), and filter out low-quality and texture-less cases. For training, we further enhance the variability by combining this dataset with G-Objaverse and ABO. For synthetic data, we sample 441 objects from Arb Objaverse and G-Objaverse, selecting four viewpoints for each object. We conduct experiments on the real-world Open Illumination dataset (Liu et al., 2024a) and the synthetic Ne RFactor dataset (Zhang et al., 2021b). Additionally, we conduct experiments on standard benchmarks, MIT-Intrinsic (Grosse et al., 2009) and Stanford-ORB (Kuang et al., 2023).
Dataset Splits No For synthetic data, we sample 441 objects from Arb Objaverse and G-Objaverse, selecting four viewpoints for each object. During training, the number of input images N is randomly set to 3 or 1 per object. The paper describes aspects of dataset usage and evaluation sets but does not provide specific training/validation/test splits (e.g., percentages or exact counts) for its primary or combined datasets.
Hardware Specification Yes The entire training procedure takes approximately 4 days on a cluster of 16 Nvidia Tesla A100 GPUs.
Software Dependencies No We finetune the UNet from the pretrained Stable Diffusion with the zero terminal SNR schedule (Lin et al., 2024). We utilize the v-prediction as training objective and the Adam W optimizer with a learning rate of 1e-4. The paper mentions software components like Stable Diffusion and Adam W optimizer but does not provide specific version numbers for any key software libraries or frameworks.
Experiment Setup Yes We finetune the UNet from the pretrained Stable Diffusion with the zero terminal SNR schedule (Lin et al., 2024). We utilize the v-prediction as training objective and the Adam W optimizer with a learning rate of 1e-4. The model is trained on downsampled 256x256 resolution over 80,000 steps. During training, the number of input images N is randomly set to 3 or 1 per object.