reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Authors: Tianyuan Zhang, Zhengfei Kuang, Haian Jin, Zexiang Xu, Sai Bi, Hao Tan, HE Zhang, Yiwei Hu, Milos Hasan, William Freeman, Kai Zhang, Fujun Luan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on synthetic and real-world datasets to evaluate Relit LRM. The results demonstrate that our method matches state-of-the-art inverse rendering approaches while using significantly fewer input images and requiring much less processing time (seconds v.s. hours).
Researcher Affiliation	Collaboration	1Massachusetts Institute of Technology 2Stanford University 3Cornell University 4Adobe Research
Pseudocode	Yes	A.1 PSEUDO CODE Algorithm 1 Relit LRM pseudo code.
Open Source Code	No	Our project page is available at: https://relit-lrm.github.io/. This URL points to a project demonstration page, not explicitly a source code repository, and the paper does not contain an explicit code release statement.
Open Datasets	Yes	Our training dataset is constructed from a combination of 800K objects sourced from Objaverse (Deitke et al., 2023) and 210K synthetic objects from Zeroverse (Xie et al., 2024). [...] For lighting diversity, we gathered over 8,000 HDR environment maps from multiple sources, including Polyhaven1, Laval Indoor (Gardner et al., 2017), Laval Outdoor (Hold-Geoffroy et al., 2019), internal datasets, and a selection of randomly generated Gaussian blobs.
Dataset Splits	Yes	The initial training phase employs four input views, four target denoising views (under target lighting, used for computing the diffusion loss), and two additional supervision views (under target lighting), all at a resolution of 256 256, with the environment map set to 128 256. The model is trained with a batch size of 512 for 80K iterations [...] Following this pretraining at the 256-resolution, we fine-tune the model for a larger context by increasing to six input views and six denosing target views at a higher resolution of 512 512. [...] We evaluate our method against these approaches on three publicly available datasets: STANFORD-ORB (Kuang et al., 2024), OBJECTS-WITH-LIGHTING (Ummenhofer et al., 2024), and TENSOIR-SYNTHETIC (Jin et al., 2023). The STANFORD-ORB dataset comprises 14 objects captured under three lighting conditions, with around 60 training views and 10 test views per lighting setup per object. The OBJECTS-WITH-LIGHTING dataset contains 7 objects with dense views captured under one training lighting condition and 3 views for two additional lighting conditions for testing. The TENSOIR-SYNTHETIC dataset consists of 4 objects with 100 training views under one lighting condition and 200 test views for each of five lighting conditions.
Hardware Specification	Yes	Relit LRM decodes 3D Gaussian (Kerbl et al., 2023) primitive parameters within approximately one second on a single A100 GPU. [...] Our transformer model [...] requires four days on 32 NVIDIA A100 GPUs (40GB VRAM each).
Software Dependencies	No	The paper mentions using 'Adam W optimizer', 'Ge LU activations', 'DDIM sampler', and 'classifier-free guidance technique' but does not specify software names with version numbers for libraries or frameworks used (e.g., PyTorch version, TensorFlow version).
Experiment Setup	Yes	The model is trained with a batch size of 512 for 80K iterations, introducing the perceptual loss after the first 5K iterations to enhance training stability. Following this pretraining at the 256-resolution, we fine-tune the model for a larger context by increasing to six input views and six denosing target views at a higher resolution of 512 512. This fine-tuning expands the context window to up to 31K tokens. For diffusion training, we discretize the noise into 1,000 timesteps, adhering to the method described in Ho et al. (2020), with a variance schedule that linearly increases from 0.00085 to 0.0120. To enable classifier-free guidance, environment map tokens are randomly masked to zero with a probability of 0.1 during training. [...] We use the Adam W optimizer with a peak learning rate of 4e 4 and a weight decay of 0.05. The β1, β2 are set to 0.9 and 0.95 respectively. We use 2000 iterations of warmup and start to introduce perceptual loss after 5000 iterations for training stability. We then finetune the model [...] with a reduced peak learning rate of 4e 5 and 1000 warmup steps. Throughout training, we apply gradient clipping at 1.0 and skip steps where the gradient norm exceeds 20.0.