Latent Radiance Fields with 3D-aware 2D Representations
Authors: Chaoyi Zhou, Xi Liu, Feng Luo, Siyu Huang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction approaches in terms of synthesis performance and cross-dataset generalizability across diverse indoor and outdoor scenes. To our knowledge, this is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance. |
| Researcher Affiliation | Academia | Chaoyi Zhou , Xi Liu , Feng Luo, Siyu Huang Visual Computing Division School of Computing Clemson University EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described in prose and mathematical equations. |
| Open Source Code | Yes | The project page is latent-radiance-field.github.io. |
| Open Datasets | Yes | We first evaluate LRF on four real-world datasets, including MVImg Net (Yu et al., 2023), Ne RF-LLFF (Mildenhall et al., 2019), Mip Ne RF360 (Barron et al., 2022), and DL3DV-10K Ling et al. (2024), to demonstrate the effectiveness of our approach for latent 3D reconstruction. |
| Dataset Splits | Yes | We follow the standard train and test split in 3DGS and Mip-Splatting(Kerbl et al., 2023; Yu et al., 2024). |
| Hardware Specification | Yes | For Stage-I, we employ the pre-trained VAE model (f = 8, KL), from LDM model zoo as the backbone VAE model. We fine-tune the VAE on 2 NVIDIA A100-80GB GPUs for around one day... For Stage-III, we fine-tune the decoder on the image-latent dataset with 2 NVIDIA A100-80GB GPUs for around one day. |
| Software Dependencies | No | The paper mentions using a 'pre-trained VAE model (f = 8, KL), from LDM model zoo' and that a method 'is implemented based on the APE computation approach in the evo library (Grupp, 2017)'. However, explicit version numbers for these or other software dependencies like programming languages or deep learning frameworks are not provided. |
| Experiment Setup | Yes | the base learning rate of 4.5e-06, and the default optimizer. For Stage-III, we fine-tune the decoder on the image-latent dataset with 2 NVIDIA A100-80GB GPUs for around one day. ... λnovel and λnovel are the weighting coefficient that balances the contributions of the training and novel views. Both of the weights are set to 0.5 to ensure that the decoder learns not only to decode effectively from the training views but also to generalize and perform well on the novel views. |