Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Authors: Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valerie Gouet-Brunet
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally confirm that Latent Ne RFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to Ne RFs trained in the image space. Our project page can be found at https://ig-ae.github.io . |
| Researcher Affiliation | Collaboration | Antoine Schnepf * 1,2, Karim Kassab* 1,3, Jean-Yves Franceschi1, Laurent Caraffa3, Flavian Vasile1, Jeremie Mary1, Andrew Comport 2, Val erie Gouet-Brunet 3 * Equal contribution 1 Criteo AI Lab, Paris, France 2 Universit e Cˆote d Azur, CNRS, I3S, France 3 LASTIG, Universit e Gustave Eiffel, IGN-ENSG, F-94160 Saint-Mand e |
| Pseudocode | No | The paper describes the methodology and training process using mathematical equations and prose, but does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 3 describes "LATENT NERF" and its training process. |
| Open Source Code | Yes | We utilize the trained IG-AE to bring Ne RFs to the latent space with a latent Ne RF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. [...] Our code is open-source and available on the following Git Hub repository: https://github.com/Antoine Schnepf/ latent-nerfstudio . The training code for IG-AE is open-source and available on the following Git Hub repository: https://github.com/k-kassab/igae . |
| Open Datasets | Yes | For 3D-regularization, we adopt Objaverse (Deitke et al., 2023), a synthetic dataset which is standard when large-scale and diverse 3D data is needed (Liu et al., 2023; Shi et al., 2024). [...] For AE preservation, we adopt Imagenet (Deng et al., 2009), a large dataset of diverse real images. [...] For Ne RF evaluations, we utilize synthetic, object-level data as it aligns with the training domain. As such, we train Ne RFs on held-out scenes from Objaverse, and on scenes from three out-of-distribution datasets: Shapenet Hats, Bags, and Vases (Chang et al., 2015). |
| Dataset Splits | No | For 3D-regularization, we adopt Objaverse (Deitke et al., 2023), a synthetic dataset which is standard when large-scale and diverse 3D data is needed (Liu et al., 2023; Shi et al., 2024). We utilize N = 500 objects from Objaverse. Each object is rendered from V = 300 views at a 128 128 resolution. [...] For Ne RF evaluations, we utilize synthetic, object-level data as it aligns with the training domain. As such, we train Ne RFs on held-out scenes from Objaverse, and on scenes from three out-of-distribution datasets: Shapenet Hats, Bags, and Vases (Chang et al., 2015). [...] Table 1: Main Results on Shape Net datasets. All results are obtained by training Ne RFs with our Latent Ne RF Training Pipeline, and are averaged over 4 scenes from each dataset. |
| Hardware Specification | Yes | Training IG-AE takes 60 hours on 4 NVIDIA L4 GPUs. [...] Training and rendering time is measured using a single NVIDIA L4 GPU. |
| Software Dependencies | No | Nerfstudio (Tancik et al., 2023) emerged as a unified Py Torch (Paszke et al., 2019) framework in which Ne RF models are implemented using standardized implementations, making it straightforward for researchers and practitioners to integrate various Ne RF models into their projects. [...] We adopt the pre-trained Ostris KL-f8-d16 VAE (Burkett, 2024) from Hugging Face, which has a downscale factor l = 8, and c = 16 feature channels in the latent space. |
| Experiment Setup | Yes | To train a latent Ne RF in Nerfstudio, we first train the chosen model for 10 000 iterations to minimize LLS using the method-specific optimization process. Subsequently, we continue the training with 15 000 iterations of RGB alignment by minimizing Lalign. To account for the change of image representations, we modulate the learning rate of each method by a factor of ξLS in latent supervision, and a factor ξalign for RGB alignment. Appendix F.2 details the hyper-parameters we used in Nerfstudio, including the values of these factors for each method. [...] Appendix F.1: IG-AE TRAINING SETTINGS: Table 15 details the hyperparameters taken to train our IG-AE. |