Unsupervised Discovery and Composition of Object Light Fields

Authors: Cameron Omid Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We demonstrate that compositional neural light fields, unconstrained by the sampling requirements of volumetric rendering, outperform prior work on unsupervised learning of object-centric 3D representations while dramatically reducing time and memory complexity. We further demonstrate that object-centric light fields admit scene editing in the from of translation and composing, and allow rendering of scenes with tens of objects at interactive frame-rates. Please find further qualitative results, including video, in the supplemental material. ... Table 1: Quantitative Comparison. Our method outperforms state-of-the-art baselines Yu et al. (2021) across all reconstruction quality metrics, while being orders of magnitude faster and requiring less memory.
Researcher Affiliation Collaboration 1MIT CSAIL 2 Stanford University 3 Toyota Research Institute 4MIT BCS 5 CBMM
Pseudocode No The pseudo-code describing the background-aware slot encoding is the same as in u ORF, but exists in the supplemental material for reference.
Open Source Code Yes cameronosmith.github.io/colf
Open Datasets Yes CLEVR-657: The first room-scene dataset proposed by (Yu et al., 2021) is a 3D extension to the CLEVR (Johnson et al., 2017) dataset. ... ShapeNet chairs (Chang et al., 2015)
Dataset Splits Yes CLEVR-657: There are 1,000 scenes for training and 500 for testing. ... Room-Chair: There are 1,000 scenes for training and 500 for testing. ... Room-Diverse: There are 5,000 scenes for training and 500 for testing. ... City-Block: There are 500 scenes for training and we render out one scene for qualitative demonstration.
Hardware Specification No The paper does not provide specific details about the CPU or GPU models, memory, or cloud resources used for the experiments, only general statements like "capacity of even large GPUs".
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes On CLEVR-567, we set the background latent vector to 0, since the model needs no information about the unchanging background. This leads to faster model convergence. ... We supervise the model with the L2 reconstruction loss Iquery I 2, where Iquery are the ground truth views. We use a deep-feature based perceptual loss (Zhang et al., 2018) on both chair datasets to avoid inherent ambiguities in estimating lighting and geometry at occluded views. Lastly, we impose a small penalty z 2 on each object regressed code z to enforce a Gaussian prior. ... Lastly, we initially render and supervise images at 64 64 resolution to efficiently learn the coarse structure and decomposition of scenes, and subsequently supervise at 128 128 to learn more fine object structure.