Unsupervised Discovery and Composition of Object Light Fields
Authors: Cameron Omid Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We demonstrate that compositional neural light fields, unconstrained by the sampling requirements of volumetric rendering, outperform prior work on unsupervised learning of object-centric 3D representations while dramatically reducing time and memory complexity. We further demonstrate that object-centric light fields admit scene editing in the from of translation and composing, and allow rendering of scenes with tens of objects at interactive frame-rates. Please find further qualitative results, including video, in the supplemental material. ... Table 1: Quantitative Comparison. Our method outperforms state-of-the-art baselines Yu et al. (2021) across all reconstruction quality metrics, while being orders of magnitude faster and requiring less memory. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2 Stanford University 3 Toyota Research Institute 4MIT BCS 5 CBMM |
| Pseudocode | No | The pseudo-code describing the background-aware slot encoding is the same as in u ORF, but exists in the supplemental material for reference. |
| Open Source Code | Yes | cameronosmith.github.io/colf |
| Open Datasets | Yes | CLEVR-657: The first room-scene dataset proposed by (Yu et al., 2021) is a 3D extension to the CLEVR (Johnson et al., 2017) dataset. ... ShapeNet chairs (Chang et al., 2015) |
| Dataset Splits | Yes | CLEVR-657: There are 1,000 scenes for training and 500 for testing. ... Room-Chair: There are 1,000 scenes for training and 500 for testing. ... Room-Diverse: There are 5,000 scenes for training and 500 for testing. ... City-Block: There are 500 scenes for training and we render out one scene for qualitative demonstration. |
| Hardware Specification | No | The paper does not provide specific details about the CPU or GPU models, memory, or cloud resources used for the experiments, only general statements like "capacity of even large GPUs". |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | On CLEVR-567, we set the background latent vector to 0, since the model needs no information about the unchanging background. This leads to faster model convergence. ... We supervise the model with the L2 reconstruction loss Iquery I 2, where Iquery are the ground truth views. We use a deep-feature based perceptual loss (Zhang et al., 2018) on both chair datasets to avoid inherent ambiguities in estimating lighting and geometry at occluded views. Lastly, we impose a small penalty z 2 on each object regressed code z to enforce a Gaussian prior. ... Lastly, we initially render and supervise images at 64 64 resolution to efficiently learn the coarse structure and decomposition of scenes, and subsequently supervise at 128 128 to learn more fine object structure. |