Unsupervised Discovery of Object-Centric Neural Fields

Authors: Rundong Luo, Hong-Xing Yu, Jiajun Wu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our approach significantly improves generalization and sample efficiency, and enables unsupervised 3D object discovery in real scenes. We evaluate our method on three tasks: unsupervised object segmentation in 3D, novel view synthesis, and scene manipulation in 3D.
Researcher Affiliation Academia Rundong Luo EMAIL Cornell University Hong-Xing Yu EMAIL Stanford University Jiajun Wu EMAIL Stanford University
Pseudocode No The paper describes methods through textual descriptions and mathematical equations (e.g., Eq. 1-7) and architectural diagrams (e.g., Figure 2, Figure 13(a)), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The project page is available at https://red-fairy.github.io/u OCF/. All code and data will be made public. Sample code and data are included in the supplementary material, and we plan to release the full code and datasets for public use.
Open Datasets No Lastly, we collect four challenging datasets, Room-Texture, Room-Furniture, Kitchen-Matte, and Kitchen Shiny, and show that u OCF significantly outperforms existing methods on these datasets, unlocking zero-shot, single-image object discovery. All code and data will be made public. Sample code and data are included in the supplementary material, and we plan to release the full code and datasets for public use.
Dataset Splits Yes Room-Texture. The dataset comprises 5,000 scenes for training and 100 for evaluation. Room-Furniture. We render 5000 scenes for training and 100 scenes for evaluation. Kitchen-Matte. This dataset contains 735 scenes for training and 102 for evaluation. Kitchen-Shiny. This dataset consists of 324 scenes for training and 56 for evaluation.
Hardware Specification Yes All experiments are run on a single RTX-A6000 GPU. This optimization takes about 3 minutes on a single A6000 gpu.
Software Dependencies No We employ Mip-Ne RF (Barron et al., 2021) as our Ne RF backbone and estimate the depth maps by Mi Da S (Ranftl et al., 2022). An Adam optimizer with default hyper-parameters and an exponential decay scheduler is used across all experiments. The paper mentions software components but does not provide specific version numbers for them.
Experiment Setup Yes The initial learning rate is 0.0003 for the first stage and 0.00015 for the second stage. Loss weights are set to λperc = 0.006, λdepth = 1.5, and λocc = 0.1. The position update momentum m is set to 0.5, and the latent inference module lasts T = 6 iterations.