Generative Neural Articulated Radiance Fields

Authors: Alexander Bergman, Petr Kellnhofer, Wang Yifan, Eric Chan, David Lindell, Gordon Wetzstein

NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a solution to these challenges by developing a 3D GAN framework that learns to generate radiance fields of human bodies or faces in a canonical pose and warp them using an explicit deformation field into a desired body pose or facial expression. Using our framework, we demonstrate the first high-quality radiance field generation results for human bodies. Moreover, we show that our deformation-aware training procedure significantly improves the quality of generated bodies or faces when editing their poses or facial expressions compared to a 3D GAN that is not trained with explicit deformations. We first evaluate the proposed deformation field by overfitting a single representation on a single dynamic full body scene. Then we apply this deformation method in a GAN training pipeline for both bodies (AIST++ [23] and SURREAL [120]) and faces (FFHQ) [104].
Researcher Affiliation Academia Alexander W. Bergman Stanford University EMAIL Petr Kellnhofer TU Delft EMAIL Wang Yifan Stanford University EMAIL Eric R. Chan Stanford University EMAIL David B. Lindell University of Toronto Vector Institute EMAIL Gordon Wetzstein Stanford University EMAIL
Pseudocode No The paper describes methods in text and uses diagrams, but does not include any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured step-by-step procedures formatted like code.
Open Source Code No In the ethics checklist (3.a), the authors state: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We plan to release the code completely, but have not yet with submission.'
Open Datasets Yes AIST++ is a large dataset consisting of 10.1M images capturing 30 performers in dance motion. Each frame is annotated with a ground truth camera and fitted SMPL body model. SURREAL contains 6M images of synthetic humans created using SMPL body models in various poses rendered in indoor scenes. FFHQ is a large dataset of high-resolution images of human faces collected from Flickr. All images have licenses that allow free use, redistribution, and adaptation for non-commencial use.
Dataset Splits Yes We select a multi-view video sequence from the AIST++ dataset [23] and optimize tri-plane features in the canonical pose using a subset of the views and frames for supervision. We then evaluate the quality of the estimated radiance field warped into these training views and poses but also into held-out test views and poses. AIST++. AIST++ is a challenging dataset as the body poses are extremely diverse. We collect 30 frames per video as our training data after filtering out frames whose camera distance is above a threshold or the human bounding box is partially outside the image. Then we extract the human body by cropping a 600 × 600 patch centered at the pelvis joint, and resize these frames to 256 × 256.
Hardware Specification Yes The timings are measured to deform a single feature volume on an RTX3090 graphics processing unit.
Software Dependencies No The paper mentions software tools like 'Open3D library [122]', 'MMPose Project [125]', and 'DECA [128]' but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup Yes Training details and hyper-parameters are discussed in the supplement. Rather than initializing our network weights randomly, we begin training from a pre-trained EG3D [1] model. Fine-tuning allows for quicker convergence and saves computational resources during training. Similarly to AIST++, we use transfer learning from a pre-trained EG3D model at the appropriate resolution.