Generalizable Human Gaussians from Single-View Image

Authors: Jinnan Chen, Chen Li, Jianfeng Zhang, Lingting Zhu, Buzhen Huang, Hanlin Chen, Gim H Lee

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the publicly available 3D human datasets THuman2.0 (Yu et al., 2021), Custom Humans(Ho et al., 2023) and Hu MMan (Cai et al., 2022). Our method is compared with state-of-the-art (SOTA) methods in both novel view synthesis and 3D mesh reconstruction. We conduct ablation studies to evaluate the effectiveness of our SMPL-X dual branch Gaussian prediction model, coarse-to-fine refinement strategy, back-view refine Control Net, and our SMPL-X refinement. We show the quantitative ablation results in Table 4.
Researcher Affiliation Academia 1National University of Singapore, 2The University of Hong Kong EMAIL, EMAIL
Pseudocode No The paper describes the method using textual descriptions and diagrams (Figures 2 and 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We open-source our code at: https://github.com/jinnan-chen/HGM.
Open Datasets Yes We conduct experiments on the publicly available 3D human datasets THuman2.0 (Yu et al., 2021), Custom Humans(Ho et al., 2023) and Hu MMan (Cai et al., 2022).
Dataset Splits Yes We train our HGM on 500 human scans from THuman2.0 dataset following Zhang et al. (2024). We render the images with resolution of 512 512 and using weak perspective camera on 12 fixed cameras evenly distributed with the azimuths from 0 to 360 degree. During evaluation, all the methods are tested without the ground truth SMPL-X. We follow the train and test list from SIFU (Zhang et al., 2024) and SHERF (Hu et al., 2023) to evaluate our method on THuman2.0 and Hu MMan dataset. For Custom Humans dataset we use 45 scans for cross-dataset evaluation containing loosing clothes and challenging poses.
Hardware Specification Yes Our model is trained on 4 NVIDIA RTX A6000 with batch size of 4 for 20 hours. We conduct all the experiments on NVIDIA RTX A6000 GPU.
Software Dependencies Yes The experimental environment is Py Torch 2.2.1 and CUDA 12.2.
Experiment Setup Yes Our model is trained on 4 NVIDIA RTX A6000 with batch size of 4 for 20 hours. Our input image size is 512 512 and the number of Gaussians for each view is 256 256, with a total of 65,536 Gaussians per view. For SMPL-X estimation, we use PIXIE(Feng et al., 2021). The objective function for HGM training includes L2 color loss, Lrgb, VGG-based LPIPS perceptual loss, Llpips (Zhang et al., 2018), and L2 background mask loss Lbg with ground truth masks. Each of these losses has corresponding weights that are treated as hyperparameters: LHGM = λrgb Lrgb + λlpips Llpips + λbg Lbg, where λrgb = λlpips = λbg=1.0. During optimization, we render SMPL-X side views and compute the side-view mask loss and normal loss for a total of 45 iterations. SMPL-X parameters are updated at each iteration... The loss weights are set as follows: λfront = 10, λside = 1, and λn = 0.5.