Locality Sensitive Avatars From Video
Authors: Chunjin Song, Zhijie Wu, Shih-Yang Su, Bastian Wandt, Leonid Sigal, Helge Rhodin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate on ZJU-Mo Cap, Syn Wild, Actors HQ, MVHuman Net and various outdoor videos. The experiments reveal that with the locality sensitive deformation to canonical feature space, we are the first to achieve state-of-the-art results across novel view synthesis, novel pose animation and 3D shape reconstruction simultaneously. We also conduct ablation studies to measure the importance of locality sensitive offsets, skeletal deformation and spatial window function. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of British Columbia 2Vector Institute for AI 3Department of Electrical Engineer, Linköping University 4Bielefeld University |
| Pseudocode | No | The paper includes equations, architectural diagrams (Figure 1), and step-by-step descriptions of the method in text, but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Chunjin Song/lsavatar. |
| Open Datasets | Yes | We evaluate on ZJU-Mo Cap, Syn Wild, Actors HQ, MVHuman Net and various outdoor videos. We follow the evaluation protocal established by Mono Human and Go MAvatar to choose eight sequences from the ZJU-Mocap dataset (Peng et al., 2021b) with the same dataset splits. We further select four characters from Actors HQ (I sık et al., 2023). The Syn Wild examples (Guo et al., 2023) are additionally applied to measure the capability of geometry reconstruction. We also adopt two sequences from Mono Perf Cap (Xu et al., 2018) and downloaded Youtube videos (Weng et al., 2022; Yu et al., 2023) respectively as an in-the-wild dataset. We collect poses of large-scale motions from AIST++ dataset (Li et al., 2021a). We additionally evaluate on the latest released MVHuman Net dataset (Xiong et al., 2024). |
| Dataset Splits | Yes | We follow the evaluation protocal established by Mono Human and Go MAvatar to choose eight sequences from the ZJU-Mocap dataset (Peng et al., 2021b) with the same dataset splits. Given an image sequence, we use the provided first camera parameter for model training, and then use the remaining 22 cameras for evaluation (Weng et al., 2022; Yu et al., 2023; Wen et al., 2024). For each sequence [Actors HQ], we use the images captured by camera 127 and camera 128 as training and evaluation data respectively as they can cover the whole human body. Specifically, we pick one image every four frames until we have 375 images in total for training and use 125 images and 175 images for the evaluation of novel pose synthesis and novel view rendering respectively. |
| Hardware Specification | Yes | We train our network on two NVIDIA Tesla V100 GPUs for 15 hours. |
| Software Dependencies | No | Specifically, our method implementation is based on Py Torch framework (Paszke et al., 2019). Similar to Human Ne RF and PM-Avatar, we utilize the Adam optimizer (Kingma & Ba, 2014). All learnable weights are activated by Relu (Agarap, 2018) for network stability. Additionally, we choose the VGG (Simonyan & Zisserman, 2014) network as the backbone of our LPIPS objective. |
| Experiment Setup | Yes | We maintain the same hyper-parameter settings across all experiments, which include the weights of loss function tλeik, λLPIPS, λ xu, the number of training iterations, and the network capacity and learning rate. We set the initial learning rate of the learnable parameter β to 1 ˆ 10 4 for stable training and the learning rates of remaining parameters to 5 ˆ 10 4. we set λs 0.001 and NB 24 to accurately capture the topology variations and avoid introducing unnecessary training changes. Here we sample 4 patches with H 24. we disable the non-rigid motions at the beginning of network training, and then bring them back after 5000 iterations. We set Lnr 5 and Lc 5 across all experiments. we set α 2 and β 6 across all experiments. |