reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Locality Sensitive Avatars From Video

Authors: Chunjin Song, Zhijie Wu, Shih-Yang Su, Bastian Wandt, Leonid Sigal, Helge Rhodin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate on ZJU-Mo Cap, Syn Wild, Actors HQ, MVHuman Net and various outdoor videos. The experiments reveal that with the locality sensitive deformation to canonical feature space, we are the first to achieve state-of-the-art results across novel view synthesis, novel pose animation and 3D shape reconstruction simultaneously. We also conduct ablation studies to measure the importance of locality sensitive offsets, skeletal deformation and spatial window function.
Researcher Affiliation	Collaboration	1Department of Computer Science, University of British Columbia 2Vector Institute for AI 3Department of Electrical Engineer, Linköping University 4Bielefeld University
Pseudocode	No	The paper includes equations, architectural diagrams (Figure 1), and step-by-step descriptions of the method in text, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Chunjin Song/lsavatar.
Open Datasets	Yes	We evaluate on ZJU-Mo Cap, Syn Wild, Actors HQ, MVHuman Net and various outdoor videos. We follow the evaluation protocal established by Mono Human and Go MAvatar to choose eight sequences from the ZJU-Mocap dataset (Peng et al., 2021b) with the same dataset splits. We further select four characters from Actors HQ (I sık et al., 2023). The Syn Wild examples (Guo et al., 2023) are additionally applied to measure the capability of geometry reconstruction. We also adopt two sequences from Mono Perf Cap (Xu et al., 2018) and downloaded Youtube videos (Weng et al., 2022; Yu et al., 2023) respectively as an in-the-wild dataset. We collect poses of large-scale motions from AIST++ dataset (Li et al., 2021a). We additionally evaluate on the latest released MVHuman Net dataset (Xiong et al., 2024).
Dataset Splits	Yes	We follow the evaluation protocal established by Mono Human and Go MAvatar to choose eight sequences from the ZJU-Mocap dataset (Peng et al., 2021b) with the same dataset splits. Given an image sequence, we use the provided first camera parameter for model training, and then use the remaining 22 cameras for evaluation (Weng et al., 2022; Yu et al., 2023; Wen et al., 2024). For each sequence [Actors HQ], we use the images captured by camera 127 and camera 128 as training and evaluation data respectively as they can cover the whole human body. Specifically, we pick one image every four frames until we have 375 images in total for training and use 125 images and 175 images for the evaluation of novel pose synthesis and novel view rendering respectively.
Hardware Specification	Yes	We train our network on two NVIDIA Tesla V100 GPUs for 15 hours.
Software Dependencies	No	Specifically, our method implementation is based on Py Torch framework (Paszke et al., 2019). Similar to Human Ne RF and PM-Avatar, we utilize the Adam optimizer (Kingma & Ba, 2014). All learnable weights are activated by Relu (Agarap, 2018) for network stability. Additionally, we choose the VGG (Simonyan & Zisserman, 2014) network as the backbone of our LPIPS objective.
Experiment Setup	Yes	We maintain the same hyper-parameter settings across all experiments, which include the weights of loss function tλeik, λLPIPS, λ xu, the number of training iterations, and the network capacity and learning rate. We set the initial learning rate of the learnable parameter β to 1 ˆ 10 4 for stable training and the learning rates of remaining parameters to 5 ˆ 10 4. we set λs 0.001 and NB 24 to accurately capture the topology variations and avoid introducing unnecessary training changes. Here we sample 4 patches with H 24. we disable the non-rigid motions at the beginning of network training, and then bring them back after 5000 iterations. We set Lnr 5 and Lc 5 across all experiments. we set α 2 and β 6 across all experiments.