Hierarchically Controlled Deformable 3D Gaussians for Talking Head Synthesis

Authors: Zhenhua Wu, Linxuan Jiang, Xiang Li, Chaowei Fang, Yipeng Qin, Guanbin Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the HDTF dataset and additional test sets demonstrate that our method outperforms existing approaches in visual quality, facial landmark accuracy, and audio-visual synchronization while being more computationally efficient in both training and inference.
Researcher Affiliation Collaboration 1Sun Yat-sen University, 2Shanghai Innovation Institute, 3Guangdong University of Technology, 4Gezhi Intelligent Technology, 5Xidian University, 6Cardiff University, 7Peng Cheng Laboratory 8Guangdong Key Laboratory of Big Data Analysis and Processing. The authors have affiliations with universities (academic) and technology institutes/companies (industry/research), indicating a collaborative effort.
Pseudocode No No specific pseudocode or algorithm blocks are explicitly provided in the paper. The methodology is described through text and mathematical equations.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Extensive experimental results on HDTF (Zhang et al. 2021) and two testing sets (Li et al. 2023; Ye et al. 2023b) demonstrate that our method achieves state-of-the-art performance... We conduct experiments on three datasets: HDTF (Zhang et al. 2021), Testset 1 (Li et al. 2023), and Testset 2 (Ye et al. 2023b).
Dataset Splits No The paper mentions that for the HDTF dataset, 8 videos are selected, but it does not specify a quantitative training, validation, and test split for the datasets used in the overall evaluation of the model.
Hardware Specification No The paper mentions 'high-end GPUs' in the context of NeRF-based methods but does not provide specific hardware details (like GPU models, CPU models, or memory) used for running its own experiments.
Software Dependencies No The paper references several models and frameworks like Hu BERT (Hsu et al. 2021), Faceformer (Fan et al. 2022), MediaPipe (Lugaresi et al. 2019), Depth-anything (Yang et al. 2024), and VGG network (Sengupta et al. 2019), but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The paper specifies the training schedule for the Expression-Controlled 3DGS Head Synthesis, detailing three stages with specific iteration counts: 'Initialization Stage (3,000 iterations)', 'Deformation Stage (2,000 iterations)', and 'Refinement Stage (15,000 iterations)'. It also describes the loss functions used for different modules.