Hierarchically Controlled Deformable 3D Gaussians for Talking Head Synthesis
Authors: Zhenhua Wu, Linxuan Jiang, Xiang Li, Chaowei Fang, Yipeng Qin, Guanbin Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the HDTF dataset and additional test sets demonstrate that our method outperforms existing approaches in visual quality, facial landmark accuracy, and audio-visual synchronization while being more computationally efficient in both training and inference. |
| Researcher Affiliation | Collaboration | 1Sun Yat-sen University, 2Shanghai Innovation Institute, 3Guangdong University of Technology, 4Gezhi Intelligent Technology, 5Xidian University, 6Cardiff University, 7Peng Cheng Laboratory 8Guangdong Key Laboratory of Big Data Analysis and Processing. The authors have affiliations with universities (academic) and technology institutes/companies (industry/research), indicating a collaborative effort. |
| Pseudocode | No | No specific pseudocode or algorithm blocks are explicitly provided in the paper. The methodology is described through text and mathematical equations. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Extensive experimental results on HDTF (Zhang et al. 2021) and two testing sets (Li et al. 2023; Ye et al. 2023b) demonstrate that our method achieves state-of-the-art performance... We conduct experiments on three datasets: HDTF (Zhang et al. 2021), Testset 1 (Li et al. 2023), and Testset 2 (Ye et al. 2023b). |
| Dataset Splits | No | The paper mentions that for the HDTF dataset, 8 videos are selected, but it does not specify a quantitative training, validation, and test split for the datasets used in the overall evaluation of the model. |
| Hardware Specification | No | The paper mentions 'high-end GPUs' in the context of NeRF-based methods but does not provide specific hardware details (like GPU models, CPU models, or memory) used for running its own experiments. |
| Software Dependencies | No | The paper references several models and frameworks like Hu BERT (Hsu et al. 2021), Faceformer (Fan et al. 2022), MediaPipe (Lugaresi et al. 2019), Depth-anything (Yang et al. 2024), and VGG network (Sengupta et al. 2019), but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The paper specifies the training schedule for the Expression-Controlled 3DGS Head Synthesis, detailing three stages with specific iteration counts: 'Initialization Stage (3,000 iterations)', 'Deformation Stage (2,000 iterations)', and 'Refinement Stage (15,000 iterations)'. It also describes the loss functions used for different modules. |