reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer

Authors: Yihong Lin, Zhaoxin Fan, Xianjia Wu, Lingyu Xiong, Xiandong Li, Wenxiong Kang, Liang Peng, Songju Lei, Huang Xu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on standard benchmarks demonstrate that GLDi Talker outperforms existing methods, achieving superior results in both lip-sync accuracy and motion diversity. ... 4 Experiments 4.1 Datasets and Implementations 4.2 Quantitative Evaluation 4.3 Qualitative Evaluation 4.4 User Study 4.5 Ablation Study
Researcher Affiliation	Collaboration	1South China University of Technology 2 Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University 3Hangzhou International Innovation Institute, Beihang University 4Huawei Cloud 5Nanjing University
Pseudocode	No	The paper describes the methodology using architectural diagrams and textual explanations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct abundant experiments on two public 3D facial datasets, BIWI [Fanelli et al., 2010] and VOCASET [Cudeiro et al., 2019], both of which have 4D face scans along with audio recordings.
Dataset Splits	Yes	We follow the data splits of the previous work [Fan et al., 2022] and only use the emotional data for fair comparisons. Specifically, the training set (BIWITrain) contains 192 sentences, the validation set (BIWI-Val) contains 24 sentences, and the testing set are divided into two subsets, in which BIWI-Test-A contains 24 sentences spoken by 6 seen subjects during training and BIWI-Test-B contains 32 sentences spoken by 8 unseen subjects during training. ... Similar to [Fan et al., 2022], we adopt the same training (VOCA-Train), validation (VOCA-Val) and testing (VOCA-Test) splits for qualitative testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	Yes	Audio Encoder Ea uses the released hubert-base-ls960 version of the Hu BERT architecture pre-trained on 960 hours of 16k Hz sampled speech audio.
Experiment Setup	Yes	Lstage1 = λrec1Lrec1 + λquant Lquant, (6) where λrec1 = λquant = 1. ... where β denotes a weighted hyperparameter, which is 0.25 in all our experiments. ... Lstage2 = λrec2Lrec2 + λvel Lvel, (13) where λrec2 = λvel = 1. ... The feature extractor, feature projection layer and the initial two layers of the encoder are frozen, while the remaining parameters are set to be trainable.