reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

Authors: Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that VQTalker achieves state-of-the-art performance in both video-driven and speech-driven scenarios, particularly in multilingual settings.
Researcher Affiliation	Collaboration	Tao Liu1*, Ziyang Ma1, Qi Chen1, Feilong Chen2, Shuai Fan2, Xie Chen1, Kai Yu1 1X-LANCE Lab, Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University 2AISpeech Ltd
Pseudocode	Yes	Algorithm 1: Group-Residual FSQ
Open Source Code	No	The paper provides a link for viewing synthetic results (https://x-lance.github.io/VQTalker), but does not explicitly state that source code for the methodology is available at this link or in supplementary materials.
Open Datasets	Yes	We utilized three publicly available datasets: Vox Celeb (Nagrani, Chung, and Zisserman 2017), HDTF (Zhang et al. 2021), and VFHQ (Xie et al. 2022).
Dataset Splits	Yes	To evaluate performance in Indo-European languages and video reconstruction tasks, we use HDTF (Zhang et al. 2021) as our test set, which follows Dinet (Zhang et al. 2023b).
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found in the paper.
Software Dependencies	No	The paper mentions using a 12-layer BERT network and a pre-trained speech tokenizer from Cosy Voice, but does not provide specific version numbers for these or other software dependencies like programming languages or libraries.
Experiment Setup	Yes	In the second stage, we employed a 12-layer BERT (Devlin et al. 2019) network to iteratively generate a four-layer residual codebook for the face tokenizer. The maximum length is 4096. ... Our approach employs 12 group layers, 4 residual layers, and 625 codebook entries per group, with a sampling rate of 25 fps...