reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model

Authors: Fei Shen, Cong Wang, Junyao Gao, Qin Guo, Jisheng Dang, Jinhui Tang, Tat-Seng Chua

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the effectiveness of MCDM in maintaining identity and motion continuity for long-term Talking Face generation.
Researcher Affiliation	Academia	1Nanjing University of Science and Technology 2Nanjing University 3Tongji University 4Peking University 5Sun Yat-sen University 6National University of Singapore. Correspondence to: Jinhui Tang <EMAIL>.
Pseudocode	No	The paper describes the architecture and methodology in sections 3.1, 3.2, 3.3, and 3.4 using descriptive text and a figure (Figure 1), but no explicit pseudocode or algorithm block is provided.
Open Source Code	No	The paper does not contain an explicit statement indicating the release of source code for the methodology described, nor does it provide a direct link to a code repository for their implementation. Footnote 4 links to a company website (https://www.guiji.ai/) and a third-party tool's GitHub link is mentioned (https://github.com/Moore Threads/Moore-Animate Anyone), but neither is the authors' own code release.
Open Datasets	Yes	Additionally, we present the Talking Face-Wild dataset, a high-quality, multilingual video dataset with over 200 hours of footage in 10 languages, offering a valuable resource for further research in Talking Face generation.
Dataset Splits	Yes	Following prior work (Chen et al., 2024; Tian et al., 2024; Xu et al., 2024), we split HDTF into training and testing sets with a 9:1 ratio.
Hardware Specification	Yes	The experiments are conducted on a computing platform equipped with 8 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using 'Stable Diffusion v1.5' and models like 'Wav2Vec' and 'CLIP', but does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries, or frameworks (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup	Yes	Training is performed in three stages, with each stage consisting of 30,000 iterations and a batch size of 4. Video data is processed at a resolution of 512 × 512. The learning rate is fixed at 1 × 10−5 across all stages, and the Adam W optimizer is employed to stabilize training. Each training clip comprised 16 video frames. In the archived-clip motion-prior module, we set α = 16, m = 256, and n = 16. In the present-clip motion-prior diffusion model, the number of layers L is set to 8, and the weighting factor α in Eq. 5 is configured to 0.1 to balance the influence of prior motion information.