reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Authors: Seyeon Kim, Siyoon Jin, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on standard benchmarks demonstrate that our model outperforms existing GAN-based and diffusion-based models. We also provide comprehensive ablation studies and user study results. In experiments, our framework achieves state-of-the-art performance on HDTF dataset (Zhang et al. 2021), surpassing GAN-based (Prajwal et al. 2020; Zhou et al. 2021) and diffusion-based (Ma et al. 2023; Wei, Yang, and Wang 2024) approaches.
Researcher Affiliation	Collaboration	Seyeon Kim 1, 2, Siyoon Jin 1, Jihye Park 1, 2*, Kihong Kim 3, Jiyoung Kim 1, Jisu Nam 4, Seungryong Kim 4 1Korea University 2Samsung Electronics 3VIVE STUDIOS 4KAIST
Pseudocode	No	The paper describes its methodology in prose and mathematical formulations but does not include any distinct pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/cvlab-kaist/Mo Di Talker
Open Datasets	Yes	We used the LRS3-TED (Afouras, Chung, and Zisserman 2018) and HDTF (Zhang et al. 2021) datasets to train our ATo M and MTo V models, respectively.
Dataset Splits	Yes	For MTo V, we randomly selected 312 videos from the HDTF dataset for training, using remaining 98 videos for testing.
Hardware Specification	Yes	For all experiments, we used single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions software components like Hu BERT and 3DMM but does not provide specific version numbers for these or other key software dependencies required for replication.
Experiment Setup	Yes	For ATo M, we train the model for 300k iterations with a learning rate of 1e-4. For MTo V, we train the model for 600k iterations with a learning rate of 1e-4. To alleviate jittering, we employed a blending technique using Gaussian blur, as described in (Chen et al. 2020). Additional implementation details are provided in the Appendix 1.