reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Echo Mimic has been comprehensively compared with alternative algorithms across various public datasets and our collected dataset, showcasing superior performance in both quantitative and qualitative evaluations.
Researcher Affiliation	Industry	Terminal Technology Department, Alipay, Ant Group, Hangzhou, China EMAIL
Pseudocode	No	The paper describes methods and a model architecture, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	The code and models are available on the project page. Project Page https://antgroup.github.io/ai/echomimic
Open Datasets	Yes	Echo Mimic is extensively compared with alternative algorithms across diverse public datasets and our collected dataset, demonstrating superior performance in both quantitative and qualitative evaluations. Datasets. We collected approximately 540 hours (about 130,000 15-second video clips) of talking head videos, augmented with the HDTF and Celeb V-HQ datasets.
Dataset Splits	Yes	A 90:10 split was used for identity data, with 90% for training.
Hardware Specification	Yes	Implementation Details. Experiments involved training and inference phases on a high-performance computing setup with 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions specific models and architectures like 'Whisper-Tiny model (Radford et al. 2023)', 'Stable Diffusion (SD)', 'SDv1.5 architecture', 'CLIP (Radford et al. 2021) ViTL/14 text encoder', and 'pre-trained Animatediff weights'. However, it does not provide specific version numbers for ancillary software components (e.g., programming languages, libraries, or frameworks like Python, PyTorch, or CUDA versions).
Experiment Setup	Yes	Training comprised two segments of 30,000 steps each, using a batch size of 4 with 512 512 pixel video data. ... A constant learning rate of 1e-5 was used.