Outstanding Orthodontist: No More Artifactual Teeth in Talking Face

Authors: Zibo Su, Ziqi Zhang, Kun Wei, Xu Yang, Cheng Deng

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method makes the teeth in generated videos appear more natural during speech, significantly enhancing the temporal consistency and structural stability of audio-driven video generation. Section 4 is titled 'Experiments' and includes 'evaluation metrics', 'compare against state-of-the-art (SOTA) methods', 'ablation studies', 'Quantitative Analysis', 'Qualitative Analysis', and performance tables like Table 1 and Table 2.
Researcher Affiliation Academia Zibo Su , Ziqi Zhang , Kun Wei , Xu Yang and Cheng Deng Xidian University EMAIL. All authors are affiliated with Xidian University, which is an academic institution.
Pseudocode Yes Algorithm 1 Memory Bank Update in Ortho Net
Open Source Code No The paper states 'We will publicly release our dataset to facilitate future research.' regarding their self-built dataset, but it does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We train the proposed method on High-Definition Talking Face (HDTF) [Zhang et al., 2021] dataset and our self-built high-resolution news anchor dataset. The HDTF dataset is cited, indicating its public availability.
Dataset Splits No The paper mentions using the HDTF dataset and a self-built dataset, and that 'The model is trained on the combined HDTF dataset and our self-built dataset.' It also specifies training parameters like 'train for 30,000 epochs with batch size 4.' However, it does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages or sample counts).
Hardware Specification Yes We implement our framework using Py Torch and train it on four A6000 GPUs.
Software Dependencies No The paper states 'We implement our framework using Py Torch' but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes During training, we employ the Adam optimizer with a learning rate of 1 10 5 and train for 30,000 epochs with batch size 4. The memory modules maintain a 30-frame long-term buffer and 4-frame short-term window based on ablation studies. For network architecture, we set scale = 2 in DRM-Conv. Also, for the teeth perception loss, 'β set to 0.7' is specified.