Outstanding Orthodontist: No More Artifactual Teeth in Talking Face
Authors: Zibo Su, Ziqi Zhang, Kun Wei, Xu Yang, Cheng Deng
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method makes the teeth in generated videos appear more natural during speech, significantly enhancing the temporal consistency and structural stability of audio-driven video generation. Section 4 is titled 'Experiments' and includes 'evaluation metrics', 'compare against state-of-the-art (SOTA) methods', 'ablation studies', 'Quantitative Analysis', 'Qualitative Analysis', and performance tables like Table 1 and Table 2. |
| Researcher Affiliation | Academia | Zibo Su , Ziqi Zhang , Kun Wei , Xu Yang and Cheng Deng Xidian University EMAIL. All authors are affiliated with Xidian University, which is an academic institution. |
| Pseudocode | Yes | Algorithm 1 Memory Bank Update in Ortho Net |
| Open Source Code | No | The paper states 'We will publicly release our dataset to facilitate future research.' regarding their self-built dataset, but it does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We train the proposed method on High-Definition Talking Face (HDTF) [Zhang et al., 2021] dataset and our self-built high-resolution news anchor dataset. The HDTF dataset is cited, indicating its public availability. |
| Dataset Splits | No | The paper mentions using the HDTF dataset and a self-built dataset, and that 'The model is trained on the combined HDTF dataset and our self-built dataset.' It also specifies training parameters like 'train for 30,000 epochs with batch size 4.' However, it does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages or sample counts). |
| Hardware Specification | Yes | We implement our framework using Py Torch and train it on four A6000 GPUs. |
| Software Dependencies | No | The paper states 'We implement our framework using Py Torch' but does not specify the version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | During training, we employ the Adam optimizer with a learning rate of 1 10 5 and train for 30,000 epochs with batch size 4. The memory modules maintain a 30-frame long-term buffer and 4-frame short-term window based on ablation studies. For network architecture, we set scale = 2 in DRM-Conv. Also, for the teeth perception loss, 'β set to 0.7' is specified. |