DiffFERV: Diffusion-based Facial Editing of Real Videos
Authors: Xiangyi Chen, Han Xue, Li Song
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Diff FERV achieves state-of-the-art performance in both reconstruction and editing tasks. ... Extensive evaluations demonstrate that Diff FERV excels in preserving facial identity, ensuring temporal consistency, especially when handling challenging real-world data. Diff FERV sets a new benchmark for robust, generalizable, and high-quality face video editing. ... Qualitative Results In Fig. 4, we present reconstruction results. ... Quantitative Results Table 1 shows that Diff FERV achieves the highest scores across all reconstruction metrics. ... Ablation Studies |
| Researcher Affiliation | Academia | Xiangyi Chen1 , Han Xue2Q , Li Song1Q 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2School of Computer Science and Technology, Donghua University EMAIL, EMAIL, song EMAIL |
| Pseudocode | No | The paper describes its methodology in text and uses mathematical equations for clarification (e.g., equations 1-8). However, it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Munchkin Chen/Diff FERV. |
| Open Datasets | Yes | For specialization, we initialize with the pretrained weights of Stable Diffusion 1.5. We utilize the FFHQ dataset [Karras et al., 2019] as our training dataset. ... We evaluate Diff FERV on Celeb V-HQ [Zhu et al., 2022]. |
| Dataset Splits | No | The paper states: "Within our dataset, we integrate 10% of image-text pairs sampled from the LAION-2B-en [Rombach et al., 2022b] dataset." However, it does not provide specific train/test/validation splits for the primary datasets used (FFHQ for training, Celeb V-HQ for evaluation), nor does it specify how the Celeb V-HQ dataset was partitioned for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory amounts used for running the experiments. It mentions software like "Stable Diffusion 1.5" and "GMFlow" but no hardware. |
| Software Dependencies | Yes | For specialization, we initialize with the pretrained weights of Stable Diffusion 1.5. ... We employ Pixtral 2 for automatic captioning. ... We leverage GMFlow [Xu et al., 2022a] for optical flow prediction in TTA. |
| Experiment Setup | Yes | We opt for Adam [Kingma, 2014] optimizer with a batch size of 8 and a learning rate of 2.5e 6. For temporal modeling, we configure window length to w = 3 for SWCFA. During editing, we use DDIM [Song et al., 2021] sampling and inversion with T = 50 timesteps. A negative prompt [Ban et al., 2025] scheme is adopted, where the original prompt serves as the negative prompt to enhance editing effectiveness, with guidance scale set to 5. We use τapp = 0.9 for texture-level edits and τapp = 0.7 for shape-altering edits. |