reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffFERV: Diffusion-based Facial Editing of Real Videos

Authors: Xiangyi Chen, Han Xue, Li Song

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Diff FERV achieves state-of-the-art performance in both reconstruction and editing tasks. ... Extensive evaluations demonstrate that Diff FERV excels in preserving facial identity, ensuring temporal consistency, especially when handling challenging real-world data. Diff FERV sets a new benchmark for robust, generalizable, and high-quality face video editing. ... Qualitative Results In Fig. 4, we present reconstruction results. ... Quantitative Results Table 1 shows that Diff FERV achieves the highest scores across all reconstruction metrics. ... Ablation Studies
Researcher Affiliation	Academia	Xiangyi Chen1 , Han Xue2Q , Li Song1Q 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2School of Computer Science and Technology, Donghua University EMAIL, EMAIL, song EMAIL
Pseudocode	No	The paper describes its methodology in text and uses mathematical equations for clarification (e.g., equations 1-8). However, it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Munchkin Chen/Diff FERV.
Open Datasets	Yes	For specialization, we initialize with the pretrained weights of Stable Diffusion 1.5. We utilize the FFHQ dataset [Karras et al., 2019] as our training dataset. ... We evaluate Diff FERV on Celeb V-HQ [Zhu et al., 2022].
Dataset Splits	No	The paper states: "Within our dataset, we integrate 10% of image-text pairs sampled from the LAION-2B-en [Rombach et al., 2022b] dataset." However, it does not provide specific train/test/validation splits for the primary datasets used (FFHQ for training, Celeb V-HQ for evaluation), nor does it specify how the Celeb V-HQ dataset was partitioned for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory amounts used for running the experiments. It mentions software like "Stable Diffusion 1.5" and "GMFlow" but no hardware.
Software Dependencies	Yes	For specialization, we initialize with the pretrained weights of Stable Diffusion 1.5. ... We employ Pixtral 2 for automatic captioning. ... We leverage GMFlow [Xu et al., 2022a] for optical flow prediction in TTA.
Experiment Setup	Yes	We opt for Adam [Kingma, 2014] optimizer with a batch size of 8 and a learning rate of 2.5e 6. For temporal modeling, we configure window length to w = 3 for SWCFA. During editing, we use DDIM [Song et al., 2021] sampling and inversion with T = 50 timesteps. A negative prompt [Ban et al., 2025] scheme is adopted, where the original prompt serves as the negative prompt to enhance editing effectiveness, with guidance scale set to 5. We use τapp = 0.9 for texture-level edits and τapp = 0.7 for shape-altering edits.