reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Authors: Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate our method achieves state-of-the-art performance in realworld scenarios. Our code, data, and demos are available on the project page. (...) 4 EXPERIMENTS
Researcher Affiliation	Academia	1 Re LER Lab, AAII, University of Technology Sydney 2 Re LER Lab, CCAI, Zhejiang University
Pseudocode	No	The paper describes the methodology using textual descriptions and diagrams (Figure 4) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code, data, and demos are available on the project page. Project Page: https://knightyxp.github.io/Video Grain_project_page
Open Datasets	Yes	We evaluate our Video Grain using a dataset of 76 video-text pairs, including videos from DAVIS (Perazzi et al., 2016), TGVE1, and the Internet2 , with 16-32 frames per video. 1https://sites.google.com/view/loveucvpr23/track4 2https://www.istockphoto.com/ and https://www.pexels.com/
Dataset Splits	No	We evaluate our Video Grain using a dataset of 76 video-text pairs, including videos from DAVIS (Perazzi et al., 2016), TGVE1, and the Internet2 , with 16-32 frames per video. The paper does not provide specific details on how this dataset is split into training, validation, or test sets.
Hardware Specification	Yes	All The experiments are conducted on an NVIDIA A40 GPU.
Software Dependencies	Yes	In the experiment, we adopt the pretrained Stable Diffusion v1.5 as the base model, using 50 steps of DDIM inversion and denoising. Our Video Grain operates in a zero-shot manner, requiring no additional parameter tuning.
Experiment Setup	Yes	In the experiment, we adopt the pretrained Stable Diffusion v1.5 as the base model, using 50 steps of DDIM inversion and denoising. Our Video Grain operates in a zero-shot manner, requiring no additional parameter tuning. To enhance memory efficiency, we re-engineer slice attention within our ST Layout Attn. ST Layout Attn is applied during the first 15 denoising steps. We set ξ(t) = 0.3 t5 for self-attention and ξ(t) = t5 for cross-attention, where the timestep t [0, 1] is normalized. All The experiments are conducted on an NVIDIA A40 GPU.