reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Authors: Koichi Namekata, Sherwin Bahmani, Ziyi Wu, Yash Kant, Igor Gilitschenski, David Lindell

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to show superior performance over zero-shot baselines while significantly narrowing down the performance gap with supervised models in terms of visual quality and motion fidelity.
Researcher Affiliation	Academia	1University of Toronto, 2Vector Institute
Pseudocode	No	The paper describes the methodology in text and through diagrams (e.g., Figure 3 provides an overview), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Additional details and video results are available on our project page: https://kmcode1.github.io/Projects/SG-I2V.
Open Datasets	Yes	Following prior works (Wu et al., 2024c; Zhou et al., 2024), we evaluate our method on the validation set of the VIPSeg dataset (Miao et al., 2022).
Dataset Splits	Yes	Following prior works (Wu et al., 2024c; Zhou et al., 2024), we evaluate our method on the validation set of the VIPSeg dataset (Miao et al., 2022). We test on the same control regions and target trajectories as Drag Anything, where the size of our bounding boxes is the same as the diameter of the circles in their work. [...] Additionally, we exclude ground truth trajectory points that fall outside the image space due to objects moving out of frame. We also omit short videos with fewer than 14 frames from the evaluation.
Hardware Specification	Yes	The runtime depends on the number of trajectory conditions, with an average runtime of 305 seconds on the VIPSeg dataset with A6000 48GB.
Software Dependencies	No	The paper mentions the use of a "discrete Euler scheduler" (Karras et al., 2022), "Adam W optimizer" (Loshchilov & Hutter, 2019), "Co-Tracker" (Karaev et al., 2024), and a "Butterworth filter", but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	In all experiments, we leverage the image-to-video variant of Stable Video Diffusion (Blattmann et al., 2023a) to generate videos with 14 frames and 576 1024 resolution. The default discrete Euler scheduler (Karras et al., 2022) is applied with T = 50 sampling steps. We extract feature maps from the last two self-attention layers from the middle stage in the denoising U-Net. We optimize Eq. (1) at the early denoising timesteps t [45, 44, ..., 30] for 5 iterations per timestep. We use the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 0.21. [...] During loss calculation, Gaussian heatmap Gb is constructed following (Wu et al., 2024c), where a heatmap for a bounding box of size (hb, wb) is created by Gaussian distribution with standard deviation σ = (0.2hb, 0.2wb). For the low-pass filter Hγ, we set the cut-off frequency γ to 0.5.