reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

Authors: Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, Yang Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of our Fine-grained Video Editing framework (FVE), we validate on both video moment retrieval and video action editing tasks. ... Experiments on three datasets demonstrate the effectiveness of FVE to unseen novel semantic video moment retrieval tasks.
Researcher Affiliation	Collaboration	1Queen Mary University of London 2Sony AI 3Adobe Research 4WICT, Peking University 5State Key Laboratory of General Artiﬁcial Intelligence, Peking University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical formulations and processes using equations and structured text, but it does not contain a clearly labeled pseudocode block or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a direct link to a code repository for the methodology described. The text "Symbol indicates our implementation with the author-released code." refers to external methods, not the authors' own FVE code.
Open Datasets	Yes	To assess FVE for novel semantic VMR, we employed the novel-word split (Li et al. 2022) on Charades-STA (Gao et al. 2017). For QVHighlights (Lei, Berg, and Bansal 2021) and Ta Co S (Regneri et al. 2013), we sample sentences from the standard training split and exclude them from the training set.
Dataset Splits	Yes	To assess FVE for novel semantic VMR, we employed the novel-word split (Li et al. 2022) on Charades-STA (Gao et al. 2017). For QVHighlights (Lei, Berg, and Bansal 2021) and Ta Co S (Regneri et al. 2013), we sample sentences from the standard training split and exclude them from the training set. In our implementation, we selected 50/300/300 sentences separately from each dataset for data generation.
Hardware Specification	No	The acknowledgements section mentions "Queen Mary University of London s Apocrita HPC facility from QMUL RESEARCHIT" as support, but it does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for experiments.
Software Dependencies	No	The paper mentions using tools like CLIP (Radford et al. 2021), DINO (Caron et al. 2021), and the Dreambooth strategy (Ruiz et al. 2023), but it does not provide specific version numbers for any software dependencies, libraries, or programming languages used in their implementation.
Experiment Setup	Yes	For hybrid selection, we used CLIP (Radford et al. 2021) to compute the cross-modal relevance score and DINO (Caron et al. 2021) for the uni-modal structure score. We set k to 500, 1500 and 1500 respectively for the three datasets. For the model performance disparity metric, we set l to be 100, 500 and 500 respectively for each dataset. ... We observe the best combination is k=500 and l=100.