reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models

Authors: Yifang Xu, Yunzhuo Sun, Benxiang Zhai, Ming Li, Wenxin Liang, Yang Li, Sidan Du

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our method outperforms the SOTA MLLM-based and zero-shot models on several public datasets, including QVHighlights, Activity Net-Captions, and Charades-STA.
Researcher Affiliation	Academia	Yifang Xu1, Yunzhuo Sun2, Benxiang Zhai1, Ming Li1, Wenxin Liang2, Yang Li1, Sidan Du1 1Nanjing University 2Dalian University of Technology EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes steps for query debiasing, span generation, and span selection in paragraph text, for example in Section 3.3 describing the Span generator: 'Firstly, we compute the inverse cumulative histogram of Sf i , with η bins. We then traverse these bins in reverse order to find the first bin containing at least κ moments, using its left endpoint value as the adaptive threshold γ. Next, we iterate through Sf i in temporal order. If Sf i,j exceeds γ, the corresponding moment is marked as the starting moment. When the similarities of τ consecutive moments all fall below γ, we mark the final moment with a similarity exceeding γ as the ending moment. Finally, we repeat the above process to generate a set of candidate spans T p from Sf:'
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for Moment-GPT nor provides a direct link to a code repository for its methodology. It mentions 'github' in the context of Mini GPT-v2 but not for the current work.
Open Datasets	Yes	To evaluate our proposed method, we conduct experiments on three datasets with different topics: QVHighlights (Lei et al. 2021), Charades-STA (Gao et al. 2017), Activity Net-Captions (Krishna et al. 2017).
Dataset Splits	Yes	We conduct experiments on three datasets with different topics: QVHighlights (Lei et al. 2021), Charades-STA (Gao et al. 2017), Activity Net-Captions (Krishna et al. 2017). Table 1 presents performance metrics for QVHighlights on both 'test' and 'val' sets, indicating the use of predefined splits for this benchmark dataset.
Hardware Specification	Yes	All experiments are conducted on 1 NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions several MLLM models used (LLa MA-3-8B, Mini GPT-v2-7B, Video-Chat GPT based on Vicuna-7B-v1.1) but does not provide specific ancillary software details such as programming language versions, library versions, or specific solver versions.
Experiment Setup	Yes	Following previous works (Huang et al. 2023a; Lei et al. 2021), we set the frame rates of videos from Charades-STA, Activity Net-Captions, and QVHighlights to 1, 1, and 0.5, respectively. The employed MLLM models include LLa MA-3-8B, Mini GPT-v2-7B, and Video-Chat GPT based on Vicuna-7B-v1.1 (Zheng et al. 2024). To reduce the randomness of results, we configure the temperatures of LLa MA-3, Mini GPTv2, and Video-Chat GPT to 0.3, 0.2, and 0.2, respectively. The number of histogram bins η is empirically fixed to 10. The hidden dimension d of LLa MA-3 is 4096. We set the number of debiased queries Nd to 3, the counting threshold κ to 7, the number of consecutive moments τ to 5, the distance coefficient λ to 0.2, and the Io U threshold σ in NMS to 0.9.