reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment Retrieval

Authors: Weijia Liu, Jiuxin Cao, Bo Miao, Zhiheng Fu, Xuelin Zhu, Jiawei Ge, Bo Liu, Mehwish Nasim, Ajmal Mian

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Charades-STA and QVHighlights demonstrate that our approach surpasses state-of-the-art methods on all metrics. Furthermore, our denoise-then-retrieve paradigm is adaptable and can be seamlessly integrated into advanced VMR models to boost performance. ... Experiments on the Charades-STA and QVHighlights benchmarks show that our approach significantly outperforms existing state-of-the-art methods on all metrics. On Charades STA, we surpass the nearest competitor MESM [Liu et al., 2024b] by 4.36% points on the m AP@0.7 metric.
Researcher Affiliation	Academia	1Southeast University 2The University of Adelaide 3The Hong Kong Polytechnic University 4The University of Western Australia EMAIL
Pseudocode	No	The paper describes the method using textual descriptions and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing its source code, nor does it provide a direct link to a code repository for the methodology described.
Open Datasets	Yes	Datasets. We validate the effectiveness of our method through extensive experiments on two popular datasets: QVHighlights and Charades-STA. QVHighlights [Lei et al., 2021] is designed for moment retrieval and highlight detection... Charades STA [Sigurdsson et al., 2016] is focused on temporal sentence grounding, derived from the Charades dataset.
Dataset Splits	Yes	QVHighlights... We follow the original data splits, using the training set for model training and the test set for evaluation... Charades STA [Sigurdsson et al., 2016]... It contains 12,408 training and 3,720 testing moment-sentence pairs...
Hardware Specification	Yes	All experiments are conducted on a single RTX 3090 GPU.
Software Dependencies	No	The paper mentions using "pre-extracted Slow Fast and CLIP video features, and CLIP text features" and refers to models like "Mamba" and "Transformer", but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions) that are critical for reproducibility.
Experiment Setup	Yes	Implementation Details. For a fair comparison, we use pre-extracted Slow Fast and CLIP video features, and CLIP text features, for both datasets, provided by [Lin et al., 2023]. In our DRNet, all encoders constructed using CIO consist of three CIO layers, each with a hidden size of D = 1024. Loss weights are set as: λt = 2, λg L1 = 5, λg iou = 1, λb L1 = 10, λb iou = 1, and λc = 10 for both datasets. For QVHighlights, λintra and λinter are set to 2 each, while for Charades-STA, they are set to 1 and 0.5, respectively.