Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Generalized Video Moment Retrieval

Authors: Qin You, Qilong Wu, Yicong Li, Wei Ji, Li Li, Pengcheng Cai, Lina Wei, Roger Zimmermann

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENT 5.1 EVALUATION METRICS 5.2 IMPLEMENTATION DETAILS 5.3 COMPARISON WITH STATE-OF-THE-ART METHODS 5.4 ABLATION STUDY
Researcher Affiliation Academia 1Nanjing University, 2National University of Singapore, 3Shanghai AI Laboratory 4University of Southern California, 5Zhejiang University EMAIL, EMAIL EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods and equations but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code: https://github.com/42xingxing/NEx T-VMR
Open Datasets Yes We build a specialized dataset, NEx T-VMR, which is derived from the YFCC100M dataset (Thomee et al., 2016) after meticulous construction and analysis. This dataset is tailored specifically for GVMR, featuring a diverse array of query types, including one-to-multi and no-target queries.
Dataset Splits No Figure 4 shows the distribution of these queries in the train, validation and test sets. This distribution reflects the diversity and balance of the dataset.
Hardware Specification Yes All speed tests were conducted on a single NVIDIA RTX A40 GPU.
Software Dependencies No The paper mentions several pre-trained models and optimizers (e.g., Slow Fast, CLIP, Adam W) but does not provide specific version numbers for software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes During training, Adam W (Ilya Loshchilov, 2019) optimizer with weight decay 1e-4 is adopted; the batch size is set at 32 for training and 128 for testing; the hidden dimension C = 256. We configured our transformer encoder and decoder with two layers each, denoted as T = 2. The hyperparameter settings were determined as follows: L = 10, λc = 4, λl1 = 10, λiou = 1, λbcl = λproxy = 0.1, for optimal performance. For the no-target threshold we set it as δ = 0.7 which is experimentally balance for target and no-target generalization.