Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Generalized Video Moment Retrieval
Authors: Qin You, Qilong Wu, Yicong Li, Wei Ji, Li Li, Pengcheng Cai, Lina Wei, Roger Zimmermann
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENT 5.1 EVALUATION METRICS 5.2 IMPLEMENTATION DETAILS 5.3 COMPARISON WITH STATE-OF-THE-ART METHODS 5.4 ABLATION STUDY |
| Researcher Affiliation | Academia | 1Nanjing University, 2National University of Singapore, 3Shanghai AI Laboratory 4University of Southern California, 5Zhejiang University EMAIL, EMAIL EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and equations but does not present any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/42xingxing/NEx T-VMR |
| Open Datasets | Yes | We build a specialized dataset, NEx T-VMR, which is derived from the YFCC100M dataset (Thomee et al., 2016) after meticulous construction and analysis. This dataset is tailored specifically for GVMR, featuring a diverse array of query types, including one-to-multi and no-target queries. |
| Dataset Splits | No | Figure 4 shows the distribution of these queries in the train, validation and test sets. This distribution reflects the diversity and balance of the dataset. |
| Hardware Specification | Yes | All speed tests were conducted on a single NVIDIA RTX A40 GPU. |
| Software Dependencies | No | The paper mentions several pre-trained models and optimizers (e.g., Slow Fast, CLIP, Adam W) but does not provide specific version numbers for software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | During training, Adam W (Ilya Loshchilov, 2019) optimizer with weight decay 1e-4 is adopted; the batch size is set at 32 for training and 128 for testing; the hidden dimension C = 256. We configured our transformer encoder and decoder with two layers each, denoted as T = 2. The hyperparameter settings were determined as follows: L = 10, λc = 4, λl1 = 10, λiou = 1, λbcl = λproxy = 0.1, for optimal performance. For the no-target threshold we set it as δ = 0.7 which is experimentally balance for target and no-target generalization. |