reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Scale Contrastive Learning for Video Temporal Grounding

Authors: Thong Thanh Nguyen, Yi Bin, Xiaobao Wu, Zhiyuan Hu, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our framework for not only long-form but also short-form video grounding. ... Experiments To validate the effectiveness, we conduct extensive experiments against recent methods for temporal grounding. We also perform ablation study to investigate each component. ... Ablation Study We conduct extensive experiments on TACoS to study the influence of the design choices.
Researcher Affiliation	Academia	1 Institute of Data Science (IDS), National University of Singapore, Singapore 2 Tongji University, China, 3 Nanyang Technological University (NTU), Singapore
Pseudocode	No	The paper describes the methodology using mathematical equations and textual descriptions, for example, under 'Cross-scale Contrastive Learning' section. However, there are no clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Following previous works, we work on five challenging datasets of temporal grounding, which belong to two main categories, i.e. 1) Long videos, many queries (Ego4D-NLQ (Grauman et al. 2022), MAD (Soldan et al. 2022), and TACoS (Regneri et al. 2013)) and 2) Short videos, few queries (Activity Net-Captions (Krishna et al. 2017) and Charades-STA (Sigurdsson et al. 2016)).
Dataset Splits	No	The paper mentions specific datasets used and refers to evaluation metrics like 'R@K, t Io U', but it does not provide explicit details about how these datasets were split into training, validation, and test sets. It mentions 'video-centric sampling approach (Mu, Mo, and Li 2024)' for Ego4D-NLQ but this is not a general dataset split description.
Hardware Specification	No	The paper discusses various pre-trained video and textual features (e.g., SlowFast, BERT, Ego VLP, CLIP), but it does not specify any hardware details like GPU models, CPU types, or memory used to run the experiments.
Software Dependencies	No	The paper mentions using several pre-trained models and features (e.g., BERT, SlowFast, CLIP, C3D, GloVe), but it does not specify any software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	For Ego4D-NLQ, we use pre-trained 1) Slow Fast video features (Feichtenhofer et al. 2019) with BERT textual features (Devlin et al. 2018), and 2) Ego VLP video and textual features (Lin et al. 2022). For testing, we report R@{1, 5}, t Io U = {0.3, 0.5}. ... For both withinscale and cross-scale contrastive learning implementation, we keep the size of the negative sample set N(l) in every level l to be equal to the size of the positive video clips P(l) that correspond to the target video moments. Based upon validation and fair comparison with previous methods, we use ρref = ρwithin = ρcross = 1.0.