Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval

Authors: Cheol-Ho Cho, WonJun Moon, WooJin Jun, MinSeok Jung, Jae-Pil Heo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Datasets and Metrics We evaluate our method on two large-scale video datasets, i.e. TVR (Lei et al. 2020) and Activity Net Captions (Krishna et al. 2017). ... We report the results comparing our approach with state-of-the-art methods... In Tab. 1 and Tab. 2, we show the results on the TVR and Activity Net Captions datasets. As observed, our proposed method outperforms previous works in all recall metrics... Ablation and Further Studies Component Analysis. To understand the effectiveness of each component, we conduct component ablation in Tab. 5.
Researcher Affiliation Academia Cheol-Ho Cho, Won Jun Moon, Woojin Jun, Min Seok Jung, and Jae-Pil Heo* Sungkyunkwan University EMAIL
Pseudocode No The paper describes the methodology and learning objectives using mathematical equations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block, nor structured code-like steps.
Open Source Code No The paper does not contain any explicit statements about making the source code available, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate our method on two large-scale video datasets, i.e. TVR (Lei et al. 2020) and Activity Net Captions (Krishna et al. 2017).
Dataset Splits Yes We adopt the data split provided by previous works (Zhang et al. 2020, 2021).
Hardware Specification Yes We analyzed three aspects: FLOPs, the number of parameters, and runtime required to process a single text query on Nvidia RTX 3090 GPU.
Software Dependencies No The paper mentions using specific models and features like Res Net, I3D, RoBERTa, and CLIP-L/14, but does not provide specific version numbers for underlying software libraries, programming languages, or operating systems.
Experiment Setup Yes While the typical parameters are set the same as in (Wang et al. 2024), the thresholds τs and τu were defined at each epoch using the distribution values of similarity and uncertainty from the training dataset. Particularly, τs is set to the mean value of the similarity distribution of the positive pairs, and τu is set to the value corresponding to the mean of uncertainty distribution of train dataset. More details are provided in the appendix.