Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
Authors: Cheol-Ho Cho, WonJun Moon, WooJin Jun, MinSeok Jung, Jae-Pil Heo
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Datasets and Metrics We evaluate our method on two large-scale video datasets, i.e. TVR (Lei et al. 2020) and Activity Net Captions (Krishna et al. 2017). ... We report the results comparing our approach with state-of-the-art methods... In Tab. 1 and Tab. 2, we show the results on the TVR and Activity Net Captions datasets. As observed, our proposed method outperforms previous works in all recall metrics... Ablation and Further Studies Component Analysis. To understand the effectiveness of each component, we conduct component ablation in Tab. 5. |
| Researcher Affiliation | Academia | Cheol-Ho Cho, Won Jun Moon, Woojin Jun, Min Seok Jung, and Jae-Pil Heo* Sungkyunkwan University EMAIL |
| Pseudocode | No | The paper describes the methodology and learning objectives using mathematical equations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block, nor structured code-like steps. |
| Open Source Code | No | The paper does not contain any explicit statements about making the source code available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on two large-scale video datasets, i.e. TVR (Lei et al. 2020) and Activity Net Captions (Krishna et al. 2017). |
| Dataset Splits | Yes | We adopt the data split provided by previous works (Zhang et al. 2020, 2021). |
| Hardware Specification | Yes | We analyzed three aspects: FLOPs, the number of parameters, and runtime required to process a single text query on Nvidia RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using specific models and features like Res Net, I3D, RoBERTa, and CLIP-L/14, but does not provide specific version numbers for underlying software libraries, programming languages, or operating systems. |
| Experiment Setup | Yes | While the typical parameters are set the same as in (Wang et al. 2024), the thresholds τs and τu were defined at each epoch using the distribution values of similarity and uncertainty from the training dataset. Particularly, τs is set to the mean value of the similarity distribution of the positive pairs, and τu is set to the value corresponding to the mean of uncertainty distribution of train dataset. More details are provided in the appendix. |