TEST-V: TEst-time Support-set Tuning for Zero-shot Video Classification
Authors: Rui Yan, Jin Wang, Hongyu Qu, Xiaoyu Du, Dong Zhang, Jinhui Tang, Tieniu Tan
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TEST-V achieves state-of-theart results across four benchmarks and shows good interpretability. Extensive experimental results show that TEST-V improves the ... by 2.98%, 2.15%, and 1.83% absolute average accuracy respectively across four benchmarks. We evaluate the effectiveness of the proposed method on four popular video benchmarks, i.e., HMDB-51 [Kuehne et al., 2011], UCF-101 [Soomro et al., 2012], Kinetics-600 [Carreira et al., 2018], Activity Net [Fabian Caba Heilbron and Niebles, 2015]. |
| Researcher Affiliation | Academia | Rui Yan1,2 , Jin Wang1 , Hongyu Qu1 , Xiaoyu Du1 , Dong Zhang3 , Jinhui Tang1 and Tieniu Tan2 1Nanjing University of Science and Technology 2Nanjing University 3Hong Kong University of Science and Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and diagrams (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released or provide a link to a repository. There is an arXiv link to an extended version of the paper, but no direct code link. |
| Open Datasets | Yes | We evaluate the effectiveness of the proposed method on four popular video benchmarks, i.e., HMDB-51 [Kuehne et al., 2011], UCF-101 [Soomro et al., 2012], Kinetics-600 [Carreira et al., 2018], Activity Net [Fabian Caba Heilbron and Niebles, 2015]. |
| Dataset Splits | Yes | We evaluate the effectiveness of the proposed method on four popular video benchmarks, i.e., HMDB-51 [Kuehne et al., 2011], UCF-101 [Soomro et al., 2012], Kinetics-600 [Carreira et al., 2018], Activity Net [Fabian Caba Heilbron and Niebles, 2015]. These are well-known benchmark datasets with standard, predefined splits for evaluation. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions pre-trained Vision Language Models (VLMs) like CLIP [Radford et al., 2021], BIKE [Wu et al., 2023], Vi Fi-CLIP [Rasheed et al., 2023], and LLMs (Chat GPT [Open AI, 2023], Gemini [Team et al., 2024], Llama-3 [AI@Meta, 2024], Claude-3 [Anthropic, 2024]) and Text-to-Video models (La Vie [Wang et al., 2023b], Show-1 [Zhang et al., 2023a], Hi Gen [Qing et al., 2024], TF-T2V [Wang et al., 2024], Model Scope T2V [Wang et al., 2023a]) but does not provide specific version numbers for these software components or other ancillary software dependencies required for reproduction. |
| Experiment Setup | No | The paper discusses the methodology, components (MSD, TSE), and ablations of certain parameters like 'n' (number of repeatedly generated videos) for support-set construction, and different sampling strategies for multi-scale temporal tuning. However, it does not provide specific hyperparameters such as learning rates, batch sizes, optimizers, or other detailed training configurations used for fine-tuning or optimization processes. |