Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

Authors: Jingyi Yang, Zitong YU, Nixiuming, He Jia, Hui Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various benchmarks and learning scenarios demonstrate the superiority and generality of our approach. Code is available at https://github.com/yjyddq/CLAVER. 1 INTRODUCTION ... Extensive qualitative and quantitative experiments demonstrate the effectiveness of CLAVER. Our method achieves superior or competitive performance on Kinetics-400 and Kinetics-600 under fully-supervised scenario, and on HMDB-51 and UCF-101 under zeroshot, few-shot scenarios. ... 4 EXPERIMENTS 4.1 IMPLEMENTATION DETAILS 4.2 COMPARISON RESULTS 4.3 ABLATION STUDY
Researcher Affiliation Collaboration Jingyi Yang1 , Zitong Yu23 , Xiuming Ni4, Jia He4, Hui Li1 1University of Science and Technology of China, 2Great Bay University, 3Dongguan Key Laboratory for Intelligence and Information Technology, 4Anhui Tsinglink Information Technology Co.,Ltd. EMAIL, EMAIL, EMAIL, mythlee@.ustc.edu.cn
Pseudocode No The paper describes methods and mathematical formulations in text and equations (e.g., in Section 3.2 Kronecker Mask Attention), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code Yes Code is available at https://github.com/yjyddq/CLAVER.
Open Datasets Yes We evaluate the performance of our method on four benchmarks: Kinetics400 Kay et al. (2017), Kientics-600 Carreira et al. (2018), UCF-101 Soomro et al. (2012), HMDB-51 Kuehne et al. (2011).
Dataset Splits Yes Few-shot experiments setting. We randomly sample 2, 4, 8 and 16 videos from each class on UCF101 and HMDB-51 constructing the training set. For evaluation, we use the first split of the test set on UCF-101 and HMDB-51. Zero-shot Experiments. ... 1) Evaluation for HMDB-51 and UCF-101. Following, the prediction is conducted on the three splits of the test data, and we report the average top-1 accuracy and standard deviation. 2) Evaluation for Kinetics600. Following, the 220 new categories outside Kinetics-400 in Kinetics-600 are used for evaluation. The evaluation is conducted three times. For each iteration, we randomly sampled 160 categories for evaluation from the 220 categories in Kinetics-600.
Hardware Specification Yes The experiments are conducted on 8 NVIDIA 80G A100 GPUs.
Software Dependencies No The paper mentions several components like CLIP-B/32, Vi T, KMT/KMCT transformer, and Adam W optimizer, but it does not specify version numbers for any general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Architectures and hyperparameters. ... The detailed hyperparameter settings are provided in Appendix D. ... Table 14: The training hyperparameters settings of experiments. Config Fully-sup Few-shot Zero-shot Optimizer Adam W Base learning rate 12e-6 2e-6 12e-6 Minimal learning rate 12e-8 2e-8 12e-8 Weight decay 0.001 Optimizer betas β1, β2 =0.9, 0.98 Batch size 128 (Vi T-B) 32 (Vi T-L) Learning rate schedule Cosine decay Warmup epochs 5 Training epochs 60 (Vi T-B) 40 (Vi T-L) 80 (20 on K400) 0 (20 on K400) Augmentation Random Flip, Multi Scale Crop, Color Jitter Gray Scale, Label smoothing, Mixup, Cutmix