Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
Authors: Jingyi Yang, Zitong YU, Nixiuming, He Jia, Hui Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various benchmarks and learning scenarios demonstrate the superiority and generality of our approach. Code is available at https://github.com/yjyddq/CLAVER. 1 INTRODUCTION ... Extensive qualitative and quantitative experiments demonstrate the effectiveness of CLAVER. Our method achieves superior or competitive performance on Kinetics-400 and Kinetics-600 under fully-supervised scenario, and on HMDB-51 and UCF-101 under zeroshot, few-shot scenarios. ... 4 EXPERIMENTS 4.1 IMPLEMENTATION DETAILS 4.2 COMPARISON RESULTS 4.3 ABLATION STUDY |
| Researcher Affiliation | Collaboration | Jingyi Yang1 , Zitong Yu23 , Xiuming Ni4, Jia He4, Hui Li1 1University of Science and Technology of China, 2Great Bay University, 3Dongguan Key Laboratory for Intelligence and Information Technology, 4Anhui Tsinglink Information Technology Co.,Ltd. EMAIL, EMAIL, EMAIL, mythlee@.ustc.edu.cn |
| Pseudocode | No | The paper describes methods and mathematical formulations in text and equations (e.g., in Section 3.2 Kronecker Mask Attention), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Code is available at https://github.com/yjyddq/CLAVER. |
| Open Datasets | Yes | We evaluate the performance of our method on four benchmarks: Kinetics400 Kay et al. (2017), Kientics-600 Carreira et al. (2018), UCF-101 Soomro et al. (2012), HMDB-51 Kuehne et al. (2011). |
| Dataset Splits | Yes | Few-shot experiments setting. We randomly sample 2, 4, 8 and 16 videos from each class on UCF101 and HMDB-51 constructing the training set. For evaluation, we use the first split of the test set on UCF-101 and HMDB-51. Zero-shot Experiments. ... 1) Evaluation for HMDB-51 and UCF-101. Following, the prediction is conducted on the three splits of the test data, and we report the average top-1 accuracy and standard deviation. 2) Evaluation for Kinetics600. Following, the 220 new categories outside Kinetics-400 in Kinetics-600 are used for evaluation. The evaluation is conducted three times. For each iteration, we randomly sampled 160 categories for evaluation from the 220 categories in Kinetics-600. |
| Hardware Specification | Yes | The experiments are conducted on 8 NVIDIA 80G A100 GPUs. |
| Software Dependencies | No | The paper mentions several components like CLIP-B/32, Vi T, KMT/KMCT transformer, and Adam W optimizer, but it does not specify version numbers for any general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Architectures and hyperparameters. ... The detailed hyperparameter settings are provided in Appendix D. ... Table 14: The training hyperparameters settings of experiments. Config Fully-sup Few-shot Zero-shot Optimizer Adam W Base learning rate 12e-6 2e-6 12e-6 Minimal learning rate 12e-8 2e-8 12e-8 Weight decay 0.001 Optimizer betas β1, β2 =0.9, 0.98 Batch size 128 (Vi T-B) 32 (Vi T-L) Learning rate schedule Cosine decay Warmup epochs 5 Training epochs 60 (Vi T-B) 40 (Vi T-L) 80 (20 on K400) 0 (20 on K400) Augmentation Random Flip, Multi Scale Crop, Color Jitter Gray Scale, Label smoothing, Mixup, Cutmix |