Active Multimodal Distillation for Few-shot Action Recognition

Authors: Weijia Feng, Yichen Zhu, Ruojia Zhang, Chenyang Wang, Fei Ma, Xiaobao Wang, Xiaobai Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing approaches. 4 Experiments 4.1 Validation Protocol Datasets. We assess our proposed method on four prominent and challenging benchmarks for few-shot action recognition: Kinetics-400 [Kay et al., 2017], Something-Something V2 [Goyal et al., 2017], HMDB51 [Wang et al., 2015], and UCF101 [Peng et al., 2018]. 4.3 Comparative Experiments 4.4 Ablation Study
Researcher Affiliation Academia 1College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China 2College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 3Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 4College of Intelligence and Computing, Tianjin University, Tianjin, China 5The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods through narrative text and mathematical equations, but does not include a distinct section or figure labeled 'Pseudocode' or 'Algorithm', nor does it present any structured algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about making the source code available, nor does it provide a link to a code repository for the described methodology.
Open Datasets Yes Datasets. We assess our proposed method on four prominent and challenging benchmarks for few-shot action recognition: Kinetics-400 [Kay et al., 2017], Something-Something V2 [Goyal et al., 2017], HMDB51 [Wang et al., 2015], and UCF101 [Peng et al., 2018].
Dataset Splits No In the meta-training phase, we utilize a multimodal video dataset Dtrain that encompasses base action classes Ctrain. ... In the meta-test phase, we employ a multimodal dataset Dtest, which includes novel action classes Ctest that are disjoint from the training classes (Ctest Ctrain = ). Similar to the meta-training phase, the support and query sets for each test task are constructed in the same manner. ... Within the N way K shot metalearning setting, the query set Q = {(xr i , xf i , yi)}M i=1 includes M multimodal query samples. ... The support set S = {(xr i , xf i , yi)}M+NK i=M+1 contains K multimodal samples for each of the N classes.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as exact GPU or CPU models.
Software Dependencies No The paper mentions using the SGD optimizer and specific pre-trained models (Res Net-50, I3D) but does not provide specific version numbers for any software dependencies, such as deep learning frameworks or programming languages.
Experiment Setup Yes Parameters. In the meta-training phase, the balance weight (λ, specified in Eq. 9) is uniformly set to 1.0 across all benchmarks. Training is conducted using the SGD optimizer. For both RGB and optical flow modalities, the respective networks are iteratively updated by minimizing a combined weighted loss function, which includes both cross-entropy and distillation losses, until convergence is achieved. The learning rate γ is set as 10 3.