Active Multimodal Distillation for Few-shot Action Recognition
Authors: Weijia Feng, Yichen Zhu, Ruojia Zhang, Chenyang Wang, Fei Ma, Xiaobao Wang, Xiaobai Li
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing approaches. 4 Experiments 4.1 Validation Protocol Datasets. We assess our proposed method on four prominent and challenging benchmarks for few-shot action recognition: Kinetics-400 [Kay et al., 2017], Something-Something V2 [Goyal et al., 2017], HMDB51 [Wang et al., 2015], and UCF101 [Peng et al., 2018]. 4.3 Comparative Experiments 4.4 Ablation Study |
| Researcher Affiliation | Academia | 1College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China 2College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 3Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 4College of Intelligence and Computing, Tianjin University, Tianjin, China 5The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods through narrative text and mathematical equations, but does not include a distinct section or figure labeled 'Pseudocode' or 'Algorithm', nor does it present any structured algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available, nor does it provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | Datasets. We assess our proposed method on four prominent and challenging benchmarks for few-shot action recognition: Kinetics-400 [Kay et al., 2017], Something-Something V2 [Goyal et al., 2017], HMDB51 [Wang et al., 2015], and UCF101 [Peng et al., 2018]. |
| Dataset Splits | No | In the meta-training phase, we utilize a multimodal video dataset Dtrain that encompasses base action classes Ctrain. ... In the meta-test phase, we employ a multimodal dataset Dtest, which includes novel action classes Ctest that are disjoint from the training classes (Ctest Ctrain = ). Similar to the meta-training phase, the support and query sets for each test task are constructed in the same manner. ... Within the N way K shot metalearning setting, the query set Q = {(xr i , xf i , yi)}M i=1 includes M multimodal query samples. ... The support set S = {(xr i , xf i , yi)}M+NK i=M+1 contains K multimodal samples for each of the N classes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as exact GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the SGD optimizer and specific pre-trained models (Res Net-50, I3D) but does not provide specific version numbers for any software dependencies, such as deep learning frameworks or programming languages. |
| Experiment Setup | Yes | Parameters. In the meta-training phase, the balance weight (λ, specified in Eq. 9) is uniformly set to 1.0 across all benchmarks. Training is conducted using the SGD optimizer. For both RGB and optical flow modalities, the respective networks are iteratively updated by minimizing a combined weighted loss function, which includes both cross-entropy and distillation losses, until convergence is achieved. The learning rate γ is set as 10 3. |