SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Authors: Yongle Huang, Haodong Chen, Zhenbang Xu, Zihan Jia, Haozhou Sun, Dian Shao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Se FAR achieves stateof-the-art performance on two FAR datasets, Fine Gym and Fine Diving, across various data scopes, as well as two classical coarse-grained datasets, UCF101 and HMDB51. Further analysis and ablation studies validate the effectiveness of our designs.
Researcher Affiliation Academia 1Unmanned System Research Institute, Northwestern Polytechnical University, Xi an, China 2School of Automation, Northwestern Polytechnical University, Xi an, China 3School of Computer Science, Northwestern Polytechnical University, Xi an, China 4School of Software, Northwestern Polytechnical University, Xi an, China EMAIL, EMAIL
Pseudocode No The paper describes the methodology in narrative text and mathematical formulas without presenting a distinct pseudocode block or algorithm.
Open Source Code Yes Code https://github.com/Kyle Huang9/Se FAR
Open Datasets Yes We perform evaluations on fine-grained datasets Gym99, Gym288 (Shao et al. 2020), and Fine Diving (Xu et al. 2022a), as well as coarse-grained datasets UCF-101 (Soomro 2012) and HMDB-51 (Kuehne et al. 2011), using Top-1 accuracy as metrics. Additionally, we use the Something-Something V2 (Sth.Sth.) dataset (Goyal et al. 2017) in ablation studies.
Dataset Splits Yes The labeling rates of the data are indicated by 5% , 10% , and 20% in the datasets. (Table 1 caption)
Hardware Specification No The paper does not provide specific hardware details (GPU models, CPU types, etc.) used for running its experiments.
Software Dependencies No The paper mentions various models and frameworks like Vi T, Time Sformer, Fix Match, Vicuna-7B, CLIP-Vi T, and EVA-CLIP, but does not provide specific version numbers for any underlying software libraries or programming languages.
Experiment Setup Yes We employ the Vi T (Dosovitskiy 2020) extended model Time Sformer (Bertasius, Wang, and Torresani 2021) as the backbone. We instantiate the Se FAR-S model based on Vi T-S... We configure the sampling combination by default as {2 2 4} for Se FAR, as commonly used 8-frame input. Tables 1 and 2 also specify the number of input frames (#F 8) and epochs (Epoch 30) for Se FAR.