SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Authors: Yongle Huang, Haodong Chen, Zhenbang Xu, Zihan Jia, Haozhou Sun, Dian Shao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that Se FAR achieves stateof-the-art performance on two FAR datasets, Fine Gym and Fine Diving, across various data scopes, as well as two classical coarse-grained datasets, UCF101 and HMDB51. Further analysis and ablation studies validate the effectiveness of our designs. |
| Researcher Affiliation | Academia | 1Unmanned System Research Institute, Northwestern Polytechnical University, Xi an, China 2School of Automation, Northwestern Polytechnical University, Xi an, China 3School of Computer Science, Northwestern Polytechnical University, Xi an, China 4School of Software, Northwestern Polytechnical University, Xi an, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in narrative text and mathematical formulas without presenting a distinct pseudocode block or algorithm. |
| Open Source Code | Yes | Code https://github.com/Kyle Huang9/Se FAR |
| Open Datasets | Yes | We perform evaluations on fine-grained datasets Gym99, Gym288 (Shao et al. 2020), and Fine Diving (Xu et al. 2022a), as well as coarse-grained datasets UCF-101 (Soomro 2012) and HMDB-51 (Kuehne et al. 2011), using Top-1 accuracy as metrics. Additionally, we use the Something-Something V2 (Sth.Sth.) dataset (Goyal et al. 2017) in ablation studies. |
| Dataset Splits | Yes | The labeling rates of the data are indicated by 5% , 10% , and 20% in the datasets. (Table 1 caption) |
| Hardware Specification | No | The paper does not provide specific hardware details (GPU models, CPU types, etc.) used for running its experiments. |
| Software Dependencies | No | The paper mentions various models and frameworks like Vi T, Time Sformer, Fix Match, Vicuna-7B, CLIP-Vi T, and EVA-CLIP, but does not provide specific version numbers for any underlying software libraries or programming languages. |
| Experiment Setup | Yes | We employ the Vi T (Dosovitskiy 2020) extended model Time Sformer (Bertasius, Wang, and Torresani 2021) as the backbone. We instantiate the Se FAR-S model based on Vi T-S... We configure the sampling combination by default as {2 2 4} for Se FAR, as commonly used 8-frame input. Tables 1 and 2 also specify the number of input frames (#F 8) and epochs (Epoch 30) for Se FAR. |