VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention
Authors: Jiangning Wei, Lixiong Qin, Bo Yu, Tianjian Zou, Chuhan Yan, Dandan Xiao, Yang Yu, Lan Yang, Ke Li, Jun Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments confirm that VA-AR achieves state-of-the-art performance on the same five datasets, demonstrating VA-AR s effectiveness across a broad spectrum of action recognition scenarios. We conducted a comprehensive comparison with 11 methods based on skeleton data for action recognition. The experiments covered five large datasets, with published results copied into the paper, and unpublished results obtained through retraining on these datasets. We conducted rigorous validations on both Joint data and multi-modalities data. |
| Researcher Affiliation | Academia | 1Beijing University of Posts and Telecommunications 2Macau University of Science and Technology 3China Institute of Sport Science 4Beijing Sport University EMAIL |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code github.com/Trinity Neo99/VA-AR official |
| Open Datasets | Yes | We meticulously selected five representative action recognition datasets to comprehensively assess the performance of the proposed method. These datasets comprise NTU RGB+D (NTU-60) (Shahroudy et al. 2016), NTU RGB+D (NTU-120) (Liu et al. 2019), P2A (Bian et al. 2022), Olympic Badminton (Ghosh, Singh, and Jawahar 2018), and Fine Gym (Shao et al. 2020). |
| Dataset Splits | No | The paper refers to 'X-Sub', 'X-View', and 'X-Set' as splits used for reporting results in Table 1 for specific datasets, which are standard benchmarks. However, it does not explicitly provide the split percentages, sample counts, or the methodology for these splits within the main text. It also mentions 'We mixed the test sets of the five datasets and carefully distinguished them based on action speed' but this is for analysis, not the original experimental splits. |
| Hardware Specification | Yes | During the experimental process, we employed two NVIDIA 3090 GPUs for training, which encompassed a total of 60 training epochs. |
| Software Dependencies | No | The paper mentions using a Graph Convolutional Network (GCN) and SGD optimizer but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We have selected to use the Graph Convolutional Network (GCN) as the Spatial Module and have configured three STBlocks. Additionally, we have employed three different window sizes of 4, 8, and 16. During the experimental process, we employed two NVIDIA 3090 GPUs for training, which encompassed a total of 60 training epochs. For the optimizer, we have chosen SGD with a momentum of 0.9 and a weight decay parameter of 0.0001, with batch size setting to 32. The maximum temporal lengths for the NTU-60, NTU-120, and Fine Gym datasets were set to 256; whereas, for the P2A and Olympic Badminton datasets, the maximum temporal lengths were set to 128. |