Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion

Authors: Haipeng Chen, Yuheng Yang, Yingda Lyu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach sets the state-of-the-art performance on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. ... In this section, we conduct extensive experiments to empirically evaluate the performance of our method on three benchmark action recognition datasets.
Researcher Affiliation Academia Haipeng Chen,1,2, Yuheng Yang1,2*, Yingda Lyu1,3* 1College of Computer Science and Technology, Jilin University, 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 3Public Computer Education and Research Center, Jilin University EMAIL, EMAIL, EMAIL,
Pseudocode No The paper describes its methodology using mathematical equations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The implementations have been released, hoping to facilitate future research.
Open Datasets Yes We adopt three widely used action recognition benchmark datasets, namely the NTU-RGB+D 60 dataset, the NTU-RGB+D 120 dataset, and the Northwestern UCLA dataset, to evaluate the proposed method. NTU-RGB+D (Shahroudy et al. 2016) is designed for the skeleton-based action recognition task. ... NTU-RGB+D 120 (Liu et al. 2019a) is currently the largest 3D skeleton-based action recognition dataset... Northwestern-UCLA (Wang et al. 2014) consists of 1,494 video samples divided into 10 classes.
Dataset Splits Yes NTU-RGB+D: (1) Cross Subject (X-Sub): data from 20 subjects is used as the training set, while the rest is used as test data. (2) Cross-View (X-View): it divides the training and test sets according to different camera views. NTU-RGB+D 120: (1) Cross-Subject (X-Sub120) divides the 106 volunteers into two groups, with 53 subjects assigned to the training set and the remaining to the test set. (2) Cross-Setup (X-Set120) are divided into the training and test sets based on their IDs. Samples with even IDs are included in the training set, while samples with odd IDs into the test set. Northwestern-UCLA: Following the evaluation protocol (Wang et al. 2014), training samples are obtained from the first two cameras, and the remaining camera is used to capture test samples.
Hardware Specification Yes We conducted our experiments using two NVIDIA Ge Force GTX 3090 GPUs.
Software Dependencies Yes Our model was implemented using Py Torch 1.11.
Experiment Setup Yes To train our framework, we employed stochastic gradient descent (SGD) with 0.9 Nesterov momentum. For all datasets, we set the total number of training epochs to 120, with the first 5 epochs dedicated to a warm-up strategy to stabilize the training process. The small Gaussian kernel was set to 1 and the large Gaussian kernel was set to 9. For the NTU-RGB+D and NTU-RGB+D 120 datasets, we set the initial learning rate to 0.1 and applied a decay of 0.1 every 50 epochs. The batch size was set to 128, and the distillation temperature was set to 1.0. For the Northwestern-UCLA dataset, we set the initial learning rate to 0.01, with a decay of 0.1 every 50 epochs. The batch size was set to 32, and the distillation temperature was set to 1.0.