Optimizing Human Pose Estimation Through Focused Human and Joint Regions
Authors: Yingying Jiao, Zhigang Wang, Zhenguang Liu, Shaojing Fan, Sifan Wu, Zheqi Wu, Zhuoyue Xu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our method achieves state-of-the-art performance on three large-scale benchmark datasets. A remarkable highlight is that our method achieves an 84.8 mean Average Precision (m AP) on the challenging wrist joint, which significantly outperforms the 81.5 m AP achieved by the current state-of-the-art method on the Pose Track2017 dataset. To evaluate the efficacy of our method, we conduct extensive experiments on three public benchmarks, achieving state-of-the-art performance. |
| Researcher Affiliation | Academia | Yingying Jiao1,2, Zhigang Wang3*, Zhenguang Liu4,5*, Shaojing Fan6, Sifan Wu1,2*, Zheqi Wu3, Zhuoyue Xu3, 1College of Computer Science and Technology, Jilin University 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 3College of Computer Science and Technology, Zhejiang Gongshang University 4The State Key Laboratory of Blockchain and Data Security, Zhejiang University 5Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 6School of Computing, National University of Singapore EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper includes mathematical formulations (equations 1-4) and pipeline diagrams (Figure 1, Figure 2) but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Datasets. Pose Track has become a crucial dataset in video-based human pose estimation benchmarks. Pose Track2017(Iqbal, Milan, and Gall 2017) introduces 250 training videos and 50 validation videos, with 80,144 pose annotations across 15 key points. Pose Track2018(Andriluka et al. 2018) expands to 593 training and 170 validation videos, totaling 153,615 annotations. Pose Track2021 (Doering et al. 2022) further enriches the dataset, particularly improving the representation of smaller figures and crowded scenes, reaching 177,164 pose annotations, with recalibrated joint visibility flags to better address occlusions. ... pre-trained on the COCO dataset (Lin et al. 2014) |
| Dataset Splits | Yes | Pose Track2017(Iqbal, Milan, and Gall 2017) introduces 250 training videos and 50 validation videos, with 80,144 pose annotations across 15 key points. Pose Track2018(Andriluka et al. 2018) expands to 593 training and 170 validation videos, totaling 153,615 annotations. ... The number of input frames is set to 3, consisting of one key frame accompanied by two auxiliary frames sourced from preceding and succeeding neighbors, respectively. |
| Hardware Specification | Yes | Our model is trained on a single RTX 4090 GPU for 20 epochs with the backbone frozen. |
| Software Dependencies | No | Our VREMD framework is realized utilizing Py Torch. The paper mentions PyTorch but does not specify a version number. |
| Experiment Setup | Yes | The input image size is fixed at 256 192. We integrate a series of data augmentation techniques, consistent with methodologies employed in previous works (Bertasius et al. 2019; Liu et al. 2021), comprising random rotation [ 45 , 45 ], random scale [0.65, 1.35], truncation (half body), and flipping during training. The number of input frames is set to 3, consisting of one key frame accompanied by two auxiliary frames sourced from preceding and succeeding neighbors, respectively. This configuration mirrors that of DCPose (Liu et al. 2021), rather than employing the five frame input as seen in TDMI (Feng et al. 2023) and FAMI-Pose (Liu et al. 2022a). Our model is trained on a single RTX 4090 GPU for 20 epochs with the backbone frozen. We utilize the Adam W optimizer with an initial learning rate of 2e-3, which is then reduced by a factor of ten at the 16th epoch. |