reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimizing Human Pose Estimation Through Focused Human and Joint Regions

Authors: Yingying Jiao, Zhigang Wang, Zhenguang Liu, Shaojing Fan, Sifan Wu, Zheqi Wu, Zhuoyue Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our method achieves state-of-the-art performance on three large-scale benchmark datasets. A remarkable highlight is that our method achieves an 84.8 mean Average Precision (m AP) on the challenging wrist joint, which significantly outperforms the 81.5 m AP achieved by the current state-of-the-art method on the Pose Track2017 dataset. To evaluate the efficacy of our method, we conduct extensive experiments on three public benchmarks, achieving state-of-the-art performance.
Researcher Affiliation	Academia	Yingying Jiao1,2, Zhigang Wang3, Zhenguang Liu4,5, Shaojing Fan6, Sifan Wu1,2*, Zheqi Wu3, Zhuoyue Xu3, 1College of Computer Science and Technology, Jilin University 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 3College of Computer Science and Technology, Zhejiang Gongshang University 4The State Key Laboratory of Blockchain and Data Security, Zhejiang University 5Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 6School of Computing, National University of Singapore EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical formulations (equations 1-4) and pipeline diagrams (Figure 1, Figure 2) but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Datasets. Pose Track has become a crucial dataset in video-based human pose estimation benchmarks. Pose Track2017(Iqbal, Milan, and Gall 2017) introduces 250 training videos and 50 validation videos, with 80,144 pose annotations across 15 key points. Pose Track2018(Andriluka et al. 2018) expands to 593 training and 170 validation videos, totaling 153,615 annotations. Pose Track2021 (Doering et al. 2022) further enriches the dataset, particularly improving the representation of smaller figures and crowded scenes, reaching 177,164 pose annotations, with recalibrated joint visibility flags to better address occlusions. ... pre-trained on the COCO dataset (Lin et al. 2014)
Dataset Splits	Yes	Pose Track2017(Iqbal, Milan, and Gall 2017) introduces 250 training videos and 50 validation videos, with 80,144 pose annotations across 15 key points. Pose Track2018(Andriluka et al. 2018) expands to 593 training and 170 validation videos, totaling 153,615 annotations. ... The number of input frames is set to 3, consisting of one key frame accompanied by two auxiliary frames sourced from preceding and succeeding neighbors, respectively.
Hardware Specification	Yes	Our model is trained on a single RTX 4090 GPU for 20 epochs with the backbone frozen.
Software Dependencies	No	Our VREMD framework is realized utilizing Py Torch. The paper mentions PyTorch but does not specify a version number.
Experiment Setup	Yes	The input image size is fixed at 256 192. We integrate a series of data augmentation techniques, consistent with methodologies employed in previous works (Bertasius et al. 2019; Liu et al. 2021), comprising random rotation [ 45 , 45 ], random scale [0.65, 1.35], truncation (half body), and flipping during training. The number of input frames is set to 3, consisting of one key frame accompanied by two auxiliary frames sourced from preceding and succeeding neighbors, respectively. This configuration mirrors that of DCPose (Liu et al. 2021), rather than employing the five frame input as seen in TDMI (Feng et al. 2023) and FAMI-Pose (Liu et al. 2022a). Our model is trained on a single RTX 4090 GPU for 20 epochs with the backbone frozen. We utilize the Adam W optimizer with an initial learning rate of 2e-3, which is then reduced by a factor of ten at the 16th epoch.