Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation

Authors: Haipeng Chen, Sifan Wu, Zhigang Wang, Yifang Yin, Yingying Jiao, Yingda Lyu, Zhenguang Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method outperforms state-of-the-art methods on three large-scale benchmark datasets.
Researcher Affiliation Academia 1College of Computer Science and Technology, Jilin University, 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 3College of Computer Science and Technology, Zhejiang Gongshang University, 4Institute for Infocomm Research (I2R), A*STAR, 5Public Computer Education and Research Center, Jilin University, 6The State Key Laboratory of Blockchain and Data Security, Zhejiang University, 7Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, EMAIL, EMAIL, EMAIL, yin EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods and procedures in paragraph text and figures, but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements about releasing code, nor does it provide any links to a code repository.
Open Datasets Yes We evaluate the proposed CM-Pose for videobased human pose estimation in three widely used datasets: Pose Track2017 (Iqbal, Milan, and Gall 2017), Pose Track2018 (Andriluka et al. 2018), and Pose Track2021 (Doering et al. 2022).
Dataset Splits Yes Pos Track2017 includes 80,144 pose annotations and has two subsets, i.e., training (train) and validation (val) with 250 videos and 50 videos (split according to the official protocol), respectively. Pose Track2018 largely increases the number of video clips and pose annotations including 593 videos for training, 170 videos for validation, and the total number of pose annotations is 153,615. Pose Track2018 also introduces an additional flag characterizing joint visibility. Pose Track2021 further increases the number of pose annotations for small or crowded persons, including 177,164 labels. All three datasets identify 15 keypoints and the training set is densely labeled in the center 30 frames, while the validation set contains additional pose annotations every 4 frames.
Hardware Specification Yes We implement our method CM-Pose for human pose estimation with Pytorch, which is trained on 2 Nvidia Geforce RTX 4090 GPUs and terminated with 20 epochs.
Software Dependencies No We implement our method CM-Pose for human pose estimation with Pytorch... The paper mentions Pytorch but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We set the image size as 256 192. The time span ω is set to 1. The number of keypoint tokens K is 15. We use Adam W optimizer to train the model with an initial learning rate of 2e 4 (decays to 2e 5, 2e 6, 2e 7 at the 5-th, 12-th, 18-th epochs, respectively).