reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos

Authors: Yingying Jiao, Zhigang Wang, Sifan Wu, Shaojing Fan, Zhenguang Liu, Zhuoyue Xu, Zheqi Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carried out thorough evaluations for video pose propagation and video pose estimation tasks on three popular benchmarks: Pose Track2017 (Iqbal, Milan, and Gall 2017), Pose Track2018 (Andriluka et al. 2018), and Pose Track21 (Doering et al. 2022). The videos in these datasets feature diverse challenges, such as crowded scenes and rapid movements. We evaluate our model using the standard pose estimation metric, average precision (AP), by initially calculating the AP for each joint and subsequently deriving the model s overall performance through the mean average precision (m AP) across all joints. The results of video pose propagation on Pose Track2017 (Iqbal, Milan, and Gall 2017), Pose Track2018 (Andriluka et al. 2018), and Pose Track2021 (Doering et al. 2022) datasets. We conduct a comprehensive evaluation of each component in our proposed STDPose framework, presenting the quantitative results in Table 4.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Jilin University 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 3College of Computer Science and Technology, Zhejiang Gongshang University 4School of Computing, National University of Singapore 5The State Key Laboratory of Blockchain and Data Security, Zhejiang University 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed method in Section 3 and its subsections, outlining the components and their interactions in paragraph form. No explicitly labeled 'Pseudocode', 'Algorithm', or structured code blocks are present in the main text.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository in the main text or supplementary information. There are no phrases like "We release our code..." or links to GitHub/GitLab.
Open Datasets	Yes	We carried out thorough evaluations for video pose propagation and video pose estimation tasks on three popular benchmarks: Pose Track2017 (Iqbal, Milan, and Gall 2017), Pose Track2018 (Andriluka et al. 2018), and Pose Track21 (Doering et al. 2022). The videos in these datasets feature diverse challenges, such as crowded scenes and rapid movements. We utilize a standard Vision Transformer (Dosovitskiy et al. 2020) pretrained on the COCO dataset (Lin et al. 2014) as the backbone network of our STDPose framework.
Dataset Splits	Yes	We carried out thorough evaluations for video pose propagation and video pose estimation tasks on three popular benchmarks: Pose Track2017 (Iqbal, Milan, and Gall 2017), Pose Track2018 (Andriluka et al. 2018), and Pose Track21 (Doering et al. 2022). By varying parameter T, we control the proportion of manually-labeled frames, with T=2 indicating a 50/50 split. We then evaluate the pose estimation performance on Pose Track2017 validation set. As shown in Table 3, pseudo-labels generated from pose propagation significantly improves pose estimation when dealing with sparsely-labeled videos. Our model achieves 84.3 m AP at T=4, close to FAMI-Pose (Liu et al. 2022a). Notably, at T=2, our model excels over FAMI-Pose (Liu et al. 2022a), achieving 85.2 m AP with only 50% of the manually-labeled frames, demonstrating superior performance with only half the labeled data. We conduct a comprehensive evaluation of each component in our proposed STDPose framework, presenting the quantitative results in Table 4. All the ablation studies are conducted on the Pose Track2017 validation set.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used to run its experiments, such as GPU models, CPU models, or cloud computing specifications. It only mentions general experimental settings.
Software Dependencies	No	The paper mentions using a "Vision Transformer... pretrained on the COCO dataset as the backbone network" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed for replication.
Experiment Setup	Yes	The input image size is 256 192. We utilize a standard Vision Transformer (Dosovitskiy et al. 2020) pretrained on the COCO dataset (Lin et al. 2014) as the backbone network of our STDPose framework. We set the parameters α to 0.1 and β to 0.01 in Eq. 2, and have not densely tuned them.