reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semantics-aware Test-time Adaptation for 3D Human Pose Estimation

Authors: Qiuxia Lin, Rongyu Chen, Kerui Gu, Angela Yao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments Datasets. We follow the adaptation tasks from previous work (Zhang et al., 2020; Nam et al., 2023), using Human3.6M (Ionescu et al., 2013) as the labeled training dataset and 3DPW (Von Marcard et al., 2018) and 3DHP (Mehta et al., 2017) as the unlabeled test datasets. [...] Evaluation metrics. We report three evaluation metrics: Mean Per Joint Position Error (MPJPE) [...] 5.3. Quantitative Results [...] 5.4. Analysis Experiments Ablations on the method components. Semantics-incorporated strategies analysis. Improvement distribution. Runtime analysis. 2D fill-in analysis.
Researcher Affiliation	Academia	1Department of Computer Science, National University of Singapore, Singapore. Correspondence to: Qiuxia Lin <EMAIL>.
Pseudocode	No	The paper describes the methodology through prose and a diagram (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Datasets. We follow the adaptation tasks from previous work (Zhang et al., 2020; Nam et al., 2023), using Human3.6M (Ionescu et al., 2013) as the labeled training dataset and 3DPW (Von Marcard et al., 2018) and 3DHP (Mehta et al., 2017) as the unlabeled test datasets. Human3.6M is a widely used indoor dataset comprising 3.6 million images annotated with 2D and 3D labels. [...] Furthermore, we validate our method on a egocentric dataset Ego Body (Zhang et al., 2022).
Dataset Splits	Yes	Datasets. We follow the adaptation tasks from previous work (Zhang et al., 2020; Nam et al., 2023), using Human3.6M (Ionescu et al., 2013) as the labeled training dataset and 3DPW (Von Marcard et al., 2018) and 3DHP (Mehta et al., 2017) as the unlabeled test datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions several software components like Openpose (Cao et al., 2017), Adam optimizer (Kingma & Ba, 2014), Motion CLIP (Tevet et al., 2022), GPT-4o (Achiam et al., 2023), and CLIP (Radford et al., 2021), but it does not specify version numbers for these components.
Experiment Setup	Yes	At the start of test-time adaptation for each test video, the model parameters are initialized with the pre-trained values, following (Nam et al., 2023). We employed the Adam optimizer (Kingma & Ba, 2014) with parameters set to beta1 = 0.5, beta2 = 0.9, and a learning rate of 5.0e-5. A cosine scheduler is used with a minimum learning rate of 1.0e-6. The input images are resized to 224 × 224, and the frame number of each video segment is 60. We use a batch size of 4 and the total training epoch is 6. The hyperparameters are λ1 = 0.1, λ2 = 0.2, σ = 0.75, α = 0.9. We use Openpose (Cao et al., 2017) to provide 2D poses with a confidence threshold of 0.3 (Gu et al., 2024).