reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators

Authors: Harry Zhang, Luca Carlone

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results indicate that using a simple mean aggregation on the conformal prediction-filtered hypotheses set yields competitive results. When integrated with more sophisticated aggregation techniques, our method achieves state-of-the-art performance across various metrics and datasets while inheriting the probabilistic guarantees of conformal prediction. Quantitative and qualitative results that demonstrate the state-of-the-art results of our method on a variety of real-world datasets. Section 5 describes "EXPERIMENTS" including quantitative and qualitative results on standard human pose estimation datasets, and ablation studies.
Researcher Affiliation	Academia	Harry Zhang, Luca Carlone Massachusetts Institute of Technology Cambridge, MA 02139, USA EMAIL. Both authors are affiliated with the Massachusetts Institute of Technology, an academic institution, and their email addresses use the .edu domain.
Pseudocode	No	The paper describes the methods and processes through textual descriptions, mathematical equations, and architectural diagrams (Figures 2 and 9). There are no explicitly labeled sections such as "Pseudocode" or "Algorithm", nor are there structured code-like blocks presented.
Open Source Code	No	Interactive 3D visualization, code, and data will be available at this website. The statement "will be available" indicates future availability, not current. No specific, immediately accessible repository link is provided in the paper text.
Open Datasets	Yes	To evaluate our method, we train and test on standard human pose estimation datasets... Human3.6M (Ionescu et al., 2013)... MPI-INF-3DHP (Mehta et al., 2017)... The 3DPW dataset Von Marcard et al. (2018) is a more challenging dataset. These are all well-known and cited publicly available datasets.
Dataset Splits	Yes	Human3.6M... we train on 5 actors (S1, S5, S6, S7, S8) and evaluate on 2 actors (S9, S11). For both of the datasets, in our setting, we hold out 2% of the training dataset for conformal calibration during test time. We first split the testing data the same way as (Zhang et al., 2022b; Shan et al., 2023; Gong et al., 2023) to ensure the fairness of results. We then further split the training set into the actual training dataset and a calibration dataset before inference. Specifically, we split the training dataset by uniformly sampling a 2% subset as the calibration dataset, and CHAMP is only trained on the remaining 98%.
Hardware Specification	Yes	We train the CHAMP model using an NVIDIA V100 GPU
Software Dependencies	No	CHAMP s denoiser model uses Adam optimizer with a weight decay parameter of 0.1 and momentum parameters β1 = β2 = 0.99. While an optimizer is mentioned, no specific software library names or their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) are provided for reproducibility.
Experiment Setup	Yes	For the training objective in eq. (13), we use λ = 0.6. We train the CHAMP model... for 300 epochs with a batch size of 8 and a learning rate of 5e-5 and reduce it on plateau with a factor of 0.5. During training, the number of hypotheses is 20, and #DDIM iterations is set to 1. During inference, they are set to 80 and 10. The maximum number of diffusion steps is T = 999. Following previous work... we use input pose sequence of 243 frames (N = 243) of Human3.6M universal skeleton format.