reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

Authors: Matan Rusanovsky, Or Hirschorn, Shai Avidan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our novel approach using the MP-100 benchmark, a comprehensive dataset covering over 100 categories and 18,000 images. Under a 1-shot setting, our solution achieves a notable performance boost of 1.26%, establishing a new state-of-the-art for CAPE. Additionally, we enhance the dataset by providing text description annotations for both training and testing. We also include alternative text annotations specifically for testing the model s ability to generalize across different textual descriptions, further increasing its value for future research. Our code and dataset are publicly available at https://github.com/matanr/capex.
Researcher Affiliation	Academia	Matan Rusanovsky, Or Hirschorn and Shai Avidan Tel Aviv University EMAIL and EMAIL
Pseudocode	No	The paper describes the architecture, loss functions, and experimental details with equations and diagrams but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and dataset are publicly available at https://github.com/matanr/capex.
Open Datasets	Yes	We validate our novel approach using the MP-100 benchmark, a comprehensive dataset covering over 100 categories and 18,000 images. Our code and dataset are publicly available at https://github.com/matanr/capex. We provide an enhanced version of the MP-100 dataset with textual annotations for the keypoints in all categories, enriching the benchmarking capabilities for category-agnostic pose estimation.
Dataset Splits	Yes	The dataset is divided into five separate splits for training and evaluation. Importantly, each split ensures that the categories used for training, validation, and testing are mutually exclusive, ensuring that the categories used for evaluation are unseen during the training phase.
Hardware Specification	Yes	Our model requires 6.5 GB of GPU memory and takes roughly 13 hours to train for each split, on a machine equipped with an NVIDIA RTX A5000 GPU.
Software Dependencies	No	The paper mentions "MMPose framework Contributors (2020)" but does not provide a specific version number for the framework itself or for any other key software libraries like PyTorch or TensorFlow, which are essential for reproducibility.
Experiment Setup	Yes	The architecture is implemented within the MMPose framework Contributors (2020), trained using the Adam optimizer for 200 epochs with a batch size of 16. The initial learning rate is 10 5, reducing by a factor of 10 at the 160th and 180th epochs. Ci is 768 in Swin V2-T, Ct is 768 in gte-base-v1.5. C and K are set to 256 and 100, respectively.