reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

Authors: Nie Lin, Takehiko Ohkawa, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, Yoichi Sato

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our method outperforms conventional contrastive learning approaches that produce positive pairs solely from a single image with data augmentation. We achieve significant improvements over the state-of-the-art method (Pe CLR) in various datasets, with gains of 15% on Frei Hand, 10% on Dex YCB, and 4% on Assembly Hands. Our experiments demonstrate that our approach surpasses prior pre-training methods and achieves robust performances across different hand pose datasets.
Researcher Affiliation	Academia	1The University of Tokyo, 2Hunan University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methodological steps and includes mathematical formulas and diagrams (e.g., Fig. 3 for overview, Eq. 4 for loss function), but it does not contain a clearly labeled pseudocode or algorithm block with structured steps formatted like code.
Open Source Code	Yes	Our code is available at https://github.com/ut-vision/Si MHand.
Open Datasets	Yes	Specifically, we collected 2.0M hand images from recent human-centric videos, such as 100DOH and Ego4D. ... We processed two datasets, Ego4D (Grauman et al., 2022) and 100DOH (Shan et al., 2020)... We conduct fine-tuning experiments on three datasets with 3D hand pose ground truth in various data size and viewpoints: exocentric datasets from Frei Hand (Zimmermann et al., 2019) and Dex YCB (Chao et al., 2021), and an egocentric dataset Assembly Hands (Ohkawa et al., 2023b).
Dataset Splits	Yes	Frei Hand consists of 130.2K training frames and 3.9K test frames... Dex YCB contains 325.3K training images and 98.2K test images... Assembly Hands, the largest of the three, includes 704.0K training samples and 109.8K test samples... Following (Spurr et al., 2021), we prepare 10% of the labeled Frei Hand dataset, which is denoted as Frei Hand*, especially used for ablation studies. This allow us to assess the performance in a limited supervision setting. ... Fig. 4 illustrates the experiment under different proportions of labeled fine-tuning data, namely 10%, 20%, 40%, and 80% in Frei Hand.
Hardware Specification	Yes	We use 8 NVIDIA V100 GPUs with a batch size of 8192 for pre-training. ... We use a single NVIDIA V100 GPU with a batch size of 128.
Software Dependencies	No	The paper mentions using ResNet-50 as the encoder, LARS and ADAM optimizers, and MediaPipe for keypoint extraction, but does not provide specific version numbers for these or other software libraries.
Experiment Setup	Yes	For similar hands mining, we choose the PCA embedding size as D = 14. For the pre-training framework, we use Res Net-50 (He et al., 2016) as the encoder. Throughout the pre-training phase, all models are trained using LARS (You et al., 2017) with ADAM (Kingma & Ba, 2014) optimizer, with the learning rate of 3.2e-3. Following (Spurr et al., 2021), Sim CLR employs scale and color jitter as image augmentation, while Pe CLR and Si MHand utilize scale, rotation, translation, and color jitter. We use resized images with 128x128 as the input. We set the temperature parameter τ of contrastive learning as 0.5. We use 8 NVIDIA V100 GPUs with a batch size of 8192 for pre-training. ... We use a single NVIDIA V100 GPU with a batch size of 128.