A³-Net: Calibration-Free Multi-View 3D Hand Reconstruction for Enhanced Musical Instrument Learning

Authors: Geng Chen, Xufeng Jian, Yuchen Chen, Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Jing Wang, Jianxin Liao

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves state-of-the-art results in task of multi-view 3D hand reconstruction, demonstrating the effectiveness of the proposed framework. 4 Experiments 4.1 Datasets and Experimental Settings 4.2 Training Details 4.3 Comparisons with State-of-the-Arts 4.4 Ablation Study
Researcher Affiliation Academia Geng Chen , Xufeng Jian , Yuchen Chen , Pengfei Ren , Jingyu Wang , Haifeng Sun , Qi Qi , Jing Wang and Jianxin Liao State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications EMAIL
Pseudocode No The paper describes the methodology in prose and through network architecture diagrams (Figures 2, 3, 4) and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes Dex YCB [Chao et al., 2021] is a large-scale dataset that records the pose of the hand grasping on objects, which contains 582K images with 20 objects selected from the YCB-Video dataset. Han Co [Zimmermann et al., 2022] consists of 1517 videos with multiple views and camera calibration.
Dataset Splits Yes For Dex YCB [Chao et al., 2021]... we evaluat our method using default S0 split. For Han Co [Zimmermann et al., 2022]... we divide it in an 8:2 ratio. According to recent methods [Zheng et al., 2023], the first 1200 sequences are used for training, while the remaining 317 sequences are used for testing.
Hardware Specification Yes All experiments are conducted on an NVIDIA RTX 4090 GPU.
Software Dependencies No We implement our method by Py Torch framework. The paper mentions PyTorch but does not provide a specific version number, nor does it list any other software dependencies with version numbers.
Experiment Setup Yes We train our network using Adam W optimizer with the initial learning rate of 1e-4 and divided by 10 every 10 epochs. The whole training process takes 30 epochs with batch size of 32. We adopt random translation, random rotation, and random scaling for training augmentation. We crop the hands from the input images by 2D keypoints and resize them to resolution 256 x 256 during training and testing.