Open-Vocabulary Fine-Grained Hand Action Detection

Authors: Ting Zhe, Mengya Han, Xiaoshuai Hao, Yong Luo, Zheng He, Xiantao Cai, Jing Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Open-FGHA outperforms existing OVD methods, showing its strong potential for open-vocabulary hand action detection. We evaluate the effectiveness of Open-FGHA for fine-grained hand action detection in two task settings: OVD and closed-set Action Detection (AD). 4 Experiments 4.2 Comparisons with SOTA Methods 4.3 Ablation Studies
Researcher Affiliation Academia 1School of Computer Science, National Engineering Research Center for Multimedia Software and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University 2Beijing Academy of Artificial Intelligence EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the method's components (Hi H-Lo RA, BSF, CQG) in textual paragraphs and illustrates them with a diagram (Figure 2), but it does not contain a structured pseudocode block or algorithm section.
Open Source Code Yes The source code is available at OV-FGHAD.
Open Datasets Yes To facilitate fair comparisons with existing open-vocabulary detection methods, we propose the FHA-Kitchens OVD benchmark. Following the convention of the COCO OVD benchmark [Lin et al., 2014], we have restructured the publicly available FHA-Kitchens benchmark [Zhe et al., 2024], focusing on multi-granularity hand actions.
Dataset Splits Yes We have re-split the original train and validation sets of the FHA-Kitchens benchmark to create new train and validation sets suitable for the OV-FGHAD task. The model is trained on the 46 base categories, containing 35,351 instances, and evaluated on a validation set containing 9,361 instances, which includes both the 46 base and 15 novel categories.
Hardware Specification Yes The experiments were conducted using 4 NVIDIA Ge Force RTX 4090 GPUs with the total batch size set to 16 for Open-FGHA-T and Open-FGHA-B, and 4 for Open-FGHA-L.
Software Dependencies No We trained the Open-FGHA model on the FHA-Kitchens OVD benchmark using the MMDetection codebase [Chen et al., 2019]. The paper mentions using MMDetection but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes Fine-tuning was performed with the Adam optimizer [Kingma and Ba, 2015], using an initial learning rate of 5 10 5 for the tiny variant, 1 10 4 for the base and large variants, and weight decay set to 10 4. The experiments were conducted using 4 NVIDIA Ge Force RTX 4090 GPUs with the total batch size set to 16 for Open-FGHA-T and Open-FGHA-B, and 4 for Open-FGHA-L. The model was trained for 12 epochs by default.