reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ActiveHAI: Active Collection Based Human-AI Diagnosis with Limited Expert Predictions

Authors: Xuehan Zhao, Jiaqi Liu, Xin Zhang, Zhiwen Yu, Bin Guo

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on three real-world datasets show that Active HAI surpasses doctor and other human-AI methods by 16.3% and 3.6% in accuracy, respectively. Furthermore, Active HAI reaches 97.2% relative accuracy, even with just eight expert predictions per class. ... Experiment Study: Experiments on three real-world datasets show that the proposed method outperforms individual human and other human-AI collaboration methods by 16.3% and 3.6% in diagnosis accuracy, respectively. For reproducibility, we release the code and data in https://github.com/mercyzi/Active HAI.git.
Researcher Affiliation	Academia	Xuehan Zhao1 , Jiaqi Liu1 , Xin Zhang1 , Zhiwen Yu2,1 and Bin Guo1 1Northwestern Polytechnical University 2Harbin Engineering University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Median-Window Active Collection
Open Source Code	Yes	For reproducibility, we release the code and data in https://github.com/mercyzi/Active HAI.git.
Open Datasets	Yes	We extensively evaluate the proposed method on three datasets: MZ-10 [Chen et al., 2023], DR-5 [Ju et al., 2022], and Chaoyang-3 [Zhu et al., 2021].
Dataset Splits	Yes	For DR-5 and Chaoyang-3, we perform five-fold cross-validation, repeating each fold ten times.
Hardware Specification	Yes	We implement Active HAI using Py Torch on a single NVIDIA 3090 GPU.
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number. Other software components like Transformer and EfficientNet-B1 are model architectures, not software dependencies with version numbers.
Experiment Setup	Yes	The evaluator module is trained for 100 epochs using the Adam optimizer with a learning rate of 3 4. ... The embedding layer dimension is set to 512. ... The random sampling size N is set to 100, and the medianwindow length Wl is set to 5. For D1, D2, and D3 in MZ-10, the window starting points Ws are set to 65, 50, and 50, respectively. For DR-5, Ws is set to 55, and for Chaoyang-3, Ws is set to 50.