reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection

Authors: Tingyi Cai, Yunliang Jiang, Ming Li, Changqin Huang, Yi Wang, Qionghao Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimentation conducted on seven diverse sets of real-world multi-label graph datasets, encompassing crossdomain scenarios. The results show that the AUROC of MLGOOD is improved by 5.26% in intra-domain and 6.54% in cross-domain compared to the previous methods.
Researcher Affiliation	Academia	Tingyi Cai1,2, Yunliang Jiang2,1,3*, Ming Li4,2, Changqin Huang2, Yi Wang1,2, Qionghao Huang2 1School of Computer Science and Technology, Zhejiang Normal University, China 2Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, China 3School of Information Engineering, Huzhou University, China 4Zhejiang Institute of Optoelectronics, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We summarize the complete algorithm for ML-GOOD in Appendix 1. 1The appendix is accessible via the provided Git Hub link.
Open Source Code	Yes	Code https://github.com/ca1man-2022/ML-GOOD
Open Datasets	Yes	In this experiment, we employ 6 real-world multilabel datasets including five molecular datasets, OGB-Proteins, (Hu et al. 2020), PPI, (Zitnik and Leskovec 2017), Hum Loc and Euk Loc (Zhao et al. 2023) and one citation network PCG.
Dataset Splits	No	To mitigate this, we conducted all experiments by randomly sampling 20% of this dataset, ensuring fairness and representativeness in comparisons. For datasets DBLP, PCG, Hum Loc, and Euk Loc, which are single-graph datasets lacking obvious domain information, we utilize feature interpolation, a method introduced by (Wu et al. 2023) for generating OOD data.
Hardware Specification	Yes	All experimental procedures are conducted on a NVIDIA RTX A6000 GPU device with 48 GB memory.
Software Dependencies	No	We uniformly use a 2-layer GCN (Kipf and Welling 2017) model as backbone encoder. We use the Adam optimizer (Kingma and Ba 2015) for optimization.
Experiment Setup	Yes	The weight decay is 0.01 and learning rate is 0.01. The results highlight the significant role margin hyperparameters mout and min in the effectiveness of ML-GOOD... impact of the mout { 60, 55, 45, 35, 25} and min { 85, 75, 65} margin hyperparameters on PPI; Right: impact of the weight hyperparameter λ on DBLP.