reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Authors: Liangliang Zhang, Haoran Bao, Yao Ma

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In the experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), generally achieves the best performance. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data but also offers substantial benefits for diverse real-world applications.
Researcher Affiliation	Academia	Liangliang Zhang EMAIL Rensselaer Polytechnic Institute Haoran Bao EMAIL Rensselaer Polytechnic Institute Yao Ma EMAIL Rensselaer Polytechnic Institute
Pseudocode	Yes	A.1 Adapted GCond Algorithm Algorithm 1: Multi-Label GCond Adaptation
Open Source Code	Yes	Code is available at https://github.com/liangliang6v6/Multi-GC.
Open Datasets	Yes	Datasets. Specifically, we employ eight real-world datasets including PPI (Zeng et al., 2019) , PPI-large (Zeng et al., 2019) , Yelp(Zeng et al., 2019), DBLP (Akujuobi et al., 2019), OGBN-Proteins (Hu et al., 2020), PCG (Zhao et al., 2023), Human Go (Chou & Shen, 2007; Liberzon et al., 2015), Eukaryote Go (Chou & Shen, 2007; Liberzon et al., 2015).
Dataset Splits	Yes	We follow the predefined data splits from (Zeng et al., 2019; Zhao & Bilen, 2023; Hu et al., 2020; Chou & Shen, 2007). 1 presents an overview of the datasets characteristics. Table 1: Multi-Label Graph Dataset Statistics...Train/Val/Test PPI 0.66/0.12/0.22
Hardware Specification	No	No specific hardware details (like GPU/CPU models or specific compute infrastructure) are mentioned in the paper.
Software Dependencies	No	The paper does not explicitly state any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	The condensation ratio (C-rate) is a critical measure in graph condensation, representing the fraction of the original graph retained in the condensed graph. For instance, with a C-rate of 1%, the synthetic graph S is only 1% of original graph G. In general, synthetic graphs contain around 150 nodes. We evaluate different initialization strategies, including subgraph sampling methods (Random, Herding, K-Center), and probabilistic label sampling for synthetic Y. After initialization, the synthetic graph S is optimized using the gradient matching strategy in GCond. Next, we compare optimization loss functions, including Soft Margin Loss (Cao et al., 2019) and BCELoss (Durand et al., 2019), across various datasets with predefined condensation rates. Equation 3: where ... η is the learning rate for the gradient descent.