Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Authors: Liangliang Zhang, Haoran Bao, Yao Ma

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In the experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), generally achieves the best performance. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data but also offers substantial benefits for diverse real-world applications.
Researcher Affiliation Academia Liangliang Zhang EMAIL Rensselaer Polytechnic Institute Haoran Bao EMAIL Rensselaer Polytechnic Institute Yao Ma EMAIL Rensselaer Polytechnic Institute
Pseudocode Yes A.1 Adapted GCond Algorithm Algorithm 1: Multi-Label GCond Adaptation
Open Source Code Yes Code is available at https://github.com/liangliang6v6/Multi-GC.
Open Datasets Yes Datasets. Specifically, we employ eight real-world datasets including PPI (Zeng et al., 2019) , PPI-large (Zeng et al., 2019) , Yelp(Zeng et al., 2019), DBLP (Akujuobi et al., 2019), OGBN-Proteins (Hu et al., 2020), PCG (Zhao et al., 2023), Human Go (Chou & Shen, 2007; Liberzon et al., 2015), Eukaryote Go (Chou & Shen, 2007; Liberzon et al., 2015).
Dataset Splits Yes We follow the predefined data splits from (Zeng et al., 2019; Zhao & Bilen, 2023; Hu et al., 2020; Chou & Shen, 2007). 1 presents an overview of the datasets characteristics. Table 1: Multi-Label Graph Dataset Statistics...Train/Val/Test PPI 0.66/0.12/0.22
Hardware Specification No No specific hardware details (like GPU/CPU models or specific compute infrastructure) are mentioned in the paper.
Software Dependencies No The paper does not explicitly state any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes The condensation ratio (C-rate) is a critical measure in graph condensation, representing the fraction of the original graph retained in the condensed graph. For instance, with a C-rate of 1%, the synthetic graph S is only 1% of original graph G. In general, synthetic graphs contain around 150 nodes. We evaluate different initialization strategies, including subgraph sampling methods (Random, Herding, K-Center), and probabilistic label sampling for synthetic Y. After initialization, the synthetic graph S is optimized using the gradient matching strategy in GCond. Next, we compare optimization loss functions, including Soft Margin Loss (Cao et al., 2019) and BCELoss (Durand et al., 2019), across various datasets with predefined condensation rates. Equation 3: where ... η is the learning rate for the gradient descent.