Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study
Authors: Liangliang Zhang, Haoran Bao, Yao Ma
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In the experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), generally achieves the best performance. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data but also offers substantial benefits for diverse real-world applications. |
| Researcher Affiliation | Academia | Liangliang Zhang EMAIL Rensselaer Polytechnic Institute Haoran Bao EMAIL Rensselaer Polytechnic Institute Yao Ma EMAIL Rensselaer Polytechnic Institute |
| Pseudocode | Yes | A.1 Adapted GCond Algorithm Algorithm 1: Multi-Label GCond Adaptation |
| Open Source Code | Yes | Code is available at https://github.com/liangliang6v6/Multi-GC. |
| Open Datasets | Yes | Datasets. Specifically, we employ eight real-world datasets including PPI (Zeng et al., 2019) , PPI-large (Zeng et al., 2019) , Yelp(Zeng et al., 2019), DBLP (Akujuobi et al., 2019), OGBN-Proteins (Hu et al., 2020), PCG (Zhao et al., 2023), Human Go (Chou & Shen, 2007; Liberzon et al., 2015), Eukaryote Go (Chou & Shen, 2007; Liberzon et al., 2015). |
| Dataset Splits | Yes | We follow the predefined data splits from (Zeng et al., 2019; Zhao & Bilen, 2023; Hu et al., 2020; Chou & Shen, 2007). 1 presents an overview of the datasets characteristics. Table 1: Multi-Label Graph Dataset Statistics...Train/Val/Test PPI 0.66/0.12/0.22 |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or specific compute infrastructure) are mentioned in the paper. |
| Software Dependencies | No | The paper does not explicitly state any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | The condensation ratio (C-rate) is a critical measure in graph condensation, representing the fraction of the original graph retained in the condensed graph. For instance, with a C-rate of 1%, the synthetic graph S is only 1% of original graph G. In general, synthetic graphs contain around 150 nodes. We evaluate different initialization strategies, including subgraph sampling methods (Random, Herding, K-Center), and probabilistic label sampling for synthetic Y. After initialization, the synthetic graph S is optimized using the gradient matching strategy in GCond. Next, we compare optimization loss functions, including Soft Margin Loss (Cao et al., 2019) and BCELoss (Durand et al., 2019), across various datasets with predefined condensation rates. Equation 3: where ... η is the learning rate for the gradient descent. |