ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
Authors: Beining Yang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Jianxin Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both node-level and graph-level datasets show that ST-GCond outperforms existing methods by 2.5% 18.7% in all cross-task and cross-dataset scenarios, and also achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario. Extensive experiments on 10 real-world datasets demonstrate that ST-GCond enjoys the stateof-art performance on both single task and cross-dataset/cross-task scenarios. |
| Researcher Affiliation | Academia | Beining Yang1, Qingyun Sun2 , Cheng Ji2, Xingcheng Fu3, Jianxin Li2, 1Univerisity of Edinburgh 2 Beihang University 3Guangxi Normal University EMAIL, EMAIL |
| Pseudocode | Yes | The overall algorithm is presented in Appendix F.1. Algorithm 1 ST-GCond: Self-supervised and Transferable Graph Dataset Condensation |
| Open Source Code | Yes | Our code is available at https://github.com/Ring BDStack/ST-GCond. |
| Open Datasets | Yes | We evaluate our method on 6 node-level datasets (Cora (Kipf & Welling, 2017), Citeseer (Kipf & Welling, 2017), Ogbn-arxiv (Hu et al., 2020a), Reddit (Hamilton et al., 2017) and Flickr (Zeng et al., 2020)) and 5 graph-level datasets (GEOM (Axelrod & Gomez Bombarelli, 2020), BACE (Wu et al., 2018), Clin Tox (Gayvert et al., 2016), and SIDER (Kuhn et al., 2016)). |
| Dataset Splits | Yes | For the supervised node classification task, we follow the settings from GCond (Jin et al., 2022c). For the other types of task, we follow the public split of the dataset. We randomly split the classes of the task into 3 parts, with each part containing h classes (h > 1) as a sub-task. |
| Hardware Specification | Yes | GPU: NVIDIA Tesla A100 SMX4 with 80GB of Memory. CPU: Intel(R) Xeon(R) Platinum 8358 CPU@2.60GHz with 1TB DDR4 of Memory. |
| Software Dependencies | Yes | Software: CUDA 10.1, Python 3.8.12, Py Torch (Paszke et al., 2019) 1.7.0. |
| Experiment Setup | Yes | In the condensing stage, for the supervised task, we randomly split the classes of the task into 3 parts, with each part containing h classes (h > 1) as a sub-task. For the auxiliary self-supervised tasks, we select 5 classic tasks for node-level condensation (...) and 7 tasks for graph-level condensation (...). Table A3: Fixed key parameters. Parameters Value GNN backbone GCN, GIN Number of layers 2 Hidden Units 256 Activation Leaky ReLU Dropout Rate 0.5 k 5 Split of meta tasks 3 δ 0.5. Table A4: Search space of the key parameters. Parameters Search Space lr 0.1, 0.01, 0.001 α (0.0, 1.0) β (0.0, 1.0) |