reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ST-GCond: Self-supervised and Transferable Graph Dataset Condensation

Authors: Beining Yang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Jianxin Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both node-level and graph-level datasets show that ST-GCond outperforms existing methods by 2.5% 18.7% in all cross-task and cross-dataset scenarios, and also achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario. Extensive experiments on 10 real-world datasets demonstrate that ST-GCond enjoys the stateof-art performance on both single task and cross-dataset/cross-task scenarios.
Researcher Affiliation	Academia	Beining Yang1, Qingyun Sun2 , Cheng Ji2, Xingcheng Fu3, Jianxin Li2, 1Univerisity of Edinburgh 2 Beihang University 3Guangxi Normal University EMAIL, EMAIL
Pseudocode	Yes	The overall algorithm is presented in Appendix F.1. Algorithm 1 ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
Open Source Code	Yes	Our code is available at https://github.com/Ring BDStack/ST-GCond.
Open Datasets	Yes	We evaluate our method on 6 node-level datasets (Cora (Kipf & Welling, 2017), Citeseer (Kipf & Welling, 2017), Ogbn-arxiv (Hu et al., 2020a), Reddit (Hamilton et al., 2017) and Flickr (Zeng et al., 2020)) and 5 graph-level datasets (GEOM (Axelrod & Gomez Bombarelli, 2020), BACE (Wu et al., 2018), Clin Tox (Gayvert et al., 2016), and SIDER (Kuhn et al., 2016)).
Dataset Splits	Yes	For the supervised node classification task, we follow the settings from GCond (Jin et al., 2022c). For the other types of task, we follow the public split of the dataset. We randomly split the classes of the task into 3 parts, with each part containing h classes (h > 1) as a sub-task.
Hardware Specification	Yes	GPU: NVIDIA Tesla A100 SMX4 with 80GB of Memory. CPU: Intel(R) Xeon(R) Platinum 8358 CPU@2.60GHz with 1TB DDR4 of Memory.
Software Dependencies	Yes	Software: CUDA 10.1, Python 3.8.12, Py Torch (Paszke et al., 2019) 1.7.0.
Experiment Setup	Yes	In the condensing stage, for the supervised task, we randomly split the classes of the task into 3 parts, with each part containing h classes (h > 1) as a sub-task. For the auxiliary self-supervised tasks, we select 5 classic tasks for node-level condensation (...) and 7 tasks for graph-level condensation (...). Table A3: Fixed key parameters. Parameters Value GNN backbone GCN, GIN Number of layers 2 Hidden Units 256 Activation Leaky ReLU Dropout Rate 0.5 k 5 Split of meta tasks 3 δ 0.5. Table A4: Search space of the key parameters. Parameters Search Space lr 0.1, 0.01, 0.001 α (0.0, 1.0) β (0.0, 1.0)