Bonsai: Gradient-free Graph Condensation for Node Classification

Authors: Mridul Gupta, Samyak Jain, Vansh Ramani, HARIPRASAD KODAMANA, Sayan Ranu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we benchmark Bonsai and establish: Superior accuracy: Bonsai consistently outperforms existing baselines in terms of accuracy across various compression factors, datasets, and Gnn architectures. Enhanced computation and energy efficiency: On average, Bonsai is at least 7 times faster and 17 times more energy efficient than the state-of-the-art baselines. Increased robustness: Unlike existing methods that require tuning condensation-specific hyperparameters for each combination of Gnn architecture, dataset, and compression ratio, Bonsai achieves superior performance using a single set of parameters across all scenarios. Our implementation is available at https://github.com/idea-iitd/Bonsai.
Researcher Affiliation Academia 1Yardi School of Artificial Intelligence 2Department of Computer Science 3Department of Chemical Engineering Indian Institute of Technology Delhi, New Delhi, 110016, India 4Indian Institute of Technology Delhi, Abu Dhabi, Zayed City, Abu Dhabi, UAE {mridul.gupta@scai,cs5200667@,cs5230804@,kodamana@,sayanranu@cse}.iitd.ac.in
Pseudocode Yes Algorithm 1 The greedy approach Require: Graph G, budget b, Rev k NN T L v Ensure: solution set A, |A| = b 1: A 2: while size(A) b (within budget) do 3: T L v arg max T L v T\A{Π(A {T L v }) Π(A)} 4: A A {T L v } 5: Return A
Open Source Code Yes Our implementation is available at https://github.com/idea-iitd/Bonsai.
Open Datasets Yes Datasets: Table 3 lists the benchmark datasets used. Dataset # Nodes # Edges # Classes # Features Cora (Kipf & Welling, 2017) 2,708 10,556 7 1,433 Citeseer (Kipf & Welling, 2017) 3,327 9,104 6 3,703 Pubmed (Kipf & Welling, 2017) 19,717 88,648 3 500 Flickr (Zeng et al., 2020) 89,250 899,756 7 500 Ogbn-arxiv (Hu et al., 2021) 169,343 2,315,598 40 128 Reddit (Hamilton et al., 2017) 232,965 23,213,838 41 602 MAG240M (Hu et al., 2021) 1,398,159 26,434,726 153 768
Dataset Splits Yes Across all datasets, except MAG240M, we maintain a train-validation-test split ratio of 60 : 20 : 20. In MAG240M, we use a ratio of 80:10:10.
Hardware Specification Yes All experiments were conducted on a high-performance computing system with the following specifications: CPU: 96 logical cores RAM: 512 GB GPU: NVIDIA A100-PCIE-40GB
Software Dependencies Yes Operating System: Linux (Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-124-generic x86_64)) ) Py Torch Version: 1.13.1+cu117 CUDA Version: 11.7 Py Torch Geometric Version: 2.3.1
Experiment Setup Yes The specifics of our experimental setup, including hardware and software environment, and hyperparameters are detailed in App. B. For the baseline algorithms, we use the code shared by their respective authors. We conduct each experiment 5 times and report the means and standard deviations. ... Number of layers in evaluation models: 2 (with Relu in between) for Gcn, Gat, and Gin. The Mlp used in Gin is a simple linear transform with a bias defined by the following equation WX+b where X is the input design matrix. Value of k in Rev-k-NN: 5 Hyper-parameters Baselines: We use the config files shared by the authors. We note that the benchmark datasets are common between our experiments and those used in the baselines.