CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition
Authors: Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark graph datasets demonstrate our superiority over comparable baselines on accuracy, fidelity, and F1 score under strict query-size constraints. These results highlight both the susceptibility of deployed GNNs to extraction attacks and the promise of ethical, efficient GNN acquisition methods to support low-resource research environments. Our implementation is publicly available at https://github.com/Lab RAI/CEGA. |
| Researcher Affiliation | Academia | 1Department of Biostatistics, T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA 2Department of Statistics, Florida State University, Tallahassee, Florida, USA 3Department of Computer Science, Florida State University, Tallahassee, Florida, USA 4Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York, USA. Correspondence to: Tianxi Cai <EMAIL>, Yushun Dong <EMAIL>. |
| Pseudocode | Yes | We summarize the algorithmic routine of CEGA in Algorithm 1. Algorithm 1 The Proposed Framework of CEGA |
| Open Source Code | Yes | Our implementation is publicly available at https://github.com/Lab RAI/CEGA. |
| Open Datasets | Yes | Our experiments are conducted on 6 widely used benchmark datasets: (1) Coauthorship networks where nodes are authors and edges represent collaboration, including Coauthor CS and Coauthor-Physics; (2) Co-purchase graphs with nodes as products and edges as items frequently purchased together, including Amazon-Computer and Amazon-Photo; and (3) Academic citation and collaboration network, including Cora-Full and DBLP. These datasets vary in size, complexity, and formality of node attributes, providing a comprehensive basis for evaluating CEGA s performance. The dataset statistics are provided in Appendix B.1. |
| Dataset Splits | Yes | If training and test sets are not provided, we randomly select 60% of the nodes for training and use the remaining 40% for testing. |
| Hardware Specification | Yes | All experiments are conducted on two NVIDIA RTX 6000 Ada GPUs. |
| Software Dependencies | No | The paper mentions training GCN models and using active learning techniques (AGE, GRAIN), but does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | Initially, we train a target model, f T, for 1000 epochs with a learning rate of 1e-3... In the initialization step, we randomly select 2 nodes from each class across all the tested datasets, resulting in a total of 2C nodes... The total budget is capped at 20C. ...For our proposed method, in cycle γ, CEGA queries κ = 1 node and trains a 2-layer GCN model with {Vγ, Ga} for E = 1 epoch. In the analysis for node diversity, we set the weight ρ = 0.8... We set the initial weight coefficients as α1 = α2 = α3 = 0.2, the measurement of the initial weight gap between Rγ 1 and Rγ 2 as = 0.6, the measurement of the curvature for the weight changes as λ = 0.3. After the node selection process, we train a 2-layer GCN with a hidden dimension of 16. The model is optimized with a learning rate of 1e-3 and trained for 1000 epochs. For AGE, we apply a warm-up period of 400 epochs. |