Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Authors: Lu Yi, Zhewei Wei
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-world datasets demonstrate the efficiency and unlearning efficacy of Scale GUN. Remarkably, Scale GUN accomplishes (ϵ, δ) = (1, 10 4) certified unlearning on the billion-edge graph ogbnpapers100M in 20 seconds for a 5,000 random edge removal request of which only 5 seconds are required for updating the node embeddings compared to 1.91 hours for retraining and 1.89 hours for re-propagation. |
| Researcher Affiliation | Academia | Lu Yi & Zhewei Wei Renmin University of China EMAIL |
| Pseudocode | Yes | Algorithm 1: Scale GUN Input: Graph data D = (X, Y, A), sequence of removal requests R = {R1, , Rk}, loss function l, parameters L, {wℓ}, rmax, α, γ2, ϵ, δ Algorithm 2: Basic Prop Input: Graph G, level L, threshold rmax, and initialized {q(ℓ), r(ℓ)}, 0 ℓ L Output: New estimated vectors {q(ℓ), r(ℓ)} Algorithm 3: Update Embedding Matrix(Edge Removal) Input: Graph G, level L, edge e = (u, v) to be removed, threshold rmax, {wℓ} and {q(ℓ), r(ℓ)}, 0 ℓ L Output: ˆZ Rn Algorithm 4: Batch Update(Edge Removal) Input: Graph G, level L, edges to be unlearned S = {e1, , ek}, weight coefficients wℓ, threshold rmax, (q(ℓ), r(ℓ)), 0 ℓ L Output: ˆZ Rn |
| Open Source Code | Yes | Our code is available at https://github.com/luyi256/Scale GUN. |
| Open Datasets | Yes | In this section, we evaluate the performance of Scale GUN on real-world datasets, including three small graph datasets: Cora (Sen et al., 2008), Citeseer (Yang et al., 2016), and Photo (Mc Auley et al., 2015); as well as three large graph datasets: ogbn-arxiv, ogbn-products, and ogbn-papers100M (Hu et al., 2020). Licenses of the datasets. Ogbn-arxiv and ogbn-papers100M are both under the ODC-BY license. Cora, Citeseer, Photo, and ogbn-products are under the CC0 1.0 license, Creative Commons license, Open Data Commons Attribution License, and Amazon license, respectively. |
| Dataset Splits | Yes | The public splittings are used for all datasets. Table 6: Statistics of datasets. Dataset n m |C| F train/val/test Cora 2708 5278 7 1433 1208/500/1000 Citeseer 3327 4552 6 3703 1827/500/1000 Photo 7650 119081 8 745 6150/500/1000 ogbn-arxiv 169,343 1,166,243 40 128 90941/29799/48603 ogbn-products 2,449,029 61,859,140 47 128 196615/39323/2213091 ogbn-papers100M 111,059,956 1,615,685,872 172 128 1,207,179/125,265/125,265 |
| Hardware Specification | Yes | We conduct all experiments on a machine with an Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz CPU, 1007GB memory, and Quadro RTX 8000 GPU in Linux OS. |
| Software Dependencies | No | The paper mentions LBFGS and Adam as optimizers, but does not provide specific version numbers for any software dependencies, libraries, or programming languages. |
| Experiment Setup | Yes | Unless otherwise stated, we set L = 2, ϵ = 1 for all experiments, averaging results across 5 trials with random seeds. By default, we set δ as 1 #edges for edge unlearning and 1 #nodes for node/feature unlearning in the linear model utility experiments to meet the typical privacy requirements (Chien et al., 2024; Sajadmanesh et al., 2023). Following (Chien et al., 2022a), wℓis set to 1 for ℓ= 2 and 0 for other values. The propagation matrix is set to P = D 1 2 across all the methods for fair comparison. Table 7: Parameters used in the linear model experiments. Dataset rmax λ α Table 8: Parameters used in the deep model experiments. Dataset λ α learning rate hidden size batch size |