Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier

Authors: Lu Yi, Zhewei Wei

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on real-world datasets demonstrate the efficiency and unlearning efficacy of Scale GUN. Remarkably, Scale GUN accomplishes (ϵ, δ) = (1, 10 4) certified unlearning on the billion-edge graph ogbnpapers100M in 20 seconds for a 5,000 random edge removal request of which only 5 seconds are required for updating the node embeddings compared to 1.91 hours for retraining and 1.89 hours for re-propagation.
Researcher Affiliation Academia Lu Yi & Zhewei Wei Renmin University of China EMAIL
Pseudocode Yes Algorithm 1: Scale GUN Input: Graph data D = (X, Y, A), sequence of removal requests R = {R1, , Rk}, loss function l, parameters L, {wℓ}, rmax, α, γ2, ϵ, δ Algorithm 2: Basic Prop Input: Graph G, level L, threshold rmax, and initialized {q(ℓ), r(ℓ)}, 0 ℓ L Output: New estimated vectors {q(ℓ), r(ℓ)} Algorithm 3: Update Embedding Matrix(Edge Removal) Input: Graph G, level L, edge e = (u, v) to be removed, threshold rmax, {wℓ} and {q(ℓ), r(ℓ)}, 0 ℓ L Output: ˆZ Rn Algorithm 4: Batch Update(Edge Removal) Input: Graph G, level L, edges to be unlearned S = {e1, , ek}, weight coefficients wℓ, threshold rmax, (q(ℓ), r(ℓ)), 0 ℓ L Output: ˆZ Rn
Open Source Code Yes Our code is available at https://github.com/luyi256/Scale GUN.
Open Datasets Yes In this section, we evaluate the performance of Scale GUN on real-world datasets, including three small graph datasets: Cora (Sen et al., 2008), Citeseer (Yang et al., 2016), and Photo (Mc Auley et al., 2015); as well as three large graph datasets: ogbn-arxiv, ogbn-products, and ogbn-papers100M (Hu et al., 2020). Licenses of the datasets. Ogbn-arxiv and ogbn-papers100M are both under the ODC-BY license. Cora, Citeseer, Photo, and ogbn-products are under the CC0 1.0 license, Creative Commons license, Open Data Commons Attribution License, and Amazon license, respectively.
Dataset Splits Yes The public splittings are used for all datasets. Table 6: Statistics of datasets. Dataset n m |C| F train/val/test Cora 2708 5278 7 1433 1208/500/1000 Citeseer 3327 4552 6 3703 1827/500/1000 Photo 7650 119081 8 745 6150/500/1000 ogbn-arxiv 169,343 1,166,243 40 128 90941/29799/48603 ogbn-products 2,449,029 61,859,140 47 128 196615/39323/2213091 ogbn-papers100M 111,059,956 1,615,685,872 172 128 1,207,179/125,265/125,265
Hardware Specification Yes We conduct all experiments on a machine with an Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz CPU, 1007GB memory, and Quadro RTX 8000 GPU in Linux OS.
Software Dependencies No The paper mentions LBFGS and Adam as optimizers, but does not provide specific version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup Yes Unless otherwise stated, we set L = 2, ϵ = 1 for all experiments, averaging results across 5 trials with random seeds. By default, we set δ as 1 #edges for edge unlearning and 1 #nodes for node/feature unlearning in the linear model utility experiments to meet the typical privacy requirements (Chien et al., 2024; Sajadmanesh et al., 2023). Following (Chien et al., 2022a), wℓis set to 1 for ℓ= 2 and 0 for other values. The propagation matrix is set to P = D 1 2 across all the methods for fair comparison. Table 7: Parameters used in the linear model experiments. Dataset rmax λ α Table 8: Parameters used in the deep model experiments. Dataset λ α learning rate hidden size batch size