SEMU: Singular Value Decomposition for Efficient Machine Unlearning
Authors: Marcin Sendera, Łukasz Struski, Kamil Książek, Kryspin Musiol, Jacek Tabor, Dawid Damian Rymarczyk
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that SEMU achieves competitive performance while significantly improving efficiency in terms of both data usage and the number of modified parameters. ... 5. Experimental setup. ... Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. These experiments are performed on widely used datasets, namely CIFAR-10 and CIFAR-100, and employ popular deep learning architectures such as Res Net-18 and VGG-16. |
| Researcher Affiliation | Collaboration | 1Faculty of Mathematics and Computer Science, Jagiellonian University 2Institute of Theoretical and Applied Informatics, Polish Academy of Sciences 3Ardigen SA. |
| Pseudocode | Yes | D. Algorithms for SEMU unlearning. In this, we present the pseudo-codes for each algorithm used for SEMU. The Section is structured as follows firstly, we present the general procedure for selecting weights in SEMU, Alg. 1. Then, we introduce using SEMU in a classification setting (2), and in the image generation one (3). Algorithm 1 Pseudo code of SEMU selecting weights procedure. Algorithm 2 Pseudo code of SEMU in classification tasks. Algorithm 3 Pseudo code of SEMU in generation tasks. |
| Open Source Code | Yes | To ensure reproducibility of our work, we make the code publicly available at (link). ... Our code for SEMU and the benchmarks studied is made public at https://github.com/gmum/semu as a base for future work on machine unlearning. |
| Open Datasets | Yes | Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. These experiments are performed on widely used datasets, namely CIFAR-10 and CIFAR-100... We evaluate SEMU on image generation tasks in both class and concept unlearning settings. Our experiments cover two diffusion model architectures: DDPM and Stable Diffusion, applied to CIFAR-10 and Imagenette datasets. |
| Dataset Splits | Yes | Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. ... The setting depends on how the forgetting dataset Df is constructed. In the first scenario, the goal is to remove the influence of randomly selected data points from the training set... We define the forgetting dataset as Df D, with the remaining dataset being its complement, Dr = D \ Df. ... For DDPM, we attempt to unlearn the "airplane" class from CIFAR-10. |
| Hardware Specification | No | We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2023/016302. Some experiments were performed on servers purchased with funds from the Priority Research Area (Artificial Intelligence Computing Center Core Facility) under the Strategic Programme Excellence Initiative at Jagiellonian University. |
| Software Dependencies | No | The paper mentions popular deep learning architectures such as Res Net-18 and VGG-16, and diffusion models like DDPMs and Stable Diffusion, but does not specify the versions of any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Within experiments we run grid search to find the best parameter γ [60% 95%] and report the best performing model. ... DDPM: γ [0.9, 0.95] for all layers; Stable Diffusion: γ = 1.0 for cross-attention layers, and γ [0.9, 0.95] for all other layers. ... To finetune the set of parameters {Ri}L i=1 for unlearning, we follow the random labeling unlearning losses proposed by Sal Un. Specifically, we use the classification loss Lc... For the generation task, we apply the generation loss Lg... In the case where the remaining dataset is unavailable, the situation is equivalent to setting α = 0 and β = 0. ... Hyper-parameters: learning rate η, explanation parameter γ, forgetting loss function ℓ, and number of epochs E. |