reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SEMU: Singular Value Decomposition for Efficient Machine Unlearning

Authors: Marcin Sendera, Łukasz Struski, Kamil Książek, Kryspin Musiol, Jacek Tabor, Dawid Damian Rymarczyk

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that SEMU achieves competitive performance while significantly improving efficiency in terms of both data usage and the number of modified parameters. ... 5. Experimental setup. ... Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. These experiments are performed on widely used datasets, namely CIFAR-10 and CIFAR-100, and employ popular deep learning architectures such as Res Net-18 and VGG-16.
Researcher Affiliation	Collaboration	1Faculty of Mathematics and Computer Science, Jagiellonian University 2Institute of Theoretical and Applied Informatics, Polish Academy of Sciences 3Ardigen SA.
Pseudocode	Yes	D. Algorithms for SEMU unlearning. In this, we present the pseudo-codes for each algorithm used for SEMU. The Section is structured as follows firstly, we present the general procedure for selecting weights in SEMU, Alg. 1. Then, we introduce using SEMU in a classification setting (2), and in the image generation one (3). Algorithm 1 Pseudo code of SEMU selecting weights procedure. Algorithm 2 Pseudo code of SEMU in classification tasks. Algorithm 3 Pseudo code of SEMU in generation tasks.
Open Source Code	Yes	To ensure reproducibility of our work, we make the code publicly available at (link). ... Our code for SEMU and the benchmarks studied is made public at https://github.com/gmum/semu as a base for future work on machine unlearning.
Open Datasets	Yes	Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. These experiments are performed on widely used datasets, namely CIFAR-10 and CIFAR-100... We evaluate SEMU on image generation tasks in both class and concept unlearning settings. Our experiments cover two diffusion model architectures: DDPM and Stable Diffusion, applied to CIFAR-10 and Imagenette datasets.
Dataset Splits	Yes	Specifically, we focus on two scenarios of random data forgetting: 10% and 50% of the training data. ... The setting depends on how the forgetting dataset Df is constructed. In the first scenario, the goal is to remove the influence of randomly selected data points from the training set... We define the forgetting dataset as Df D, with the remaining dataset being its complement, Dr = D \ Df. ... For DDPM, we attempt to unlearn the "airplane" class from CIFAR-10.
Hardware Specification	No	We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2023/016302. Some experiments were performed on servers purchased with funds from the Priority Research Area (Artificial Intelligence Computing Center Core Facility) under the Strategic Programme Excellence Initiative at Jagiellonian University.
Software Dependencies	No	The paper mentions popular deep learning architectures such as Res Net-18 and VGG-16, and diffusion models like DDPMs and Stable Diffusion, but does not specify the versions of any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Within experiments we run grid search to find the best parameter γ [60% 95%] and report the best performing model. ... DDPM: γ [0.9, 0.95] for all layers; Stable Diffusion: γ = 1.0 for cross-attention layers, and γ [0.9, 0.95] for all other layers. ... To finetune the set of parameters {Ri}L i=1 for unlearning, we follow the random labeling unlearning losses proposed by Sal Un. Specifically, we use the classification loss Lc... For the generation task, we apply the generation loss Lg... In the case where the remaining dataset is unavailable, the situation is equivalent to setting α = 0 and β = 0. ... Hyper-parameters: learning rate η, explanation parameter γ, forgetting loss function ℓ, and number of epochs E.