UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Authors: Yash Sinha, Murari Mandal, Mohan Kankanhalli
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments and Results 3.1 Experiments Experimental Setup. We use the identical experimental settings as in the case of RWHP (Liu et al. (2024a)) using the Wikipedia Person Unlearn (WPU) dataset. The LLM must unlearn multiple individuals simultaneously, capturing the nuances of both forgetting and retaining relevant knowledge. Datasets. Metrics. We utilize multiple metrics to assess the performance of the model across various dimensions. Baselines. We evaluate our method against eight baselines: Models and Implementation. 3.2 Results. |
| Researcher Affiliation | Academia | Yash Sinha EMAIL School of Computing National University of Singapore Murari Mandal EMAIL Resp AI Lab, School of Computer Engineering KIIT Bhubaneswar, India Mohan Kankanhalli EMAIL School of Computing National University of Singapore |
| Pseudocode | Yes | Algorithm 1: Un Star: This algorithm outlines how to generate anti-samples from the forget set and fine-tune the model while preserving knowledge from the retain set. |
| Open Source Code | Yes | Source code: https://github.com/Machine Unlearn/Un Star |
| Open Datasets | Yes | We use the identical experimental settings as in the case of RWHP (Liu et al. (2024a)) using the Wikipedia Person Unlearn (WPU) dataset. Similar to WPU, the Peter Parker forgetting dataset, is constructed using GPT-4-turbo and GPT-3.5-turbo as presented in Opt-Out Choi et al. (2025). TOFU dataset Maini et al. (2024) contains QA pairs about fictitious authors. |
| Dataset Splits | Yes | The WPU dataset includes a diverse set of individuals designated as unlearning targets, along with their associated documents and test data in a free-response question-answering (QA) format. This setup assesses three distinct knowledge types. ❶Forget QA (FQA): These questions target the unlearning subjects with answers sourced from the unlearning documents. ... ❷Hard-retain QA (HRQA): ... ❸General-retain QA (GRQA): ... The dataset includes 100 examples for the forgetting set Df and 300 examples for retaining set Dr, generated using a diverse set of prompts. TOFU dataset ... is also divided into retain and forget sets. The detailed statistics are presented in Table 3. |
| Hardware Specification | Yes | All experiments were conducted on an Apple M3 Pro chip with 18 GB of unified memory. |
| Software Dependencies | No | We evaluate our approach using the Mistral 7B Instruct v0.3 model, a compact yet powerful language model fine-tuned for instruction-based tasks. We fine-tune the Mistral 7B model using Lo RA (Low-Rank Adaptation) via the mlx-lm library. |
| Experiment Setup | Yes | For Un Star, we run over multiple iterations. For each iteration, 20 paraphrased questions and incorrect answers are generated. Semantically divergent questions and near-correct incorrect answers are filtered. Misleading justifications are generated for the retained questions, and the model is fine-tuned for 10 epochs. Iterations continue until the target is unlearned. For WPU and Peter Parker, the training hyperparameters are shown in Table 4. |