Towards Robust Influence Functions with Flat Validation Minima
Authors: Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across various tasks validate the superiority of our approach. |
| Researcher Affiliation | Academia | 1Fudan University 2Shanghai Key Laboratory of Intelligent Information Processing 3Innovation Center of Calligraphy and Painting Creation Technology, MCT, China 4Hong Kong Baptist University. Correspondence to: Weizhong Zhang <EMAIL>, Yifan Chen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Influence Function in Flat Validation Minima |
| Open Source Code | Yes | We release the code at: https://github.com/Virusdol/IF-FVM. |
| Open Datasets | Yes | Our evaluation is conducted on the CIFAR-10N and CIFAR-100N datasets (Wei et al., 2022), which are real-world noisy label variants of the CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We use Google’s Dream Booth dataset (Ruiz et al., 2023) |
| Dataset Splits | Yes | Each dataset contains 10 distinct classes, with 100 total data points in each class. We partitioned the 100 examples into 90 training data points (used for Lo RA) and 10 validation data points for influence estimation. For style generation, we combine three publicly available image-text pair datasets... We use 200 training image-text pairs and 50 validation image-text pairs, resulting in a total of 600 training data points and 150 validation data points. For each subject, 3 data points are used for the training dataset and 1 to 3 data points are used for the validation dataset. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models, processors, or memory details) used for running the experiments. |
| Software Dependencies | No | We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We apply Lo RA to every query and value matrix of the attention layer in the Llama-2-13B-chat model. The training was performed using the Hugging Face Peft library (Mangrulkar et al., 2022). While these libraries are mentioned, specific version numbers are not provided. |
| Experiment Setup | Yes | The basic hyper-parameters settings are listed as follows: minibatch size (128), optimizer (SGD), initial learning rate (0.1), momentum (0.9), weight decay (0.0005), number of epochs (100), and learning rate decay (0.1 at 50 epochs). For Lo RA hyperparameters, we set r = 8 and α = 32. For our proposed VM and FVM, to obtain θ, we tune the trained model on the validation set with the following hyperparameter settings: minibatch size (128), optimizer (SGD), initial learning rate (0.01), momentum (0.9), weight decay (0.0005), number of steps (1000), and learning rate decay (cosine). For FVM, SAM (Foret et al., 2021) is used as the flat minima solver, with the hyperparameter γ set to 0.05 for CIFAR-10N and 0.1 for CIFAR-100N, in accordance with the original paper. |