Towards Robust Influence Functions with Flat Validation Minima

Authors: Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across various tasks validate the superiority of our approach.
Researcher Affiliation Academia 1Fudan University 2Shanghai Key Laboratory of Intelligent Information Processing 3Innovation Center of Calligraphy and Painting Creation Technology, MCT, China 4Hong Kong Baptist University. Correspondence to: Weizhong Zhang <EMAIL>, Yifan Chen <EMAIL>.
Pseudocode Yes Algorithm 1 Influence Function in Flat Validation Minima
Open Source Code Yes We release the code at: https://github.com/Virusdol/IF-FVM.
Open Datasets Yes Our evaluation is conducted on the CIFAR-10N and CIFAR-100N datasets (Wei et al., 2022), which are real-world noisy label variants of the CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We use Google’s Dream Booth dataset (Ruiz et al., 2023)
Dataset Splits Yes Each dataset contains 10 distinct classes, with 100 total data points in each class. We partitioned the 100 examples into 90 training data points (used for Lo RA) and 10 validation data points for influence estimation. For style generation, we combine three publicly available image-text pair datasets... We use 200 training image-text pairs and 50 validation image-text pairs, resulting in a total of 600 training data points and 150 validation data points. For each subject, 3 data points are used for the training dataset and 1 to 3 data points are used for the validation dataset.
Hardware Specification No The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models, processors, or memory details) used for running the experiments.
Software Dependencies No We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We apply Lo RA to every query and value matrix of the attention layer in the Llama-2-13B-chat model. The training was performed using the Hugging Face Peft library (Mangrulkar et al., 2022). While these libraries are mentioned, specific version numbers are not provided.
Experiment Setup Yes The basic hyper-parameters settings are listed as follows: minibatch size (128), optimizer (SGD), initial learning rate (0.1), momentum (0.9), weight decay (0.0005), number of epochs (100), and learning rate decay (0.1 at 50 epochs). For Lo RA hyperparameters, we set r = 8 and α = 32. For our proposed VM and FVM, to obtain θ, we tune the trained model on the validation set with the following hyperparameter settings: minibatch size (128), optimizer (SGD), initial learning rate (0.01), momentum (0.9), weight decay (0.0005), number of steps (1000), and learning rate decay (cosine). For FVM, SAM (Foret et al., 2021) is used as the flat minima solver, with the hyperparameter γ set to 0.05 for CIFAR-10N and 0.1 for CIFAR-100N, in accordance with the original paper.