reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Robust Influence Functions with Flat Validation Minima

Authors: Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across various tasks validate the superiority of our approach.
Researcher Affiliation	Academia	1Fudan University 2Shanghai Key Laboratory of Intelligent Information Processing 3Innovation Center of Calligraphy and Painting Creation Technology, MCT, China 4Hong Kong Baptist University. Correspondence to: Weizhong Zhang <EMAIL>, Yifan Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Influence Function in Flat Validation Minima
Open Source Code	Yes	We release the code at: https://github.com/Virusdol/IF-FVM.
Open Datasets	Yes	Our evaluation is conducted on the CIFAR-10N and CIFAR-100N datasets (Wei et al., 2022), which are real-world noisy label variants of the CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We use Google’s Dream Booth dataset (Ruiz et al., 2023)
Dataset Splits	Yes	Each dataset contains 10 distinct classes, with 100 total data points in each class. We partitioned the 100 examples into 90 training data points (used for Lo RA) and 10 validation data points for influence estimation. For style generation, we combine three publicly available image-text pair datasets... We use 200 training image-text pairs and 50 validation image-text pairs, resulting in a total of 600 training data points and 150 validation data points. For each subject, 3 data points are used for the training dataset and 1 to 3 data points are used for the validation dataset.
Hardware Specification	No	The paper does not explicitly mention specific hardware models (e.g., GPU/CPU models, processors, or memory details) used for running the experiments.
Software Dependencies	No	We use Lo RA (Hu et al., 2022) to Llama-2-13B-chat (Touvron et al., 2023). We apply Lo RA to every query and value matrix of the attention layer in the Llama-2-13B-chat model. The training was performed using the Hugging Face Peft library (Mangrulkar et al., 2022). While these libraries are mentioned, specific version numbers are not provided.
Experiment Setup	Yes	The basic hyper-parameters settings are listed as follows: minibatch size (128), optimizer (SGD), initial learning rate (0.1), momentum (0.9), weight decay (0.0005), number of epochs (100), and learning rate decay (0.1 at 50 epochs). For Lo RA hyperparameters, we set r = 8 and α = 32. For our proposed VM and FVM, to obtain θ, we tune the trained model on the validation set with the following hyperparameter settings: minibatch size (128), optimizer (SGD), initial learning rate (0.01), momentum (0.9), weight decay (0.0005), number of steps (1000), and learning rate decay (cosine). For FVM, SAM (Foret et al., 2021) is used as the flat minima solver, with the hyperparameter γ set to 0.05 for CIFAR-10N and 0.1 for CIFAR-100N, in accordance with the original paper.