reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Benign Overfitting in Token Selection of Attention Mechanism

Authors: Keitaro Sakamoto, Issei Sato

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide experiments to support our theoretical analysis using both synthetic and real-world datasets.
Researcher Affiliation	Academia	1Department of Computer Science, The University of Tokyo, Tokyo, Japan. Correspondence to: Keitaro Sakamoto <EMAIL>, Issei Sato <EMAIL>.
Pseudocode	No	The paper describes methods textually and mathematically but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code is available on Git Hub 1. 1https://github.com/keitaroskmt/ benign-attention
Open Datasets	Yes	We further conducted real-world experiments on image and natural language datasets for classification. For each task, we used the pre-trained Vi T (Dosovitskiy et al., 2021) and BERT (Devlin et al., 2018) models. ... 10-class image classification with MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009), anomaly detection in medical image with Pneumonia MNIST and Breast MNIST (Yang et al., 2023), topic classification of text with AG-news (Zhang et al., 2015), and question type classification with TREC (Li & Roth, 2002).
Dataset Splits	Yes	Table 2 presents the training loss and test accuracy when varying the training size n. ... Training Size n 20 200 1000 ... we used the pre-trained Vi T (Dosovitskiy et al., 2021) and BERT (Devlin et al., 2018) models. ... We used datasets from various types: 10-class image classification with MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009), anomaly detection in medical image with Pneumonia MNIST and Breast MNIST (Yang et al., 2023), topic classification of text with AG-news (Zhang et al., 2015), and question type classification with TREC (Li & Roth, 2002). For detailed descriptions of these datasets, please refer to Appendix F.2.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU or CPU models) used for running the experiments.
Software Dependencies	No	We prepared the pre-trained Vi T (Dosovitskiy et al., 2021) and BERT (Devlin et al., 2018) models using huggingface transformer library (Wolf et al., 2020). ... During the experiments, the AdamW optimizer (Loshchilov & Hutter, 2019) without weight decay was used... While software components like 'huggingface transformer library' and 'AdamW optimizer' are mentioned, specific version numbers for these libraries or other key software dependencies are not provided, only citations to the papers introducing them.
Experiment Setup	Yes	Specifically, we consider the setting with n = 20, T = 8, η = 0.2, ρ = 0.1 and α =5e 3, changing the value of the dimension d and the signal size µ 2. ... During the experiments, the AdamW optimizer (Loshchilov & Hutter, 2019) without weight decay was used with a learning rate of 5e 5, along with linear warmup and learning rate decay.