reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Authors: Shuai Li, Zhao Song, Yu Xia, Tong Yu, Tianyi Zhou

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present our numerical experiments to validate our theoretical results that when training self-attention-only Transformers for softmax regression tasks, the models learned by gradient-descent and Transformers show great similarity.
Researcher Affiliation	Collaboration	Shuai Li Shanghai Jiao Tong University EMAIL Zhao Song Simons Institute for the Theory of Computing, UC Berkeley EMAIL Yu Xia University of California, San Diego EMAIL Tong Yu Adobe Research EMAIL Tianyi Zhou University of Southern California EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The data and code are planned to be released upon acceptance and approval.
Open Datasets	No	According to Definition 1.3, we construct the synthetic softmax regression tasks consists of randomly sampled length-n documents A Rn d where each word has the d-dimensional embedding and targets b Rn. Each document is generated from a unique random seed. The paper does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset.
Dataset Splits	No	To compare the trained single self-attention layer with a softmax unit and the softmax regression model trained with one-step gradient descent, we sample 10^3 tasks and record the losses of two models. While a 'training set' of tasks is mentioned for learning rate, explicit train/validation/test splits of a dataset are not described in the typical sense for reproducibility.
Hardware Specification	Yes	All experiments run on a single NVIDIA RTX2080Ti GPU with 10 independent repetitions.
Software Dependencies	No	The paper does not specify any software versions or library dependencies required for replication.
Experiment Setup	Yes	For the single self-attention layer with a softmax unit, we choose the learning rate ηSA = 0.005. For the softmax regression model, we determine the optimal learning rate ηGD by minimizing the ℓ2 regression loss over a training set of 103 tasks through line search.