reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Authors: Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3. We conduct numerical experiments, empirically show that training loss will converge, and verify our conclusions regarding the optimization trajectories of trainable parameters. Specifically, the sparsity of the attention score matrix empirically demonstrates that one-layer transformers can effectively learn the optimal variable selection. Additionally, we transfer the pre-trained one-layer transformers to downstream tasks, and empirically show that it can achieve a good generalization performance with a small sample size. All these empirical observations back up our theoretical findings.
Researcher Affiliation	Academia	Chenyang Zhang , Xuran Meng , Yuan Cao The University of Hong Kong University of Michigan, Ann Arbor EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical formulations and theoretical proofs but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	In this section, we conduct experiments using the CIFAR-10 dataset, where each image has a shape of 3x32x32, representing three color channels (RGB).
Dataset Splits	Yes	For this experiment, we select two labels, Frog and Airplane, and use 500 images from each label.
Hardware Specification	No	The paper does not provide specific details about the hardware used (e.g., GPU models, CPU types) for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	We set the learning rate η = 0.5 and train the models for 400 iterations. ... Both experiments use a sample size of 400, and the learning rate is set to 10^-3. ... The transformer model is initialized to 0, and we train it using a batch size of 64 and a learning rate of 10^-3.