reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MLPs Learn In-Context on Regression and Classification Tasks

Authors: William Tong, Cengiz Pehlevan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We begin by exploring MLPs behavior in a controlled ICL format, where their specific capacities and weaknesses can be precisely characterized. Specifically, we examine two tasks: in-context regression and in-context classification. Figure 1c plots the MSE achieved by different architectures as a function of total compute.
Researcher Affiliation	Academia	William L. Tong & Cengiz Pehlevan School of Engineering and Applied Sciences Center for Brain Sciences Kempner Institute for the Study of Artificial and Natural Intelligence Harvard University, Cambridge, MA 02138 {wtong@g,cpehlevan@seas}.harvard.edu
Pseudocode	No	The paper describes the model architectures (MLP, MLP-Mixer, Transformer, RB MLP) using mathematical equations and descriptive text in Appendix C, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. For example, for MLPs it states: 'h1(x) = ϕ (W 1x + b1) h2(x) = ϕ (W 2h1(x) + b2) ... hℓ(x) = ϕ (W ℓhℓ 1(x) + bℓ) f MLP(x) = W outhℓ(x) + bout'.
Open Source Code	Yes	For the most precise information on our setup, please refer to our Git Hub code repository: https://github.com/wtong98/mlp-icl
Open Datasets	No	We focus on controlled tasks commonly studied in the ICL literature... These tasks are necessarily synthetic approximations of natural language ICL prompting, but allow us to disambiguate a model s capacity for in-context learning from its ability to attain natural language fluency. Inputs are sampled as x N(0, I) and weights are sampled as β N (0, I/n).
Dataset Splits	No	All training examples are presented online with batch size 128. During testing, we probe the model s performance both on the training distribution where the weights are restricted to a finite pool β U {βi}k i=1 and an unrestricted distribution where the weights are drawn freely β N(0, I/n). The paper describes an online training regime with synthetic data generation rather than predefined train/test/validation splits with fixed datasets.
Hardware Specification	Yes	The per-experiment GPU time on an A100 to generate the above figures are estimated at
Software Dependencies	No	All models are implemented and trained using the Jax (Bradbury et al., 2018) family of libraries, particularly Flax (Heek et al., 2023). Plots are created using Seaborn (Waskom, 2021) and Pandas (pandas development team, 2020). The paper lists software libraries but does not provide specific version numbers for them.
Experiment Setup	Yes	For all tasks, we use ReLU activation functions applied pointwise ϕ(x) = max(x, 0). Widths of all hidden layers are fixed to the same value H. As with all models, all training examples are presented online with batch size 128. Training uses AdamW (Loshchilov and Hutter, 2017) with learning rate α = 1 × 10−4 and weight decay λ = 1 × 10−4. The hyperparameters used to train MLPs on each task are presented in Table 1.