reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Teaching Arithmetic to Small Transformers

Authors: Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study investigates how even small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy.
Researcher Affiliation	Academia	Nayoung Lee University of Wisconsin-Madison EMAIL Kartik Sreenivasan University of Wisconsin-Madison EMAIL Jason D. Lee Princeton University EMAIL Kangwook Lee University of Wisconsin-Madison EMAIL Dimitris Papailiopoulos University of Wisconsin-Madison EMAIL
Pseudocode	Yes	We present the full pseudo-code in Algorithm 1.
Open Source Code	Yes	Our code is available at https://github.com/lee-ny/teaching_arithmetic
Open Datasets	Yes	For arithmetic tasks like addition, subtraction, and multiplication, we define the training dataset for a binary operator f( ) as Dtrain = {(ai, bi), yi}N i=1 where yi = f(ai, bi). ... We use the Shakespeare dataset (Karpathy, 2015) that includes 1, 115, 394 tokens of text...
Dataset Splits	Yes	The learning rate is chosen from {1e-3, 5e-4, 1e-4, 5e-5} based on validation loss.
Hardware Specification	Yes	All of our experiments on Nano GPT and GPT-2 models are run using Py Torch 2.1 and CUDA 11.7 on Nvidia 2808 TIs and NVIDIA 3090s.
Software Dependencies	Yes	All of our experiments on Nano GPT and GPT-2 models are run using Py Torch 2.1 and CUDA 11.7 on Nvidia 2808 TIs and NVIDIA 3090s.
Experiment Setup	Yes	In this section, we provide a detailed description of our experimental setup, including the model architecture and an overview of the various data formatting and sampling techniques used. ... Table 16: Hyper Parameters used for Nano GPT experiments on arithmetic tasks ... Table 17: Hyper Parameters used for GPT-2 experiments on arithmetic tasks