reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linear algebra with transformers

Authors: Francois Charton

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, I investigate the capability of transformers to learn to perform numerical computations with high accuracy. I focus on nine problems of linear algebra, from basic operations on dense matrices to inversion, eigen and singular value decomposition. I show that small transformers can be trained, from examples only, to compute approximate solutions (up to a few percents of the L1 norm) with more than 90% accuracy (over 99% in most cases). I propose and discuss four encodings to represent real numbers, and train small sequence to sequence transformers (up to 6 layers, 10 to 50 million trainable parameters) from generated datasets of random matrices. I investigate different architectures, in particular asymmetric configurations where the encoder or decoder has only one layer. Finally, I show that the models are robust to noisy data, and that they can generalize out of their training distribution if special attention is paid to training data generation.
Researcher Affiliation	Industry	François Charton Meta AI EMAIL
Pseudocode	No	The paper describes the problems and methods used in prose and through experimental results tables, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for the model and experiments is available at github.com/facebookresearch/LAWT.
Open Datasets	No	For each problem, the training data is generated by sampling random input matrices I (see section 2.2), and computing the output O with a linear algebra package (Num Py linalg). All coefficients in I and O are set in base ten floating-point representation, and rounded to three significant digits in the mantissa.
Dataset Splits	Yes	At the end of every epoch (300,000 examples), a random test set (10,000 examples) is generated and model accuracy is evaluated. A predicted sequence is a correct solution to the problem (I, O) (I and O the input and output matrices) if it can be decoded as a valid matrix P and approximates the correct solution to a given tolerance τ.
Hardware Specification	Yes	All models are trained on an internal cluster, using NVIDIA Volta GPU with 32GB memory.
Software Dependencies	No	The paper mentions using a "linear algebra package (Num Py linalg)" and that the models "run in Python" for comparison, but it does not specify version numbers for NumPy or Python or any other key software dependencies.
Experiment Setup	Yes	All models use the transformer architecture from Vaswani et al. (2017): an encoder and a decoder connected by cross-attention. Models have 512 dimensions, 8 attention heads and up to 6 layers (experiments with larger models can be found in Appendix D.3). Training is supervised, minimizes the cross-entropy between model predictions and correct solutions, and uses the Adam optimiser (Kingma & Ba, 2014) with a learning rate of 10 4, a linear warm-up phase of 10,000 steps and cosine scheduling (Loshchilov & Hutter, 2016). Training data is generated on the fly in batches of 64.