reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Network That Learns Strassen Multiplication

Authors: Veit Elser

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train these networks with the conservative learning rule, which makes minimal changes to the weights so as to give the correct output for each input at the time the input-output pair is received. Conservative learning needs a few thousand examples to ﬁnd the rank 7 decomposition of M2, and 105 for the rank 23 decomposition of M3 (the lowest known). High precision is critical, especially for M3, to discriminate between true decompositions and border approximations. We also present numerical experiments that are unique in the degree of precision required to establish results.
Researcher Affiliation	Academia	Veit Elser EMAIL Department of Physics Cornell Ithaca, NY 14853-2501, USA
Pseudocode	Yes	Appendix A. Conservative Learning Rules for the Strassen Network 1. Forward propagate (a, b) using (9) and (10) to get ( a , b , c = a b ) and c. 2. From the output discrepancy c c compute γ using (18). Computing the scalar multiple of c c in this expression requires a single backward-forward propagation by the matrix G in (17). 3. The update c of Wc is the rank 1 matrix (12) constructed from γ and c . 4. Backward propagate γ by (14) to obtain α and β. 5. The updates ( a, b) of (Wa, Wb) are given by the rank 1 matrices (7) constructed from (α, β) and (a, b).
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, a link to a code repository, or information about code in supplementary materials.
Open Datasets	No	Our input matrices were simply constructed from uniform samples of [ 1, 1] in each element of A and B. After rescaling to conform with our normalization convention Tr AT A = Tr BT B = 1, these together with C = AB were handed to the network for training.
Dataset Splits	No	Figure 2 shows the decomposition error ϵ converging linearly, after an initial training set of only a few thousand. We did not explore this dimension in our experiments with the Strassen network. The paper describes using "streams of random instances of correctly multiplied matrices" for training, but does not specify train/test/validation splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies	No	The paper does not provide any specific software dependencies or version numbers used in the experiments.
Experiment Setup	Yes	The weights were initialized with independent uniform samples from a symmetric interval about the origin. For simplicity we therefore chose to always initialize with samples drawn from [ 1, 1]. Figure 5: Distribution of the ﬁnal decomposition error of M3 in 103 runs, each with 108 training items and terminated when the error dropped below 10 14.