A Network That Learns Strassen Multiplication
Authors: Veit Elser
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train these networks with the conservative learning rule, which makes minimal changes to the weights so as to give the correct output for each input at the time the input-output pair is received. Conservative learning needs a few thousand examples to find the rank 7 decomposition of M2, and 105 for the rank 23 decomposition of M3 (the lowest known). High precision is critical, especially for M3, to discriminate between true decompositions and border approximations. We also present numerical experiments that are unique in the degree of precision required to establish results. |
| Researcher Affiliation | Academia | Veit Elser EMAIL Department of Physics Cornell Ithaca, NY 14853-2501, USA |
| Pseudocode | Yes | Appendix A. Conservative Learning Rules for the Strassen Network 1. Forward propagate (a, b) using (9) and (10) to get ( a , b , c = a b ) and c. 2. From the output discrepancy c c compute γ using (18). Computing the scalar multiple of c c in this expression requires a single backward-forward propagation by the matrix G in (17). 3. The update c of Wc is the rank 1 matrix (12) constructed from γ and c . 4. Backward propagate γ by (14) to obtain α and β. 5. The updates ( a, b) of (Wa, Wb) are given by the rank 1 matrices (7) constructed from (α, β) and (a, b). |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, a link to a code repository, or information about code in supplementary materials. |
| Open Datasets | No | Our input matrices were simply constructed from uniform samples of [ 1, 1] in each element of A and B. After rescaling to conform with our normalization convention Tr AT A = Tr BT B = 1, these together with C = AB were handed to the network for training. |
| Dataset Splits | No | Figure 2 shows the decomposition error ϵ converging linearly, after an initial training set of only a few thousand. We did not explore this dimension in our experiments with the Strassen network. The paper describes using "streams of random instances of correctly multiplied matrices" for training, but does not specify train/test/validation splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide any specific software dependencies or version numbers used in the experiments. |
| Experiment Setup | Yes | The weights were initialized with independent uniform samples from a symmetric interval about the origin. For simplicity we therefore chose to always initialize with samples drawn from [ 1, 1]. Figure 5: Distribution of the final decomposition error of M3 in 103 runs, each with 108 training items and terminated when the error dropped below 10 14. |