Monotone Learning with Rectified Wire Networks

Authors: Veit Elser, Dan Schmidt, Jonathan Yedidia

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate a training algorithm using this update, called sequential deactivation (SDA), on MNIST and some synthetic datasets. Upon adopting a natural choice for the nodal weights, SDA has no hyperparameters other than those describing the network structure. Our experiments explore behavior with respect to network size and depth in a family of sparse expander networks.
Researcher Affiliation Collaboration Veit Elser EMAIL Department of Physics Cornell University Ithaca, NY 14853-2501, USA Dan Schmidt EMAIL and Jonathan Yedidia EMAIL Analog Devices, Inc. Boston, MA, USA
Pseudocode Yes Algorithm 1 Elementary network procedures
Open Source Code Yes This construction is implemented by the publicly available1C program expander and was used for all the experiments reported in the next section. All our experiments were carried out with a publicly available1 C implementation of the SDA algorithm called rainman. (footnote 1. github.com/veitelser/rectified-wires)
Open Datasets Yes Conservative learning on rectified wire networks with the SDA algorithm is demonstrated for MNIST and synthetic datasets.
Dataset Splits Yes Seen as images, the MNIST handwritten digits (Le Cun et al., 1998) are analog data. We compute the cumulative probability function from the training data and use the same function when processing the test data (with test samples below the minimum or above the maximum training samples mapped to 0 and 1 respectively).
Hardware Specification Yes On a single Intel Xeon 2.00GHz core rainman runs at a rate of 50ns per iteration per network edge.
Software Dependencies No The paper mentions 'C program expander', 'C implementation of the SDA algorithm called rainman', and 'C++ SGD optimizer' but does not specify version numbers for these software components.
Experiment Setup Yes The two-parameter sparse expander networks offer a convenient way to study behavior both with respect to network size and depth. The mini-batch size for SGD was fixed at 100 and we employed standard stochastic gradient descent without momentum. With the learning rate set at 0.002, training accuracy reaches a maximum of 91.5% in 6 minutes; this is also the test accuracy for this mode of operation. [...] switching after 10 epochs improved training and test accuracies to 94.4% and 94.1%, respectively. This trend in improvement continues and reaches 96.8% (training) and 95.6% (test) when the switch is made after 20 epochs (121 minutes).