Controlling Neural Network Smoothness for Neural Algorithmic Reasoning

Authors: David A. Klindt

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that two layer neural networks fail to learn the structure of the task, despite containing multiple solutions of the true function within their hypothesis class. Growing the network s width leads to highly complex error regions in the input space. Moreover, we find that the network fails to generalise with increasing severity i) in the training domain, ii) outside of the training domain but within its convex hull, and iii) outside the training domain s convex hull. This behaviour can be emulated with Gaussian process regressors that use radial basis function kernels of decreasing length scale. Classical results establish an equivalence between Gaussian processes and infinitely wide neural networks. We demonstrate a tight linkage between the scaling of a network weights standard deviation and its effective length scale on a sinusoidal regression problem, suggesting simple modifications to control the length scale of the function learned by a neural network and, thus, its smoothness. This has important applications for the different generalisation scenarios suggested above, but it also suggests a partial remedy to the brittleness of neural network predictions as exposed by adversarial examples. We demonstrate the gains in adversarial robustness that our modification achieves on simple image classification problems.
Researcher Affiliation Academia The paper only lists the author 'David A. Klindt EMAIL' without any institutional affiliation. The email address is a personal Gmail account, providing no direct information about academic or industry affiliation. However, given its publication in 'Transactions on Machine Learning Research', an academic journal, it is broadly categorized as academic, though specific institutional details are absent.
Pseudocode No The paper describes methods and processes using mathematical notation and prose, but it does not include any clearly labeled pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions using 'standard GP implementation in sklearn' and 'standard Res Net9 ... based on this public repository (kaggle)' for third-party tools, but there is no statement or link indicating the release of the authors' own implementation code.
Open Datasets Yes As further evidence in support of the claim that this approach effectively controls the smoothness of the learned NN input-output mapping, we turn towards the more complicated problem of performing image recognition on MNIST (Le Cun et al., 1989) and CIFAR10 (Krizhevsky et al., 2009) under different distribution shifts.
Dataset Splits Yes Reported is the mean squared error (MSE) on the training and test sets, as well as out of distribution Xo.o.d. := [ 1.5, 1.5]2 \ D.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'We use the standard GP implementation in sklearn (Pedregosa et al., 2011)', but it does not specify a version number for sklearn or any other software dependency, which is required for reproducibility.
Experiment Setup Yes We train for 50,000 steps with the Adam optimizer (Kingma & Ba, 2014), an initial learning rate of 0.001 and a learning rate decay of 0.9. We verified in initial experiments that these settings led to best heldout performance and convergence of gradient descent for the tested models. For the adversarial robustness experiments in section 3.5, we use the fast gradient sign attack (Goodfellow et al., 2014).