The Asymptotic Performance of Linear Echo State Neural Networks

Authors: Romain Couillet, Gilles Wainrib, Harry Sevi, Hafiz Tiomoko Ali

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the present article, we consider linear ESN s with a general connectivity matrix and internal network noise. By leveraging tools from the field of random matrix theory, we shall attempt to provide a first theoretical study of the performance of ESN s. Beyond the obvious advantage of exploiting theoretical formulas to select the optimal hyper-parameters, this mathematical study reveals key quantities that intimately relate the internal network memory to the input-target relationship, therefore contributing to a better understanding of short-term memory properties of RNNs. Section 3: Applications In this section, we shall further estimate the results of Corollary 5 and Corollary 11 in specific settings for the network connectivity matrix W and the input weights m. By leveraging specific properties of certain stochastic models for W (such as invariance by orthogonal matrix product or by normality), the results of Section 2 will be greatly simplified, by then providing further insights on the network performance. Figure 3: Training and testing (normalized) MSE for the Mackey Glass one-step ahead task, W fixed and defined as in Figure 1, n = 200, T = ˆT = 400 (left) and n = 400, T = ˆT = 800 (right). Comparison between Monte Carlo simulations (Monte Carlo), deterministic approximation assuming W fixed (Th. (fixed W)) as per Corollaries 5 and 11, and assuming W random in the large n limit (Th. (limit)) as per Corollary 13.
Researcher Affiliation Academia Romain Couillet EMAIL Centrale Sup elec LSS Universit e Paris Sud (Gif-sur-Yvette, France). Gilles Wainrib EMAIL D epartement Informatique, team DATA, Ecole Normale Sup erieure (Paris, France). Harry Sevi EMAIL Laboratoire de Physique, Ecole Normale Sup erieure de Lyon (Lyon, France). Hafiz Tiomoko Ali EMAIL Centrale Sup elec LSS Universit e Paris Sud (Gif-sur-Yvette, France).
Pseudocode No The paper describes mathematical derivations and methodologies textually and with equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing code, nor does it provide links to source code repositories for the methodology described.
Open Datasets Yes As a practical example, we provide in Figure 3 Monte Carlo simulations versus theory curves of the training and testing performances of networks of n = 200 and n = 400 nodes, for training and testing times T = ˆT = 2n, on the Mackey Glass one-step ahead anticipation task (Glass and Mackey, 1979).
Dataset Splits Yes As a practical example, we provide in Figure 3 Monte Carlo simulations versus theory curves of the training and testing performances of networks of n = 200 and n = 400 nodes, for training and testing times T = ˆT = 2n, on the Mackey Glass one-step ahead anticipation task (Glass and Mackey, 1979). Figure 4: Testing (normalized) MSE for the Mackey Glass one-step ahead task, W fixed and defined as in Figure 1, n = 400, T = ˆT = 200.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or cloud resources with specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific names of software or libraries along with their version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes In the present article, we consider linear ESN s with a general connectivity matrix and internal network noise. More specifically, we shall consider an n-node ESN trained with an input of size T and shall show that, assuming the internal noise variance η2 remains large compared to 1/ n, the training and testing performances of the network can be well approximated by a deterministic quantity which is a function of the training and test data as well as the connectivity matrix. Remark 16 (Selecting W based on delayed correlations) ...For instance, if ˆbi = αi 1 for some α ( 1, 1), it is easily shown that an optimal choice for W = σZ with Z Haar is to take σ2 = |α|. Figure 3: Training and testing (normalized) MSE for the Mackey Glass one-step ahead task, W fixed and defined as in Figure 1, n = 200, T = ˆT = 400 (left) and n = 400, T = ˆT = 800 (right). In a second experiment, we shall illustrate the noise resurgence effect discussed earlier in Remark 6. In Figure 10, we specifically draw the curves of the testing MSE variances for various experiments conducted earlier in the article. Figure 11: Testing (normalized) MSE for the Mackey Glass one-step ahead task, W (multimemory) versus W + 1 = .99Z+ 1 , W + 2 = .9Z+ 2 , W + 3 = .5Z+ 3 (with Z+ i Haar distributed) all defined as in Figure 1, n = 400, T = ˆT = 800.