Associative memory and dead neurons

Authors: Vladimir Fanaskov, Ivan Oseledets

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study this energy function and identify that it is vulnerable to the problem of dead neurons. Each point in the state space where the neuron dies is contained in a non-compact region with constant energy. In these flat regions, energy function alone does not completely determine all degrees of freedom and, as a consequence, can not be used to analyze stability or find steady states or basins of attraction. We perform a direct analysis of the dynamical system and show how to resolve problems caused by flat directions corresponding to dead neurons: (i) all information about the state vector at a fixed point can be extracted from the energy and Hessian matrix (of Lagrange function), (ii) it is enough to analyze stability in the range of Hessian matrix, (iii) if steady state touching flat region is stable the whole flat region is the basin of attraction. The analysis of the Hessian matrix can be complicated for realistic architectures, so we show that for a slightly altered dynamical system (with the same structure of steady states), one can derive a diverse family of Lyapunov functions that do not have flat regions corresponding to dead neurons. In addition, these energy functions allow one to use Lagrange functions with Hessian matrices that are not necessarily positive definite and even consider architectures with non-symmetric feedforward and feedback connections.
Researcher Affiliation Academia Vladimir Fanaskov AIRI, Skoltech EMAIL Ivan Oseledets AIRI, Skoltech
Pseudocode No The paper describes mathematical models and dynamical systems using equations and prose, but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper does not contain any explicit statements about releasing source code, links to code repositories, or mention of code provided in supplementary materials.
Open Datasets No The paper is theoretical and focuses on mathematical analysis of dynamical systems and energy functions. It does not describe any experiments that would use specific datasets, nor does it provide access information for any datasets.
Dataset Splits No The paper does not mention using any datasets for empirical evaluation; therefore, there is no information about dataset splits.
Hardware Specification No The paper is purely theoretical and focuses on mathematical models and their properties. It does not describe any computational experiments or the specific hardware used to perform them.
Software Dependencies No The paper is theoretical and focuses on mathematical derivations. It does not describe any specific software or libraries with version numbers that would be needed to replicate experimental results or implementations.
Experiment Setup No The paper is a theoretical work focusing on the analysis of associative memory models and Lyapunov functions. It does not describe any empirical experiments, and consequently, no experimental setup details such as hyperparameters, training configurations, or system-level settings are provided.