reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Topology of Deep Neural Networks

Authors: Gregory Naitzat, Andrey Zhitnikov, Lek-Heng Lim

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed extensive experiments on the persistent homology of a wide range of point cloud data sets, both real and simulated. The results consistently demonstrate the following: Neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simple one as it passes through the layers. No matter how complicated the topology of M we begin with, when passed through a well-trained neural network f : Rd Rp, there is a vast reduction in the Betti numbers of both components Ma and Mb; in fact they nearly always reduce to their lowest possible values: βk f(Mi) = 0 for k 1 and β0 f(Mi) = 1, i = a, b. The reduction in Betti numbers is signiﬁcantly faster for Re LU activation than for hyperbolic tangent activation as the former deﬁnes nonhomeomorphic maps that change topology, whereas the latter deﬁnes homeomorphic maps that preserve topology. Shallow and deep networks transform data sets diﬀerently a shallow network operates mainly through changing geometry and changes topology only in its ﬁnal layers, a deep one spreads topological changes more evenly across all layers. Keywords: neural networks, topology change, Betti numbers, topological complexity, persistent homology
Researcher Affiliation	Academia	Gregory Naitzat EMAIL Department of Statistics University of Chicago Chicago, IL, 60637, USA Andrey Zhitnikov EMAIL Faculty of Electrical Engineering Technion Israel Institute of Technology Haifa 32000, Israel Lek-Heng Lim EMAIL Computational and Applied Mathematics Initiative University of Chicago Chicago, IL, 60637, USA
Pseudocode	No	The paper describes mathematical concepts, methodology, and experimental results but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Key findings: This work is primarily an empirical study we have performed more than 10,000 experiments on real and simulated data sets of varying topological complexities and have made our codes available for the reader s further experimentations.2 https://github.com/topnn/topnn_framework.
Open Datasets	Yes	We will validate our results on four real-world data sets from (i) MNIST Handwritten Digits (Le Cun et al., 1998), (ii) HTRU2 High Time-Resolution Universe Survey (Lyon et al., 2016), (iii) UCI Banknotes Authentication (Lohweg et al., 2013), (iv) UCI Sensorless Drive Diagnostic (Bayer et al., 2013).
Dataset Splits	Yes	We used 60,000 samples to train and the remaining 10,000 samples to test our neural networks; we use one third of the test set for our persistent homology calculations. We use 80% of this balanced data X for training the neural networks and the remaining 20% for testing.
Hardware Specification	Yes	These computations are run in parallel over 12 cores of an Intel i7-8750H-2.20GHz processor with 9,216KB cache and 32GB DDR4-2666MHz RAM. The jobs are fed in a queue, with each core limited to 9GB of memory.
Software Dependencies	Yes	To train these neural networks to our requirements recall that this means zero training error and a near-zero ( 0.01%) generalization error we relied on Tensor Flow (version 1.12.0 on Ubuntu 16.04.1). Our homology and persistent homology computations are all performed using the Eirene package in Julia 0.6.4 (Henselman and Ghrist, 2016).
Experiment Setup	Yes	Training is done on cross-entropy categorical loss with standard Adam optimizer (Kingma and Ba, 2015) for up to 18,000 training epochs. Learning rate is set to 0.02 0.04 with an exponential decay, i.e., ηt/d where t is the training epoch normalized by d = 2500. For the bottleneck architectures where the widths narrow down in the middle layers (see Table 1), the decay is set to 4000 and η = 0.5. We use the softmax function as the score function in all of our networks, i.e., s : Rp Rp whose ith coordinate is si(x) = exi/(ex1 + + exp), i = 1, . . . , p, where p is the number of categories.