Topology of Deep Neural Networks
Authors: Gregory Naitzat, Andrey Zhitnikov, Lek-Heng Lim
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We performed extensive experiments on the persistent homology of a wide range of point cloud data sets, both real and simulated. The results consistently demonstrate the following: Neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simple one as it passes through the layers. No matter how complicated the topology of M we begin with, when passed through a well-trained neural network f : Rd Rp, there is a vast reduction in the Betti numbers of both components Ma and Mb; in fact they nearly always reduce to their lowest possible values: βk f(Mi) = 0 for k 1 and β0 f(Mi) = 1, i = a, b. The reduction in Betti numbers is significantly faster for Re LU activation than for hyperbolic tangent activation as the former defines nonhomeomorphic maps that change topology, whereas the latter defines homeomorphic maps that preserve topology. Shallow and deep networks transform data sets differently a shallow network operates mainly through changing geometry and changes topology only in its final layers, a deep one spreads topological changes more evenly across all layers. Keywords: neural networks, topology change, Betti numbers, topological complexity, persistent homology |
| Researcher Affiliation | Academia | Gregory Naitzat EMAIL Department of Statistics University of Chicago Chicago, IL, 60637, USA Andrey Zhitnikov EMAIL Faculty of Electrical Engineering Technion Israel Institute of Technology Haifa 32000, Israel Lek-Heng Lim EMAIL Computational and Applied Mathematics Initiative University of Chicago Chicago, IL, 60637, USA |
| Pseudocode | No | The paper describes mathematical concepts, methodology, and experimental results but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Key findings: This work is primarily an empirical study we have performed more than 10,000 experiments on real and simulated data sets of varying topological complexities and have made our codes available for the reader s further experimentations.2 https://github.com/topnn/topnn_framework. |
| Open Datasets | Yes | We will validate our results on four real-world data sets from (i) MNIST Handwritten Digits (Le Cun et al., 1998), (ii) HTRU2 High Time-Resolution Universe Survey (Lyon et al., 2016), (iii) UCI Banknotes Authentication (Lohweg et al., 2013), (iv) UCI Sensorless Drive Diagnostic (Bayer et al., 2013). |
| Dataset Splits | Yes | We used 60,000 samples to train and the remaining 10,000 samples to test our neural networks; we use one third of the test set for our persistent homology calculations. We use 80% of this balanced data X for training the neural networks and the remaining 20% for testing. |
| Hardware Specification | Yes | These computations are run in parallel over 12 cores of an Intel i7-8750H-2.20GHz processor with 9,216KB cache and 32GB DDR4-2666MHz RAM. The jobs are fed in a queue, with each core limited to 9GB of memory. |
| Software Dependencies | Yes | To train these neural networks to our requirements recall that this means zero training error and a near-zero ( 0.01%) generalization error we relied on Tensor Flow (version 1.12.0 on Ubuntu 16.04.1). Our homology and persistent homology computations are all performed using the Eirene package in Julia 0.6.4 (Henselman and Ghrist, 2016). |
| Experiment Setup | Yes | Training is done on cross-entropy categorical loss with standard Adam optimizer (Kingma and Ba, 2015) for up to 18,000 training epochs. Learning rate is set to 0.02 0.04 with an exponential decay, i.e., ηt/d where t is the training epoch normalized by d = 2500. For the bottleneck architectures where the widths narrow down in the middle layers (see Table 1), the decay is set to 4000 and η = 0.5. We use the softmax function as the score function in all of our networks, i.e., s : Rp Rp whose ith coordinate is si(x) = exi/(ex1 + + exp), i = 1, . . . , p, where p is the number of categories. |