Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks
Authors: Akshay Kumar, Jarvis Haupt
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For illustration, we provide a brief toy example showing the phenomenon of directional convergence near small initialization. We train a single-layer squared Re LU neural network using gradient descent and small initialization, and provide in Figure 1 a visual depiction of (a) the overall loss and the ℓ2 norm of the network weights, and (b) the angle the weight vectors make with the positive horizontal axis, all as a function of the number of training iterations. (See the figure caption for more specific experimental details.) |
| Researcher Affiliation | Academia | Akshay Kumar EMAIL Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN Jarvis Haupt EMAIL Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN |
| Pseudocode | No | The paper describes methods and proofs using mathematical equations and lemmas, but it does not include any sections explicitly labeled 'Pseudocode' or 'Algorithm,' nor does it present any structured code-like procedures. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or supplementary materials containing code. |
| Open Datasets | No | For training, we use 50 unit norm inputs and corresponding labels are generated using the function H (x1, x2) = 5 max(0, x1)2 + 4 max(0, x1)2. We use square loss and optimize using gradient descent for 50000 iterations with step-size 5e-5. At initialization, the weights of each hidden neuron are drawn from Gaussian distribution with standard deviation 10e-5. |
| Dataset Splits | No | The paper describes a generated dataset of '50 unit norm inputs' for illustrative toy examples, but it does not specify any explicit training, validation, or test splits for this data. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or other computational resources used for the experiments. |
| Software Dependencies | No | The paper does not specify any software versions for libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | We use square loss and optimize using gradient descent for 50000 iterations with step-size 5e-5. At initialization, the weights of each hidden neuron are drawn from Gaussian distribution with standard deviation 10e-5. |