reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

Authors: Akshay Kumar, Jarvis Haupt

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For illustration, we provide a brief toy example showing the phenomenon of directional convergence near small initialization. We train a single-layer squared Re LU neural network using gradient descent and small initialization, and provide in Figure 1 a visual depiction of (a) the overall loss and the ℓ2 norm of the network weights, and (b) the angle the weight vectors make with the positive horizontal axis, all as a function of the number of training iterations. (See the ﬁgure caption for more speciﬁc experimental details.)
Researcher Affiliation	Academia	Akshay Kumar EMAIL Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN Jarvis Haupt EMAIL Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN
Pseudocode	No	The paper describes methods and proofs using mathematical equations and lemmas, but it does not include any sections explicitly labeled 'Pseudocode' or 'Algorithm,' nor does it present any structured code-like procedures.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or supplementary materials containing code.
Open Datasets	No	For training, we use 50 unit norm inputs and corresponding labels are generated using the function H (x1, x2) = 5 max(0, x1)2 + 4 max(0, x1)2. We use square loss and optimize using gradient descent for 50000 iterations with step-size 5e-5. At initialization, the weights of each hidden neuron are drawn from Gaussian distribution with standard deviation 10e-5.
Dataset Splits	No	The paper describes a generated dataset of '50 unit norm inputs' for illustrative toy examples, but it does not specify any explicit training, validation, or test splits for this data.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or other computational resources used for the experiments.
Software Dependencies	No	The paper does not specify any software versions for libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	We use square loss and optimize using gradient descent for 50000 iterations with step-size 5e-5. At initialization, the weights of each hidden neuron are drawn from Gaussian distribution with standard deviation 10e-5.