Weight-balancing fixes and flows for deep learning

Authors: Lawrence K. Saul

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Fig. 2 plots the convergence of the multiplicative updates in Algorithm 1 for different values of p and q and for three randomly initialized networks with differing numbers of hidden layers but the same overall numbers of input (200), hidden (3750), and output (10) units. From shallowest to deepest, the networks had 200-2500-1250-10 units, 200-2000-1000-500-250-10 units, and 200-1000-750-750-500-500-250-10 units. The networks were initialized with zero-valued biases and zero-mean Gaussian random weights whose variances were inversely proportional to the fan-in at each unit (He et al., 2015). The panels in the figure plot the ratio W p,q/ W0 p,q as a function of the number of multiplicative updates, where W0 p,q and W p,q are respectively the ℓp,q-norms, defined in eq. (1), of the initial and updated weight matrix. Results are shown for several values for p and q.
Researcher Affiliation Industry Lawrence K. Saul EMAIL Flatiron Institute, Center for Computational Mathematics 162 Fifth Avenue, New York, NY 10010
Pseudocode Yes Algorithm 1 Given a network with weights W0 and biases b0, this procedure returns a functionally equivalent network whose rescaled weights W and biases b minimize the norm W p,q in eq. (1) up to some tolerance δ > 0. The set H contains the indices of the network s hidden units.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions related work and future directions but no explicit link or statement about releasing code for this paper's contributions.
Open Datasets No Fig. 2 plots the convergence of the multiplicative updates in Algorithm 1 for different values of p and q and for three randomly initialized networks with differing numbers of hidden layers but the same overall numbers of input (200), hidden (3750), and output (10) units. ... The networks were initialized with zero-valued biases and zero-mean Gaussian random weights... The experiments are performed on these randomly initialized synthetic networks, not a publicly available dataset.
Dataset Splits No The paper uses randomly initialized synthetic networks for its demonstration, rather than a specific dataset. Therefore, there are no dataset splits to specify.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes The networks were initialized with zero-valued biases and zero-mean Gaussian random weights whose variances were inversely proportional to the fan-in at each unit (He et al., 2015).