Weight-balancing fixes and flows for deep learning
Authors: Lawrence K. Saul
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Fig. 2 plots the convergence of the multiplicative updates in Algorithm 1 for different values of p and q and for three randomly initialized networks with differing numbers of hidden layers but the same overall numbers of input (200), hidden (3750), and output (10) units. From shallowest to deepest, the networks had 200-2500-1250-10 units, 200-2000-1000-500-250-10 units, and 200-1000-750-750-500-500-250-10 units. The networks were initialized with zero-valued biases and zero-mean Gaussian random weights whose variances were inversely proportional to the fan-in at each unit (He et al., 2015). The panels in the figure plot the ratio W p,q/ W0 p,q as a function of the number of multiplicative updates, where W0 p,q and W p,q are respectively the ℓp,q-norms, defined in eq. (1), of the initial and updated weight matrix. Results are shown for several values for p and q. |
| Researcher Affiliation | Industry | Lawrence K. Saul EMAIL Flatiron Institute, Center for Computational Mathematics 162 Fifth Avenue, New York, NY 10010 |
| Pseudocode | Yes | Algorithm 1 Given a network with weights W0 and biases b0, this procedure returns a functionally equivalent network whose rescaled weights W and biases b minimize the norm W p,q in eq. (1) up to some tolerance δ > 0. The set H contains the indices of the network s hidden units. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It mentions related work and future directions but no explicit link or statement about releasing code for this paper's contributions. |
| Open Datasets | No | Fig. 2 plots the convergence of the multiplicative updates in Algorithm 1 for different values of p and q and for three randomly initialized networks with differing numbers of hidden layers but the same overall numbers of input (200), hidden (3750), and output (10) units. ... The networks were initialized with zero-valued biases and zero-mean Gaussian random weights... The experiments are performed on these randomly initialized synthetic networks, not a publicly available dataset. |
| Dataset Splits | No | The paper uses randomly initialized synthetic networks for its demonstration, rather than a specific dataset. Therefore, there are no dataset splits to specify. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | The networks were initialized with zero-valued biases and zero-mean Gaussian random weights whose variances were inversely proportional to the fan-in at each unit (He et al., 2015). |