reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimization and Generalization Guarantees for Weight Normalization

Authors: Pedro Cisneros-Velarde, Zhijie Chen, Sanmi Koyejo, Arindam Banerjee

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present experimental results which illustrate how the normalization terms and other quantities of theoretical interest relate to the training of Weight Norm networks.
Researcher Affiliation	Collaboration	Pedro Cisneros-Velarde EMAIL VMware Research Zhijie Chen EMAIL University of Illinois Urbana-Champaign Sanmi Koyejo EMAIL Stanford University Arindam Banerjee EMAIL University of Illinois Urbana-Champaign
Pseudocode	No	The paper provides mathematical derivations and theoretical analyses but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions PyTorch 2.0 as a library with built-in implementations of Weight Norm (Section 1) and provides a link to its documentation. However, it does not state that the authors are releasing their own code for the methodology described in the paper.
Open Datasets	Yes	We do empirical evaluations on CIFAR-10 (Krizhevsky, 2009) and MNIST (Deng, 2012).
Dataset Splits	No	The paper mentions evaluating on CIFAR-10 and MNIST datasets and using mini-batch SGD, but it does not specify the exact training, validation, or test splits used for these datasets.
Hardware Specification	Yes	Our experiments were conducted on a computing cluster with AMD EPYC 7713 64-Core Processor and NVIDIA A100 Tensor Core GPU.
Software Dependencies	Yes	Pytorch 2.0. Pytorch 2.0 documentation. https://pytorch.org/docs/stable/generated/torch.nn.utils. weight_norm.html. Accessed: 05-09-2023.
Experiment Setup	Yes	We apply mini-batch stochastic gradient descent (SGD) with batch size 512 to optimize the Weight Norm networks under mean squared loss. ... with learning rate 0.001, and weights initialized independently from a uniform distribution [ 0.5 m, 0.5 m]. ... for two different widths m {512, 1024} on the MNIST dataset, ... the weights are initialized with a uniform distribution [ 5 m, 5 m].