reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Remove Symmetries to Control Model Expressivity and Improve Optimization

Authors: Liu Ziyin, Yizhou Xu, Isaac Chuang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. ... We apply the method to solve a broad range of practical problems where symmetry-impaired training can be a major concern (Section 6).
Researcher Affiliation	Collaboration	1Research Laboratory of Electronics, Massachusetts Institute of Technology 2Physics & Informatics Laboratories, NTT Research 3Computer and Communication Sciences, Ecole Polytechnique F ed erale de Lausanne 4Department of Physics, Massachusetts Institute of Technology
Pseudocode	No	The paper describes methods and algorithms using mathematical equations and prose (e.g., Equation 6 describes the syre algorithm), but it does not contain a dedicated pseudocode block or algorithm listing with structured, step-by-step instructions formatted like code.
Open Source Code	Yes	An implementation of syre can be found at https://github.com/xu-yz19/syre/.
Open Datasets	Yes	We initialize a two-layer Re LU neural network on a low-capacity state where a fraction of the hidden neurons are identical (corresponding to the symmetric states of the permutation symmetry) and train with and without removing the symmetries. ... Res Net. ... on the CIFAR-10 datasets... In Figure 9, we train a β-VAE (Kingma & Welling, 2013; Higgins et al., 2016) on the Fashion MNIST dataset. ... We train a Resnet18 together with a two-layer projection head over the CIFAR-100 dataset... In Figure 11, we train a CNN... over the MNIST datasets.
Dataset Splits	Yes	For the data, we randomly permute the pixels of the training and test sets for 9 times, forming 10 different tasks (including the original MNIST).
Hardware Specification	Yes	In all experiments, we train the models on the CIFAR10 dataset with a signal A6000 GPU.
Software Dependencies	No	The paper mentions algorithms and frameworks like PPO (Schulman et al., 2017) and Pybullet's Ant problem (Coumans & Bai, 2016), but it does not specify concrete software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x').
Experiment Setup	Yes	We train a two-layer Re LU net in a teacher-student scenario... We choose the Adam optimizer, learning rate 0.01, and weight decay 0.01. ... For syre and weight decay, we choose weight decay from 0.1 to 10. ... For dropout, we choose a dropout rate from 0.01 to 0.6. ... We use a four-layer FCN with 300 neurons in each layer trained on the MNIST dataset with batch size 64.