Remove Symmetries to Control Model Expressivity and Improve Optimization

Authors: Liu Ziyin, Yizhou Xu, Isaac Chuang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. ... We apply the method to solve a broad range of practical problems where symmetry-impaired training can be a major concern (Section 6).
Researcher Affiliation Collaboration 1Research Laboratory of Electronics, Massachusetts Institute of Technology 2Physics & Informatics Laboratories, NTT Research 3Computer and Communication Sciences, Ecole Polytechnique F ed erale de Lausanne 4Department of Physics, Massachusetts Institute of Technology
Pseudocode No The paper describes methods and algorithms using mathematical equations and prose (e.g., Equation 6 describes the syre algorithm), but it does not contain a dedicated pseudocode block or algorithm listing with structured, step-by-step instructions formatted like code.
Open Source Code Yes An implementation of syre can be found at https://github.com/xu-yz19/syre/.
Open Datasets Yes We initialize a two-layer Re LU neural network on a low-capacity state where a fraction of the hidden neurons are identical (corresponding to the symmetric states of the permutation symmetry) and train with and without removing the symmetries. ... Res Net. ... on the CIFAR-10 datasets... In Figure 9, we train a β-VAE (Kingma & Welling, 2013; Higgins et al., 2016) on the Fashion MNIST dataset. ... We train a Resnet18 together with a two-layer projection head over the CIFAR-100 dataset... In Figure 11, we train a CNN... over the MNIST datasets.
Dataset Splits Yes For the data, we randomly permute the pixels of the training and test sets for 9 times, forming 10 different tasks (including the original MNIST).
Hardware Specification Yes In all experiments, we train the models on the CIFAR10 dataset with a signal A6000 GPU.
Software Dependencies No The paper mentions algorithms and frameworks like PPO (Schulman et al., 2017) and Pybullet's Ant problem (Coumans & Bai, 2016), but it does not specify concrete software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x').
Experiment Setup Yes We train a two-layer Re LU net in a teacher-student scenario... We choose the Adam optimizer, learning rate 0.01, and weight decay 0.01. ... For syre and weight decay, we choose weight decay from 0.1 to 10. ... For dropout, we choose a dropout rate from 0.01 to 0.6. ... We use a four-layer FCN with 300 neurons in each layer trained on the MNIST dataset with batch size 64.