Remove Symmetries to Control Model Expressivity and Improve Optimization
Authors: Liu Ziyin, Yizhou Xu, Isaac Chuang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. ... We apply the method to solve a broad range of practical problems where symmetry-impaired training can be a major concern (Section 6). |
| Researcher Affiliation | Collaboration | 1Research Laboratory of Electronics, Massachusetts Institute of Technology 2Physics & Informatics Laboratories, NTT Research 3Computer and Communication Sciences, Ecole Polytechnique F ed erale de Lausanne 4Department of Physics, Massachusetts Institute of Technology |
| Pseudocode | No | The paper describes methods and algorithms using mathematical equations and prose (e.g., Equation 6 describes the syre algorithm), but it does not contain a dedicated pseudocode block or algorithm listing with structured, step-by-step instructions formatted like code. |
| Open Source Code | Yes | An implementation of syre can be found at https://github.com/xu-yz19/syre/. |
| Open Datasets | Yes | We initialize a two-layer Re LU neural network on a low-capacity state where a fraction of the hidden neurons are identical (corresponding to the symmetric states of the permutation symmetry) and train with and without removing the symmetries. ... Res Net. ... on the CIFAR-10 datasets... In Figure 9, we train a β-VAE (Kingma & Welling, 2013; Higgins et al., 2016) on the Fashion MNIST dataset. ... We train a Resnet18 together with a two-layer projection head over the CIFAR-100 dataset... In Figure 11, we train a CNN... over the MNIST datasets. |
| Dataset Splits | Yes | For the data, we randomly permute the pixels of the training and test sets for 9 times, forming 10 different tasks (including the original MNIST). |
| Hardware Specification | Yes | In all experiments, we train the models on the CIFAR10 dataset with a signal A6000 GPU. |
| Software Dependencies | No | The paper mentions algorithms and frameworks like PPO (Schulman et al., 2017) and Pybullet's Ant problem (Coumans & Bai, 2016), but it does not specify concrete software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x'). |
| Experiment Setup | Yes | We train a two-layer Re LU net in a teacher-student scenario... We choose the Adam optimizer, learning rate 0.01, and weight decay 0.01. ... For syre and weight decay, we choose weight decay from 0.1 to 10. ... For dropout, we choose a dropout rate from 0.01 to 0.6. ... We use a four-layer FCN with 300 neurons in each layer trained on the MNIST dataset with batch size 64. |