Improving Equivariant Networks with Probabilistic Symmetry Breaking
Authors: Hannah Lawrence, Vasco Portilheiro, Yan Zhang, Sékou-Oumar Kaba
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Sym PE significantly improves performance of group-equivariant and graph neural networks across diffusion models for graphs, graph autoencoders, and lattice spin system modeling. 8 EXPERIMENTS We evaluate Sym PE empirically on three tasks: autoencoding graphs with EGNN (Section 8.1), graph generation with the Di Gress diffusion process (Section 8.2), and predicting ground states of Ising models with G-CNNs (Section 8.3). In all cases, we find that Sym PE outperforms baselines, both without symmetry-breaking and with other methods for symmetry-breaking. |
| Researcher Affiliation | Collaboration | Hannah Lawrence1 , Vasco Portilheiro2 , Yan Zhang3,4, S ekou-Oumar Kaba4,5 1Department of Electrical Engineering and Computer Science, MIT, 2Gatsby Computational Neuroscience Unit, UCL, 3Samsung SAIT AI Lab, Montreal, 4Mila Quebec Artfcial Intelligence Institute, 5Mc Gill University |
| Pseudocode | Yes | Algorithm 1 Sym PE: Symmetry-Breaking Positional Encodings 1: Inputs: input x X 2: Learnable parameters: learned vector v V, equivariant neural network parameters θ, canonicalization parameters ϕ 3: Sample g hϕ (x) Sample group element for canonicalization 4: v gv Apply group element to learned vector 5: Return fθ(x v) Forward pass with positional encoding |
| Open Source Code | Yes | Code available at: https://github.com/hannahlawrence/symm-breaking |
| Open Datasets | Yes | We evaluate combining our method with Di Gress on the QM9 (Wu et al., 2017) and MOSES (Polykovskiy et al., 2020) datasets. |
| Dataset Splits | Yes | We define a training set of Hamiltonian parameters as Jx = 1 and sampling Jy Unif ( 3, 3), h Unif (0, 2), with parameters constant over the lattice. The training set is of size 1024. For the validation we also use 1024 samples. We consider two test sets: one in-distribution (ID) test set of 10, 000 regularly sampled Hamiltonian parameter values in the same range. We also consider an out-of-distribution (OOD) test set, which is the ID test set, augmented with rotations of the Hamiltonians parameters (J, h). |
| Hardware Specification | Yes | We see that the computational overhead of Sym PE is small compared to the vanilla G-CNN. Table 3: Test set energies of predicted configurations Method Energy (ID) Energy (OOD) Parameters Forward time (s) Random configurations 0.0 0.0 - MLP -1.22 -0.93 3.2M 8 10 3 MLP + aug. -1.24 -1.24 3.2M 8 10 3 G-CNN -0.69 -0.69 397K 0.47 G-CNN + noise -1.28 -1.28 399K 0.49 Relaxed group convolution -1.49 -1.42 463K 0.47 G-CNN + Sym PE (Ours) -1.46 -1.47 468K 0.52 Ground truth -1.60 -1.60 - We also compare the parameter count and forward time (on Nvidia Quadro RTX 8000 GPUs with of the different models. |
| Software Dependencies | No | No specific software versions are mentioned. The paper mentions using 'networkx' and 'Equi Adapt library (Mondal et al., 2023)' but without specific version numbers. |
| Experiment Setup | Yes | In our experiments, we followed the training hyperparameters of Satorras et al. (2021), but trained for fewer epochs (50) using their erdosrenyinodes 0.25 none dataset and the AE architecture. Moreover, we follow the setup of Satorras et al. (2021) and break symmetries at the input, but later compare to breaking symmetries of the embedding (post-encoding) as well. Unsupervised training is performed by having the neural network ϕ : J [0, 1]Λ output the probability that each spin is up by applying a softmax on the last layer; symmetry breaking elements are sampled through a canonicalization given by a G-CNN as in Kaba et al. (2023). When there is a tie in the canonicalization, an element of the argmax set is chosen randomly. We use the Equi Adapt library (Mondal et al., 2023) to implement the canonicalization. While training, we then compute the expectation value of the spin at each site and treat this as the configuration, using the Hamiltonian as a loss function. At evaluation, we sample a spin value for each site using the probability output from the neural network. We define a training set of Hamiltonian parameters as Jx = 1 and sampling Jy Unif ( 3, 3), h Unif (0, 2), with parameters constant over the lattice. The training set is of size 1024. For the validation we also use 1024 samples. We used the QM9 dataset of small molecules, incorporating explicit hydrogen atoms, for evaluating the validity and uniqueness of generated molecular graphs along with the negative log-likelihood of the test set. For the MOSES dataset we report the negative log-likelihood on the withheld test set, not on the scaffold test set. The graphs were preprocessed similarly to the standard setup in Di Gress, where node features represent atom types and edge features represent bond types. We incorporated spectral features into the network to improve expressivity, following the methodology outlined in the original paper. The positional encodings introduced by Sym PE were concatenated to node and edge features during the diffusion process. |