Optimization Dynamics of Equivariant and Augmented Neural Networks

Authors: Oskar Nordenfors, Fredrik Ohlsson, Axel Flinth

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform some simple numerical experiments to illustrate our findings. (Section 1 Introduction) and Section 4 is titled "Experiments" where the authors describe training networks and analyzing results.
Researcher Affiliation Academia Oskar Nordenfors EMAIL Department of Mathematics and Mathematical Statistics Umeå University; Fredrik Ohlsson EMAIL Department of Mathematics and Mathematical Statistics Umeå University; Axel Flinth EMAIL Department of Mathematics and Mathematical Statistics Umeå University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Methods are described using mathematical formulations and natural language.
Open Source Code Yes The code for the experiment was written in Python, using the Py Torch library. The code can be accessed at github.com/usinedepain/eq_aug_dyn/.
Open Datasets Yes These were trained in Equi mode for 250 epochs, manifestly invariant to the rotation action of Z4, to classify MNIST (Le Cun et al., 1998).
Dataset Splits Yes First, we train our models only on the 10000 test examples (instead of the 60000 training samples).
Hardware Specification Yes The 30 3 4 trials of the experiment were run in parallel on a super computer using NVIDIA Tesla T4 GPUs with 16GB RAM. We run each experiment 30 times on Tesla A40 GPUs situated on a cluster, resulting in about 80 hours of GPU time.
Software Dependencies No The code for the experiment was written in Python, using the Py Torch library. While software is mentioned, specific version numbers for Python or PyTorch are not provided.
Experiment Setup Yes We used SGD as the optimizer, with an MSE loss with the labels as one-hot vectors; the learning rate was set to 5 10 4 and the batch size was set to 100. ... for 50 more epochs of training with gradient descent, with a learning rate of 2.5 10 4. The images were normalized with respect to mean and standard deviation before being sent to the first layer.