Optimization Dynamics of Equivariant and Augmented Neural Networks
Authors: Oskar Nordenfors, Fredrik Ohlsson, Axel Flinth
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform some simple numerical experiments to illustrate our findings. (Section 1 Introduction) and Section 4 is titled "Experiments" where the authors describe training networks and analyzing results. |
| Researcher Affiliation | Academia | Oskar Nordenfors EMAIL Department of Mathematics and Mathematical Statistics Umeå University; Fredrik Ohlsson EMAIL Department of Mathematics and Mathematical Statistics Umeå University; Axel Flinth EMAIL Department of Mathematics and Mathematical Statistics Umeå University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Methods are described using mathematical formulations and natural language. |
| Open Source Code | Yes | The code for the experiment was written in Python, using the Py Torch library. The code can be accessed at github.com/usinedepain/eq_aug_dyn/. |
| Open Datasets | Yes | These were trained in Equi mode for 250 epochs, manifestly invariant to the rotation action of Z4, to classify MNIST (Le Cun et al., 1998). |
| Dataset Splits | Yes | First, we train our models only on the 10000 test examples (instead of the 60000 training samples). |
| Hardware Specification | Yes | The 30 3 4 trials of the experiment were run in parallel on a super computer using NVIDIA Tesla T4 GPUs with 16GB RAM. We run each experiment 30 times on Tesla A40 GPUs situated on a cluster, resulting in about 80 hours of GPU time. |
| Software Dependencies | No | The code for the experiment was written in Python, using the Py Torch library. While software is mentioned, specific version numbers for Python or PyTorch are not provided. |
| Experiment Setup | Yes | We used SGD as the optimizer, with an MSE loss with the labels as one-hot vectors; the learning rate was set to 5 10 4 and the batch size was set to 100. ... for 50 more epochs of training with gradient descent, with a learning rate of 2.5 10 4. The images were normalized with respect to mean and standard deviation before being sent to the first layer. |