Enforcing Idempotency in Neural Networks

Authors: Nikolaj Banke Jensen, Jamie Vicary

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results for MLP and CNN-based architectures with significant improvement in idempotent error over the canonical gradient-based approach. Finally, we demonstrate practical applications of the method as we train generative networks on MNIST and Celeb A successfully using only a simple reconstruction loss paired with our method. ... In Section 3, we present experimental data for a variety of fully-connected network architectures, showing that our method outperforms ordinary backpropagation under varied conditions.
Researcher Affiliation Academia 1Department of Computer Science, University of Oxford, Oxford, UK. 2Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. Correspondence to: Nikolaj Banke Jensen <EMAIL>, Jamie Vicary <EMAIL>.
Pseudocode Yes In practice, the definition (15) can be implemented in common machine learning frameworks, such as Jax and Py Torch as a user-defined automatic differentiation rule (see Appendix C). ... Algorithm 1 Modified Backpropagation Py Torch rule.
Open Source Code No The paper describes how the method can be implemented using PyTorch custom autograd functions (Algorithm 1 in Appendix C) but does not provide an explicit statement of code release or a link to a repository for the specific implementation used in the paper's experiments.
Open Datasets Yes We train generative networks on MNIST and Celeb A successfully using only a simple reconstruction loss paired with our method. ... The dataset used for training in this section is drawn from a normal distribution with mean 0 and standard deviation 1. ... Figure 8 shows qualitative examples of noise drawn from D being mapped to images resembling samples from the MNIST and Celeb A datasets.
Dataset Splits No The paper mentions that the synthetic dataset is "sampled i.i.d. at each epoch during training" and generative models are "trained... on MNIST and Celeb A datasets" but does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions "Jax and Py Torch" as machine learning frameworks where the method can be implemented but does not specify exact version numbers for these or any other software dependencies.
Experiment Setup Yes The optimizer used is SGD. ... a batch size of 1000 is used ... trained for 2 500 epochs. ... Table 3. Training parameters. Optimizer Adam(lr = 1.0 10 4, β1 = 0.5, β2 = 0.999) Dropout probability 0.05 Batch Size 512 Epochs 100 Weight initialization Default Kaiming Uniform initialization: U(-sqrt(1/k), sqrt(1/k)), for k = sqrt(1/n) for n features.