Plastic Learning with Deep Fourier Features

Authors: Alex Lewandowski, Dale Schuurmans, Marlos C. Machado

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results show that continual learning performance can be improved by replacing Re LU activations with deep Fourier features combined with regularization. These results hold for different continual learning scenarios (e.g., label noise, class incremental learning, pixel permutations) on all major supervised learning datasets used for continual learning research, such as CIFAR10, CIFAR100, and tiny-Image Net.
Researcher Affiliation Collaboration Alex Lewandowski1,2 Dale Schuurmans1,2,3,4 Marlos C. Machado1,2,4 1Deparment of Computing Science, University of Alberta, 2Amii, 3Google Deep Mind, 4Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Deep Fourier Feature Layer
Open Source Code No The paper does not explicitly provide information about open-source code availability for the methodology described.
Open Datasets Yes Our experiments use the common image classification datasets for continual learning, namely tiny-Image Net (Le and Yang, 2015), CIFAR10, and CIFAR100 (Krizhevsky, 2009). ... For MNIST, Fashion MNIST and EMNIST: we use a random sample of 25600 of the observations and a batch size of 256 (unless otherwise indicated, such as the linearly separable experiment).
Dataset Splits Yes For CIFAR10 and CIFAR1100: Full 50000 images for training, 1000 test images for validation, rest for testing. The batch size used was 250. ... tiny-Image Net: All 100000 images for training, 10000 for validation, 10000 for testing as per predetermined split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using the Adam optimizer but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The optimiser used for all experiments was Adam, and after a sweep on each of the datasets over [0.005, 0.001, 0.0005], we found that α = 0.0005 was most performant. ... The batch size used was 250. Labelnoise non-stationarity: 60 epochs, 10 tasks. ... tiny-Image Net: All 100000 images for training... 80 epochs per task, 10 tasks total. Class incremental learning: 10000 iterations per task, 80 tasks.