Kolmogorov-Arnold Networks Still Catastrophically Forget but Differently from MLP

Authors: Anton Lee, Heitor Murilo Gomes, Yaqian Zhang, W. Bastiaan Kleijn

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our work investigates the claims that KAN avoid catastrophic forgetting, finding that they fail to do so on more complex datasets containing features that overlap between tasks. We give a simple explanation as to why and how KAN catastrophically forget. Motivated by evidence suggesting KAN are superior for symbolic regression, we augment KAN in the same ways as multilayer perceptron (MLP) to perform continual learning tasks, making special accommodations to support KAN. Our experiments found that unmodified KAN often forget more than MLP, but KAN can be better than MLP when combined with continual learning strategies.
Researcher Affiliation Academia 1School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6140, New Zealand 2University of Waikato, Hamilton 3240, New Zealand EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes The technical appendix contains algorithm pseudo code, while this section summarises how Wise KAN functions.
Open Source Code Yes Code https://github.com/tachyonicClock/AAAI25-clkan
Open Datasets Yes TI Feynman re-purpose the Feynman Symbolic Regression Database (Udrescu and Tegmark 2020) used to benchmark KAN in original paper (Liu et al. 2024) for continual learning. We turn it into a continual learning problem splitting each equation into a task. TI Europe Wind Farm (Gensler 2016; He and Sick 2021) dataset contains hourly wind power generation and day-ahead wind speed forecasts for 45 European wind farms. TI River Radar (Nick Lim 2023) dataset’s objective is to predict rain gauges and the height of a river at two locations.
Dataset Splits No The paper describes how datasets are split into 'tasks' (e.g., 'splitting each equation into a task', 'split the dataset into eight tasks, each equating to a quarter of a year'). It also mentions 'validation performance', implying a validation set is used. However, it does not provide specific percentages or methodology for standard training/validation/test splits needed to reproduce the experiment.
Hardware Specification No The paper states: 'The technical appendix includes other details of the hardware and software used.' This indicates that specific hardware details are not provided in the main text.
Software Dependencies No The paper mentions 'Our KAN used a modified version of Efficient KAN (Blealtan and Dash 2024)' and 'We implemented the hyper-parameter search with Optuna (Akiba et al. 2019) and Hydra (Yadan 2019).' While it names these tools, it does not provide specific version numbers for any software, which is required for reproducibility. It also states 'The technical appendix includes other details of the hardware and software used.'
Experiment Setup No The paper states: 'We use random search to explore hyper-parameters optimising for R2 (coefficient of determination) and parameter count. Our R2 value is an average over the course of tasks, it is consistent with D ıaz-Rodr ıguez et al. (2018) average accuracy metric but adapted for R2. Figure 6 shows the Pareto front of these results. Details on the search space are in the technical appendix.' It refers to an appendix for specific details, but no hyperparameters or system-level training settings are provided in the main text.