Improving Memory Efficiency for Training KANs via Meta Learning

Authors: Zhangchi Zhao, Jun Shu, Deyu Meng, Zongben Xu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on diverse benchmark tasks, including symbolic regression, partial differential equation solving, and image classification, demonstrate the effectiveness of Meta KANs in improving parameter efficiency and memory usage.
Researcher Affiliation Academia 1School of Mathematics and Statistics, Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi an Jiaotong University 2Pengcheng Laboratory. Correspondence to: Jun Shu <EMAIL>.
Pseudocode Yes Algorithm 1 The Meta KANs Algo. for Shallow KANs, Algorithm 2 Clusters Determination Algo. for Deep KANs, Algorithm 3 The Meta KANs Algo. for Deep KANs
Open Source Code Yes Our code is available at https: //github.com/Murphyzc/Meta KAN.
Open Datasets Yes We conducted a function-fitting task on the Feynman dataset (Udrescu & Tegmark, 2020)... The convolutional architecture comparisons (shown in Table 3) reveal significant parameter efficiency gains while maintaining competitive accuracy across datasets. For 4layer models, Meta KANConv achieves 45.86% accuracy on CIFAR-10... Table 5. Classification accuracies and parameter counts for KAN and Meta KAN models across different settings. DATASET... SVHN FMNIST KMNIST MNIST CIFAR-10 CIFAR-100
Dataset Splits Yes During training, the same number of initial points, boundary points, and interior points are used. Following (Zeng et al., 2022), Ni is set to 2000, 4000, 8000, 12000 for different dimensions d {20, 50, 100}, while Nb is set to 100 points per boundary, resulting in a total of 100d boundary points.
Hardware Specification No No specific hardware details (like GPU/CPU models or memory) are provided in the paper. The paper mentions "peak GPU memory consumption" but does not specify the GPU hardware.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes In the experiments, we employed the LBFGS optimizer with an initial learning rate set to 1, consistent with the original paper. Additionally, to better balance fitting performance and model complexity, the hidden layer nodes of the meta-learner of Meta KANs were set to 32, 64. The number of grid points in set to G = 5, 20. The optimization strategy employs three Adam W optimizers: learnable prompts (η = 10 4), meta-learner (η = 10 3), and main network (η = 10 4). Training details include random horizontal flipping and cropping for CIFAR datasets, exponential decay learning rate scheduling, and Dropout (p = 0.2) regularization.