KAN: Kolmogorov–Arnold Networks

Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljacic, Thomas Hou, Max Tegmark

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. Moreover, KANs are shown to be more accurate and have faster scaling laws than MLPs in function fitting and PDE solving, both theoretically and empirically.
Researcher Affiliation Academia 1 Massachusetts Institute of Technology 2 California Institute of Technology 3 Northeastern University 4 The NSF Institute for Artificial Intelligence and Fundamental Interactions
Pseudocode No The paper describes methods textually and with equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions PyTorch as the framework used for building codes but does not explicitly state that the code for this specific work is open-sourced or provide a repository link.
Open Datasets Yes The paper mentions using specific public datasets and benchmarks: "PDEBench Takamoto et al. (2022)", "MNIST", "Feynman datasets Udrescu & Tegmark (2020); Udrescu et al. (2020)".
Dataset Splits Yes The paper provides specific dataset split information, for example: "randomly generated 1000 training and test samples from U[-1, 1]^2" for toy function fitting, and for MNIST: "The whole training dataset (60000) and test dataset (10000) are used to evaluate train/test loss/acc."
Hardware Specification Yes All models are trained with the Adam Optimizer for 15000 steps with learning rate decay (5000 steps for learning rate 10^-3, 10^-4 and 10^-5), with batch size 1024, on a V100 GPU.
Software Dependencies No The paper mentions "Codes are built based on pytorch Paszke et al. (2019)" and "Sympy is used to compute the symbolic formula", but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes All models are trained with the Adam Optimizer for 15000 steps with learning rate decay (5000 steps for learning rate 10^-3, 10^-4 and 10^-5), with batch size 1024, on a V100 GPU. For PDE solving, specific parameters are mentioned: "Adam optimizers with a learning rate 10^-3 for 1000 steps except for 10000 steps for MLP (10x training)." and "α = 0.01" for loss balancing.