Nonparametric Modern Hopfield Models

Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we validate our framework in both synthetic and realistic settings for memory retrieval and learning tasks. Code is available at Git Hub; future updates are on ar Xiv. ... We conduct numerical experiments to support our framework in Appendix G. ... G Experimental Studies
Researcher Affiliation Academia 1Department of Computer Science, Northwestern University, Evanston, IL, USA 2Department of Physics and Computer Science, National Taiwan University, Taipei, Taiwan 3Department of Statistics and Data Science, Northwestern University, Evanston, IL, USA. Correspondence to: Jerry Yao-Cheih Hu <EMAIL>, Bo-Yu Chen <EMAIL>, Dennis Wu <EMAIL>, Feng Ruan <EMAIL>, Han Liu <EMAIL>.
Pseudocode No The paper describes methods using mathematical equations and prose. It does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code Yes Empirically, we validate our framework in both synthetic and realistic settings for memory retrieval and learning tasks. Code is available at Git Hub; future updates are on ar Xiv.
Open Datasets Yes In the memory retrieval task, we examine two datasets: MNIST (sparse) and CIFAR10 (dense). ... We evaluate Dense Hopfield, Sparse Hopfield, and Top-K Hopfield models on a Multiple Instance Learning (MIL) task using MNIST bags. ... We utilize four datasets: Elephant, Fox, and Tiger for image annotation (Ilse et al., 2018), and UCSB breast cancer classification (Kandemir et al., 2014). ... We conduct the experiments on four multivariate time series real-world datasets: ETTh1 (Electricity Transformer Temperature-hourly), ETTm1 (Electricity Transformer Temperature-minutely), WTH (Weather), ECL (Electricity Consuming Load), Traffic.
Dataset Splits Yes In each fold, we utilize a stratified sampling process to partition the data into a training set and a validation set, with a split rate of 0.1.
Hardware Specification Yes All experiments are conducted on the platform with NVIDIA GEFORCE RTX 2080 Ti and INTEL XEON SILVER 4214 @ 2.20GHz.
Software Dependencies Yes We use Py Torch 1.8.0 for all experiments, and use Ray Tune for hyperparameter search.
Experiment Setup Yes All models are trained using the Adam W optimizer for 150 epochs, with a cosine annealing learning rate decay applied to all models. ... Table 2. Hyperparameter used in the MIL MNIST experiment. parameter values batch size 256 learning rate 1e-3 embedding dimension 256 number of heads 4 head dimension 64 test set size 500 train set size 2000 scaling 0.1 num of pattern 2 epochs 150