Self-Attention-Based Contextual Modulation Improves Neural System Identification

Authors: Isaac Lin, Tianye Wang, Shang Gao, Tang Shiming, Tai Lee

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we demonstrate that adding a simple self-attention layer to a CNN can improve neural response prediction of macaque V1 neurons in two performance metrics: overall tuning correlation and prediction of the tuning peaks. To understand the mechanism driving improvement, we assessed the three contextual modulation mechanisms convolutions, self-attention, and a fully connected readout layer. We obtained a dataset of neuronal responses measured using two-photon imaging with GCa MP5 from two awake behaving macaque monkeys... We compared the performance of the ff+sa-CNN model to the parameter-matched baseline ff-CNN model and found that incorporating self-attention significantly improved correlation and both peak tuning metrics (see first two rows of Table 1).
Researcher Affiliation Academia Isaac Lin1, , Tianye Wang2, Shang Gao1,3, Shiming Tang2, Tai Sing Lee1, 1Carnegie Mellon University, 2Peking University, 3Massachusetts Institute of Technology
Pseudocode No The paper describes methods in text and uses diagrams for model architectures (e.g., Figure 2) but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes A.15 CODE FOR EXPERIMENTS The code is hosted at the github repository: https://github.com/lucanren/sacnn
Open Datasets Yes We obtained a dataset of neuronal responses measured using two-photon imaging with GCa MP5 from two awake behaving macaque monkeys... in response to 34k and 49k natural images extracted from the Image Net dataset.
Dataset Splits Yes The 30k-50k images in the training set were presented once, and the 1000 images in the validation set were tested once with 10 repeats.
Hardware Specification Yes Training and computations were performed on an in-house computing cluster with GPU (NVIDIA V100 or similar) nodes.
Software Dependencies No The paper mentions 'optimizer = Adam' and 'loss = MSE' but does not specify programming language versions or library versions (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We list key training hyperparameters here: (1) batch size = 50, (2) learning rate = 0.001, (3) optimizer = Adam, (4) loss = MSE, (5) epochs = 50.