Improving Semantic Understanding in Speech Language Models via Brain-tuning

Authors: Omer Moussa, Dietrich Klakow, Mariya Toneva

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on semantic downstream tasks and 2) a representational space with increased semantic preference.
Researcher Affiliation Academia 1Max Planck Institute for Software Systems 2Saarland University EMAIL EMAIL
Pseudocode No The paper describes the proposed brain-tuning approach and evaluation strategy using figures (Fig.1a, Fig.1b, Fig.1c) and descriptive text, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes We make the code available at https://github.com/bridge-ai-neuro/brain-tuning.
Open Datasets Yes We use the largest public dataset of f MRI recordings (Le Bel et al., 2024) for brain-tuning. We use standard datasets for these tasks: TIMIT (Garofolo, 1993), Crema-D (Cao et al., 2014), Speech Commands (Warden, 2018), and SLURP (Bastianelli et al., 2020).
Dataset Splits Yes The 27 f MRI stories are split into a training set (24 stories), a validation set (2 stories), and a held-out test set (1 story). The training is stopped when the validation loss saturates or begins to diverge.
Hardware Specification No The paper describes the methods, models, and experimental results, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for the experiments or training.
Software Dependencies Yes To make the normalized brain alignment comparison focused on language and primary auditory regions, we use Free Surfer v7 to project the participants data
Experiment Setup Yes We used a base learning rate of 5 10 5 and 10 4 respectively for the transformer layers and the linear projection head. Both had a linear decay scheduler for the learning rate with a warmup period for 10% of the epochs. The training is stopped when the validation loss saturates or begins to diverge.