Improving Semantic Understanding in Speech Language Models via Brain-tuning
Authors: Omer Moussa, Dietrich Klakow, Mariya Toneva
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on semantic downstream tasks and 2) a representational space with increased semantic preference. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems 2Saarland University EMAIL EMAIL |
| Pseudocode | No | The paper describes the proposed brain-tuning approach and evaluation strategy using figures (Fig.1a, Fig.1b, Fig.1c) and descriptive text, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | We make the code available at https://github.com/bridge-ai-neuro/brain-tuning. |
| Open Datasets | Yes | We use the largest public dataset of f MRI recordings (Le Bel et al., 2024) for brain-tuning. We use standard datasets for these tasks: TIMIT (Garofolo, 1993), Crema-D (Cao et al., 2014), Speech Commands (Warden, 2018), and SLURP (Bastianelli et al., 2020). |
| Dataset Splits | Yes | The 27 f MRI stories are split into a training set (24 stories), a validation set (2 stories), and a held-out test set (1 story). The training is stopped when the validation loss saturates or begins to diverge. |
| Hardware Specification | No | The paper describes the methods, models, and experimental results, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for the experiments or training. |
| Software Dependencies | Yes | To make the normalized brain alignment comparison focused on language and primary auditory regions, we use Free Surfer v7 to project the participants data |
| Experiment Setup | Yes | We used a base learning rate of 5 10 5 and 10 4 respectively for the transformer layers and the linear projection head. Both had a linear decay scheduler for the learning rate with a warmup period for 10% of the epochs. The training is stopped when the validation loss saturates or begins to diverge. |