reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Authors: Omer Moussa, Dietrich Klakow, Mariya Toneva

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on semantic downstream tasks and 2) a representational space with increased semantic preference.
Researcher Affiliation	Academia	1Max Planck Institute for Software Systems 2Saarland University EMAIL EMAIL
Pseudocode	No	The paper describes the proposed brain-tuning approach and evaluation strategy using figures (Fig.1a, Fig.1b, Fig.1c) and descriptive text, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We make the code available at https://github.com/bridge-ai-neuro/brain-tuning.
Open Datasets	Yes	We use the largest public dataset of f MRI recordings (Le Bel et al., 2024) for brain-tuning. We use standard datasets for these tasks: TIMIT (Garofolo, 1993), Crema-D (Cao et al., 2014), Speech Commands (Warden, 2018), and SLURP (Bastianelli et al., 2020).
Dataset Splits	Yes	The 27 f MRI stories are split into a training set (24 stories), a validation set (2 stories), and a held-out test set (1 story). The training is stopped when the validation loss saturates or begins to diverge.
Hardware Specification	No	The paper describes the methods, models, and experimental results, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for the experiments or training.
Software Dependencies	Yes	To make the normalized brain alignment comparison focused on language and primary auditory regions, we use Free Surfer v7 to project the participants data
Experiment Setup	Yes	We used a base learning rate of 5 10 5 and 10 4 respectively for the transformer layers and the linear projection head. Both had a linear decay scheduler for the learning rate with a warmup period for 10% of the epochs. The training is stopped when the validation loss saturates or begins to diverge.