The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Authors: Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our self-supervised representations by measuring how they scale with unlabelled data and generalise across datasets, subjects, and tasks. ... In all tables and figures, we quote the receiver operating characteristic area under the curve (ROC AUC) where chance is always 0.5 regardless of the class distribution. ... Table 2 shows that our approach achieves two key feats: outperforming comparable state-of-the-art self-supervised methods by 15-27% (part C), and matching the performance of prior self-supervised methods with surgical data (11) while using only non-invasive data.
Researcher Affiliation Collaboration 1 3OHBA, University of Oxford 2Google Deep Mind. Correspondence to: <EMAIL>.
Pseudocode No The paper describes the network architecture and pretext tasks in detail, but does not present them in a formalized pseudocode or algorithm block.
Open Source Code No The paper mentions the OSL library for preprocessing, which is under the BSD-3-Clause licence, but does not explicitly state that the code for the methodology described in this paper is open-source or provide a direct link to its implementation. The URL https://pnpl.robots.ox.ac.uk/bbl is provided but is a project page, not a direct code repository.
Open Datasets Yes This work uses publicly available datasets from human studies (Armeni et al., 2022; Gwilliams et al., 2023; Shafto et al., 2014; Taylor et al., 2017; Schoffelen et al., 2019), each with their own ethical approvals and documentation available in their respective publications.
Dataset Splits Yes When training with Armeni et al. (2022), we hold out session 009 for validation and 010 for testing. Similarly, when fine-tuning with Gwilliams et al. (2023), we hold out task 1 from subjects 23, 24, 25, 26, and 27, using these sessions for evaluation only. ... For our novel subject experiments, we hold out subjects 1, 2, and 3 entirely and use the data for these subjects during evaluation. In Table 4, the hyperparameters include 'Train ratio 0.8', 'Validation ratio 0.1', 'Test ratio 0.1'.
Hardware Specification Yes All experiments were run on individual NVIDIA V100 and A100 GPUs with up to 40Gi B of GPU memory on a system with up to 1Ti B of RAM.
Software Dependencies No The paper mentions using the OSL library for preprocessing and adapting the SEANet architecture, and that the optimizer used is Adam W (Loshchilov & Hutter, 2019), but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Table 4 'Experimental hyperparameters' explicitly lists values for Window length (0.5s), ρ (phase 0.5, amplitude 0.2), weights {w1, w2, w3} {1.0, 1.0, 1.0}, dshared (512), dbackbone (512), SEANet convolution channels (512, 512, 512, 512), SEANet downsampling ratios (5, 5, 1), Fi LM conditioning dimension (16), Subject embedding dimension (16), Pre-training epochs (200), Optimizer (Adam W), Learning rate (0.000066), and data ratios (Train 0.8, Validation 0.1, Test 0.1).