reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Authors: Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our self-supervised representations by measuring how they scale with unlabelled data and generalise across datasets, subjects, and tasks. ... In all tables and figures, we quote the receiver operating characteristic area under the curve (ROC AUC) where chance is always 0.5 regardless of the class distribution. ... Table 2 shows that our approach achieves two key feats: outperforming comparable state-of-the-art self-supervised methods by 15-27% (part C), and matching the performance of prior self-supervised methods with surgical data (11) while using only non-invasive data.
Researcher Affiliation	Collaboration	1 3OHBA, University of Oxford 2Google Deep Mind. Correspondence to: <EMAIL>.
Pseudocode	No	The paper describes the network architecture and pretext tasks in detail, but does not present them in a formalized pseudocode or algorithm block.
Open Source Code	No	The paper mentions the OSL library for preprocessing, which is under the BSD-3-Clause licence, but does not explicitly state that the code for the methodology described in this paper is open-source or provide a direct link to its implementation. The URL https://pnpl.robots.ox.ac.uk/bbl is provided but is a project page, not a direct code repository.
Open Datasets	Yes	This work uses publicly available datasets from human studies (Armeni et al., 2022; Gwilliams et al., 2023; Shafto et al., 2014; Taylor et al., 2017; Schoffelen et al., 2019), each with their own ethical approvals and documentation available in their respective publications.
Dataset Splits	Yes	When training with Armeni et al. (2022), we hold out session 009 for validation and 010 for testing. Similarly, when fine-tuning with Gwilliams et al. (2023), we hold out task 1 from subjects 23, 24, 25, 26, and 27, using these sessions for evaluation only. ... For our novel subject experiments, we hold out subjects 1, 2, and 3 entirely and use the data for these subjects during evaluation. In Table 4, the hyperparameters include 'Train ratio 0.8', 'Validation ratio 0.1', 'Test ratio 0.1'.
Hardware Specification	Yes	All experiments were run on individual NVIDIA V100 and A100 GPUs with up to 40Gi B of GPU memory on a system with up to 1Ti B of RAM.
Software Dependencies	No	The paper mentions using the OSL library for preprocessing and adapting the SEANet architecture, and that the optimizer used is Adam W (Loshchilov & Hutter, 2019), but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Table 4 'Experimental hyperparameters' explicitly lists values for Window length (0.5s), ρ (phase 0.5, amplitude 0.2), weights {w1, w2, w3} {1.0, 1.0, 1.0}, dshared (512), dbackbone (512), SEANet convolution channels (512, 512, 512, 512), SEANet downsampling ratios (5, 5, 1), Fi LM conditioning dimension (16), Subject embedding dimension (16), Pre-training epochs (200), Optimizer (Adam W), Learning rate (0.000066), and data ratios (Train 0.8, Validation 0.1, Test 0.1).