reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Are Large Brainwave Foundation Models Capable Yet ? Insights from Fine-Tuning

Authors: Na Lee, Konstantinos Barmpas, Yannis Panagakis, Dimitrios Adamos, Nikolaos Laskaris, Stefanos Zafeiriou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we comprehensively evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments across multiple Brain-Computer Interface (BCI) benchmark tasks, including memory tasks and sleep stage classification. Our extensive analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures while requiring significantly more parameters (millions vs thousands), raising important questions about their efficiency and applicability in BCI contexts. Moreover, through detailed ablation studies and Low-Rank Adaptation (Lo RA), we significantly reduce trainable parameters without performance degradation, while demonstrating that architectural and training inefficiencies limit LBMs current capabilities. Our experiments span both full model fine-tuning and parameter-efficient adaptation techniques, providing insights into optimal training strategies for BCI applications.
Researcher Affiliation	Academia	1Imperial College London 2Cogitat 3Archimedes / Athena Research Unit 4National and Kapodistrian University of Athens 5Aristotle University of Thessaloniki. Correspondence to: Na Lee <EMAIL>. All listed institutions are universities or research units typically associated with academia, and the provided email address uses an academic domain (.ac.uk).
Pseudocode	No	The paper describes methods and processes like Low-Rank Adaptation (Lo RA) mathematically and textually, but it does not include any clearly labeled pseudocode blocks or algorithm sections.
Open Source Code	No	The paper does not contain an unambiguous statement or a direct link indicating that the authors have released the source code for the methodology described in this paper.
Open Datasets	Yes	All models were evaluated in downstream classification tasks for the following five benchmark EEG datasets (Lee et al., 2025): Motor paradigm in High Gamma (Schirrmeister et al., 2017), the ERP (Event-Related Potential) paradigm from Korean University (Hong-Kyung et al., 2019), a Working Memory dataset (Pavlov et al., 2022), Physionet s sleep staging dataset, Sleep-EDF (Kemp et al., 2000) and Eyes Open vs Closed classification on the Physionet Motor dataset (Schalk et al., 2004).
Dataset Splits	Yes	Each configuration was trained for 20 epochs (to avoid overfitting) and evaluated using 10-fold subject-independent cross-validation, where samples were split on a subject level such that no participant would be present in both the training and validation sets.
Hardware Specification	No	The paper describes experiments and model evaluations but does not provide specific details about the hardware (e.g., GPU models, CPU specifications, memory) used to run these experiments.
Software Dependencies	No	The paper refers to various models and techniques such as EEGNet, Neuro GPT, La Bra M, and Lo RA, but it does not specify any version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup	Yes	Each configuration was trained for 20 epochs (to avoid overfitting) and evaluated using 10-fold subject-independent cross-validation, where samples were split on a subject level such that no participant would be present in both the training and validation sets. ... In all experiments, the scaling factor α (as described in (Hu et al., 2021)) is set to 8. ... we select a relatively high dropout of 0.5.