reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping

Authors: Sam Gijsen, Kerstin Ritter

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our multimodal models significantly improve over EEG-only models across four clinical evaluations and for the first time enable zero-shot classification as well as retrieval of both neural signals and reports.
Researcher Affiliation	Academia	1Charit e Universit atsmedizin Berlin, Department of Psychiatry and Psychotherapy, Berlin, Germany 2Hertie Institute for AI in Brain Health, University of T ubingen, Germany. Correspondence to: Sam Gijsen <EMAIL>.
Pseudocode	No	The paper describes its methodology in prose and mathematical equations (e.g., Equations 1-13) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We provide code and pretrained models at https://github.com/Sam Gijsen/ELM.
Open Datasets	Yes	TUEG. The Temple University Hospital (TUH) EEG Corpus is the largest available corpus of hospital EEG data with varying montages, channel counts, and sampling frequencies (n=26846 (Obeid & Picone, 2016)). ... The data used in this study was provided by the Neural Engineering Data Consortium at Temple University. For further details about this data, please access the following URL: https://isip.piconepress.com/ projects/tuh_eeg/html/.
Dataset Splits	Yes	TUAB. ... Following the literature, we use the provided evaluation set as the hold-out test set. NMT. ... We use the provided train/test split. TUSZ. ... We perform binary classification using 5-fold cross validation on the provided train and dev sets (n=6491), while testing on the eval set. TUEV. ... We only use the provided train set (5-fold CV) ... For linear evaluation, we train logistic linear regression models using 10-fold cross validation for each pretrained model...
Hardware Specification	Yes	Models were trained on either an Nvidia Geforce GTX 3090 or Tesla V100 GPU and require less than 24GB of memory.
Software Dependencies	Yes	We used CUDA v11.3 and Py Torch v1.12.1. For linear evaluation, we train logistic linear regression models using sklearn (Pedregosa et al., 2011). EEG data received minimal preprocessing (using MNE (Gramfort et al., 2013)).
Experiment Setup	Yes	All models are pretrained using the LARS optimizer (You et al., 2017) with a cosine decay learning rate schedule over 50 epochs, with a warm-up of 4 epochs. The base learning rate is set to 0.3 for EEG-only, 0.01 for ELMs, and 0.06 for ELM-MIL, scaled with the batch size (Base LR Batch Size/256; (Grill et al., 2020)). ... We use a weight-decay parameter of 1 10 4. ... We set the temperature parameter τ to 0.3 for all further analyses. ... For the supervised learning baseline, we use the identical EEG encoder backbone as used for all other analyses and use 60 second crops. ... The ADAM learning rate is set to 0.001 and we use the validation set to select weight decay out of [0.1, 0.01, 0.0001]. We use a batch size of 256 and train using the cross entropy loss.