reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Geodesic Optimization for Predictive Shift Adaptation on EEG data

Authors: Apolline Mellot, Antoine Collas, Sylvain Chevallier, Alex Gramfort, Denis A. Engemann

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed empirical benchmarks on the cross-site generalization of age-prediction models with resting-state EEG data from a large multi-national dataset (Har MNq EEG), which included 14 recording sites and more than 1500 human participants. Compared to state-of-the-art methods, our results showed that GOPSA achieved significantly higher performance on three regression metrics (R2, MAE, and Spearman s ρ) for several source-target site combinations, highlighting its effectiveness in tackling multi-source DA with predictive shifts in EEG data analysis.
Researcher Affiliation	Collaboration	Apolline Mellot , Antoine Collas Inria, CEA, Université Paris-Saclay Palaiseau, France EMAIL EMAIL Sylvain Chevallier TAU Inria, LISN-CNRS, University Paris-Saclay, France. sylvain.chevallier@ universite-paris-saclay.fr Alexandre Gramfort Inria, CEA, Université Paris-Saclay Palaiseau, France EMAIL Denis A. Engemann Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center Basel, F. Hoffmann La Roche Ltd., Basel, Switzerland. EMAIL
Pseudocode	Yes	Algorithm 1: Train-Time GOPSA; Algorithm 2: Test-Time GOPSA
Open Source Code	Yes	The dataset Har MNq EEG [ 33 ] is in open access. We provide the code to reproduce the experiments from the raw data.
Open Datasets	Yes	The Har MNq EEG dataset [ 33 ] was used for our numerical experiments. This dataset includes EEG recordings collected from 1564 participants across 14 different study sites, distributed across 9 countries. In our analysis, we consider each study site as a distinct domain.
Dataset Splits	Yes	For each source-target combination we performed a stratified shuffle split approach with 100 repetitions on the target data. Stratification was based on the recording sites to ensure that each split contained a balanced proportion of participants from each site. The regularization parameter λ in Ridge regression was selected with a nested cross-validation (grid search) over a logarithmic grid of values from 10 1 to 105. To evaluate the benefit of GOPSA, we compared it against four baselines.
Hardware Specification	Yes	Experiments with 100 repetitions and all site combinations have been run on a standard Slurm cluster for 12 hours with 250 CPU cores.
Software Dependencies	Yes	Numerical computation was enabled by the scientific Python ecosystem: Matplotlib [ 27 ], Scikit-learn [ 42 ], Numpy [ 21 ], Scipy [ 54 ], Py Torch [ 41 ] Py Riemann [ 3 ], MNE [ 19 ] and SKADA [ 18 ]. Specifically, Py Riemann [ 3 ] is cited as "v0.3, July 2022" and SKADA [ 18 ] as "7 2024".
Experiment Setup	Yes	The regularization parameter λ in Ridge regression was selected with a nested cross-validation (grid search) over a logarithmic grid of values from 10 1 to 105. In practice, we use L-BFGS and obtain the gradient using automatic differentiation through the Ridge solution that is plugged into the loss in ( 8 ).