SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments

Authors: Simon Dahan, Gabriel Bénédict, Logan Williams, Yourong Guo, Daniel Rueckert, Robert Leech, Emma Robinson

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach on 7T task-fMRI data from 174 healthy participants engaged in the movie-watching experiment from the Human Connectome Project (HCP). Results show that it is possible to detect which movie clips an individual is watching purely from their brain activity, even for individuals and movies not seen during training. Further analysis of attention maps reveals that our model captures individual patterns of brain activity that reflect semantic and visual systems. This opens the door to future personalised simulations of brain function. Code & pre-trained models will be made available at https://github.com/metrics-lab/sim.
Researcher Affiliation Collaboration Simon Dahan1 Gabriel B en edict2 Logan Z. J. Williams1 Yourong Guo1 Daniel Rueckert3,4,5 Robert Leech6 Emma C. Robinson1 1Research Department of Biomedical Computing, BMEIS, King s College London 2Amazon, Madrid 3Institute for AI in Medicine, Technical University of Munich 4Department of Computing, Imperial College London 5Munich Center for Machine Learning (MCML) 6Institute of Psychiatry, Psychology & Neuroscience, King s College London
Pseudocode No The paper describes the methodology using textual explanations and figures, such as Figure 1 for the overall SIM framework, but does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Implementation provided in https://github.com/metrics-lab/sim
Open Datasets Yes In this paper, stimuli and accompanying brain recordings were taken from 174 participants, aged 29.4 3.3 years (68 male and 106 female) who were scanned as part of the HCP 7T movie-watching experiment (Van Essen et al., 2013; Finn & Bandettini, 2021). Functional MRI data were downloaded from the Movie Task fMRI 1.6mm/59k FIX-Denoised package available for download at https://db.humanconnectome.org/.
Dataset Splits Yes Subjects were partitioned into train/validation/test splits of size 124/25/25, while stratifying sex and age distribution across splits, with fMRI from left and right hemispheres treated as independent samples but placed in the same split. This corresponds to 992 training, 200 validation and 200 testing samples.
Hardware Specification Yes All experiments were run on 4 NVIDIA V100 GPUs (32 GB of memory).
Software Dependencies No The paper mentions several software components and models used (e.g., torchaudio library, opencv, wav2vec2.0, VDVAE model, Versatile Diffusion model, Adam W optimizer, Dei T-small, Vi T), but it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes For all training phases (vs MAE pre-training and tri-modal CLIP alignment), the Adam W (Loshchilov & Hutter, 2019) optimisation was used with LR = 3e 4 and cosine decay. Distributed training was implemented in all experiments with a batch size of 64 (per GPU) for the vs MAE pre-training task. For the tri-modal CLIP training, batch size was maximised across all GPU instances to 256 by implementing aggregation across instances. ... Following results in Appendix C.7, we use a masking ratio of ρ = 50% in all vs MAE pre-training. ... applying 50 DDIM steps with a strength of 0.75.