Multi-modal brain encoding models for multi-modal stimuli

Authors: SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy, Maneeesh Singh, Manish Gupta, Raju Surampudi Bapi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate this question by using multiple unimodal and two types of multi-modal models cross-modal and jointly pretrained to determine which type of model is more relevant to f MRI brain activity when participants are engaged in watching movies (videos with audio). We observe that both types of multi-modal models show improved alignment in several language and visual regions. This study also helps in identifying which brain regions process unimodal versus multi-modal information. We further investigate the contribution of each modality to multi-modal alignment by carefully removing unimodal features one by one from multi-modal representations, and find that there is additional information beyond the unimodal embeddings that is processed in the visual and language regions.
Researcher Affiliation Collaboration Subba Reddy Oota1 , Khushbu Pahwa2 , Mounika Marreddy3, Maneesh Singh4 Manish Gupta5, Bapi S. Raju6 1Technische Universität Berlin, Germany, 2Rice Univ, USA, 3Univ of Bonn, Germany 4Spector Inc, USA, 5Microsoft, India, 6IIIT Hyderabad, India EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods like 'bootstrap ridge regression' and 'residual approach' but presents them descriptively or with diagrams (Fig. 1B) rather than in structured pseudocode or algorithm blocks.
Open Source Code Yes We make the code publicly available1. 1https://github.com/subbareddy248/multi-modal-brain-stimuli
Open Datasets Yes We experiment with a multi-modal naturalistic f MRI dataset, Movie10 (St Laurent et al., 2023) obtained from the Courtois Neuro Mod databank. ... We used the Movie10 dataset which is publicly available without any restrictions. Movie10 dataset can be downloaded from https://github.com/courtois-neuromod/movie10/tree/33a97c01503315e5e09b3ac07c6ccadb8b887dcf.
Dataset Splits Yes independent encoding models are trained for each subject using data concatenated from two movies (The Bourne supremacy: 4024 TRs and The wolf of wall street: 6898 TRs). The test set consisted only data from the Life movie (2028 TRs).
Hardware Specification Yes All experiments were conducted on a machine with 1 NVIDIA GeForce-GTX GPU with 16GB GPU RAM.
Software Dependencies No The paper mentions using specific pretrained Transformer models from Huggingface but does not provide specific version numbers for the overall software environment, libraries, or dependencies used for the experiments.
Experiment Setup Yes We used bootstrap ridge-regression (Appendix I) with MSE loss function; L2-decay (λ) varied from 101 to 103. Best λ was chosen by tuning on validation data that comprised a randomly chosen 10% subset from train set used only for hyper-parameter tuning.