Multi-modal brain encoding models for multi-modal stimuli
Authors: SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy, Maneeesh Singh, Manish Gupta, Raju Surampudi Bapi
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate this question by using multiple unimodal and two types of multi-modal models cross-modal and jointly pretrained to determine which type of model is more relevant to f MRI brain activity when participants are engaged in watching movies (videos with audio). We observe that both types of multi-modal models show improved alignment in several language and visual regions. This study also helps in identifying which brain regions process unimodal versus multi-modal information. We further investigate the contribution of each modality to multi-modal alignment by carefully removing unimodal features one by one from multi-modal representations, and find that there is additional information beyond the unimodal embeddings that is processed in the visual and language regions. |
| Researcher Affiliation | Collaboration | Subba Reddy Oota1 , Khushbu Pahwa2 , Mounika Marreddy3, Maneesh Singh4 Manish Gupta5, Bapi S. Raju6 1Technische Universität Berlin, Germany, 2Rice Univ, USA, 3Univ of Bonn, Germany 4Spector Inc, USA, 5Microsoft, India, 6IIIT Hyderabad, India EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods like 'bootstrap ridge regression' and 'residual approach' but presents them descriptively or with diagrams (Fig. 1B) rather than in structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make the code publicly available1. 1https://github.com/subbareddy248/multi-modal-brain-stimuli |
| Open Datasets | Yes | We experiment with a multi-modal naturalistic f MRI dataset, Movie10 (St Laurent et al., 2023) obtained from the Courtois Neuro Mod databank. ... We used the Movie10 dataset which is publicly available without any restrictions. Movie10 dataset can be downloaded from https://github.com/courtois-neuromod/movie10/tree/33a97c01503315e5e09b3ac07c6ccadb8b887dcf. |
| Dataset Splits | Yes | independent encoding models are trained for each subject using data concatenated from two movies (The Bourne supremacy: 4024 TRs and The wolf of wall street: 6898 TRs). The test set consisted only data from the Life movie (2028 TRs). |
| Hardware Specification | Yes | All experiments were conducted on a machine with 1 NVIDIA GeForce-GTX GPU with 16GB GPU RAM. |
| Software Dependencies | No | The paper mentions using specific pretrained Transformer models from Huggingface but does not provide specific version numbers for the overall software environment, libraries, or dependencies used for the experiments. |
| Experiment Setup | Yes | We used bootstrap ridge-regression (Appendix I) with MSE loss function; L2-decay (λ) varied from 101 to 103. Best λ was chosen by tuning on validation data that comprised a randomly chosen 10% subset from train set used only for hyper-parameter tuning. |