Toward Generalizing Visual Brain Decoding to Unseen Subjects

Authors: Xiangtao Kong, Kexin Huang, Ping Li, Lei Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A series of experiments are conducted and we have the following findings. First, the network exhibits clear generalization capabilities with the increase of training subjects. Second, the generalization capability is common to popular network architectures (MLP, CNN and Transformer). Third, the generalization performance is affected by the similarity between subjects. Our findings reveal the inherent similarities in brain activities across individuals.
Researcher Affiliation Academia 1The Hong Kong Polytechnic University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes a rank-based method to calculate subject similarity in Section 3.3, but it is presented in descriptive text and mathematical formulas rather than a structured pseudocode or algorithm block.
Open Source Code Yes Codes and models can be found at https://github.com/Xiangtaokong/TGBD.
Open Datasets Yes We build this dataset using the data from the Human Connectome Project (HCP) (Van Essen et al., 2013), which contains human brain neuroimages for various tasks. ... Compared to the commonly used datasets like NSD (8 subjects) (Allen et al., 2022) and BOLD5000 (4 subjects) (Chang et al., 2019), this dataset enables us to explore brain decoding performance with a much larger number of subjects.
Dataset Splits Yes The HCP dataset we consolidated includes 177 subjects, each subject having 3,127 image-f MRI pairs. We randomly choose 100 images and the corresponding f MRI voxels as the test pairs, and use the rest as the training pairs. Note that the test pairs of all subjects are from the same 100 images. Subjs 1-10 are designated as unseen subjects, with the remaining 167 subjects as seen subjects. In our experiments, several models will be trained on different numbers of subjects. For convenience of expression, we define one training epoch based on the number of image-f MRI pairs of a single subject; that is, one epoch contains 3,027 image-f MRI pairs. ... For the experiment on the NSD dataset, which includes 8 subjects, we follow the standard train/test split with 1,000 test images (Allen et al., 2022), and select Subj 2 and Subj 5 as unseen subjects.
Hardware Specification No The paper does not explicitly mention the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies No We implement all models using Py Torch (Paszke et al., 2017). Except specifically indicated, we employ MLP and 3D CNN as the backbone for feature extraction when using whole-brain data. The detailed network structure can be found in Appendix. During training, we employ the CLIP loss (Radford et al., 2021) and the Adam W optimizer (Loshchilov & Hutter, 2017) to optimize the models (β1 = 0.9, β2 = 0.999). While PyTorch, CLIP, and AdamW are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes During training, we employ the CLIP loss (Radford et al., 2021) and the Adam W optimizer (Loshchilov & Hutter, 2017) to optimize the models (β1 = 0.9, β2 = 0.999). We set the batch size to 300, and apply the One Cycle LR strategy with a warm-up phase to adjust the learning rate, with a maximum learning rate of 1 10 4. The numbers of epochs to train models on 1, 2, 20, 50, 100, and 167 seen subjects are 200, 200, 400, 600, 800, and 1,000, respectively.