reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Human-Aligned Image Models Improve Visual Decoding from the Brain

Authors: Nona Rajabi, Antonio H. Ribeiro, Miguel Vasco, Farzaneh Taleb, Mårten Björkman, Danica Kragic

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities.
Researcher Affiliation	Academia	1Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Stockholm, Sweden 2Department of Information Technology, Uppsala University, Uppsala, Sweden.
Pseudocode	No	The paper describes its methodology using text and mathematical equations in Section 2, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1All codes are available at https://github.com/ Nona Rjb/Align Vis.git
Open Datasets	Yes	We used the Things EEG2 dataset (Gifford et al., 2022) to train and evaluate our framework. For the results in Section 5.4, we used preprocessed MEG data from (Hebart et al., 2023)... we extended our experiments to the NSD dataset (Allen et al., 2022).
Dataset Splits	Yes	The training set includes 1,654 unique concepts, each with 10 images shown in random order and repeated 4 times, totaling 1654 10 4 samples per participant. The test set contains 200 distinct concepts, each with 1 image shown 80 times, yielding 200 1 80 samples per participant. [...] Models were trained with a 90/10% split and evaluated on the test set. [...] The training set contains 1854 12 1 samples, while the test set includes 200 1 12 samples per participant. [...] This resulted in a dataset comprising 24,980 training samples and 2,770 test samples. For the test set, we averaged the brain responses across the three repetitions of each image, reducing the test set to 982 unique samples, while the training set remained unaveraged.
Hardware Specification	No	The computations and data handling were enabled mainly by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre and partly by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. This describes computing resources but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies	No	We obtained human-aligned image encoders directly from the Dreamsim, g Local, and Harmonization repositories provided by the authors, ensuring they function exactly as reported in their respective papers without any retraining. For the original unaligned encoders, we used publicly available pretrained models using Hugging Face transformers (Table 4) or timm (Table 5) libraries. The paper lists software libraries and frameworks but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	For per-participant EEG experiments, NICE encoders were trained for up to 50 epochs with a batch size of 128, a learning rate of 0.0002, and a temperature of 0.04. The same hyperparameters were used for ATM-S, except for the number of epochs, which was set to 80. EEGNet and EEGConformer were both trained for 200 epochs. EEGConformer used a learning rate of 0.0002, a batch size of 128, and a temperature of 0.07, while EEGNet used 0.01, 512, and 0.1, respectively. For cross-participant training, NICE was trained for up to 150 epochs with a batch size of 512 and a learning rate of 0.0001. For MEG experiments, we used a learning rate of 0.00005, a batch size of 256, and a temperature of 0.1, training for up to 50 epochs. Training was halted in all models if validation loss did not improve for 25 consecutive epochs. We trained the MLP fMRI encoder with residual connections proposed by Scotti et al. (2023) for 50 epochs with a learning rate of 0.0001 and a batch size of 128.