Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity
Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Xujin Li, Huiguang He
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple video-f MRI datasets demonstrate that our model achieves state-of-the-art performance. Comprehensive visualization analyses further elucidate the interpretability of our model from a neurobiological perspective. Project page: https://mind-animator-design.github.io/. |
| Researcher Affiliation | Academia | 1Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences 2School of Future Technology, University of Chinese Academy of Sciences 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4School of Computer and Artificial Intelligence, Zhengzhou University 5Beijing University of Posts and Telecommunications Equal Contribution *Corresponding Author EMAIL |
| Pseudocode | Yes | Algorithm 1 Py Torch code for the video captioning process |
| Open Source Code | Yes | Project page: https://mind-animator-design.github.io/. We will release all data and code to facilitate future research. |
| Open Datasets | Yes | In this study, we utilize three publicly available video-f MRI datasets, which encompass paired stimulus videos and their corresponding f MRI responses. ... B.5 DATA ACQUISITION The open-source datasets used in this paper can be accessed via the following links: (1) CC2017: https://purr.purdue.edu/publications/2809/1 (2) HCP: https://www.humanconnectome.org/ (3) Algonauts2021: http://algonauts.csail.mit.edu/2021/index.html |
| Dataset Splits | Yes | CC2017 (Wen et al. (2018)) 3 2s 4320 1200 ... HCP (Marcus et al. (2011)) 3 1s 2736 304 ... Algonauts2021 (Cichy et al. (2021)) 10 1.75s 900 100 ... we randomly shuffle all video segments and allocate 90% for the training set, with the remaining 10% reserved for the test set. ... we utilize the first 900 sets of data for training and the 900-1000 sets for testing. |
| Hardware Specification | Yes | All experiments are conducted on an A100 80G GPU, with the training phase taking 8 hours and the inference phase taking 12 hours for each dataset. |
| Software Dependencies | No | The paper mentions software like 'torch', 'clip', and 'lavis' in Algorithm 1, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For all three datasets employed in the experiments, during the training of the Semantic Decoder, we set α to 0.5, λ1 to 0.01, and λ2 to 0.5. The batch size is set to 64, and the learning rate is set to 2e-4, with training conducted 100 epochs. ... During the training of the Structural Decoder, we set the batch size to 64 and the learning rate to 1e-6. ... When training the Consistency Motion Generator, we set the patch size to 64 and the mask ratio of the Sparse Causal mask to 0.6 during the training phase, with a batch size of 64 and a learning rate of 4e-5. |