SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics

Authors: Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental STo FM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information. ... The results in Table 1 demonstrate that STo FM outperforms existing methods across different tissue region semantic segmentation tasks. ... In order to demonstrate the performance improvement resulting from modeling information at multiple scales, we conduct a set of ablation experiments as shown in Table 5.
Researcher Affiliation Collaboration 1Institute for AI Industry Research (AIR), Tsinghua University 2Department of Computer Science and Tecnology, Tsinghua University 3Phar Molix Inc. 4School of Mathematical Sciences, Peking University.
Pseudocode Yes Algorithm 1 Microand Macro-scale Integration
Open Source Code Yes Code is available at: https://github.com/Phar Molix/STo FM.
Open Datasets Yes To train STo FM, we construct STo Corpus-88M, the largest high-resolution ST pretraining corpus to date. This corpus includes approximately 2,000 high-resolution ST slices obtained by 6 different ST technologies, totaling 88 million cells. It surpasses the current largest ST corpus (Schaar et al., 2024) by 1.5 fold in scale and 2 fold in ST technology diversity. STo Corpus-88M will be publicly released. ... We organize publicly available ST data from multiple sources (Xu et al., 2024; Yuan et al., 2023; Biology et al., 2023; 10x GENOMICS, 2025; Vizgen, 2025; nano String, 2025; Seek Gene, 2025).
Dataset Splits Yes For tasks involving splitting the same dataset into training and testing sets, an 8:2 random split is applied. Additionally, 10% of the training set is randomly selected as a validation set, and an early stopping strategy is applied based on performance on the validation set. ... For the cross-slice task Embryo Cross in Sec. 4.2, we train on CS12-13E2S3, CS12-13E2S5, CS12-13E2S6, and test on CS12-13E2S4.
Hardware Specification Yes The pretraining is performed with 4 NVIDIA Tesla A100 GPUs and takes approximately 20 days. ... Pretraining is carried out on four NVIDIA Tesla A100 GPUs, with both Domain Adaptation and pretraining taking approximately 10 days each to complete.
Software Dependencies No The training process is conducted using the Py Torch framework. We utilize the Adam W optimizer, with a learning rate strategy that involved a warm-up phase followed by linear decay. Additional experimental configurations are detailed in Table A.8. (Note: Table A.8 mentions 'Optimizer: Adam W', 'Scheduler: Linear', but no specific version numbers for PyTorch or other libraries are provided.)
Experiment Setup Yes Pretraining is performed on STo Corpus-88M after removing the data used for downstream tasks. We perform domain adaptation for one epoch. Then, we perform multi-scale ST representation learning for three epochs, where the cell encoder is frozen in the first two epochs, following the strategy in Sec. 3.3. The pretraining is performed with 4 NVIDIA Tesla A100 GPUs and takes approximately 20 days. Detailed pretraining hyperparameters are provided in Appendix C.1. (Table A.8 lists: 'Leiden resolution: 1.0', 'α in Algorithm 1: 0.8', 'Split scale: 1000', 'Max learning rate: 1e-4', 'Warm up steps: 500', 'Batch size: 16', 'Gradient accumulation: 4' among others).