reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Generation of Spatial Transcriptomics from Histology Images via Whole-Slide Flow Matching

Authors: Tinglin Huang, Tianyu Liu, Mehrtash Babadi, Wengong Jin, Zhitao Ying

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate STFlow on HEST-1k (Jaume et al., 2024) and STImage-1K4M (Chen et al., 2024a), two large-scale STWSI collections comprising a total of 17 benchmark datasets. Compared to five spot-based and three slide-based methods, STFlow consistently outperforms all baselines and achieves an 18% average relative improvement over the pathology foundation models. It also excels in the prediction of 4 biomarker genes, highlighting its clinical potential. Moreover, our proposed architecture offers orders-of-magnitude faster runtime and lower memory cost than existing slide-based approaches. [...] Ablation Studies We conduct an ablation study to assess the effectiveness of the flow matching framework and frame averaging-based modules.
Researcher Affiliation	Academia	1Yale University. 2Broad Institute of MIT and Harvard. 3Northeastern University. Correspondence to: Tinglin Huang <EMAIL>.
Pseudocode	Yes	Algorithm 1 STFlow: Training [...] Algorithm 2 STFlow: Inference
Open Source Code	Yes	1Implementation can be found at https://github.com/ Graph-and-Geometric-Learning/STFlow
Open Datasets	Yes	We evaluate STFlow on HEST-1k (Jaume et al., 2024) and STImage-1K4M (Chen et al., 2024a), two large-scale STWSI collections comprising a total of 17 benchmark datasets.
Dataset Splits	Yes	HEST-1k (Jaume et al., 2024) and STImage-1K4M (Chen et al., 2024a). Specifically, HEST-1k includes ten benchmarks2 and applies a patient-stratified split to prevent data leakage, resulting in a k-fold cross-validation setup. For STImage1K4M, we select the cancer samples for each organ and randomly split the dataset into train/val/test sets (8:1:1).
Hardware Specification	Yes	The experiments are conducted on a single Linux server with The AMD EPYC 7763 64-Core Processor, 1024G RAM, and 8 RTX A6000-48GB.
Software Dependencies	Yes	Our method is implemented on Py Torch 2.3.0 and Python 3.10.14.
Experiment Setup	Yes	For all the models, we fix the optimizer as Adam (Kingma & Ba, 2014) and MSE loss as the loss function. The gradient norm is clipped to 1.0 in each training step to ensure learning stability. The learning rate is tuned within {1e-3, 5e-4, 1e-4} and is set to 5e-4 by default, as it generally yields the best performance. [...] For each model, we search the hyperparameters in the following ranges: the dropout rate in {0, 0.2, 0.5}, the number of nearest neighbors for the slide-based methods in {4, 8, 25}, and the number of attention heads in {1, 2, 4, 8}. All models are trained for 100 epochs, with early stopping applied if no performance improvement is observed for 20 epochs. [...] STFlow: The number of layers, attention heads, and neighbors are 4, 4, and 8, respectively. Besides, dropout and hidden sizes are set at 0.2 and 128. The number of sampling steps for flow matching is set to 5. For the ZINB distribution, zero-inflation probability is fixed as 0.5, the mean is searched {0.1,0.2,0.4}, and the number of failures is searched in {1,2,4}.