reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SpotDiff: Spatial Gene Expression Imputation Diffusion with Single-Cell RNA Sequencing Data Integration

Authors: Tianyi Chen, Yunfei Zhang, Lianxin Xie, Wenjun Shen, Si Wu, Hau-San Wong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have been performed to demonstrate that Spot Diff outperforms existing imputation methods across multiple benchmarks in terms of yielding more accurate and biologically relevant gene expression profiles, particularly in highly sparse scenarios.
Researcher Affiliation	Academia	Tianyi Chen1, Yunfei Zhang2, Lianxin Xie2, Wenjun Shen3, Si Wu2, Hau-San Wong1 1City University of Hong Kong 2South China University of Technology 3Shantou University Medical College
Pseudocode	No	The paper describes the methodology using mathematical equations and descriptive text, illustrated by architectural diagrams (e.g., Figure 2), but does not contain a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	In this study, we utilized six distinct datasets to evaluate the performance of imputation methods. They are osm FISH (Codeluppi et al. 2018), FISH (Li et al. 2022), STARmap (Wang et al. 2018), MERFISH (Li et al. 2022), 10x BA (Long et al. 2023) and 10x HBC (Long et al. 2023). These datasets encompass various tissues and organ types, each with corresponding ST and sc RNA-seq data. The details of datasets are summarized in Table 2.
Dataset Splits	No	The paper describes a random masking strategy (ϕ = 0.3) applied to ST data to simulate missing parts for imputation, but it does not explicitly specify traditional training, validation, and test splits for the datasets used in the experiments.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for conducting the experiments.
Software Dependencies	No	The paper mentions frameworks and components like the 'Di T framework' and 'T5 text encoder,' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper mentions a masking threshold (ϕ = 0.3) and loss weight parameters (λ1, λ2) but does not provide their specific values or other detailed experimental setup parameters such as learning rates, batch sizes, or number of epochs.