reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DUSTED: Dual-Attention Enhanced Spatial Transcriptomics Denoiser

Authors: Jun Zhu, Yifu Li, Zhenchao Tang, Cheng Chang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Benchmark tests using simulated datasets demonstrate that DUSTED outperforms existing methods. Furthermore, in real-world applications with the HOCWTA and DLPFC datasets, DUSTED excels in enhancing the correlation between gene and protein expression, recovering spatial gene expression patterns, and improving clustering results. These improvements underscore its potential impact on advancing our understanding of tumor microenvironments, neural tissue organization, and other biologically significant areas.
Researcher Affiliation	Academia	1School of Life Sciences, Tsinghua University, Beijing, 100084, China 2National Center for Protein Sciences (Beijing), Beijing, 102206, China 3Beijing Institute of Lifeomics, Beijing, 102206, China 4National Superior College for Engineers,Beihang University, Beijing, 100191, China 5School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China 6School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China EMAIL,EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using mathematical formulas and textual explanations for the Gene Channel Attention Module, Unsupervised Learning Setup (Encoder, Decoder, Graph Attention Layer), and Loss Function, but does not present a structured pseudocode or algorithm block.
Open Source Code	Yes	Code https://github.com/Lifeomics/DUSTED
Open Datasets	Yes	The Human Ovarian Cancer: Whole Transcriptome Analysis (HOCWTA) dataset (10x Genomics 2020) includes 10x Visium SRT data for ovarian endometrioid adenocarcinoma tissue... https://www.10xgenomics.com/datasets/human-ovarian-cancer-whole-transcriptomeanalysis-stains-dapi-anti-pan-ck-anti-cd-45-1-standard-12-0. The Human Dorsolateral Prefrontal Cortex (DLPFC) dataset (Maynard et al. 2021) consists of SRT data for 12 dorsolateral prefrontal cortex tissue sections...
Dataset Splits	No	The paper describes generating simulated datasets and using real datasets (HOCWTA and DLPFC) for evaluation. DUSTED is presented as a self-supervised/unsupervised denoising model. While the paper outlines how data was generated and what it was used for, it does not specify explicit training/validation/test splits for DUSTED's application or evaluation in the conventional supervised learning sense. For evaluation tasks, the denoised entire datasets are used.
Hardware Specification	Yes	The model is built and trained using the Py Torch deep learning framework, and all experiments are conducted on a single NVIDIA Quadro GV100 GPU with 32GB of memory.
Software Dependencies	No	The model is built and trained using the Py Torch deep learning framework... Using Squidpy (Palla et al. 2022)... then applied the mclust (Scrucca et al. 2016) clustering method... The paper mentions software frameworks and libraries used (PyTorch, Squidpy, mclust) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The encoder and decoder of DUSTED are both set to two layers, with feature dimensions from input to output being [2000, 512, 30, 512, 2000]. The parameter updates are optimized using the Adam optimizer, and the model is trained for 500 epochs. For the simulated dataset, the learning rate is set to 10 4, α (the weights of the residual connection) is set to 0.3 , and the loss function is selected as LNB. For the HOCWTA dataset, the learning rate is 2 10 4, α is set to 1 , and the loss function is selected as LNB. For the DLPFC dataset, the learning rate is 10 4, α is set to 1.5 , and the loss function is selected as LZINB . For constructing the neighborhood graph, the radius r is adjusted for different datasets to ensure each node has 5 6 neighboring nodes.