reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Cross-Domain Representations for Transferable Drug Perturbations on Single-Cell Transcriptional Responses

Authors: Hui Liu, Shikai Jin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive evaluations of our model on multiple datasets, including single-cell transcriptional responses to drugs and singleand combinatorial genetic perturbations. The experimental results show that XTransfer CDR achieved better performance than current state-of-theart methods, showcasing its potential to advance phenotypic drug discovery.
Researcher Affiliation	Academia	College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211816, China EMAIL
Pseudocode	No	The paper describes the framework, equations, and components but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code https://github.com/hliulab/XTransfer CDR
Open Datasets	Yes	We initially evaluated the model s performance on the single-cell chemical response dataset from the sci-Plex project (Srivatsan et al. 2020). The sci-Plex3 dataset contains the single-cell transcriptional responses of three human cancer cell lines (MCF7, K562, and A549) exposed to 188 different drugs. Moreover, we have noticed that the sci-Plex project has released a new dataset sci-Plex4, thereby we extended our evaluation to this dataset. We further evaluated our method on two single-cell datasets established by genetic perturbation assays (Replogle, Saunders, and et al. 2021). To further validate the effectiveness of learned transferable perturbations, we carried out systematic evaluation on another dataset that was generated through CRISPRbased knockout (deactivation) of multiple genes, aimed at observing the consequent alterations in single-cell phenotypes (Norman et al. 2019).
Dataset Splits	Yes	For model evaluation, the expression profiles induced by these nine drugs were held out as the test set (n=3,071), while the remaining data were used to create the paired samples for training (n=101,190) and validation set (n=8,499) with a 4:1 ratio. Following the drug-level data partitioning strategy, the sci-Plex4 dataset was randomly divided into a training set (n=7,104), validation set (n=718), and test set (n=733). the K562 dataset was divided into a training set (n=64,249), a validation set (n=2,234), and a test set (n=2,233). The RPE-1 dataset was divided into a training set (n=72,200), a validation set (n=2,045), and a test set (n=2,044).
Hardware Specification	Yes	all experiments were conducted on a Cent OS Linux 8.2.2004 (Core) system, equipped with a Ge Force RTX 4090 GPU and 128GB memory.
Software Dependencies	No	The paper mentions "Cent OS Linux 8.2.2004 (Core) system" as the operating environment but does not specify any programming languages, libraries, or frameworks with version numbers that are critical software dependencies for reproducing the experiments.
Experiment Setup	Yes	They consist of four feedforward layers with sizes of 1024, 512, 256, and 128, respectively. Each feed-forward layer is followed by a batch normalization layer, and a dropout layer with the dropout probability set to 0.2. The learning rate is set to 2e-4, and the bottleneck dimension between the encoder and decoder is set to 128. The model was trained for 60 epochs