reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Causal Representation Learning for Biological Data in the Pathway Space

Authors: Jesus de la Fuente Cedeño, Robert Lehmann, Carlos Ruiz-Arenas, Jan Voges, Irene Marín-Goñi, Xabier Martinez de Morentin, David Gomez-Cabrero, Idoia Ochoa, Jesper Tegnér, Vincenzo Lagani, Mikel Hernaez

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that SENA-discrepancy-VAE achieves predictive performances on unseen combinations of interventions that are comparable with its original, non-interpretable counterpart, while inferring causal latent factors that are biologically meaningful. [...] We employ two large-scale Perturb-seq datasets, one collected on leukemia lymphoblast cells (K562 cell line) (Norman et al., 2019), termed the Norman2019 dataset, and a second one collected on acute myeloid leukemia cells (THP1 cell line)(Wessels et al., 2022), termed the Wessels2023 dataset. [...] Section 5 ABLATION STUDY. [...] Section 6 LEARNING INTERPRETABLE LATENT CAUSAL FACTORS. [...] Table 1: Benchmarking SENA-discrepancy-VAE and discrepancy-VAE on double perturbations prediction.
Researcher Affiliation	Academia	1 CIMA University of Navarra, CCUN, Idi SNA, Pamplona, Spain. 2 TECNUN, University of Navarra, San Sebastián, Spain. 3 Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia 4 Dept. of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA 5 Institute of Chemical Biology, Ilia State University, Tbilisi 0162, Georgia 6 Center for Data Science (DATAI), University of Navarra, 31008, Pamplona, Spain.
Pseudocode	No	The paper describes the SENA-discrepancy-VAE model and its components using mathematical equations and descriptive text, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	Yes	2Python package, including data and code for reproducibility: github.com/ML4BM-Lab/SENA
Open Datasets	Yes	We employ two large-scale Perturb-seq datasets, one collected on leukemia lymphoblast cells (K562 cell line) (Norman et al., 2019), termed the Norman2019 dataset, and a second one collected on acute myeloid leukemia cells (THP1 cell line)(Wessels et al., 2022), termed the Wessels2023 dataset.
Dataset Splits	Yes	The Norman2019 dataset underwent standard preprocessing steps for single cell data (filtering, normalization, and log-transformation (Wolf et al., 2018)), leading to a total of 8,907 unperturbed cells (controls), 57,831 cells under the 105 single-gene perturbations, and 41,759 cells under the 131 double-gene perturbations. [...] For both datasets, we trained both models on the unperturbed and single-gene perturbations samples from Norman et al. (2019)... Double-gene perturbations were set aside for evaluation purposes.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions several software tools and packages such as 'Python package', 'statsannotation package (Charlier et al., 2022)', 'Seurat', and 'Scanpy', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Given the good results (in interpretability and reconstruction performance) obtained in the ablation study (Section 5), we varied the number of latent factors within {5, 10, 35, 70, 105}, and the λ for the SENA-discrepancy-VAE in {0, 0.1} (Appendix VII Fig. 12 shows gradients and mask (M) distribution across several λ values). [...] The parameter N is set to 100 in our analyses. [...] We evaluated the aforementioned architectures for several values of λ: {0, 0.1, 0.01, 10 3}. [...] enforcing that every BP contains at least 5 genes.