reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data

Authors: Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia E Vogt, Sebastian Schelter, Valentina Boeva

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main contribution is an open-source benchmark, sc SSLBench, which compares the performance of several self-supervised learning methods for single-cell data. (1) To address RQ1, we evaluate nineteen generic and specialized single-cell SSL methods across seven different single-cell uni-modal and two multi-modal datasets, assessing their performance on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction (Subsection 4.1). Our results reveal that specialized frameworks, sc VI and CLAIRE, together with the foundation model, sc GPT, are the best for uni-modal batch correction, while generic SSL techniques such as VICReg and Sim CLR outperform domain-specific methods for multi-modal batch correction and the other two tasks on single-modal data. (2) For RQ2, we evaluate various model architectures and hyperparameters, including representation and projection dimensionality, augmentation strategies, and multi-modal integration methods. (Subsection 4.2). Overall, we find that a moderate to larger embedding dimensionality consistently leads to improved results and identify masking as the most beneficial augmentation technique that surpasses biology-specific augmentations.
Researcher Affiliation	Academia	1BIFOLD & TU Berlin, Berlin, Germany 2Department of Computer Science, ETH Zurich, Zurich, Switzerland 3Swiss Institute of Bioinformatics, Lausanne, Switzerland 4Paris Cite University, Cochin Institute, INSERM U1016, Paris, France. Correspondence to: Olga Ovcharenko <EMAIL>, Sebastian Schelter <EMAIL>, Valentina Boeva <EMAIL>.
Pseudocode	No	The paper describes methods in textual form and uses diagrams (e.g., Figure G1) to illustrate architectures, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format within the main text or appendices.
Open Source Code	Yes	We provide our benchmark code under an open license at https://github.com/BoevaLab/scSSL-Bench for reproducibility and for fostering further research on SSL for single-cell data.
Open Datasets	Yes	All datasets used in our benchmark are publicly available. Human Immune Cells (HIC): [...] Availability: https://doi.org/10.6084/m9.figshare.12420968.v8 Mouse Cell Atlas (MCA): [...] Availability: https://ndownloader.figsha re.com/files/10351110 and https://ndownloader.figshare.com/files/10760158 Peripheral Blood Mononuclear Cells (PBMC): Collected by Ding et al.(2020), this dataset contains 30,449 cells from two patients and includes 33,694 genes. [...] Availability: https://singlecell.broadinstitute.org/single_cell/study/SCP424/single-c ell-comparison-pbmc-data
Dataset Splits	Yes	Cell type annotation each dataset is divided into train (reference) and test (query) data that consists of up to three heldout (experimental) batches with unseen cells (details in Appendix C). In the PBMC-M dataset, for cell-type annotation mapping and missing modality inference, we hold out batches P3, P5, and P8. In the BMMC dataset, for cell-type annotation mapping and missing modality inference, we hold out batches s4d1, s4d8, and s4d9.
Hardware Specification	No	Computational data analysis was performed at Leonhard Med secure trusted research environment at ETH Zurich and at the BIFOLD Hydra cluster. While specific computing environments are mentioned, no specific hardware details like GPU/CPU models, processors, or memory amounts are provided.
Software Dependencies	No	All datasets are preprocessed using SCANPY (Wolf et al., 2018) normalize-total function... All models in this benchmark, except Concerto, were trained with the Adam optimizer (Kingma & Ba, 2017)... based on our implementation of models with Lightly SSL (Susmelj et al., 2023). The paper mentions software tools like SCANPY, Adam, and Lightly SSL, but it does not specify their exact version numbers. Citations like (Wolf et al., 2018) or (Susmelj et al., 2023) refer to publications, not software versions.
Experiment Setup	Yes	All models in this benchmark, except Concerto, were trained with the Adam optimizer (Kingma & Ba, 2017). We use a stepwise learning rate schedule with base learning rate 1e-4 and fix the batch size at 256. When applicable, the memory bank size was set to 2048. We train all models with embedding dimensions {8, 16, 32, 64, 128, 256, 512, 1024}. [...] We evaluate t {0.1, 0.5, 1, 5, 10} using SCIB-METRICS scores. [...] Variance-invariance-covariance regularization hyperparameters are used as is done in the original work. We evaluate a grid of parameters, where the invariance term and the variance term λ, α = {5, 10, 25, 50}, while the invariance term β is fixed to 1. We find that λ and α fixed to 5 perform well across both ablation datasets.