reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation

Authors: Alessandro Palma, Sergei Rybakov, Leon Hetzel, Stephan Günnemann, Fabian J Theis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic data support the theoretical soundness of our approach, while applications to time-resolved single-cell RNA sequencing data demonstrate improved trajectory reconstruction and manifold interpolation.
Researcher Affiliation	Collaboration	1Institute of Computational Biology, Helmholtz Munich, Munich, Germany 2School of Computation Information and Technology, Technical University of Munich, Germany 3Lamin Labs 4Munich Data Science Institute, Technical University of Munich, Germany 5TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany.
Pseudocode	Yes	We summarise the procedure used to train Flat VI in Algorithm 1. Algorithm 1 Train Flat VI Algorithm 2 Train latent OT-CFM with Flat VI
Open Source Code	Yes	We have made our code publicly available at https://github.com/theislab/Flat VI.
Open Datasets	Yes	All datasets used in this study are open source, and their associated publications are cited in the manuscript. (i) The Embryoid Body (EB) dataset (Moon et al., 2019), comprising 18,203 differentiating human embryoid cells over five time points and spanning four lineages; and (ii) the MEF reprogramming dataset (Schiebinger et al., 2019), containing 165,892 cells across 39 time points, tracing the reprogramming of mouse embryonic fibroblasts into induced pluripotent stem cells. pancreatic endocrinogenesis (here shortly denoted as Pancreas) by Bastidas-Ponce et al. (2019)
Dataset Splits	Yes	The data is split into 80% training and 20% test sets.
Hardware Specification	Yes	Our experiments ran on different GPU servers with varying specifications: GPU: 16x Tesla V100 GPUs (32GB RAM per card) / GPU: 2x Tesla V100 GPUs (16GB RAM per card) / GPU: 8x A100-SXM4 GPUs (40GB RAM per card).
Software Dependencies	Yes	Our model is implemented in Python 3.10, and for deep learning models, we used Py Torch 2.0. For the implementation of neural-ODE-based simulations, we use the torchdyn package.
Experiment Setup	Yes	The Geodesic AE, NB-VAE, and Flat VI models are trained using shallow 2-layer neural networks with hidden dimensions [256, 10]. ... All models are trained for 1000 epochs with early stopping based on the VAE loss and a patience value of 20 epochs. The default learning rate is set to 1e-3. For VAE-based models, we linearly anneal the KL divergence weight from 0 to 1 over the course of training. NB-VAE and Flat VI models use a batch size of 32, while Geodesic AE is trained with a batch size of 256 ... A summary of hyperparameter sweeps for Flat VI, along with selected values based on validation loss, is shown in Table 3. ... To parameterise the velocity field in OT-CFM, we use a 3-layer MLP with 64 hidden units per layer and SELU activations. The learning rate is fixed at 1e-3. ... The variance hyperparameter σ is set to 0.1 by default.