reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CellFlux: Simulating Cellular Morphology Changes via Flow Matching

Authors: Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey J Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, Serena Yeung-Levy

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on chemical (BBBC021), genetic (Rx Rx1), and combined perturbation (JUMP) datasets, Cell Flux generates biologically meaningful cell images that faithfully capture perturbation-specific morphological changes, achieving a 35% improvement in FID scores and a 12% increase in mode-ofaction prediction accuracy over existing methods. Additionally, Cell Flux enables continuous interpolation between cellular states, providing a potential tool for studying perturbation dynamics. These capabilities mark a significant step toward realizing virtual cell modeling for biomedical research. Project page: https://yuhui-zh15. github.io/Cell Flux/. ... In this section, we present detailed results demonstrating Cell Flux s state-of-the-art performance in cellular morphology prediction under perturbations, outperforming existing methods across multiple datasets and evaluation metrics. ... Our experiments were conducted using three cell imaging perturbation datasets: BBBC021 (chemical perturbation) (Caie et al., 2010), Rx Rx1 (genetic perturbation) (Sypetkowski et al., 2023), and the JUMP dataset (combined perturbation) (Chandrasekaran et al., 2023).
Researcher Affiliation	Academia	1Stanford University 2Tsinghua University 3MIT. Correspondence to: Yuhui Zhang <EMAIL>, Serena Yeung-Levy <EMAIL>.
Pseudocode	Yes	Algorithm 1 Cell Flux Algorithm Training Process: input Initial distribution p0, target distribution p1, perturbation c, neural network vθ(xt, t, c), noise injection probability pn, condition drop probability pc, learning rate η, number of iterations N output Trained neural network vθ for each iteration i = 1, . . . , N do Sample x0 p0 and x1 p1 Sample t Uniform[0, 1] Inject noise x0 x0 + ϵ, ϵ N(0, I) with pn Drop condition c ϕ with pc Interpolate xt tx1 + (1 t)x0 Compute true velocity v(xt, t, c) x1 x0 Predict velocity using neural network vθ(xt, t, c) Compute loss L vθ(xt, t, c) v(xt, t, c) 2 2 Update θ using gradient descent θ θ η θL end for Inference Process: input Initial sample x0 p0, perturbation c, step size t, classifier-free guidance strength α output Generated sample x1 p1 Initialize xt x0 for t = 0 to 1 with step size t do Computer velocity with classifier-free guidance v CFG θ (xt, t, c) αvθ(xt, t, c) + (1 α)vθ(xt, t, ) Update xt xt + t v CFG θ (xt, t, c) end for Output final state x1 xt
Open Source Code	No	Project page: https://yuhui-zh15. github.io/Cell Flux/.
Open Datasets	Yes	Our experiments were conducted using three cell imaging perturbation datasets: BBBC021 (chemical perturbation) (Caie et al., 2010), Rx Rx1 (genetic perturbation) (Sypetkowski et al., 2023), and the JUMP dataset (combined perturbation) (Chandrasekaran et al., 2023). ... BBBC021 dataset. We utilized the BBBC021v1 image set (Caie et al., 2010), available from the Broad Bioimage Benchmark Collection (Ljosa et al., 2012). ... Rx Rx1 dataset. The Rx Rx1 dataset (Sypetkowski et al., 2023), available under a CC-BY-NC-SA-4.0 license from Recursion Pharmaceuticals at rxrx.ai, focuses on genetic perturbations using CRISPR-mediated gene knockouts. ... JUMP dataset (CPJUMP1). The JUMP dataset (Chandrasekaran et al., 2023), available under a CC0 1.0 license, integrates both genetic and chemical perturbations, offering the most comprehensive image-based profiling resource to date. ... Public access to the dataset and associated analysis pipelines is available via Broad s JUMP repository.
Dataset Splits	No	The resulting datasets include 98K, 171K, and 424K images with 3, 6, and 5 channels, respectively, from 26, 1,042, and 747 perturbation types. ... Evaluation metrics. We evaluate methods using two types of metrics: (1) FID and KID (lower the better), which measure image distribution similarity via Fr echet and kernelbased distances, computed on 5K generated images for BBBC021 and 100 randomly selected perturbation classes for Rx Rx1 and JUMP; we report both overall scores across all samples and conditional scores per perturbation class.
Hardware Specification	Yes	Training details. Models are trained for 100 epochs on 4 A100 GPUs using the Adam optimizer with a learning rate of 1e-4 and a batch size of 128, requiring 8, 16, and 36 hours for BBBC021, Rx Rx1, and JUMP, respectively.
Software Dependencies	No	Perturbation encoding. We encode perturbations following IMPA s approach (Palma et al., 2025). For chemical embeddings, we use 1024-dimensional Morgan Fingerprints generated with RDKit. For gene embeddings, CRISPR and ORF embeddings combine Gene2Vec with Hyena DNA-derived sequence representations, resulting in final dimensions of 328 and 456, respectively.
Experiment Setup	Yes	Training details. Models are trained for 100 epochs on 4 A100 GPUs using the Adam optimizer with a learning rate of 1e-4 and a batch size of 128, requiring 8, 16, and 36 hours for BBBC021, Rx Rx1, and JUMP, respectively. The noise injection probability, condition drop probability, and classifier-free guidance strength are set to 0.5, 0.2, and 1.2, respectively. Models are selected based on the lowest FID scores on the validation set.