reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Variational Perspective on Generative Protein Fitness Optimization

Authors: Lea Bogensperger, Dominik Narnhofer, Ahmed Allam, Konrad Schindler, Michael Krauthammer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate VLGPO on two public benchmarks for protein fitness optimization in limited data regimes, namely Adeno-Associated Virus (AAV) (Bryant et al., 2021) and Green Fluorescent Protein (GFP) (Sarkisyan et al., 2016), as suggested by (Kirjner et al., 2023). ... We perform fitness optimization in a continuous latent representation... We demonstrate state-of-the-art performance on established benchmarks for protein fitness optimization, namely AAV and GFP... We conduct an ablation study on the influence of manifold constrained gradients in sampling (Line 7, Algorithm 1).
Researcher Affiliation	Academia	1University of Zurich 2ETH Zurich. Correspondence to: Lea Bogensperger <EMAIL>.
Pseudocode	Yes	Algorithm 1 VLGPO sampling
Open Source Code	Yes	Source code available at https://github.com/uzh-dqbm-cmi/VLGPO.
Open Datasets	Yes	We validate VLGPO on two public benchmarks for protein fitness optimization in limited data regimes, namely Adeno-Associated Virus (AAV) (Bryant et al., 2021) and Green Fluorescent Protein (GFP) (Sarkisyan et al., 2016), as suggested by (Kirjner et al., 2023).
Dataset Splits	No	The paper defines tasks based on fitness percentile ranges and mutation gaps (Table 1) and lists the number of data samples N for each task (Table 2), which are used for training VAE and flow matching models in a limited data setting. However, it does not explicitly provide specific train/validation/test splits (e.g., percentages or exact counts) for the models being developed in the paper. The oracle gψ is trained on the complete DMS data, but this is for evaluation, not for the VLGPO model's training and evaluation splits.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The paper mentions using a '1D CNN commonly used for denoising diffusion probabilistic models (DDPMs)' and links to a GitHub repository, but it does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	A learning rate of 0.001 with a convolutional architecture and β {0.01, 0.001} for AAV and GFP is used for training the encoder E and decoder D in Equation (4)... A learning rate of 5e-5 and a batch size of 1024 were used to train vθ,t for 1000 epochs. ... K = 32 ODE steps... The parameters αt {0.97, 1.2, 0.56} and J {39, 19, 37} for AAV (medium), AAV (hard) and GFP (medium).