reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion on Language Model Encodings for Protein Sequence Generation

Authors: Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov, Fedor Nikolaev, Nikita Ivanisenko, Olga Kardymon, Dmitry Vetrov

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate existing methods alongside Di MA using multiple metrics across two protein modalities, covering quality, diversity, novelty, and distribution matching of generated proteins. Di MA consistently produces novel, high-quality and diverse protein sequences and achieves strong results compared to baselines such as autoregressive, discrete diffusion and flow matching language models. Section 3 is titled "Experiments" and contains subsections like "Evaluation Metrics", "Denoiser Component Analysis", and "Comparison Across Generative Paradigms".
Researcher Affiliation	Collaboration	1Constructor University, Bremen, Germany 2AIRI, Moscow, Russia. Correspondence to: Viacheslav Meshchaninov <EMAIL>, Pavel Strashnov <EMAIL>, Andrey Shevtsov <EMAIL>.
Pseudocode	No	The paper describes methods and architectures in prose and via diagrams (e.g., Figure 1, Figure 9), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at Git Hub.
Open Datasets	Yes	Swiss Prot is a dataset that contains a high-quality, manually annotated subset of the Uni Prot (Consortium, 2020) database. Another dataset we use is AFDBv4-90 from Durairaj et al. (2023), a subset of the Uni Ref50 database.
Dataset Splits	No	During inference, we first sample the target sequence length from the training data distribution to ensure realistic protein lengths. We finetune Di MA on the CATH S40 non-redundant dataset ( 27k proteins) and evaluate performance on a hold-out set of 100 structures. For each structure, we generate 10 proteins and assess their similarity to the target fold using the TM-score.
Hardware Specification	Yes	The experiments were conducted using 4 A100 80GB GPUs.
Software Dependencies	No	The paper mentions several software tools and models, such as "ESM-2", "CHEAP", "Sa Prot", "RFDiffusion", "Protein MPNN", and "Inter Pro Scan", but does not specify their version numbers. For example, it does not state "Python 3.8, PyTorch 1.9, and CUDA 11.1" or similar specific versioning for the ancillary software used in the experiments.
Experiment Setup	Yes	All models were trained with a batch size of 512 and a learning rate of 1e 4 to convergence. We clip our gradient norm to 2 and have a linear warmup schedule for the first 5000 iterations. We also use a 0.9999 EMA. Our diffusion model employs a transformer architecture with 12 layers, 16 attention heads, and a hidden size of 320.