reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distilling Structural Representations into Protein Sequence Models

Authors: Jeffrey Ouyang-Zhang, Chengyue Gong, Yue Zhao, Philipp Krähenbühl, Adam Klivans, Daniel Diaz

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	ISM outperforms state-of-the-art sequence models on several well-studied benchmarks including mutation stability assessment and structure prediction. For example, on the CAMEO protein structure prediction benchmark ISM outperforms its ESM2 counterpart with a GDT-TS score of 0.67 versus 0.64 (see Table 1). For S669 !!G prediction, ISM surpasses ESM2 in AUC (0.76 vs 0.72) and even matches specialized models. We ablate key design decisions by reporting long-range Precision at L (P@L) for contact prediction, accuracy for secondary structure prediction, F1 for binding residue prediction, and Spearman correlation for !!G prediction in Table 4.
Researcher Affiliation	Academia	Jeffrey Ouyang-Zhang, Chengyue Gong, Yue Zhao, Philipp Krähenbühl, Adam R. Klivans, Daniel J. Diaz University of Texas at Austin EMAIL
Pseudocode	No	The paper describes methods and processes in text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/jozhang97/ISM.
Open Datasets	Yes	Our autoencoder training dataset contains 35K proteins from the Protein Data Bank(PDB). We extract per-residue microenvironment features for 5.8M proteins from Uniclust30 with Alpha Fold structures (Mirdita et al., 2017), along with 35K PDB proteins. We evaluate how effectively ISM predicts the impact of single mutations on a protein s thermodynamic stability (!!G) on the S669 dataset (Pancotti et al., 2022) in Table 2. We ﬁne-tune on the c DNA117K dataset from Diaz et al. (2024), a subset of the c DNA display proteolysis dataset (Tsuboyama et al., 2023). We evaluate ISM on the PEER (Xu et al., 2022) and FLIP (Dallago et al., 2021) benchmarks
Dataset Splits	Yes	For contact, secondary structure, and binding residue prediction, the proteins in the training and test sets have at most 30% sequence similarity. Contact, secondary structure, and binding residue prediction are evaluated using sequence similarity splits of 30%, 25%, and 20% respectively.
Hardware Specification	Yes	Training takes 26 wall-clock hours on 32 GH200 GPUs.
Software Dependencies	No	The paper mentions specific optimizers (Adam W) but does not provide version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	We structure-tune the 650M parameter ESM2 for 20 epochs using a cosine learning rate schedule with 4 warmup epochs. We use a total batch size of 1536 proteins cropped to a maximum sequence length of 512 amino acids. We use Adam W optimizer with a learning rate of 1 10 4 and weight decay of 5 10 3.