reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Distillation for extrapolative protein design through exact preference optimization

Authors: Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola, Bella Dubrov, Shang Shang, Ron Benson

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our model s performance in designing AAV and GFP proteins and demonstrated that the proposed framework significantly improves effectiveness in extrapolation tasks. Our benchmark shows that our approach can drastically improve the performance upon prior methods (Section 5). Through ablation studies, we show the importance of training on hard triplewise rankings in comparison to other methods for preference dataset creation (Section 6.1).
Researcher Affiliation	Collaboration	Mostafa Karimi, Sharmi Banerjee Amazon EMAIL Tommi Jaakkola Massachusetts Institute of Technology EMAIL Bella Dubrov, Shang Shang, Ron Benson Amazon EMAIL
Pseudocode	No	The paper describes the methodology using prose and mathematical equations, and includes a 'Schematic overview' in Figure 1, but does not contain explicitly labeled pseudocode or algorithm blocks for its own method. References to 'algorithms' in section 6.3 refer to external, state-of-the-art preference learning algorithms being benchmarked against.
Open Source Code	No	The paper does not contain an unambiguous statement of code release, a direct link to a code repository, or mention of code in supplementary materials for the methodology described.
Open Datasets	Yes	We evaluate our method on the well studied Green Fluorescent Proteins (GFP) by Sarkisyan et al. (2016) and Adeno-Associated Virus (AAV) by Bryant et al. (2021). We utilize the carefully created medium and hard difficulty splits provided by Kirjner et al. (2024).
Dataset Splits	Yes	We used the medium and hard difficulty split of datasets where mutational gap are 6 and 7 mutations respectively. In total, we created 500K (50K) training (validation) samples in which half of them based on Don t go backward and the other half based on Don t get stuck at the same fitness. We chose the top 100K (10K) hardest triplets as training (validation) samples for offline preference learning.
Hardware Specification	Yes	Table 7: Comparison of computational costs of generating triplets with P3 (V100) GPU machine.
Software Dependencies	No	The paper mentions models like Prot-T5-XL and optimizers like Adam W, but does not provide specific version numbers for any software libraries or frameworks used in their implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We trained the local editor model on Dpairs for 10 epochs with the Adam W optimizer (Loshchilov & Hutter, 2017), a learning rate of 1e-4 and batch size of 384. We further fine-tuned the local editor model based on triplet-based preference learning through EXO loss function defined in equation 3 for 1 epoch with batch size of 32, learning rate of 5e-7, β = 0.1 and the Adam W optimizer (Loshchilov & Hutter, 2017). Inspired from Padmakumar et al. (2023), for each initial seed sequence, we sample N (i.e. 10 for AAV and 2 for GFP) sequences using a combination of top-k and top-p sampling with k = 10, p = 0.95 and a temperature of 0.7 (1.0) without (with) scorer in inference.