reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-objective antibody design with constrained preference optimization

Authors: Milong Ren, ZaiKai He, Haicang Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on independent test sets, Ab Novo outperforms existing methods in metrics of binding affinity such as Rosetta binding energy and evolutionary plausibility, as well as in metrics for other biophysical properties like stability and specificity.
Researcher Affiliation	Academia	Milong Ren3,4, Zaikai He1,3,4, Haicang Zhang1,2 1Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine 2Central China Research Institute of Artificial Intelligence 3Institute of Computing Technology, Chinese Academy of Sciences 4University of Chinese Academy of Sciences Correspondence should be addressed to H. Zhang (EMAIL)
Pseudocode	Yes	Algorithm 1 Constrained Preference Optimization for Antibody Design
Open Source Code	Yes	CODE AVILIBILITY Code for Ab Novo can be found at https://github.com/Carbon Matrix Lab/Ab Novo.
Open Datasets	Yes	We trained Ab Novo using antibody-antigen complex structures derived from the SAb Dab database (Dunbar et al., 2014) and evaluated its performance on the RAb D test set, which is widely used for in silico antibody design.
Dataset Splits	No	The paper mentions training on 'SAb Dab database' and evaluating on 'RAb D test set', and describes a '40% sequence identity threshold on CDR-H3' to eliminate overlap between training and test sets. However, it does not specify explicit percentages or sample counts for training, validation, or test splits, nor does it provide citations to predefined splits with such details.
Hardware Specification	Yes	We use 8 Nvidia A100 (80G) for training, and the batch size is 128 for all training stages.
Software Dependencies	No	The paper mentions using several software and tools such as 'Rosetta software', 'Ig LM Shuai et al. (2021)', 'MMseqs', 'Alpha Fold2', and 'Adam optimizer'. However, it does not provide specific version numbers for any of these components.
Experiment Setup	Yes	Table 14: Hyper-parameter of Ab Novo. Stage Training objective Training steps Learning Rate Dataset Pre-trained LMLM + Ldistogram + Lcontact 200k 5e-5 AFDB (2M)+PDB(filter) Base model 1.0L(x) + 0.5L(r) + 0.2L(a) + 0.1Lviolation + 1.0Laux 20k 1e-4 Antigen-antibody complex Fine-tuning Lupdate policy (Equation 10) 20k 2e-5 Preference dataset We show information about the training process, objectives, and learning rate of Ab Novo in Table 14. Especially in the fine-tuning stage, for updating λ for one step, we update the policy for 100 steps. We show details about losses we used when updating policy and λ in 3.3.1 and 3.3.2. We use 8 Nvidia A100 (80G) for training, and the batch size is 128 for all training stages. For all training procedures, we use the Adam optimizer for training with default parameters. In practice, we choose [α(x), α(r), α(a)]=[1.0, 0.5, 0.2], α(sup) = 0.5 and K = 8. Meanwhile, we add the regularisation term α(R) for 1 t in Equation 9 to ensure the stability of training. Here, we choose α(R)/ t = 10.0. In our experiments, considering the different magnitudes of various rewards and constraints, we normalized all rewards and constraints during the training process. Meanwhile, we set the initial λ to [1.0, 1.0, 1.0] and the reward weights ω1 and ω2 to 1.0 and 1.0.