reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization

Authors: Zichen Wang, Yaokun Ji, Jianing Tian, Shuangjia Zheng

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments demonstrate that our method achieves state-of-the-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models. To evaluate the performance of our model s generation, we utilize two tasks: antibody CDR sequence inverse folding (Section 5.1) and antibody optimization based on sequence design (Section 5.2), to compare with the baselines. Additionally, we conducted ablation experiments and further analysis to demonstrate the effectiveness of the retrieval-augmented method (Section 5.3).
Researcher Affiliation	Academia	1 Global Institute of Future technology, Shanghai Jiao Tong University; 2 School of Software & Microelectronics, Peking University
Pseudocode	Yes	Algorithm 1 Structural Retrieval Algorithm Overview Algorithm 2 Training Procedure of RADAb Algorithm 3 Sampling Procedure of RADAb
Open Source Code	Yes	REPRODUCIBILITY STATEMENT The code is avalibale at https://github.com/GENTEL-lab/RADAb
Open Datasets	Yes	To fully exploit the protein structure space, we first compiled a database of CDR-like fragments from the non-redundant Protein Data Bank (PDB) (Berman et al., 2000). The dataset for training the model is obtained from the SAb Dab and our established CDR-like fragments dataset. Following the previous work (Luo et al., 2022), we first eliminated structures with a resolution lower than 4 A and removed antibodies that target non-protein antigens. Chothia (Chothia & Lesk, 1987) in ANARCI (Dunbar & Deane, 2016) is used for renumbering antibody residues.
Dataset Splits	Yes	We clustered the SADab datasets based on 50% sequence similarity in the CDR-H3 region, and chose 50 PDB files comprising 63 antibody-antigen complex structures as the test set. To ensure distinct training and test sets, we removed structures from the training set that were part of the same clusters as those in the test set.
Hardware Specification	Yes	All experiments are run on a single RTX4090 GPU, with a memory storage of 24GB.
Software Dependencies	No	Our model was developed and executed within the Py Torch framework.
Experiment Setup	Yes	For training, We chose the Adam optimizer with a learning rate of 0.0001, weight decay of 0.0, and momentum parameters beta1 and beta2 set to 0.9 and 0.999, respectively. To dynamically adjust the learning rate, we employed plateau as learning rate scheduler. When the validation loss plateaued, the learning rate was reduced by a factor of 0.8, with a minimum learning rate set to 5e-6. The scheduler s patience was set to 10 epochs. The batch size is 8 during training. We design 8 samples for each CDR in the test set. Due to the high variability and specificity of the CDRH3 region, and it is considered the most critical part in determining antigen-antibody binding. We conducted separate training for the sequence design of this region, adding and removing noise only for the CDRH3 region in each training iteration, with a total of 100,000 iterations. The other five regions, being more conserved, were trained together for a total of 250,000 iterations (approximately equivalent to 50,000 iterations per region). The reverse generation process time step t is set to 100.