Decomposed Direct Preference Optimization for Structure-Based Drug Design

Authors: Xiwei Cheng, Xiangxin Zhou, Yuwei Yang, Yu Bao, Quanquan Gu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Cross Docked2020 benchmark show that Decomp Dpo significantly improves model performance, achieving up to 98.5% Med. High Affinity and a 43.9% success rate for molecule generation, and 100% Med. High Affinity and a 52.1% success rate for targeted molecule optimization. Code is available at https://github.com/laviaf/Decomp DPO.
Researcher Affiliation Collaboration Xiwei Cheng EMAIL Khoury College of Computer Sciences, Northeastern University Xiangxin Zhou EMAIL Byte Dance Seed School of Artificial Intelligence, University of Chinese Academy of Sciences New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) Yu Bao EMAIL Byte Dance Seed Yuwei Yang EMAIL Byte Dance Quanquan Gu EMAIL Byte Dance Seed
Pseudocode No The paper describes the methodology using equations and textual descriptions, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/laviaf/Decomp DPO.
Open Datasets Yes We followed prior work (Luo et al., 2021; Peng et al., 2022; Guan et al., 2023a;b), using the Cross Docked2020 dataset (Francoeur et al., 2020) to pre-train our reference model and evaluate the performance of Decomp Dpo.
Dataset Splits Yes According to the protocol established by Luo et al. (2021), we filtered complexes to retain only those with high-quality docking poses (RMSD < 1Å) and diverse protein sequences (sequence identity < 30%), resulting in a refined dataset comprising 100,000 high-quality training complexes and 100 novel proteins for evaluation.
Hardware Specification Yes The model is pre-trained on a single NVIDIA A6000 GPU, and it could converge within 21 hours and 170k steps. For fine-tuning the model for molecule generation, we set βT = 0.001 and trained for 30,000 steps on one NVIDIA A40 GPU. For molecular optimization, we set βT = 0.02 and trained for 20,000 steps on one NVIDIA V100 GPU.
Software Dependencies No The paper mentions using Adam (Kingma & Ba, 2014) as an optimizer and RDKit and Alphaspace2 (Katigbak et al., 2020) toolkit for molecular fragmentation, but it does not provide specific version numbers for any software libraries or tools used in the experimental setup.
Experiment Setup Yes Pre-training We use Adam (Kingma & Ba, 2014) for pre-training, with init_learning_rate=0.0004 and betas=(0.95,0.999). The learning rate is scheduled to decay exponentially with a factor of 0.6 with minimize_learning_rate=1e-6. The learning rate is decayed if there is no improvement for the validation loss in 10 consecutive evaluations. We set batch_size=8 and clip_gradient_norm=8. During training, a small Gaussian noise with a standard deviation of 0.1 to protein atom positions is added as data augmentation. To balance the magnitude of different losses, the reconstruction losses of atom and bond type are multiplied by weights γv = 100 and γb = 100, respectively. We perform evaluations every 2000 training steps. ... Fine-tuning and Optimizing For both fine-tuning and optimizing model with Decomp Dpo, we use the Adam optimizer with init_learning_rate=1e-6 and betas=(0.95,0.999). We maintain a constant learning rate throughout both processes. We set batch_size=4 and clip_gradient_norm=8. ... For fine-tuning the model for molecule generation, we set βT = 0.001 and trained for 30,000 steps on one NVIDIA A40 GPU. For molecular optimization, we set βT = 0.02 and trained for 20,000 steps on one NVIDIA V100 GPU. ... The λ used for penalizing rewards with energy terms proposed in Section 3.3 is set to 0.1.