DPLM-2: A Multimodal Diffusion Protein Language Model
Authors: Xinyou Wang, Zaixiang Zheng, Fei YE, Dongyu Xue, Shujian Huang, Quanquan Gu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS In this section, we evaluate DPLM-2 on various generative and understanding scenarios, including unconditional protein generation (structure, sequence, and structure-sequence co-generation, § 4.1), and a variety of conditional tasks, such as folding (§ 4.2), inverse folding (§ 4.3) and motif-scaffolding (§ 4.4), and a series of protein predictive tasks (§ 4.5). |
| Researcher Affiliation | Collaboration | School of Computer Science, Nanjing University Byte Dance Research EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Temperature-annealed stochastic sampling |
| Open Source Code | Yes | We are also committed to open-source our models, training and inference code to democratize multimodal generative protein LMs to benefit the community. |
| Open Datasets | Yes | To train DPLM-2, we leverage a high-quality dataset comprising 20K clustered experimental structures from the Protein Data Bank (PDB) (Berman et al., 2000) and 200K predicted structures from the AFDB Swiss Prot split (Varadi et al., 2022), with length < 512. |
| Dataset Splits | No | The training set of DPLM-2 is composed by experimental data, i.e., PDB (Berman et al., 2000), and high quality synthetic data, i.e., Swiss Prot (Varadi et al., 2022)... We assess DPLM-2 on CAMEO 2022 and a PDB data split used by Multiflow (Campbell et al., 2024). |
| Hardware Specification | Yes | We train 150M DPLM-2 with 8 A100 GPUs for 3 days, while 650M with 16 A100 GPUs for 3 days and 3B with 16 A100 GPUs for a week. |
| Software Dependencies | No | We train all models using Adam W optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.95. We use a weight decay of 0.01 and gradient clipping of 0.5. We employ 2K warmup steps until reaching the maximum learning rate, and utilize a linear decay scheduler to decay LR to 10% of the maximum learning rate by the end of training. The maximum learning rate is 1e-4, and the overall training step is 100,000. We utilize the pretrained DPLM as the parameter initialization, and the diffusion timestep is set to 500. |
| Experiment Setup | Yes | We train all models using Adam W optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.95. We use a weight decay of 0.01 and gradient clipping of 0.5. We employ 2K warmup steps until reaching the maximum learning rate, and utilize a linear decay scheduler to decay LR to 10% of the maximum learning rate by the end of training. The maximum learning rate is 1e-4, and the overall training step is 100,000. We utilize the pretrained DPLM as the parameter initialization, and the diffusion timestep is set to 500. |