DPLM-2: A Multimodal Diffusion Protein Language Model

Authors: Xinyou Wang, Zaixiang Zheng, Fei YE, Dongyu Xue, Shujian Huang, Quanquan Gu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS In this section, we evaluate DPLM-2 on various generative and understanding scenarios, including unconditional protein generation (structure, sequence, and structure-sequence co-generation, § 4.1), and a variety of conditional tasks, such as folding (§ 4.2), inverse folding (§ 4.3) and motif-scaffolding (§ 4.4), and a series of protein predictive tasks (§ 4.5).
Researcher Affiliation Collaboration School of Computer Science, Nanjing University Byte Dance Research EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Temperature-annealed stochastic sampling
Open Source Code Yes We are also committed to open-source our models, training and inference code to democratize multimodal generative protein LMs to benefit the community.
Open Datasets Yes To train DPLM-2, we leverage a high-quality dataset comprising 20K clustered experimental structures from the Protein Data Bank (PDB) (Berman et al., 2000) and 200K predicted structures from the AFDB Swiss Prot split (Varadi et al., 2022), with length < 512.
Dataset Splits No The training set of DPLM-2 is composed by experimental data, i.e., PDB (Berman et al., 2000), and high quality synthetic data, i.e., Swiss Prot (Varadi et al., 2022)... We assess DPLM-2 on CAMEO 2022 and a PDB data split used by Multiflow (Campbell et al., 2024).
Hardware Specification Yes We train 150M DPLM-2 with 8 A100 GPUs for 3 days, while 650M with 16 A100 GPUs for 3 days and 3B with 16 A100 GPUs for a week.
Software Dependencies No We train all models using Adam W optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.95. We use a weight decay of 0.01 and gradient clipping of 0.5. We employ 2K warmup steps until reaching the maximum learning rate, and utilize a linear decay scheduler to decay LR to 10% of the maximum learning rate by the end of training. The maximum learning rate is 1e-4, and the overall training step is 100,000. We utilize the pretrained DPLM as the parameter initialization, and the diffusion timestep is set to 500.
Experiment Setup Yes We train all models using Adam W optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.95. We use a weight decay of 0.01 and gradient clipping of 0.5. We employ 2K warmup steps until reaching the maximum learning rate, and utilize a linear decay scheduler to decay LR to 10% of the maximum learning rate by the end of training. The maximum learning rate is 1e-4, and the overall training step is 100,000. We utilize the pretrained DPLM as the parameter initialization, and the diffusion timestep is set to 500.