3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

Authors: Qizhi Pei, Rui Yan, Kaiyuan Gao, Jinhua Zhu, Lijun Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify our 3D-Mol T5 framework, we conduct instruction tuning after pre-training on various molecule-text tasks, including molecular property prediction (both 3D-dependent and 3Dindependent), molecule captioning (3D-dependent), and text-based molecule generation (3Dindependent). The results show that both the Specialist (single-task tuned) and Generalist (multi-task tuned) versions of 3D-Mol T5 achieve superior performance across these tasks. For example, on the 3D-dependent molecular property prediction task with the Pub Chem QC (Maho, 2015) dataset, 3D-Mol T5 achieves an improvement of nearly 70% compared to 3D-Mo LM (Li et al., 2023c). These results underscore the versatility and efficacy of our 3D-Mol T5 in both 3D-dependent and 3D-independent molecule-text tasks. Our code is available at https://github.com/Qizhi Pei/3D-Mol T5.
Researcher Affiliation Collaboration 1Gaoling School of Artificial Intelligence, Renmin University of China 2Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education 3School of Computer Science, Wuhan University 4Huazhong University of Science and Technology 5University of Science and Technology of China 6Shanghai AI Laboratory
Pseudocode Yes Algorithm 1 E3FP Algorithm. represents the concatenation operation.
Open Source Code Yes Our code is available at https://github.com/Qizhi Pei/3D-Mol T5.
Open Datasets Yes We use the PCQM4Mv2 dataset from the OGB Large Scale Challenge (Hu et al., 2021) for this task, which includes 3.37M DFT-calculated (Geerlings et al., 2003) 3D molecular structures. ... We use three datasets to evaluate the performance of 3D-Mol T5 on computed property prediction task: QM9 (Ruddigkeit et al., 2012; Fang et al., 2023), Pub Chem QC (Maho, 2015; Xu et al., 2021), and Pub Chem (Kim et al., 2019). ... We use the Che BI-20 (Edwards et al., 2022) dataset, which is widely used for this task (Edwards et al., 2022; Luo et al., 2023; Liu et al., 2024; 2023b; Pei et al., 2023).
Dataset Splits Yes Table 11: Dataset statistics for donwstream fine-tuning. All the datasets are in instruction format. Small differences exist between our processed datasets and the original version, as we discard the data that can not be processed by E3FP (Axen et al., 2017). DATASET MOLECULE TASK SIZE (TRAIN/VALIDATION/TEXT) Pub Chem QC 3D Computed Property Prediction 2,463,404/308,024/308,248 QM9 3D Computed Property Prediction 347,774/1,928/1,928 Pub Chem 3D Computed Property Prediction 46,532/3,885/7,746 3D Descriptive Property Prediction 59,775/4,980/9,940 3D 3D Molecule Captioning 11,955/996/1988 Che BI-20 1D Text-based Molecule Generation 26,407/3,301/3,300
Hardware Specification Yes The pre-training is done on eight NVIDIA 80GB A100 GPUs. The total number of steps for pretraining is 400,000, with warm-up steps set to 10,000.
Software Dependencies No We use nano T5 (Nawrot, 2023) as our codebase. ...For all molecular data, we first get its canonical SMILES from the provided SMILES or 3D structure using RDKit (Landrum et al., 2023), and then convert it to SELFIES using selfies toolkit (Krenn et al., 2020).
Experiment Setup Yes The pre-training is done on eight NVIDIA 80GB A100 GPUs. The total number of steps for pretraining is 400,000, with warm-up steps set to 10,000. Adam W (Loshchilov & Hutter, 2019) with Root Mean Square (RMC) scaling optimizer is used. The peak learning rate is 2e-3 with cosine decay, and the minimum learning rate is 1e-5. The maximum length for input and output is 512. The batch size is set to 768.