Equivariant Masked Position Prediction for Efficient Molecular Representation

Authors: Junyi An, Chao Qu, Yun-Fei Shi, XinHao Liu, Qianwei Tang, Fenglei Cao, Yuan Qi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments to evaluate the effectiveness of EMPP across several 3D molecular benchmarks. Since EMPP can be applied in both unlabeled and labeled scenarios, we evaluate it in two settings: (i) self-supervised tasks for learning transferable molecular knowledge, and (ii) auxiliary tasks for enhancing the prediction of supervised molecular properties.
Researcher Affiliation Collaboration 1Shanghai Academy of Artificial Intelligence for Science 2INFLY TECH (Shanghai) Co., Ltd. 3School of Computer Science, Fudan University 4State Key Laboratory for Novel Software Technology, Nanjing University 5Artificial Intelligence Innovation and Incubation (AI3) Institute, Fudan University
Pseudocode No The paper describes the methodology in prose and mathematical equations in sections 2 and 3, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 0Our code is released in https://github.com/ajy112/EMPP
Open Datasets Yes We evaluate quantum property prediction using the QM9 (Ramakrishnan et al., 2014) and MD17 (Chmiela et al., 2017) datasets. ... Additionally, we utilize the PCQM4Mv2 (Nakata & Shimazaki, 2017) dataset to pre-train GNN backbones... We further investigate the performance of EMPP without pre-training on the GEOM-Drug dataset (Axelrod & Gomez-Bombarelli, 2022)
Dataset Splits Yes For data preparation, we randomly sample 200,000 molecules from GEOM-Drug as the training set and 10,000 molecules as the validation set. To ensure the reliability of the validation results, SMILES strings appearing in the validation set are excluded from the training set, recognizing that a single SMILES can represent multiple conformational data points.
Hardware Specification No The computations in this research were performed using the CFFF platform of Fudan University. This statement mentions a platform but lacks specific hardware details like GPU/CPU models, memory, or other specifications.
Software Dependencies No The paper mentions using 'e3nn library' and 'Torch MD-Net' but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup Yes First, we introduce the hyperparameter configuration of EMPP when used as an auxiliary task, with the basic training configurations based on the Equiformer setup. The configurations for QM9 and MD17 are recorded in Table 5 and Table 6, respectively. Note that there are two sets of configurations for QM9, the one with a longer epoch or smaller batch size is for the four tasks of G, H, U and U0 tasks. In the experiments on PCQM4MV2, we only use EPMM for pre-training. The experimental configurations are shown in Table 7.