Equivariant Masked Position Prediction for Efficient Molecular Representation
Authors: Junyi An, Chao Qu, Yun-Fei Shi, XinHao Liu, Qianwei Tang, Fenglei Cao, Yuan Qi
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experiments to evaluate the effectiveness of EMPP across several 3D molecular benchmarks. Since EMPP can be applied in both unlabeled and labeled scenarios, we evaluate it in two settings: (i) self-supervised tasks for learning transferable molecular knowledge, and (ii) auxiliary tasks for enhancing the prediction of supervised molecular properties. |
| Researcher Affiliation | Collaboration | 1Shanghai Academy of Artificial Intelligence for Science 2INFLY TECH (Shanghai) Co., Ltd. 3School of Computer Science, Fudan University 4State Key Laboratory for Novel Software Technology, Nanjing University 5Artificial Intelligence Innovation and Incubation (AI3) Institute, Fudan University |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations in sections 2 and 3, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 0Our code is released in https://github.com/ajy112/EMPP |
| Open Datasets | Yes | We evaluate quantum property prediction using the QM9 (Ramakrishnan et al., 2014) and MD17 (Chmiela et al., 2017) datasets. ... Additionally, we utilize the PCQM4Mv2 (Nakata & Shimazaki, 2017) dataset to pre-train GNN backbones... We further investigate the performance of EMPP without pre-training on the GEOM-Drug dataset (Axelrod & Gomez-Bombarelli, 2022) |
| Dataset Splits | Yes | For data preparation, we randomly sample 200,000 molecules from GEOM-Drug as the training set and 10,000 molecules as the validation set. To ensure the reliability of the validation results, SMILES strings appearing in the validation set are excluded from the training set, recognizing that a single SMILES can represent multiple conformational data points. |
| Hardware Specification | No | The computations in this research were performed using the CFFF platform of Fudan University. This statement mentions a platform but lacks specific hardware details like GPU/CPU models, memory, or other specifications. |
| Software Dependencies | No | The paper mentions using 'e3nn library' and 'Torch MD-Net' but does not specify their version numbers or other software dependencies with specific versions. |
| Experiment Setup | Yes | First, we introduce the hyperparameter configuration of EMPP when used as an auxiliary task, with the basic training configurations based on the Equiformer setup. The configurations for QM9 and MD17 are recorded in Table 5 and Table 6, respectively. Note that there are two sets of configurations for QM9, the one with a longer epoch or smaller batch size is for the four tasks of G, H, U and U0 tasks. In the experiments on PCQM4MV2, we only use EPMM for pre-training. The experimental configurations are shown in Table 7. |