reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Equivariant Masked Position Prediction for Efficient Molecular Representation

Authors: Junyi An, Chao Qu, Yun-Fei Shi, XinHao Liu, Qianwei Tang, Fenglei Cao, Yuan Qi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experiments to evaluate the effectiveness of EMPP across several 3D molecular benchmarks. Since EMPP can be applied in both unlabeled and labeled scenarios, we evaluate it in two settings: (i) self-supervised tasks for learning transferable molecular knowledge, and (ii) auxiliary tasks for enhancing the prediction of supervised molecular properties.
Researcher Affiliation	Collaboration	1Shanghai Academy of Artificial Intelligence for Science 2INFLY TECH (Shanghai) Co., Ltd. 3School of Computer Science, Fudan University 4State Key Laboratory for Novel Software Technology, Nanjing University 5Artificial Intelligence Innovation and Incubation (AI3) Institute, Fudan University
Pseudocode	No	The paper describes the methodology in prose and mathematical equations in sections 2 and 3, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	0Our code is released in https://github.com/ajy112/EMPP
Open Datasets	Yes	We evaluate quantum property prediction using the QM9 (Ramakrishnan et al., 2014) and MD17 (Chmiela et al., 2017) datasets. ... Additionally, we utilize the PCQM4Mv2 (Nakata & Shimazaki, 2017) dataset to pre-train GNN backbones... We further investigate the performance of EMPP without pre-training on the GEOM-Drug dataset (Axelrod & Gomez-Bombarelli, 2022)
Dataset Splits	Yes	For data preparation, we randomly sample 200,000 molecules from GEOM-Drug as the training set and 10,000 molecules as the validation set. To ensure the reliability of the validation results, SMILES strings appearing in the validation set are excluded from the training set, recognizing that a single SMILES can represent multiple conformational data points.
Hardware Specification	No	The computations in this research were performed using the CFFF platform of Fudan University. This statement mentions a platform but lacks specific hardware details like GPU/CPU models, memory, or other specifications.
Software Dependencies	No	The paper mentions using 'e3nn library' and 'Torch MD-Net' but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup	Yes	First, we introduce the hyperparameter configuration of EMPP when used as an auxiliary task, with the basic training configurations based on the Equiformer setup. The configurations for QM9 and MD17 are recorded in Table 5 and Table 6, respectively. Note that there are two sets of configurations for QM9, the one with a longer epoch or smaller batch size is for the four tasks of G, H, U and U0 tasks. In the experiments on PCQM4MV2, we only use EPMM for pre-training. The experimental configurations are shown in Table 7.