reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Electron Density-enhanced Molecular Geometry Learning

Authors: Hongxin Xiang, Jun Xia, Xin Jin, Wenjie Du, Li Zeng, Xiangxiang Zeng

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on QM9 and r MD17 demonstrate that EDG can be directly integrated into existing geometry-based models and significantly improves the capabilities of these models (e.g., Sch Net, EGNN, Sphere Net, Vi SNet) for geometry representation learning in MLFF with a maximum average performance increase of 33.7%.
Researcher Affiliation	Academia	1College of Computer Science and Electronic Engineering, Hunan University, Changsha, China 2School of Engineering, Westlake University, Hangzhou, China 3Eastern Institute of Technology, Ningbo, China 4University of Science and Technology of China, Hefei, China Corresponding author (EMAIL)
Pseudocode	No	The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and appendix are available at https://github.com/Hongxin Xiang/EDG
Open Datasets	Yes	In evaluation stage, we select 12 widely used tasks related to quantum mechanic properties from QM9 [Ramakrishnan et al., 2014] and 10 common tasks related to energy/force from revised MD17 (r MD17) [Christensen and Von Lilienfeld, 2020]. To pre-train Image ED, the ED-aware teacher, and the ED predictor, we select the first 2 millions unlabeled molecular conformations and their DFT-computed ED data from the EDBench database [Xiang et al., 2025]
Dataset Splits	Yes	The dataset split follows Geom3D [Liu et al., 2024], i.e., using 110K for training, 10K for validation, and 11K for testing in QM9 and 950 for training, 50 for validation, and 1000 for testing in r MD17.
Hardware Specification	Yes	In pre-training of Image ED on 2 million ED molecules, we use a learning rate of 1.5e-4, a batch size of 64, a mask ratio of 0.25, ̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸MP and ̸̸̸̸̸̸̸̸̸̸̸̸̸̸RP of 1 for 20 epochs on 8 Ge Force RTX 4090 (See Appendix G for more details).
Software Dependencies	No	The paper mentions software like PyMol, Psi4, and PyTorch, but does not specify their version numbers in the main text.
Experiment Setup	Yes	For example, Sch Net, EGNN, and Sphere Net are trained for 1,000 epochs with a learning rate of 5e-4; Equiformer is trained for 300 epochs with a learning rate of 5e-4; and Vi SNet is trained for 3,000 epochs with a learning rate of 0.0002. The batch size of Sch Net, EGNN, Sphere Net, and Equiformer is set to 128 in QM9 and 1 in r MD17, the batch size of Vi SNet is set to 4 in r MD17. In pre-training of Image ED on 2 million ED molecules, we use a learning rate of 1.5e-4, a batch size of 64, a mask ratio of 0.25, ̸̸̸̸̸̸̸̸̸̸̸̸̸̸MP and ̸̸̸̸̸̸̸̸̸̸̸̸̸̸RP of 1 for 20 epochs... In pre-training of EDaware teacher on 2 million molecules... We use a learning rate of 5e-3 and a batch size of 128 to train the ED-aware teacher and ED predictor for about 280k steps... In distillation stage of EDG, we select hyper-parameters ̸ from 1e-4 and 5e-4 to 1.0 with a 10x increasing in steps.