reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Equivariant Non-Local Electron Density Functionals

Authors: Nicholas Gao, Eike Eberhard, Stephan Günnemann

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical evaluation, we find EG-XC to accurately reconstruct gold-standard CCSD(T) energies on MD17. On out-of-distribution conformations of 3BPA, EG-XC reduces the relative MAE by 35 % to 50 %. Remarkably, EG-XC excels in data efficiency and molecular size extrapolation on QM9, matching force fields trained on 5 times more and larger molecules. On identical training sets, EG-XC yields on average 51 % lower MAEs.
Researcher Affiliation	Academia	Nicholas Gao , Eike Eberhard , Stephan Günnemann EMAIL Department of Computer Science & Munich Data Science Institute Technical University of Munich
Pseudocode	No	The paper describes methods like the self-consistent field (SCF) method and equivariant message passing through textual descriptions and mathematical equations. It does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	Yes	We provide the source code on https://github.com/eseberhard/eg-ex
Open Datasets	Yes	We compare these methods on the revised MD17 dataset, which contains precise gold-standard CCSD(T) (CCSD for aspirin) reference energies for conformations of five molecules along the trajectory of a molecular dynamic (MD) simulation (Chmiela et al., 2018). ... To investigate the extrapolation to unseen structures, we use the 3BPA dataset (Kovács et al., 2021). ... Here, we simulate this setting by splitting the QM9 dataset (Ramakrishnan et al., 2014) into subsets of increasing size based on the number of heavy atoms
Dataset Splits	Yes	Each molecule has a training set of 1000 structures, which we split into 950 training and 50 validation structures. Each test set contains an additional 500 structures (1000 for ethanol). ... The training set consists of 500 structures sampled from an MD simulation at room temperature (300K). The test sets consist of MD trajectories at 300K, 600K, and 1200K ... For each training set, we split the structures 90%/10% into training and validation sets.
Hardware Specification	Yes	All calculations were performed on a single NVIDIA A100 GPU with our JAX implementation.
Software Dependencies	No	The paper mentions using 'JAX (Bradbury et al., 2018)' for the SCF method and 'Py SCF (Sun et al., 2018)' for precomputing integrals and obtaining grid points. It also mentions using the Si LU activation function (Hendrycks & Gimpel, 2023). However, specific version numbers for JAX, Py SCF, or any other software libraries are not provided, only citations to their respective papers.
Experiment Setup	Yes	Table 4: Hyperparameters for EG-XC. d number of features per irrep 32, lmax number of irreps 2, T number of layers 3, Radial filters 32, ϵm GGA Base semilocal functional Dick & Fernandez-Serra (2021), Batch size 1, Iloss Number of steps to compute loss 3, Parameter EMA 0.995, Optimizer Adam β1 0.9 β2 0.999, Basis set 6-31G(d), Density fitting basis set weigend I, SCF iterations 15, Precycle XC functional LDA, Precycle iterations 15, Learning rate MD17 0.01 1+ 1 1000, 3BPA 0.01 1+ 1 1000, QM9 0.001 1+ 1 1000.