reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning the Electronic Hamiltonian of Large Atomic Structures

Authors: Chen Hao Xia, Manasa Kaniselvan, Alexandros Nikolaos Ziogas, Marko Mladenović, Rayen Mahjoub, Alexander Maeder, Mathieu Luisier

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its capabilities by predicting the electronic Hamiltonian of various systems with up to 3,000 nodes (atoms), 500,000+ edges, 28 million orbital interactions (nonzero entries of H), and 0.53% error in the eigenvalue spectra.
Researcher Affiliation	Academia	1Integrated Systems Laboratory, Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland. Correspondence to: Chen Hao Xia <EMAIL>, Manasa Kaniselvan <EMAIL>.
Pseudocode	Yes	In Algorithm 1, we detail the procedure to partition the full graph G, described by the set of vertices V and edges E, into a set of slices {G1 . . . GN} which are augmented by virtual nodes and edges.
Open Source Code	No	The paper does not provide an explicit statement or link to the source code for the methodology described.
Open Datasets	Yes	The datasets are publicly available at https://huggingface.co/datasets/chexia8/Amorphous-Hamiltonians.
Dataset Splits	Yes	Material Structure Purpose rcut [ A] # atoms # orbitals # edges x [ A] y [ A] z [ A] a-Hf O2 1 validate 8 3,000 18,000 527,348 52.876 26.308 26.242 a-Hf O2 2 train 8 3,000 18,000 533,364 52.346 26.237 26.293 a-Hf O2 3 test 8 3,000 18,000 530,920 52.722 26.267 26.191 ... Table 1. Attributes of the generated dataset for three materials, each with its own training, validation, and test set: The [x, y, z] triplet defines the periodic unit cell size. Dataset Model Train Validate Test Batch size ϵtot [µEh] water QHNet 500 500 3900 10 10.79 This work 500 500 3900 10 5.60
Hardware Specification	Yes	During the training of the full graph model, the peak memory consumption observed was 61.68 Gi B on a single NVIDIA A100 GPU. Experiments were run on NVIDIA A100 GPUs with # ranks set to Nt. The computation time per H2O molecule was 7s, when run on 12 nodes with 12-core Intel Xeon E5-2680 CPUs and NVIDIA P100 GPU, resulting in a total of 0.04 node hours.
Software Dependencies	No	The paper mentions several software tools and libraries by name (e.g., PyTorch, CP2K, LAMMPS, ASE), but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Hyper-parameters a-Hf O2/Pt Ge/GST dataset Optimizer Adam Precision single (f32) Scheduler Reduce LROn Plateau Initial learning rate 1 10 4 Minimum learning rate 1 10 5 Decay patience tpatience 500 Decay factor 0.5 Threshold 1 10 3 Maximum degree Lmax 4 Maximum order Mmax 4 Embedding size 16 Number of attention heads Nh 2 Feedforward Network Dimension 64 Table 7. Hyper-parameters used for a-Hf O2, a-Pt Ge and a-GST data.