WyckoffDiff – A Generative Diffusion Model for Crystal Symmetry

Authors: Filip Ekström Kelvinius, Oskar B. Andersson, Abhijith S Parackal, Dong Qian, Rickard Armiento, Fredrik Lindsten

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We additionally present a new metric, Fr echet Wrenformer Distance, which captures the symmetry aspects of the materials generated, and we benchmark WYCKOFFDIFF against recently proposed generative models for crystal generation. As a proof-of-concept study, we use WYCKOFFDIFF to find new materials below the convex hull of thermodynamical stability. 4. Numerical Evaluations The quantitative evaluation of our models uses the WBM dataset (Wang et al., 2021) created by substitution of chemical elements in the crystal structures available from the Materials Project (Jain et al., 2013) to generate a total of 257k materials. We set aside 10k+10k materials as validation and test sets.
Researcher Affiliation Academia 1Department of Computer and Information Science (IDA), Link oping University, Sweden 2Department of Physics, Chemistry and Biology (IFM), Link oping University, Sweden. Correspondence to: Filip Ekstr om Kelvinius or Oskar B. Andersson <EMAIL, EMAIL>.
Pseudocode Yes Algorithm 1 WYCKOFFDIFF. Algorithm 2 Full GNN forward pass. Algorithm 3 GNN layer forward pass. Algorithm 4 GNN message, Ml(wi, wj) in Equation (9).
Open Source Code Yes Data and code is available online1. 1https://github.com/httk/wyckoffdiff
Open Datasets Yes The quantitative evaluation of our models uses the WBM dataset6 (Wang et al., 2021) created by substitution of chemical elements in the crystal structures available from the Materials Project (MP) (Jain et al., 2013) to generate a total of 257k materials. We set aside 10k+10k materials as validation and test sets. As an additional experiment, we used Carbon24 (Pickard, 2020).
Dataset Splits Yes The quantitative evaluation of our models uses the WBM dataset6 (Wang et al., 2021)... to generate a total of 257k materials. We set aside 10k+10k materials as validation and test sets.
Hardware Specification Yes Training of a model required approximately 38 hours on a single NVIDIA A100. Parts of the computations were enabled by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre (NSC).
Software Dependencies No The paper mentions "Pyxtal library (Fredericks et al., 2021)", "MACE7 (Batatia et al., 2023)", "VASP electronic-structure software (Kresse & Hafner, 1994)" and "Pymatgen (Ong et al., 2013)" without providing specific version numbers for these software components.
Experiment Setup Yes Hyperparameters for WYCKOFFDIFF and its training can be found in Table 3. Table 3 lists: MAX. TIMESTEP T 1 000, MAX. ATOM NUMBER Na 100, MAX. NUM ATOMS OF AN ELEMENT P 54, NUMBER OF GNN LAYERS, N 3, DIMENSION OF hl i 256, DIMENSION OF h POS i 16, ACTIVATION FUNCTION SILU (SEE EQUATION (11)), MLPS, GENERAL NUMBER OF HIDDEN LAYERS 2, ACTIVATION SILU, MLPS IN Ml HIDDEN DIMENSION 2(DIM(hl i) + DIM(h POS i )) = 544, PROBABILITY DECODING MLPS HIDDEN DIMENSION 2DIM(hl i) = 512, OPTIMIZER ADAMW (LOSHCHILOV & HUTTER, 2019), LEARNING RATE 2 10 4, BATCH SIZE 256, NUMBER OF EPOCHS 1000.