Wyckoff Transformer: Generation of Symmetric Crystals
Authors: Nikita Kazeev, Wei Nong, Ignat Romanov, Ruiming Zhu, Andrey E Ustyuzhanin, Shuya Yamazaki, Kedar Hippalgaonkar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimentation demonstrates Wy Former s compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed. ... 3. Experimental Evaluation 3.1. De novo generation 3.1.1. DATASETS 3.1.2. METRICS 3.1.3. METHODOLOGY 3.1.4. DE NOVO STRUCTURE GENERATION RESULTS 3.2. Material property prediction |
| Researcher Affiliation | Collaboration | 1Institute for Functional Intelligent Materials University of Singapore, Block S9, Level 9, 4 Science Drive 2, Singapore 117544 2School of Materials Science and Engineering, Nanyang Technological University, Singapore 639798 3HSE University, Myasnitskaya Ulitsa, 20, Moscow, Russia, 101000 4Constructor University, Bremen, Campus Ring 1, 28759, Germany 5Constructor Knowledge Labs, Bremen, Campus Ring 1, 28759, Germany 6Institute of Materials Research and Engineering, Agency for Science Technology and Research, 2 Fusionopolis Way, Singapore, 138634. Correspondence to: Nikita Kazeev <EMAIL>, Kedar Hippalgaonkar <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Generation of Crystal Structure using Wyckoff Transformer Model Algorithm 2 Wyckoff Transformer Training Algorithm Algorithm 3 Model Forward Pass |
| Open Source Code | Yes | https: //github.com/Symmetry Advantage/ Wyckoff Transformer |
| Open Datasets | Yes | We use MP-20 (Xie et al., 2021)... Additionally, we train and evaluate Wy Former on MPTS-52 (Baird et al., 2024)... Materials Project (Jain et al., 2013)... We also utilize the AFLOW database (Curtarolo et al., 2012)... on the 3DSC dataset (Sommer et al., 2023). |
| Dataset Splits | Yes | MP-20 (Xie et al., 2021), which contains almost all experimentally stable materials in Materials Project (Jain et al., 2013) with a maximum of 20 atoms per unit cell, within 0.08 e V/atom of the convex hull, and formation energy smaller than 2 e V/atom, 45 229 structures in total, split 60/20/20 into train, validation and test parts. ... We also utilize the AFLOW database (Curtarolo et al., 2012), which contains 4905 compounds spanning a diverse range of chemistries and crystal structures. We predict four properties: thermal conductivity, Debye temperature, bulk modulus, and shear modulus. The data are divided into training, validation, and test sets using a 60/20/20 split. |
| Hardware Specification | Yes | Our tests were done on a single NVIDIA RTX 6000 Ada, 24 CPU cores and MP-20 dataset. ... We conducted experiments on a machine with NVIDIA RTX 6000 Ada and 24 physical CPU cores. ... CHGNet: 112 GPU s / structure for MP-20 on NVIDIA A40 ... Crystal Former; Cao et al. (2024): It takes 520 seconds to generate a batch size 13,000 crystal samples on a single A100 GPU |
| Software Dependencies | No | While VASP version 5.4.4 is mentioned, other key software components such as the machine learning framework (e.g., PyTorch, TensorFlow), Python version, or CUDA version are not specified with their version numbers, which is necessary for a reproducible description of ancillary software. |
| Experiment Setup | Yes | L. Hyperparameters L.1. Next token prediction MP-20 Element embedding size: 16 Site symmetry embedding size: 16 Site enumerations embedding size: 8 Number of fully-connected layers: 3 Number of attention heads: 4 Dimension of feed forward layers inside Encoder: 128 Dropout inside Encoder: 0.2 Number of Encoder layers: 3 Loss function: Cross Entropy, multi-class for element, single-class for other token parts, no averaging Batch size: 27136 (full MP-20 train) Optimizer: SGD Initial learning rate: 0.2 Scheduler: Reduce LROn Plateau Scheduler patience: 2 104 epochs Early stopping patience: 105 epochs of no improvement in validation loss clip grad norm: max norm=2 L.2. Energy prediction MP-20 ... L.3. Band gap prediction MP-20 |