reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wyckoff Transformer: Generation of Symmetric Crystals

Authors: Nikita Kazeev, Wei Nong, Ignat Romanov, Ruiming Zhu, Andrey E Ustyuzhanin, Shuya Yamazaki, Kedar Hippalgaonkar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimentation demonstrates Wy Former s compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed. ... 3. Experimental Evaluation 3.1. De novo generation 3.1.1. DATASETS 3.1.2. METRICS 3.1.3. METHODOLOGY 3.1.4. DE NOVO STRUCTURE GENERATION RESULTS 3.2. Material property prediction
Researcher Affiliation	Collaboration	1Institute for Functional Intelligent Materials University of Singapore, Block S9, Level 9, 4 Science Drive 2, Singapore 117544 2School of Materials Science and Engineering, Nanyang Technological University, Singapore 639798 3HSE University, Myasnitskaya Ulitsa, 20, Moscow, Russia, 101000 4Constructor University, Bremen, Campus Ring 1, 28759, Germany 5Constructor Knowledge Labs, Bremen, Campus Ring 1, 28759, Germany 6Institute of Materials Research and Engineering, Agency for Science Technology and Research, 2 Fusionopolis Way, Singapore, 138634. Correspondence to: Nikita Kazeev <EMAIL>, Kedar Hippalgaonkar <EMAIL>.
Pseudocode	Yes	Algorithm 1 Generation of Crystal Structure using Wyckoff Transformer Model Algorithm 2 Wyckoff Transformer Training Algorithm Algorithm 3 Model Forward Pass
Open Source Code	Yes	https: //github.com/Symmetry Advantage/ Wyckoff Transformer
Open Datasets	Yes	We use MP-20 (Xie et al., 2021)... Additionally, we train and evaluate Wy Former on MPTS-52 (Baird et al., 2024)... Materials Project (Jain et al., 2013)... We also utilize the AFLOW database (Curtarolo et al., 2012)... on the 3DSC dataset (Sommer et al., 2023).
Dataset Splits	Yes	MP-20 (Xie et al., 2021), which contains almost all experimentally stable materials in Materials Project (Jain et al., 2013) with a maximum of 20 atoms per unit cell, within 0.08 e V/atom of the convex hull, and formation energy smaller than 2 e V/atom, 45 229 structures in total, split 60/20/20 into train, validation and test parts. ... We also utilize the AFLOW database (Curtarolo et al., 2012), which contains 4905 compounds spanning a diverse range of chemistries and crystal structures. We predict four properties: thermal conductivity, Debye temperature, bulk modulus, and shear modulus. The data are divided into training, validation, and test sets using a 60/20/20 split.
Hardware Specification	Yes	Our tests were done on a single NVIDIA RTX 6000 Ada, 24 CPU cores and MP-20 dataset. ... We conducted experiments on a machine with NVIDIA RTX 6000 Ada and 24 physical CPU cores. ... CHGNet: 112 GPU s / structure for MP-20 on NVIDIA A40 ... Crystal Former; Cao et al. (2024): It takes 520 seconds to generate a batch size 13,000 crystal samples on a single A100 GPU
Software Dependencies	No	While VASP version 5.4.4 is mentioned, other key software components such as the machine learning framework (e.g., PyTorch, TensorFlow), Python version, or CUDA version are not specified with their version numbers, which is necessary for a reproducible description of ancillary software.
Experiment Setup	Yes	L. Hyperparameters L.1. Next token prediction MP-20 Element embedding size: 16 Site symmetry embedding size: 16 Site enumerations embedding size: 8 Number of fully-connected layers: 3 Number of attention heads: 4 Dimension of feed forward layers inside Encoder: 128 Dropout inside Encoder: 0.2 Number of Encoder layers: 3 Loss function: Cross Entropy, multi-class for element, single-class for other token parts, no averaging Batch size: 27136 (full MP-20 train) Optimizer: SGD Initial learning rate: 0.2 Scheduler: Reduce LROn Plateau Scheduler patience: 2 104 epochs Early stopping patience: 105 epochs of no improvement in validation loss clip grad norm: max norm=2 L.2. Energy prediction MP-20 ... L.3. Band gap prediction MP-20