reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Open Materials Generation with Stochastic Interpolants

Authors: Philipp Höllmer, Thomas Egg, Maya Martirossyan, Eric Fuemmeler, Zeren Shui, Amit Gupta, Pawan Prakash, Adrian Roitberg, Mingjie Liu, George Karypis, Mark Transtrum, Richard Hennig, Ellad B. Tadmor, Stefano Martiniani

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark OMat G s performance on two tasks: Crystal Structure Prediction (CSP) for specified compositions, and de novo generation (DNG) aimed at discovering stable, novel, and unique structures. In our ground-up implementation of OMat G, we refine and extend both CSP and DNG metrics compared to previous works. OMat G establishes a new state of the art in generative modeling for materials discovery, outperforming purely flow-based and diffusion-based implementations.
Researcher Affiliation	Academia	1New York University 2University of Minnesota 3University of Florida 4Brigham Young University. Correspondence to: Stefano Martiniani <EMAIL>.
Pseudocode	No	The paper describes methodologies in text and mathematical equations (e.g., Section 3, Section B), but it does not contain any explicitly labeled pseudocode blocks or algorithms formatted with structured steps.
Open Source Code	Yes	The OMat G code is available at https: //github.com/FERMat-ML/OMat G.
Open Datasets	Yes	We use the following datasets to benchmark the models: perov-5 (Castelli et al., 2012), a dataset of perovskites with 18 928 samples with five atoms per unit cell in which only lattice lengths and atomic types change; MP-20 (Jain et al., 2013; Xie et al., 2022) from the Materials Project that contains 45 231 structures with a maximum of N = 20 atoms per unit cell, and MPTS-52 (Baird et al., 2024) which is a chronological data split of the Materials Project with 40 476 structures with up to N = 52 atoms per unit cell and is typically the most difficult to learn. We use the same 60-20-20 splits as Xie et al. (2022); Jiao et al. (2023); Miller et al. (2024). Additionally, we consider the Alex-MP-20 dataset (Zeni et al., 2025), where we used an 80-10-10 split constructed from Matter Gen's 90-10 split, in which we removed 10% of the training data to create a test dataset. This dataset contains 675 204 structures with 20 or fewer atoms per unit cell from the Alexandria (Schmidt et al., 2022a;b) and MP-20 datasets.
Dataset Splits	Yes	We use the same 60-20-20 splits as Xie et al. (2022); Jiao et al. (2023); Miller et al. (2024). Additionally, we consider the Alex-MP-20 dataset (Zeni et al., 2025), where we used an 80-10-10 split constructed from Matter Gen's 90-10 split, in which we removed 10% of the training data to create a test dataset. This dataset contains 675 204 structures with 20 or fewer atoms per unit cell from the Alexandria (Schmidt et al., 2022a;b) and MP-20 datasets.
Hardware Specification	Yes	For these experiments, we use an Nvidia RTX8000 GPU with a batch size of 512 and 1000 integration steps.
Software Dependencies	Yes	The machine-learned interatomic potential Matter Sim (Yang et al., 2024) is utilized for initial structural relaxation, which is subsequently followed by a more computationally expensive DFT relaxation. Full workflow details are given in Appendix D.4. ... These structures were then filtered to remove any that contained elements not supported by the Matter Sim potential (version Matter Sim-v1.0.0-1M) (Yang et al., 2024) or the reference convex hull. ... All DFT relaxations utilized MPGGADouble Relax Static flows from the Atomate2 (Ganose et al., 2025) package to produce MP20-compatible data.
Experiment Setup	Yes	For every choice of the positional interpolant, sampling scheme, and latent variable γ, an independent hyperparameter optimization was performed using the Ray Tune package (Liaw et al., 2018) in conjunction with the Hyper Opt Python library (Bergstra et al., 2013) for Bayesian optimization. The tuned hyperparameters include both those relevant during training the relative loss weights λ, the choice of stochastic interpolant for the lattice vectors, the parameters for chosen γ(t) (if necessary), the sampling scheme, the usage of data-dependent coupling, the batch size, and the learning rate and during inference the number of integration steps, the choice of the noises ε(t) and η, and the magnitude of the velocity annealing parameter s for both lattice vectors and atomic coordinates.