reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

Authors: Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, Zachary Ward Ulissi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on MP20, QM9 and GEOMDRUGS datasets demonstrate that jointly trained ADi T generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models.
Researcher Affiliation	Industry	1Fundamental AI Research (FAIR) at Meta 2University of Cambridge 3MIT.
Pseudocode	Yes	Algorithm 1: Pseudocode for VAE encoder E
Open Source Code	Yes	Open source code: https://github.com/facebookresearch/allatom-diffusion-transformer
Open Datasets	Yes	For our main experiments, we train models on periodic crystals from MP20 and non-periodic molecules from QM9. MP20 (Xie et al., 2022) contains 45,231 metastable crystal structures from the Materials Project (Jain et al., 2013)... QM9 (Wu et al., 2018) consists of 130,000 stable small organic molecules...
Dataset Splits	Yes	We split the data following prior work (Xie et al., 2022; Hoogeboom et al., 2022) to ensure fair comparisons.
Hardware Specification	Yes	Both models are trained to convergence for at most 5000 epochs up to 3 days on 8 V100 GPUs.
Software Dependencies	No	The paper mentions software like RDKit, Py Mat Gen, CHGnet, and MOFChecker but does not provide specific version numbers for any of them. For example, it mentions 'constructing the molecule via RDKit' and 'build the crystal structure using Py Mat Gen' but without specific version numbers.
Experiment Setup	Yes	We sequentially train the first-stage VAE and then the second-stage Di T using Adam W optimizer with a constant learning rate 1e 4, no weight decay, and batch size of 256. We use exponential moving average (EMA) of Di T weights over training with a decay of 0.9999.