Periodic Materials Generation using Text-Guided Joint Diffusion Model

Authors: KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments using popular datasets on benchmark tasks reveal that TGDMat outperforms existing baseline methods by a good margin. Notably, for the structure prediction task, with just one generated sample, TGDMat outperforms all baseline models, highlighting the importance of text-guided diffusion. Further, in the generation task, TGDMat surpasses all baselines and their text-fusion variants, showcasing the effectiveness of the joint diffusion paradigm.
Researcher Affiliation Academia 1 Indian Institute of Technology, Kharagpur, India 2 Indo Korea Science and Technology Center, Bangalore, India
Pseudocode Yes Algorithm 1 Training Algorithm Algorithm 2 Sampling Algorithm
Open Source Code Yes Code is available at https://github.com/kdmsit/TGDMat
Open Datasets Yes Perov-5 (Castelli et al., 2012a;b), Carbon-24 (Pickard., 2020) and MP-20 (Jain et al., 2013b)
Dataset Splits Yes While training TGDMat, we split the datasets into the train, test, and validation sets following the convention of 60:20:20 (Xie et al., 2021).
Hardware Specification Yes We perform all the experiments in the Tesla P100-PCIE-16GB GPU server.
Software Dependencies No The paper mentions tools and models like Mat Sci BERT and Pymatgen but does not provide specific version numbers for general software dependencies or libraries used in the implementation.
Experiment Setup Yes In our TGDMat model, we adopted 4 layers CSPNet as message passing layer with hidden dimension set as 512. Further, we use pre-trained Mat Sci BERT (Gupta et al., 2022) followed by a two-layer projection layer (projection dimension 64) as the text encoder module. We keep the dimension of time embedding at each diffusion timestep as 64. We train it for 500 epochs using the same optimizer, and learning rate scheduler as Diff CSP and keep the batch size as 512.