reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Periodic Materials Generation using Text-Guided Joint Diffusion Model

Authors: KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments using popular datasets on benchmark tasks reveal that TGDMat outperforms existing baseline methods by a good margin. Notably, for the structure prediction task, with just one generated sample, TGDMat outperforms all baseline models, highlighting the importance of text-guided diffusion. Further, in the generation task, TGDMat surpasses all baselines and their text-fusion variants, showcasing the effectiveness of the joint diffusion paradigm.
Researcher Affiliation	Academia	1 Indian Institute of Technology, Kharagpur, India 2 Indo Korea Science and Technology Center, Bangalore, India
Pseudocode	Yes	Algorithm 1 Training Algorithm Algorithm 2 Sampling Algorithm
Open Source Code	Yes	Code is available at https://github.com/kdmsit/TGDMat
Open Datasets	Yes	Perov-5 (Castelli et al., 2012a;b), Carbon-24 (Pickard., 2020) and MP-20 (Jain et al., 2013b)
Dataset Splits	Yes	While training TGDMat, we split the datasets into the train, test, and validation sets following the convention of 60:20:20 (Xie et al., 2021).
Hardware Specification	Yes	We perform all the experiments in the Tesla P100-PCIE-16GB GPU server.
Software Dependencies	No	The paper mentions tools and models like Mat Sci BERT and Pymatgen but does not provide specific version numbers for general software dependencies or libraries used in the implementation.
Experiment Setup	Yes	In our TGDMat model, we adopted 4 layers CSPNet as message passing layer with hidden dimension set as 512. Further, we use pre-trained Mat Sci BERT (Gupta et al., 2022) followed by a two-layer projection layer (projection dimension 64) as the text encoder module. We keep the dimension of time embedding at each diffusion timestep as 64. We train it for 500 epochs using the same optimizer, and learning rate scheduler as Diff CSP and keep the batch size as 512.