reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Physics-Informed Weakly Supervised Learning For Interatomic Potentials

Authors: Makoto Takamoto, Viktor Zaverkin, Mathias Niepert

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We evaluate our method through extensive experiments designed to address the following objectives: (1) compare PIWSL with established baselines, (2) analyze the effect of PIWSL using the aspirin molecule, including molecular dynamics (MD) simulations, and (3) assess PIWSL s ability to enhance foundation model finetuning on sparse datasets, particularly for energy and force prediction tasks where force labels are unavailable.
Researcher Affiliation	Collaboration	1NEC Laboratories Europe, Heidelberg, Germany 2University of Stuttgart, Stuttgart, Germany. Correspondence to: Makoto Takamoto <EMAIL>.
Pseudocode	No	The paper describes methods using narrative text and mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and scripts to reproduce the experiments are available at https://github. com/nec-research/PICPS-ML4Sci.
Open Datasets	Yes	To evaluate the effect and dependency of the physicsinformed weakly supervised approach in detail, we performed the training on various datasets: ANI-1x as a heterogeneous molecular dataset (Smith et al., 2020), Ti O2 as a dataset for inorganic materials (Artrith & Urban, 2016)4, the revised MD17 (r MD17) dataset containing small molecules with sampled configurational spaces for each (Chmiela et al., 2017; 2018; Christensen & von Lilienfeld, 2020), the MD22 dataset containing larger molecules (Chmiela et al., 2023), and LMNTO as another material dataset (Cooper et al., 2020); the benchmark results for r MD17, MD22, and LMNTO are provided in section D.1. The detailed description of each dataset is provided in section B.3.
Dataset Splits	Yes	We split the original datasets into training, validation, and test sets for our experiments. We shuffled the original datasets using a random seed and selected the training datasets of predefined sizes. For validation, we selected the same number of configurations as in the training dataset if it exceeded 100 configurations; otherwise, we used 100 configurations to ensure sufficient validation size. For the r MD17 dataset, following (Fu et al., 2023), we used 9000 configurations as a validation dataset and another 10,000 for testing. We used the same test dataset across different sizes of the training datasets for a fair performance comparison. We used 10,000 test configurations for ANI-1x and 1000 for Ti O2 and LMNTO.
Hardware Specification	Yes	All experiments are performed on a single NVIDIA A100 GPU with 81.92 GB memory.
Software Dependencies	No	The code used to run our experiments builds upon the recent work (Fu et al., 2023) and extends it to integrate the latest Open Catalyst Project code (Chanussot et al., 2021). ... These hyper-parameters are tuned using Optuna (Akiba et al., 2019) for Pai NN and Equiformer v2. No specific version numbers are provided for any software dependency.
Experiment Setup	Yes	For potential energy and force prediction, we utilize mean-absolute error (MAE) and L2-norm (L2MAE) losses with coefficients of 1 and 100, respectively. More details on the model hyperparameters are provided in our repository. For the PITC and PISC loss functions, we use the mean square error (MSE) loss based on an experiment in section D.5. ... Training Details. For training MLIPs, we followed the setup in the Open Catalyst Project. We kept the mini-batch size consistent across all models, as shown in Table A1. ... To avoid overfitting, we stopped training when the validation loss stopped improving the specific number of training iterations is provided in Table A2. The remaining hyper-parameters are the coefficients for the PITC and PISC losses (CPITC, CPISC) and the maximum magnitude ϵmax of the perturbation vector δr; see Table A3.