reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Temperature-Annealed Boltzmann Generators

Authors: Henrik Schopmans, Pascal Friederich

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply this methodology to three molecular systems of increasing complexity and, compared to the baseline, achieve better results in almost all metrics while requiring up to three times fewer target energy evaluations. For the largest system, our approach is the only method that accurately resolves the metastable states of the system.
Researcher Affiliation	Academia	1Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany 2Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany. Correspondence to: Pascal Friederich <EMAIL>.
Pseudocode	No	The paper describes the methodology in prose and numbered steps within the text (e.g., 'In one annealing iteration, we perform the following steps: 1. Sample a dataset...', 'Our approach to learn the Boltzmann distribution... can be separated into two phases'), but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	The source code to reproduce our experiments can be found at https://github.com/aimat-lab/TA-BG.
Open Datasets	Yes	Furthermore, the ground truth datasets from MD simulations are provided at https://doi.org/10.5281/zenodo.15526429.
Dataset Splits	Yes	The ground truth test datasets at 300 K contain 1 107 samples, the datasets used for the forward KLD experiments contain 1 106 samples. The additional validation datasets at 300 K and 1200 K contain 1 106 samples.
Hardware Specification	Yes	To determine the wall times, we ran 4 experiments in parallel on a compute node with 2 AMD EPYC Rome 7402 CPU (24 cores each) and 4 NVIDIA A100 GPU.
Software Dependencies	Yes	All ground truth simulations have been performed with Open MM 8.0.0 (Eastman et al., 2024)... To implement the normalizing flow models and the internal coordinate representations, we used the bgflow (No e, 2024) and nflows (Conor Durkan et al., 2020) libraries with Py Torch (Paszke et al., 2019).
Experiment Setup	Yes	Table 5. Hyperparameters of the TA-BG experiments, annealing from 1200 K to 300 K. The cosine annealing learning rate scheduler is applied within each annealing iteration, so the learning rate resets in the beginning of each new annealing iteration. Annealing iteration here also refers to the fine-tuning iterations with Ti+1 = Ti. DIPEPTIDE TETRAPEPTIDE HEXAPEPTIDE GRADIENT DESCENT STEPS PER ANNEALING ITERATION 30 000 20 000 20 000 LEARNING RATE 5 10 6 1 10 5 5 10 6 BATCH SIZE 2048 4096 2048 LR SCHEDULER COSINE ANNEALING BUFFER SAMPLES DRAWN PER ANNEALING ITERATION 5 106 5 106 1 107 BUFFER RESAMPLED TO 2 106 2 106 2 106