reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models

Authors: Fengzhe Zhang, Laurence Illing Midgley, José Miguel Hernández-Lobato

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the DW-4, LJ-13, and alanine-dipeptide benchmarks, VT-DIS achieves effective sample sizes of approximately 80%, 35%, and 3.5%, respectively, while using only a fraction of the computational budget required by vanilla diffusion + IS or PF-ODE based IS. Our code is available at https://github.com/fz920/cov_tuned_diffusion.
Researcher Affiliation	Collaboration	Fengzhe Zhang EMAIL University of Cambridge Laurence I. Midgley EMAIL Ångström AI University of Cambridge José Miguel Hernández-Lobato EMAIL Ångström AI University of Cambridge
Pseudocode	Yes	Algorithm 1 Post-training Covariance Tuning (VT-DIS) Require: Score network sθ, initial covariances {Σϕ}, time grid {tn}N n=0, batch size M Ensure: Tuned covariances {Σϕ }
Open Source Code	Yes	Our code is available at https://github.com/fz920/cov_tuned_diffusion.
Open Datasets	Yes	DW-4 and LJ-13. We use the 107 configurations released by Klein et al. (2023). A subset of 105 configurations trains the score model and VT-DIS; the remainder forms the test set. Alanine dipeptide. We adopt the molecular dataset of Midgley et al. (2023), which provides 105 training samples and 106 test samples.
Dataset Splits	Yes	DW-4 and LJ-13. We use the 107 configurations released by Klein et al. (2023). A subset of 105 configurations trains the score model and VT-DIS; the remainder forms the test set. Alanine dipeptide. We adopt the molecular dataset of Midgley et al. (2023), which provides 105 training samples and 106 test samples.
Hardware Specification	Yes	All experiments were run on a single NVIDIA A100 GPU (80 GB); hence every GPU hour refers to wall-clock hours on that device.
Software Dependencies	No	The paper mentions software components like "Adam optimizer" and "cosine annealing schedule" but does not provide specific version numbers for these or other key software libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	GMM-2. ... Training runs for 100,000 iterations with a batch size of 1,024 for both d = 50 and d = 100. ... All models are trained for 100,000 iterations with a batch size of 512 using the Adam optimizer, a learning rate of 0.001, and a cosine annealing schedule with a minimum learning rate of 1e-6. ... VT-DIS Post-training ... Each run comprises 5,000 optimization steps with a batch size of 512 (256 for alanine dipeptide). All runs use the Adam optimizer with a learning rate of 0.01 and a cosine annealing schedule with a minimum learning rate of 1e-6.