reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Underdamped Diffusion Bridges with Applications to Sampling

Authors: Denis Blessing, Julius Berner, Lorenz Richter, Gerhard Neumann

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive numerical experiments, we can demonstrate significantly improved performance of our method. Numerical integrators and ablation studies: We provide careful ablation studies of our improvements, including the benefits of the novel integrators for controlled diffusion bridges as well as end-to-end training of hyperparameters (Section 4). 4 NUMERICAL EXPERIMENTS In this section, we present a comparative analysis of underdamped approaches against their overdamped counterparts. Table 1: Results for benchmark problems of various dimensions d, averaged across four runs. Evaluation criteria include importance-weighted errors for estimating the log-normalizing constant, log Z, effective sample size ESS, Sinkhorn distance Wγ 2 , and a lower bound (LB) on log Z; see Appendix A.10.4 for details on the metrics.
Researcher Affiliation	Collaboration	1Karlsruhe Institute of Technology, 2NVIDIA, 3Zuse Institute Berlin, 4dida Datenschmiede Gmb H, 5FZI Research Center for Information Technology
Pseudocode	Yes	We refer to Algorithm 1 in the appendix for an overview of our method and to Appendix A.10 for further details. In Algorithm 1 we display the training process of our underdamped diffusion bridge sampler.
Open Source Code	Yes	The code is publicly available8. 8https://github.com/Denis Bless/Underdamped Diffusion Bridges
Open Datasets	Yes	Real-world benchmark problems. We consider seven real-world benchmark problems: Four Bayesian inference tasks, namely Credit (d = 25), Cancer (d = 31), Ionosphere (d = 35), and Sonar (d = 61). Additionally, we choose Seeds (d = 26) and Brownian (d = 32), where the goal is to perform inference over the parameters of a random effect regression model, and the time discretization of a Brownian motion, respectively. Lastly, we consider LGCP (d = 1600), a highdimensional Log Gaussian Cox process (Møller et al., 1998). Synthetic benchmark problems. We consider two synthetic benchmark problems in this work: The challenging Funnel distribution (d = 10) introduced by Neal (2003), whose shape resembles a funnel, where one part is tight and highly concentrated, while the other is spread out over a wide region. Moreover, we choose the Many Well (d = 50) target, a highly multi-modal distribution with 25 = 32 modes. ... (see also Wu et al. (2020)).
Dataset Splits	No	The paper refers to 'benchmark problems' and 'sampling tasks' and mentions an 'evaluation protocol' from prior work (Blessing et al., 2024), but it does not explicitly describe any training/test/validation dataset splits with percentages, sample counts, or specific methodologies for partitioning data. The context implies evaluating sampling methods on target distributions rather than traditional supervised learning splits.
Hardware Specification	No	The paper mentions support from 'bw HPC' and the 'Hore Ka supercomputer' in the acknowledgements, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. The text refers to general high-performance computing resources without specifications.
Software Dependencies	No	The paper states: 'All experiments are conducted using the Jax library (Bradbury et al., 2021).' and 'We use the Adam optimizer (Kingma & Ba, 2015)'. It also mentions 'the Jax ott library (Cuturi et al., 2022)'. While specific libraries are named, their corresponding version numbers are not explicitly provided, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	Our default experimental setup, unless specified otherwise, is as follows: We use a batch size of 2000 (halved if memory-constrained) and train for 140k gradient steps to ensure approximate convergence. We use the Adam optimizer (Kingma & Ba, 2015), gradient clipping with a value of 1, and a learning rate scheduler that starts at 5 10 3 and uses a cosine decay starting at 60k gradient steps. We utilized 128 discretization steps and the EM and OBABO schemes to integrate the overdamped and underdamped Langevin equations, respectively. The control functions uθ and vγ with parameters θ and γ, respectively, were parameterized as two-layer neural networks with 128 neurons. Unlike Zhang & Chen (2022), we did not include the score of the target density as part of the parameterized control functions uθ and vγ. Inspired by Nichol & Dhariwal (2021), we applied a cosine-square scheduler for the discretization step size, i.e., = a cos2 π 2 n N at step n, where a : [0, ) (0, ) is learned. Furthermore, we use preconditioning as explained in Appendix A.10.2. The diffusion matrix σ and the mass matrix M were parameterized as diagonal matrices, and we learned the parameters µ and Σ for the prior distribution pprior = N(µ, Σ), with Σ also set as a diagonal matrix. We enforced non-negativity of a and made σ, M, and Σ positive semidefinite via an element-wise softplus transformation.