reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Authors: Jannis Chemseddine, Christian Wald, Richard Duong, Gabriele Steidl

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task. 4 EXPERIMENTS In this section we apply the different approaches to common sampling problems. We compare the performance of using the linear, learned or gradient flow interpolation. Table 1: Comparison of effective sample size (ESS), negative log likelihood (NLL), and the energy distance for different interpolations. Table 2: Comparison of effective sample size (ESS), negative log likelihood (NLL), and energy distance for different interpolations, evaluated for the 8-dimensional and 16-dimensional experiments with m = 4.
Researcher Affiliation	Academia	Jannis Chemseddine, Christian Wald, Richard Duong & Gabriele Steidl Institute of Mathematics TU Berlin Straße des 17. Juni 136 Berlin, Germany EMAIL
Pseudocode	Yes	C ALGORITHMS Algorithm 1 Learning vθ t , Cθ t in (17) for xi sampled from trajectories. Algorithm 2 Learning f θ1 t , vθ2 t , Cθ3 t in (18) for xi sampled from trajectories. Algorithm 3 Learning ψθ1 t , Cθ2 t in (19) as done in Section 4.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a direct link to a code repository. It mentions using PyTorch and torchdiffeq, but these are third-party tools, not the authors' own implementation code for the described methodology.
Open Datasets	No	The paper describes generating synthetic datasets for its experiments, such as a "mixture of 40 evenly weighted Gaussians in 2 dimensions" and an "8 and 16-dimensional many well distribution." No public dataset names, direct access links, DOIs, or formal citations for public datasets are provided.
Dataset Splits	No	The paper deals with sampling tasks and generating samples from distributions, rather than using pre-existing datasets with traditional training/validation/test splits. It describes how samples are drawn or generated for the learning process (e.g., "sampling from uniform domains", "sampling along the trajectory", "sample 4096 particles at random uniform time points"), but these are not dataset splits in the conventional sense for fixed datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It mentions adjusting "the number of iterations such that all methods ran approximately the same time on the same hardware," but no hardware specifications are given.
Software Dependencies	No	The paper mentions several software components, including "Pytorch Paszke et al. (2019)", "torchdiffeq Chen (2018) package", and "Geom Loss library Feydy et al. (2019)". However, it does not specify version numbers for any of these libraries (e.g., PyTorch 1.9, torchdiffeq 0.2).
Experiment Setup	Yes	For the linear and learned interpolation we use 50 time steps along which the loss is computed and the gradients are accumulated with a batch size of 256. For the gradient flow interpolation we sample 4096 particles at random uniform time points and therefore do not accumulate gradients. We use a linear time schedule β(t) := βmin + t (βmax βmin) and the associated SDE ... As done in Song et al. (2021) we choose βmin = 0.1 and βmax = 20. The target distribution consists of a mixture of 40 evenly weighted Gaussians in 2 dimensions. The means are distributed uniformly over [-40, 40]^2. We evaluate the methods by generating 5 * 10^4 samples and log weight and computing the effective sample size, negative log likelihood and energy distance. We report mean and standard deviation over 10 evaluation runs.