Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

Authors: Dongze Wu, Yao Xie

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical experiments comparing Annealing Flow (AF) with the following methods... We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. ... The number of time steps for Annealing Flow (AF) is set per distribution as follows: (1) GMMs: 12 steps... (2) Truncated Normal: 8 steps... (3) Funnel: 8 steps... (4) Exp-Weighted Gaussian: 20 steps... (5) Bayesian Logistic Regression: 6 steps... Figures 3 and 4... Tables 1, 4, and 5... ablation studies are thoroughly presented in Section 6 and Appendix D.
Researcher Affiliation Academia 1H. Milton Stewart School of Industrial and Systems Engineering (ISy E), Georgia Institute of Technology, USA. Correspondence to: Yao Xie <EMAIL>.
Pseudocode Yes Algorithm 1 Block-wise Training of Annealing Flow Net Algorithm 2 Metropolis-Hastings Algorithm Algorithm 3 Parallel Tempering Algorithm
Open Source Code Yes Our code is publicly available at https://github.com/ Stat Fusion/Annealing-Flow-For-Sampling.
Open Datasets Yes Bayesian logistic regression: We use the same Bayesian logistic regression setting as in Liu & Wang (2016), where a hierarchical structure is assigned to the model parameters. ... The datasets used are binary, where xi has a varying number of features, and yi {+1, 1} across different datasets. We adopt the same Bayesian logistic regression setting as described in Liu & Wang (2016), where a hierarchical structure is assigned to the model parameters. ... The datasets used are binary... across a range of datasets provided by LIBSVM.
Dataset Splits No During testing, we use all algorithms to sample 1,000 particles of β and α jointly, and use {β(i)}1000 i=1 to construct 1,000 classifiers. The mean accuracy and standard deviation are then reported in Table 3. Additionally, the average log posterior in Table 3 is reported as: x,y Dtest log 1 θ C p(y|x, θ). While a 'Dtest' is mentioned, no specific details about the splitting methodology (percentages, counts, or reference to standard splits) are provided for the datasets used in the Bayesian logistic regression.
Hardware Specification Yes Table 9 presents the training and sampling times for AF, CRAFT, LFIS, and PGPS in experiments on a 50D Exp-Weighted Gaussian distribution, conducted on a V100 GPU.
Software Dependencies No The Adam optimizer is used with a learning rate of 0.0001. The Wasserstein distance is then computed using the optimal transport plan via the linear sum assignment method (from scipy.optimize package). Specific version numbers for Adam, scipy, or any other software packages are not provided.
Experiment Setup Yes The neural network structure in our experiments is consistently set with hidden layers of size 32-32. ... We sample 100,000 data points from N(0, Id) for training, with a batch size of 1,000. The Adam optimizer is used with a learning rate of 0.0001, and the maximum number of iterations for each block vk is set to 1,000. ... The number of time steps for Annealing Flow (AF) is set per distribution as follows: (1) GMMs: 12 steps... (2) Truncated Normal: 8 steps... (3) Funnel: 8 steps... (4) Exp-Weighted Gaussian: 20 steps... (5) Bayesian Logistic Regression: 6 steps. The choice of α In the experiments on Gaussian Mixture Models (GMMs), funnel distributions, truncated normal, and Bayesian Logistic Regressions, α is uniformly set to [ 8 3, ]. In the experiments on Exp-Weighted Gaussian, α is set to [ 20 3, 1, 1, 1, 1, 1, 1, 1, 1].