Normalizing Flows are Capable Generative Models
Authors: Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Ángel Bautista, Navdeep Jaitly, Joshua M. Susskind
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3. Experiments We perform our experiments on unconditional Image Net 64x64 (van den Oord et al., 2016b), as well as class conditional Image Net 64x64, Image Net 128x128 (Deng et al., 2009) and AFHQ 256x256 (Choi et al., 2020). |
| Researcher Affiliation | Industry | 1Apple. Correspondence to: Shuangfei Zhai <EMAIL>. |
| Pseudocode | No | The paper describes mathematical formulations and steps in prose, for example, Equation 3, Equation 4 and Equation 8, but does not contain a clearly labeled "Pseudocode" or "Algorithm" block, nor a structured procedure formatted like code. |
| Open Source Code | Yes | We make our code available at https://github.com/apple/mltarflow. |
| Open Datasets | Yes | We perform our experiments on unconditional Image Net 64x64 (van den Oord et al., 2016b), as well as class conditional Image Net 64x64, Image Net 128x128 (Deng et al., 2009) and AFHQ 256x256 (Choi et al., 2020). |
| Dataset Splits | Yes | We perform our experiments on unconditional Image Net 64x64 (van den Oord et al., 2016b), as well as class conditional Image Net 64x64, Image Net 128x128 (Deng et al., 2009) and AFHQ 256x256 (Choi et al., 2020). For each setting, we randomly generate 50K samples, and compare it with the statistics from the entire training set. |
| Hardware Specification | Yes | Our models are implemented with Py Torch, and our experiments are conducted on A100 GPUs. |
| Software Dependencies | No | Our models are implemented with Py Torch, and our experiments are conducted on A100 GPUs. We by default cast the model to bfloat16, which provides significant memory savings, with the exception of the likelihood task where we found that float32 is necessary to avoid numerical issues. This only mentions "Py Torch" without a specific version number. |
| Experiment Setup | Yes | All parameters are trained end-to-end with the Adam W optimizer with momentum (0.9, 0.95). We use a cosine learning rate schedule, where the learning rate is warmed up from 10^-6 to 10^-4 for one epoch, then decayed to 10^-6. We use a small weight decay of 10^-4 to stabilize training. We adopt a simple data preprocessing protocol, where we center crop images and linearly rescale the pixels to [-1, 1]. Table 7. Hyper parameters for the best performing model on each task. |