Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

Authors: Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-chul Moon

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence demonstrates the effectiveness of the intended design in DBAE, which notably enhances downstream inference quality, reconstruction, and disentanglement. Additionally, DBAE generates high-fidelity samples in an unconditional generation. 5 EXPERIMENT This section empirically validates the effectiveness of the intended design of the proposed model, DBAE.
Researcher Affiliation Collaboration Yeongmin Kim1, Kwanghyeon Lee1, Minsang Park1, Byeonghu Na1, Il-Chul Moon1,2 1Korea Advanced Institute of Science and Technology (KAIST), 2summary.ai
Pseudocode Yes Algorithm 1: DBAE Training Algorithm for Reconstruction Algorithm 2: Reconstruction Algorithm 3: Latent DPM Training Algorithm Algorithm 4: Unconditional Generation Algorithm
Open Source Code Yes Our code is available at https://github.com/aailab-kaist/DBAE.
Open Datasets Yes We evaluate Encϕ(x0) trained on Celeb A (Liu et al., 2015) and FFHQ (Karras et al., 2019). We train a linear classifier on 1) Celeb A with 40 binary labels, measuring accuracy as AP, and 2) LFW (Kumar et al., 2009) for attribute regression...We trained DBAE on FFHQ and evaluated it on Celeb A-HQ (Karras et al., 2018). Figure 6 shows the interpolation results on the LSUN Horse, Bedroom (Yu et al., 2015) and FFHQ datasets.
Dataset Splits Yes We train a linear classifier with parameters (w, b) using data-attribute pairs (x0, y). We examine the Celeb A test dataset. Table 2 reports the averaged reconstruction error over the test dataset Eptest(x0)[d(x0, ˆx0)]. We randomly selected 1000 samples from the Celeb A training, validation, and test sets to perform the measurement following (Yeats et al., 2022). For Table 4 we measure FID between 50k random samples from the FFHQ dataset and 50k randomly generated samples.
Hardware Specification Yes Table 7: Computational cost comparison for FFHQ128. Training time is measured in milliseconds per image per NVIDIA A100 (ms/img/A100), and testing time is reported in milliseconds per one sampling step per NVIDIA A100 (ms/one sampling step/A100). Table 15: Regenerated results of Table 2 across multiple hardwares. Hardware SSIM ( ) LPIPS ( ) MSE ( ) Nvidia A100 0.953 0.072 2.49e-3 Intel Gaudi v2 0.956 0.073 2.47e-3 We conducted evaluations across various infrastructures to assess experimental reproducibility. The performance of the trained model (DBAE-d) was evaluated on both the Nvidia A100 and Intel Gaudi v2 chips.
Software Dependencies No Optimizer RAdam Optimizer Adam W (weight decay = 0.01) The paper mentions specific optimizers, but does not provide version numbers for any key software components or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes C.1 TRAINING CONFIGURATION Optimization We follow the optimization argument from DDBM (Zhou et al., 2024) with Variance Preserving (VP) SDE. We utilize the preconditioning and time-weighting proposed in DDBM, with the pred-x parameterization (Karras et al., 2022). Table 5 shows the remaining optimization hyperparameters. Table 5: Network architecture and training configuration of DBAE. Parameter Celeb A 64 FFHQ 128 Horse 128 Bedroom 128 Base channels 64 128 128 128 Channel multipliers [1,2,4,8] [1,1,2,3,4] [1,1,2,3,4] [1,1,2,3,4] Attention resolution [16] [16] [16] [16] Encoder base ch 64 128 128 128 Enc. attn. resolution [16] [16] [16] [16] Encoder ch. mult. [1,2,4,8,8] [1,1,2,3,4,4] [1,1,2,3,4,4] [1,1,2,3,4,4] latent variable z dimension 32, 256, 512 512 512 512 Vanilla forward SDE VP VP VP VP Images trained 72M, 130M 130M 130M 130M Batch size 128 128 128 128 Learning rate 1e-4 1e-4 1e-4 1e-4 Optimizer RAdam RAdam RAdam RAdam Weight decay 0.0 0.0 0.0 0.0 EMA rate 0.9999 0.9999 0.9999 0.9999