Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Authors: Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-chul Moon
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence demonstrates the effectiveness of the intended design in DBAE, which notably enhances downstream inference quality, reconstruction, and disentanglement. Additionally, DBAE generates high-fidelity samples in an unconditional generation. 5 EXPERIMENT This section empirically validates the effectiveness of the intended design of the proposed model, DBAE. |
| Researcher Affiliation | Collaboration | Yeongmin Kim1, Kwanghyeon Lee1, Minsang Park1, Byeonghu Na1, Il-Chul Moon1,2 1Korea Advanced Institute of Science and Technology (KAIST), 2summary.ai |
| Pseudocode | Yes | Algorithm 1: DBAE Training Algorithm for Reconstruction Algorithm 2: Reconstruction Algorithm 3: Latent DPM Training Algorithm Algorithm 4: Unconditional Generation Algorithm |
| Open Source Code | Yes | Our code is available at https://github.com/aailab-kaist/DBAE. |
| Open Datasets | Yes | We evaluate Encϕ(x0) trained on Celeb A (Liu et al., 2015) and FFHQ (Karras et al., 2019). We train a linear classifier on 1) Celeb A with 40 binary labels, measuring accuracy as AP, and 2) LFW (Kumar et al., 2009) for attribute regression...We trained DBAE on FFHQ and evaluated it on Celeb A-HQ (Karras et al., 2018). Figure 6 shows the interpolation results on the LSUN Horse, Bedroom (Yu et al., 2015) and FFHQ datasets. |
| Dataset Splits | Yes | We train a linear classifier with parameters (w, b) using data-attribute pairs (x0, y). We examine the Celeb A test dataset. Table 2 reports the averaged reconstruction error over the test dataset Eptest(x0)[d(x0, ˆx0)]. We randomly selected 1000 samples from the Celeb A training, validation, and test sets to perform the measurement following (Yeats et al., 2022). For Table 4 we measure FID between 50k random samples from the FFHQ dataset and 50k randomly generated samples. |
| Hardware Specification | Yes | Table 7: Computational cost comparison for FFHQ128. Training time is measured in milliseconds per image per NVIDIA A100 (ms/img/A100), and testing time is reported in milliseconds per one sampling step per NVIDIA A100 (ms/one sampling step/A100). Table 15: Regenerated results of Table 2 across multiple hardwares. Hardware SSIM ( ) LPIPS ( ) MSE ( ) Nvidia A100 0.953 0.072 2.49e-3 Intel Gaudi v2 0.956 0.073 2.47e-3 We conducted evaluations across various infrastructures to assess experimental reproducibility. The performance of the trained model (DBAE-d) was evaluated on both the Nvidia A100 and Intel Gaudi v2 chips. |
| Software Dependencies | No | Optimizer RAdam Optimizer Adam W (weight decay = 0.01) The paper mentions specific optimizers, but does not provide version numbers for any key software components or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | C.1 TRAINING CONFIGURATION Optimization We follow the optimization argument from DDBM (Zhou et al., 2024) with Variance Preserving (VP) SDE. We utilize the preconditioning and time-weighting proposed in DDBM, with the pred-x parameterization (Karras et al., 2022). Table 5 shows the remaining optimization hyperparameters. Table 5: Network architecture and training configuration of DBAE. Parameter Celeb A 64 FFHQ 128 Horse 128 Bedroom 128 Base channels 64 128 128 128 Channel multipliers [1,2,4,8] [1,1,2,3,4] [1,1,2,3,4] [1,1,2,3,4] Attention resolution [16] [16] [16] [16] Encoder base ch 64 128 128 128 Enc. attn. resolution [16] [16] [16] [16] Encoder ch. mult. [1,2,4,8,8] [1,1,2,3,4,4] [1,1,2,3,4,4] [1,1,2,3,4,4] latent variable z dimension 32, 256, 512 512 512 512 Vanilla forward SDE VP VP VP VP Images trained 72M, 130M 130M 130M 130M Batch size 128 128 128 128 Learning rate 1e-4 1e-4 1e-4 1e-4 Optimizer RAdam RAdam RAdam RAdam Weight decay 0.0 0.0 0.0 0.0 EMA rate 0.9999 0.9999 0.9999 0.9999 |