Discrete Distribution Networks

Authors: Lei Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ.
Researcher Affiliation Industry Lei Yang Step Fun Megvii Technology EMAIL
Pseudocode Yes Algorithm 1 Split-and-Prune of one layer
Open Source Code Yes The code is available at https: //discrete-distribution-networks.github.io/
Open Datasets Yes We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. Figure 5: Random samples from DDN. Figures (d) and (e) showcase images that are conditionally generated by conditional DDN, with each row of images representing a distinct category. Figures 5a and 5b depict random generation results of DDN on Celeb A-HQ64x64 Karras et al. (2017) and FFHQ-64x64 Karras et al. (2019)
Dataset Splits No Table 3: Fine-tuning DDN latent as decision tree on MNIST. Constructing a decision tree based on the latent variables from the DDN and fine-tuning it on MNIST trainning set. We report the validation set accuracy of the decision tree after majority voting for class prediction with varying number of training samples: 128, 1,024, 10,000, and 50,000 (the full training set). Explanation: While the paper mentions using training, validation, and test sets for MNIST and other datasets (e.g., FFHQ-64x64 and Celeb A-HQ-64x64 as test/generalization), it does not provide specific details on how these splits were created (e.g., percentages, sample counts for each split, or specific methodology for creating custom splits) to ensure reproducibility of the exact splits used.
Hardware Specification Yes We trained our models on a server equipped with 8 RTX2080Ti GPUs
Software Dependencies No DDN is implemented on the foundation of the EDM Karras et al. (2022) codebase, with training parameters nearly identical to EDM. Explanation: The paper mentions using the EDM codebase but does not specify its version or the versions of other critical software components like Python, PyTorch, or CUDA.
Experiment Setup Yes We trained our models on a server equipped with 8 RTX2080Ti GPUs, setting the Chain Dropout probability to 0.05 by default. For the 64x64 resolution experiments, we utilized a DDN with 93M parameters, setting K = 512 and L = 128. In the CIFAR experiments, we employed a DDN with 74M parameters, setting K = 64 and L = 64. The MNIST experiments were conducted using a Recurrence Iteration Paradigm UNet model with 407K parameters, where K = 64 and L = 10.