Discrete Distribution Networks
Authors: Lei Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. |
| Researcher Affiliation | Industry | Lei Yang Step Fun Megvii Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 Split-and-Prune of one layer |
| Open Source Code | Yes | The code is available at https: //discrete-distribution-networks.github.io/ |
| Open Datasets | Yes | We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. Figure 5: Random samples from DDN. Figures (d) and (e) showcase images that are conditionally generated by conditional DDN, with each row of images representing a distinct category. Figures 5a and 5b depict random generation results of DDN on Celeb A-HQ64x64 Karras et al. (2017) and FFHQ-64x64 Karras et al. (2019) |
| Dataset Splits | No | Table 3: Fine-tuning DDN latent as decision tree on MNIST. Constructing a decision tree based on the latent variables from the DDN and fine-tuning it on MNIST trainning set. We report the validation set accuracy of the decision tree after majority voting for class prediction with varying number of training samples: 128, 1,024, 10,000, and 50,000 (the full training set). Explanation: While the paper mentions using training, validation, and test sets for MNIST and other datasets (e.g., FFHQ-64x64 and Celeb A-HQ-64x64 as test/generalization), it does not provide specific details on how these splits were created (e.g., percentages, sample counts for each split, or specific methodology for creating custom splits) to ensure reproducibility of the exact splits used. |
| Hardware Specification | Yes | We trained our models on a server equipped with 8 RTX2080Ti GPUs |
| Software Dependencies | No | DDN is implemented on the foundation of the EDM Karras et al. (2022) codebase, with training parameters nearly identical to EDM. Explanation: The paper mentions using the EDM codebase but does not specify its version or the versions of other critical software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We trained our models on a server equipped with 8 RTX2080Ti GPUs, setting the Chain Dropout probability to 0.05 by default. For the 64x64 resolution experiments, we utilized a DDN with 93M parameters, setting K = 512 and L = 128. In the CIFAR experiments, we employed a DDN with 74M parameters, setting K = 64 and L = 64. The MNIST experiments were conducted using a Recurrence Iteration Paradigm UNet model with 407K parameters, where K = 64 and L = 10. |