Discrete Copula Diffusion

Authors: Anji Liu, Oliver Broadrick, Mathias Niepert, Guy Van den Broeck

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate the proposed method, Discrete Copula Diffusion (DCD), on language modeling tasks (Sec. 6.1 and 6.2) and antibody sequence infilling tasks (Sec. 6.3). For all tasks, we evaluate whether DCD can effectively reduce the number of diffusion steps while maintaining strong performance. Specifically, since DCD combines two pretrained models: a discrete diffusion model and an autoregressive copula model, we examine whether DCD outperforms each individual model. Figure 3: Generative perplexity ( ) with different numbers of denoising steps. Table 1: Evaluation of text infilling performance using the MAUVE score ( ) with 5 prompt masks.
Researcher Affiliation Academia Anji Liu1,2, Oliver Broadrick1, Mathias Niepert2, Guy Van den Broeck1 1Department of Computer Science, University of California, Los Angeles, USA 2Institute for Artificial Intelligence, University of Stuttgart, Germany EMAIL EMAIL
Pseudocode Yes Algorithm 1 Draw samples from a discrete diffusion model with the help of a copula model. Algorithm 2 DCD with Autoregressive Copula Models and Using Autoregressive Sampling
Open Source Code Yes Code is available at https://github.com/liuanji/Copula-Diffusion.
Open Datasets Yes We first compare the quality of unconditional samples generated by models trained on either Web Text (Radford et al., 2019) or Open Web Text (Gokaslan & Cohen, 2019), which contain web content extracted from URLs shared on Reddit with a minimum number of upvotes. ... We use the same set of 2,000 text sequences from the validation set of Wiki Text103 (Merity et al., 2022). ... We adopt NOC-D (Gruver et al., 2023), which is a discrete diffusion model trained on 104K antibody sequences from the Observed Antibody Space dataset (Ruffolo et al., 2023).
Dataset Splits Yes For all methods, we use the same set of 2,000 text sequences from the validation set of Wiki Text103 (Merity et al., 2022). After applying the prompt mask, we generate 5 samples for each prompt, resulting in a total number of 10,000 samples.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using specific models like SEDD, GPT-2-small, and NOC-D but does not provide specific version numbers for software libraries, programming languages, or other dependencies that would be required for replication.
Experiment Setup Yes We adopt the log-linear noise schedule suggested by the SEDD paper. See Appendix G.1 for more details. ... The model is trained with 50 epochs using the default settings (e.g., learning rate and its schedule). ... The GPT model has 6 layers, an embedding size of 512, and 16 attention heads. The model is trained for 10 epochs with the default settings in the nano GPT repository. ... we set β =0.1 for this task.