Scaling Probabilistic Circuits via Monarch Matrices

Authors: Honghua Zhang, Meihua Dang, Benjie Wang, Stefano Ermon, Nanyun Peng, Guy Van Den Broeck

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical evaluation, we demonstrate that by replacing dense matrices in PCs with structured Monarch matrices, we are able to scale PCs to orders-of-magnitude larger hidden sizes and, among a variety of tractable generative models, we are able to achieve state-of-the-art density estimation performance on various benchmarks, including Image Net32/64 (Deng et al., 2009) for image modeling and Text8 (Mahoney, 2011) and LM1B (Chelba et al., 2013) for language modeling.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles 2Department of Computer Science, Stanford University. Correspondence to: Honghua Zhang <EMAIL>.
Pseudocode No The paper describes methods textually and with mathematical equations but does not contain a clearly labeled pseudocode or algorithm block, nor any structured steps formatted like code.
Open Source Code Yes Code available at https://github.com/wangben88/MonarchCircuits
Open Datasets Yes Text8 (Mahoney, 2011) is a character-level language modeling dataset... LM1B (Chelba et al., 2013)... Image Net32 and Image Net64 datasets... (Deng et al., 2009)
Dataset Splits Yes Dataset Text8 (Mahoney, 2011) is a character-level language modeling dataset... We follow the standard practice of training and evaluating text8 in chunks of length 256 without preprocessing (Hoogeboom et al., 2021). We report log-likelihood results in bits-per-character (BPC) in Table 1. The results show that Monarch HMM outperforms all PC models and significantly narrows the gap between PC models and other less tractable generative models. We evaluate our method using generative modeling benchmarks for both text and image data. We use log-likelihoods as a measurement of a model s performance and the number of floating point operations (FLOPs) per dimension as a measurement of a model s efficiency.
Hardware Specification No The paper mentions "GPU Mem" in Table 3 for memory consumption analysis and acknowledges resources like "TAMU ACES" (a computing cluster), but it does not specify any exact GPU or CPU models, processor types, or detailed computer specifications used for running experiments.
Software Dependencies No The paper refers to the "Juice" library and "Py Juice PC library" but does not provide specific version numbers for these or any other software components.
Experiment Setup Yes Appendix B.1. Additional experiment setup for text8: We train Monarch-HMM using two-layer Monarch matrices and a 219 hidden state for 20 epochs. Following prior works (Zhang et al., 2024), optimization is performed using the stochastic EM algorithm with a mini-batch of 4096 and a linearly decaying learning rate from 1 to 0. Appendix B.3. Additional experimental setup for image modeling: We train all models using stochastic EM. In particular, we use a cosine learning rate decay scheduler over the course of the entire training. The mini-batch size M = 20000 and number of epochs E = 20 used in all experiments were chosen based on an initial hyperparameter search in the range M [1000, 5000, 20000, 60000] and E [5, 10, 20].