reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling Probabilistic Circuits via Monarch Matrices

Authors: Honghua Zhang, Meihua Dang, Benjie Wang, Stefano Ermon, Nanyun Peng, Guy Van Den Broeck

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical evaluation, we demonstrate that by replacing dense matrices in PCs with structured Monarch matrices, we are able to scale PCs to orders-of-magnitude larger hidden sizes and, among a variety of tractable generative models, we are able to achieve state-of-the-art density estimation performance on various benchmarks, including Image Net32/64 (Deng et al., 2009) for image modeling and Text8 (Mahoney, 2011) and LM1B (Chelba et al., 2013) for language modeling.
Researcher Affiliation	Academia	1Department of Computer Science, University of California, Los Angeles 2Department of Computer Science, Stanford University. Correspondence to: Honghua Zhang <EMAIL>.
Pseudocode	No	The paper describes methods textually and with mathematical equations but does not contain a clearly labeled pseudocode or algorithm block, nor any structured steps formatted like code.
Open Source Code	Yes	Code available at https://github.com/wangben88/MonarchCircuits
Open Datasets	Yes	Text8 (Mahoney, 2011) is a character-level language modeling dataset... LM1B (Chelba et al., 2013)... Image Net32 and Image Net64 datasets... (Deng et al., 2009)
Dataset Splits	Yes	Dataset Text8 (Mahoney, 2011) is a character-level language modeling dataset... We follow the standard practice of training and evaluating text8 in chunks of length 256 without preprocessing (Hoogeboom et al., 2021). We report log-likelihood results in bits-per-character (BPC) in Table 1. The results show that Monarch HMM outperforms all PC models and significantly narrows the gap between PC models and other less tractable generative models. We evaluate our method using generative modeling benchmarks for both text and image data. We use log-likelihoods as a measurement of a model s performance and the number of floating point operations (FLOPs) per dimension as a measurement of a model s efficiency.
Hardware Specification	No	The paper mentions "GPU Mem" in Table 3 for memory consumption analysis and acknowledges resources like "TAMU ACES" (a computing cluster), but it does not specify any exact GPU or CPU models, processor types, or detailed computer specifications used for running experiments.
Software Dependencies	No	The paper refers to the "Juice" library and "Py Juice PC library" but does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	Appendix B.1. Additional experiment setup for text8: We train Monarch-HMM using two-layer Monarch matrices and a 219 hidden state for 20 epochs. Following prior works (Zhang et al., 2024), optimization is performed using the stochastic EM algorithm with a mini-batch of 4096 and a linearly decaying learning rate from 1 to 0. Appendix B.3. Additional experimental setup for image modeling: We train all models using stochastic EM. In particular, we use a cosine learning rate decay scheduler over the course of the entire training. The mini-batch size M = 20000 and number of epochs E = 20 used in all experiments were chosen based on an initial hyperparameter search in the range M [1000, 5000, 20000, 60000] and E [5, 10, 20].