Tractable Representation Learning with Probabilistic Circuits

Authors: Steven Braun, Sahil Sidheekh, Antonio Vergari, Martin Mundt, Sriraam Natarajan, Kristian Kersting

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders.
Researcher Affiliation Academia Steven Braun EMAIL Technische Universität Darmstadt, Germany Sahil Sidheekh EMAIL University of Texas at Dallas, United States Antonio Vergari EMAIL University of Edinburgh, United Kingdom Martin Mundt EMAIL Universität Bremen, Germany Sriraam Natarajan EMAIL University of Texas at Dallas, United States Kristian Kersting EMAIL Technische Universität Darmstadt, Germany
Pseudocode Yes Appendix Algorithm 1 outlines the process of data-free knowledge distillation from a VAE to an APC of similar capacity and the same decoder architecture. The procedure iteratively refines the APC to learn from the VAE without the need for original training data. Algorithm 1 APC Encoding Procedure. Algorithm 2 Data-Free Knowledge Distillation from VAE to APC. Algorithm 3 Sampling sum units with SIMPLE.
Open Source Code Yes Our implementation is available as open-source software at https: //github.com/ml-research/autoencoding-probabilistic-circuits.
Open Datasets Yes We evaluate our models on various image and tabular datasets. For image data, we include MNIST (Le Cun et al., 1998), Fashion MNIST (F-MNIST) (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), SVHN (Netzer et al., 2011), Flowers (Gurnani et al., 2017), LSUN (Church) (Yu et al., 2015), and Tiny-Image Net (Deng et al., 2009). Note that Tiny-Image Net is occasionally abbreviated as Image Net in our tables for brevity. For tabular data, we utilize the 20 datasets from the binary density estimation benchmark DEBD (Lowd & Davis, 2010; Haaren & Davis, 2012; Bekker et al., 2015; Larochelle & Murray, 2011).
Dataset Splits Yes We evaluate our models on various image and tabular datasets. For image data, we include MNIST (Le Cun et al., 1998), Fashion MNIST (F-MNIST) (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), SVHN (Netzer et al., 2011), Flowers (Gurnani et al., 2017), LSUN (Church) (Yu et al., 2015), and Tiny-Image Net (Deng et al., 2009). Note that Tiny-Image Net is occasionally abbreviated as Image Net in our tables for brevity. For tabular data, we utilize the 20 datasets from the binary density estimation benchmark DEBD (Lowd & Davis, 2010; Haaren & Davis, 2012; Bekker et al., 2015; Larochelle & Murray, 2011).
Hardware Specification Yes The experiments were conducted on a single NVIDIA A100 GPU.
Software Dependencies No All experiments and models are implemented in Py Torch (Ansel et al., 2024) and Py Torch Lightning (Falcon & The Py Torch Lightning team, 2019). (The paper mentions software frameworks with citations but does not provide explicit version numbers for these frameworks.)
Experiment Setup Yes Each model is trained for 10,000 iterations using the Adam W optimizer (Kingma & Ba, 2015; Loshchilov & Hutter, 2017), and convergence was confirmed for all models by the end of this training period. We use the MSE as LREC for all models. Training is carried out with a batch size of 512, except for Celeb A where a batch size of 256 is used due to larger model sizes and VRAM constraints. The initial learning rate is set to 0.1 for APCs and 0.005 for AEs and VAEs. The learning rate is reduced by a factor of 10 at 66% and 90% of the training progress. Additionally, to enhance stability and avoid numerical issues or exploding gradients during the training phase, we utilize an exponential learning rate warmup over the first 2% of training iterations.