Tractable Representation Learning with Probabilistic Circuits
Authors: Steven Braun, Sahil Sidheekh, Antonio Vergari, Martin Mundt, Sriraam Natarajan, Kristian Kersting
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. |
| Researcher Affiliation | Academia | Steven Braun EMAIL Technische Universität Darmstadt, Germany Sahil Sidheekh EMAIL University of Texas at Dallas, United States Antonio Vergari EMAIL University of Edinburgh, United Kingdom Martin Mundt EMAIL Universität Bremen, Germany Sriraam Natarajan EMAIL University of Texas at Dallas, United States Kristian Kersting EMAIL Technische Universität Darmstadt, Germany |
| Pseudocode | Yes | Appendix Algorithm 1 outlines the process of data-free knowledge distillation from a VAE to an APC of similar capacity and the same decoder architecture. The procedure iteratively refines the APC to learn from the VAE without the need for original training data. Algorithm 1 APC Encoding Procedure. Algorithm 2 Data-Free Knowledge Distillation from VAE to APC. Algorithm 3 Sampling sum units with SIMPLE. |
| Open Source Code | Yes | Our implementation is available as open-source software at https: //github.com/ml-research/autoencoding-probabilistic-circuits. |
| Open Datasets | Yes | We evaluate our models on various image and tabular datasets. For image data, we include MNIST (Le Cun et al., 1998), Fashion MNIST (F-MNIST) (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), SVHN (Netzer et al., 2011), Flowers (Gurnani et al., 2017), LSUN (Church) (Yu et al., 2015), and Tiny-Image Net (Deng et al., 2009). Note that Tiny-Image Net is occasionally abbreviated as Image Net in our tables for brevity. For tabular data, we utilize the 20 datasets from the binary density estimation benchmark DEBD (Lowd & Davis, 2010; Haaren & Davis, 2012; Bekker et al., 2015; Larochelle & Murray, 2011). |
| Dataset Splits | Yes | We evaluate our models on various image and tabular datasets. For image data, we include MNIST (Le Cun et al., 1998), Fashion MNIST (F-MNIST) (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), SVHN (Netzer et al., 2011), Flowers (Gurnani et al., 2017), LSUN (Church) (Yu et al., 2015), and Tiny-Image Net (Deng et al., 2009). Note that Tiny-Image Net is occasionally abbreviated as Image Net in our tables for brevity. For tabular data, we utilize the 20 datasets from the binary density estimation benchmark DEBD (Lowd & Davis, 2010; Haaren & Davis, 2012; Bekker et al., 2015; Larochelle & Murray, 2011). |
| Hardware Specification | Yes | The experiments were conducted on a single NVIDIA A100 GPU. |
| Software Dependencies | No | All experiments and models are implemented in Py Torch (Ansel et al., 2024) and Py Torch Lightning (Falcon & The Py Torch Lightning team, 2019). (The paper mentions software frameworks with citations but does not provide explicit version numbers for these frameworks.) |
| Experiment Setup | Yes | Each model is trained for 10,000 iterations using the Adam W optimizer (Kingma & Ba, 2015; Loshchilov & Hutter, 2017), and convergence was confirmed for all models by the end of this training period. We use the MSE as LREC for all models. Training is carried out with a batch size of 512, except for Celeb A where a batch size of 256 is used due to larger model sizes and VRAM constraints. The initial learning rate is set to 0.1 for APCs and 0.005 for AEs and VAEs. The learning rate is reduced by a factor of 10 at 66% and 90% of the training progress. Additionally, to enhance stability and avoid numerical issues or exploding gradients during the training phase, we utilize an exponential learning rate warmup over the first 2% of training iterations. |