Sum of Squares Circuits

Authors: Lorenzo Loconte, Stefan Mengel, Antonio Vergari

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation. ...Finally, we empirically validate the increased expressiveness of sum of squares circuits for distribution estimation, showing they can scale to real-world data when tensorized (Section 7). ...We evaluate structured monotonic (+sd), squared PCs ( 2 R, 2 C), their sums and µSOCS as the product of a monotonic and a SOCS PC (+sd Σ2 cmp, see Definition 5) on distribution estimation tasks using both continuous and discrete real-world data.
Researcher Affiliation Academia Lorenzo Loconte1, Stefan Mengel2, Antonio Vergari1 1School of Informatics, University of Edinburgh, UK 2University of Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), France
Pseudocode No The paper describes algorithms like the Multiply algorithm conceptually (e.g., "Multiplying two compatible circuits c1, c2 can be done via the Multiply algorithm in time O(|c1||c2|) as described in Vergari et al. (2021) and which we report in Appendix A.1") but does not provide structured pseudocode or algorithm blocks in the main text.
Open Source Code Yes Code https://github.com/april-tools/sos-npcs
Open Datasets Yes We estimate the distribution of four continuous UCI data sets: Power, Gas, Hepmass, Mini Boo NE, using the same preprocessing by Papamakarios, Pavlakou, and Murray (2017) (Table C.1). ...We estimate the probability distribution of MNIST, Fashion MNIST and Celeb A images (Table C.2)
Dataset Splits Yes For all UCI data sets, we preprocess them as in Papamakarios, Pavlakou, and Murray (2017), which includes standard z-normalization and random splits for training, validation, and test sets. Specifically, we use an 80/10/10 split respectively. We use the official splits for MNIST, Fashion MNIST, and CelebA.
Hardware Specification No The paper mentions that models were trained and experiments were performed, but it does not specify any particular hardware components such as GPU models, CPU models, or memory details.
Software Dependencies No The paper does not provide specific software names with version numbers, such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes Given a training set D = {x(i)}N i=1 on variables X, we are interested in estimating p(X) from D by minimizing the parameters negative log-likelihood on a batch B D, i.e., L := |B| log Z P x B log c(x), via gradient descent. ...For all UCI data sets, we train our models for 500 epochs using the Adam optimizer with a learning rate of 1e-3 and a batch size of 128.