Integrating Bayesian Network Structure into Residual Flows and Variational Autoencoders

Authors: Jacobie Mouton, Rodney Stephen Kroon

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed GRF and SIRe N-VAE models on a range of synthetic and real-world datasets that each have an associated true or hypothesized BN graph. The synthetic datasets are generated from fully specified BNs. All models were trained using the Adam optimizer with an initial learning rate of either 0.01 or 0.001 and a batch size of 100. The learning rate was decreased by a factor of 10 each time no improvement in the loss was observed for a set number of consecutive epochs, until a minimum learning rate of 10 6 was reached, at which point training was terminated. The initial learning rate and duration before learning rate reduction was chosen based on the lowest validation loss obtained over the grid {0.01, 0.001} {10, 20, 30}. Table 1 provides the negative log-likelihood (NLL) and the negative ELBO achieved by each model on the test set of the various datasets for the density estimation and variational inference tasks, respectively.
Researcher Affiliation Academia Jacobie Mouton EMAIL Computer Science Division Stellenbosch University South Africa Steve Kroon EMAIL Computer Science Division Stellenbosch University and National Institute for Theoretical and Computational Sciences South Africa
Pseudocode No The paper describes the methods and transformations using mathematical equations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1An implementation of the GRF and SIRe N-VAE, as well as our experimental code, can be found at https://gitlab.com/pleased/grf-and-siren-vae.
Open Datasets Yes Our experiments use three synthetic datasets: the Arithmetic Circuit dataset (Weilbach et al., 2020; Wehenkel & Louppe, 2021), an adaptation of the Tree dataset (Wehenkel & Louppe, 2021) as well as a linear Gaussian BN, EColi, adapted from the repository of Scutari (2022). We also consider two real-word datasets, namely Protein (Sachs et al., 2005) and MEHRA (Vitolo et al., 2018).
Dataset Splits Yes We compared training the models on the full training sets of the respective datasets against using much smaller training sets consisting of only 2 |G| instances, where |G| = D +K. We noted each model s average negative log-evidence on the test set over five independent runs.
Hardware Specification No The paper describes training settings like optimizer, learning rate, and batch size, but does not provide any specific details about the hardware (e.g., CPU, GPU models) used for these experiments.
Software Dependencies No The paper mentions the use of the Adam optimizer, but it does not specify versions for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes All models were trained using the Adam optimizer with an initial learning rate of either 0.01 or 0.001 and a batch size of 100. The learning rate was decreased by a factor of 10 each time no improvement in the loss was observed for a set number of consecutive epochs, until a minimum learning rate of 10 6 was reached, at which point training was terminated. The initial learning rate and duration before learning rate reduction was chosen based on the lowest validation loss obtained over the grid {0.01, 0.001} {10, 20, 30}.