Distributionally Robust Skeleton Learning of Discrete Bayesian Networks

Authors: Yeshu Li, Brian Ziebart

NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical study on synthetic and real datasets validates the effectiveness of our method. We conduct experiments3 on benchmark datasets [Scutari, 2010] and real-world datasets [Malone et al., 2015] perturbed by the following contamination models: Noisefree model. This is the baseline model without any noises. Huber s contamination model. In this model, each sample has a fixed probability of ζ to be replaced by a sample drawn from an arbitrary distribution. Independent failure model. Each entry of a sample is independently corrupted with probability ζ. We conduct all experiments on a laptop with an Intel Core i7 2.7 GHz processor.
Researcher Affiliation Collaboration Yeshu Li Alibaba Group EMAIL Brian D. Ziebart Department of Computer Science University of Illinois at Chicago EMAIL
Pseudocode Yes The pseudo-code of the greedy algorithm for solving Equation (6) in Wasserstein DRO is illustrated in Algorithm 1. Algorithm 1 Greedy Algorithm for the Wasserstein Worst-case Risk
Open Source Code Yes Our code is publicly available at https://github.com/DanielLeee/drslbn.
Open Datasets Yes We conduct experiments3 on benchmark datasets [Scutari, 2010] and real-world datasets [Malone et al., 2015] perturbed by the following contamination models:
Dataset Splits No When dealing with real-world datasets, we randomly split the data into two halves for training and testing. The paper mentions training and testing splits, but does not explicitly describe a validation split or provide specific percentages for any splits.
Hardware Specification Yes We conduct all experiments on a laptop with an Intel Core i7 2.7 GHz processor.
Software Dependencies No For the Wasserstein-based method, we leverage Adam [Kingma and Ba, 2014] to optimize the overall objective... For the KL-based and standard regularization methods, we use the L-BFGS-B [Byrd et al., 1995] optimization method. The paper mentions software tools like Adam and L-BFGS-B, but does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup Yes For the Wasserstein-based method, we leverage Adam [Kingma and Ba, 2014] to optimize the overall objective with β1 0.9, β2 0.990, a learning rate of 1.0, a batch size of 500, a maximum of 200 iterations for optimization and 10 iterations for approximating the worst-case distribution. For the KL-based and standard regularization methods, we use the L-BFGS-B [Byrd et al., 1995] optimization method with default parameters. We set the cardinality of the maximum conditional set to 3 in MMPC. The Bayesian information criterion (BIC) [Neath and Cavanaugh, 2012] score is adopted in the HC algorithm. A random mixture of 20 random Bayesian networks serves as the adversarial distribution for both contamination models. All hyper-parameters are chosen based on the best performance on random Bayesian networks with the same size as the input one.