reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal Discovery via Bayesian Optimization

Authors: Bao Duong, Sunil Gupta, Thin Nguyen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This is demonstrated through an extensive set of empirical evaluations on many challenging settings with both synthetic and real data. Our implementation is available at https://github.com/baosws/Dr BO. 1 INTRODUCTION ... In this section, we verify our claim in the introduction: Dr BO is both more accurate and sampleefficient than existing approaches in score-based observational DAG learning. We show this by comparing our Dr BO method with a number of the most recent advances in causal discovery that are based on sequential optimization, including gradient-based methods DAGMA (Bello et al., 2022), COSMO (Massidda et al., 2024), GOLEM (Ng et al., 2020), NOTEARS (Zheng et al., 2018) with TMPI constraint (Zhang et al., 2022), as well as RL-based approaches CORL (Wang et al., 2021) and ALIAS (Duong et al., 2025).
Researcher Affiliation	Academia	Bao Duong, Sunil Gupta, and Thin Nguyen Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia EMAIL
Pseudocode	Yes	Algorithm 1 The Dr BO method for causal discovery. Require: Dataset D = x(j) Rd n j=1 of d nodes and n observations, score function S (D, ), DAG rank k, batch size B, no. of preliminary candidates C, and total no. of evaluations T. Ensure: A DAG ˆG that maximizes S (D, G). 1: Initialize empty experience H := and node-wise dropout neural nets: {Dropout NNi}d i=1. 2: while \|H\| < T do 3: Generate random DAGs: G(j) := τ z(j) C j=1 where z [ 1, 1]d(1+k). Secs. 4.1 & 4.2. 4: Sample local scores: n l(j) i Dropout NNi pa G(j) i od j=1 . Sec. 4.3. 5: Combine local scores: n AF(j) := Combine l(j) 1 , . . . , l(j) d o C j=1. Sec. 4.4. 6: Select top B candidates with highest AF values: j1, . . . , j B := argtop B j=1,...,C AF(j). Sec. 4.2. 7: Evaluate these candidates and update experience: H := H G(j), S D, G(j) j=j1,...,j B. 8: Update the neural nets on new H. Sec. 4.5. 9: end while 10: Get highest-scoring DAG so far: ˆG := arg max G H S (D, G). 11: Prune ˆG if needed. Sec. 4.6.
Open Source Code	Yes	This is demonstrated through an extensive set of empirical evaluations on many challenging settings with both synthetic and real data. Our implementation is available at https://github.com/baosws/Dr BO.
Open Datasets	Yes	Benchmark data. We verify the performance of our method on real data using the popular benchmark flow cytometry dataset (Sachs et al., 2005), concerning a protein signaling network based on expression levels of proteins and phospholipids. ... Real-world structures. To further illustrate the capabilities of our approach on real-world scenarios, we conduct experiments on real structures provided by the Bn Learn repository (Scutari, 2010). ... The Bn Learn repository2 (Scutari, 2010) contains a set of Bayesian networks of varying sizes and complexities from different real-world domains. 2Data is publicly downloadable at https://www.bnlearn.com/bnrepository
Dataset Splits	No	Then, we generate a dataset of n = 1,000 i.i.d. samples according to a linear-Gaussian SCM xi := P j pai wjixj + εi, where εi N (0, 1). ... Each dataset contains 1,000 observational samples and a ground truth causal network belonging to real-world applications with varying size. ... Benchmark data. ... containing 853 observations and a known causal network with 11 nodes and 17 edges. ... Varying sample sizes: we show in Figure 4 that our method can achieve low SHDs even with limited data.
Hardware Specification	Yes	D.4 IMPLEMENTATIONS AND PLATFORM Implementations. ...Platform. The majority of our experiments are conducted on a machine with Intel Core i913900KF processor and NVIDIA RTX 4070 Ti GPU. The only exception is the case of CORL on large graphs (Figure 1(b)) which requires more than 16Gb of CUDA memory, and therefore these experiments are conducted on an NVIDIA A100 GPU with 40G of CUDA memory instead.
Software Dependencies	No	E HYPERPARAMETERS We provide the specific set of hyperparameters for our Dr BO method in Table 2. More details can be found in our published source code. ... Scoring function SBIC-EV with linear regression SBIC-NV with GP regression SBIC-NV with GP regression ... Optimizer Adam ...
Experiment Setup	Yes	E HYPERPARAMETERS We provide the specific set of hyperparameters for our Dr BO method in Table 2. Unless specifically indicated, the default hyperparameters here are used for all experiments. More details can be found in our published source code. For nonlinear data with GPs, we use GP regression with regularization α = 1 (because the additive noises have near-unit variances) and radial basis function (RBF) kernel, where the length scale is optimized over 1, 105 . For the Sachs dataset, we also use GP regression with the same kernel as above. In addition, as the noise variances are unknown, we set a really small value α = 10 8 just to ensure positive definiteness of the covariance matrix, and following Wang et al. (2021); Duong et al. (2025), we also employ the median bandwidth heuristic for the kernel, by dividing the predictors by their median pairwise euclidean distance before GP regression is applied. Table 2: Hyperparameters for Dr BO. Hyperparameter Experiment Linear data Nonlinear data with GPs Sachs data Normalize data No No Yes Scoring function SBIC-EV with linear regression SBIC-NV with GP regression SBIC-NV with GP regression Pruning method Linear pruning No pruning CIT pruning Batch size B 64 DAG rank k 8 No. training steps ngrads 10 No. preliminary candidates C 100,000 Optimizer Adam Learning rate 0.1 Replay buffer size nreplay 1,024 No. hidden units 64 Dropout rate 0.1