reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expressiveness of Parametrized Distributions over DAGs for Causal Discovery

Authors: Simon Rittel, Sebastian Tschiatschek

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we focus on the expressiveness of parametrized distributions over DAGs in the context of causal structure learning and show several limitations of candidate models in a theoretical analysis and validate them empirically in relevant supervised settings.
Researcher Affiliation	Academia	Simon Rittel EMAIL Department of Statistics, LMU Munich, Germany Munich Center for Machine Learning, Germany Uni Vie Doctoral School Computer Science, Austria Sebastian Tschiatschek EMAIL Faculty of Computer Science, University of Vienna, Austria Research Network Data Science, University of Vienna, Austria
Pseudocode	No	The paper describes generative models and outline steps using mathematical notation and figures, but does not include a distinct 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	ARCO-DAG The autoregressive neural network of ARCO-DAG consists of a simple two layer perceptron with HN = 30 hidden neurons and Re LU-activations and follows the official implementation on https: //github.com/chritoth/bci-arco-gp/. GFlow Net-Dag The transformer architecture for the GFlow Net-Dag model follows the official implementation provided on https://github.com/tristandeleu/jax-dag-gflownet
Open Datasets	No	The target distribution is either derived from the MEC in Example 1, by the coupling of edges from Example 2, or a synthetically generated distribution that arises from concentrating the probability mass around a target graph based on the structural Hamming distance (SHD). In the absence of an analytic posterior that motivates such similarity, we generate a synthetic target distribution around the assumed maximimum-a-posteriori (MAP) graph G depicted in Figure 5 that has positive support for all 543 possible DAGs with 4 nodes.
Dataset Splits	No	In the supervised setting, we minimize the forward KL divergence between the target distribution and the model distribution using the Adam Optimizer with decoupled weight decay (Loshchilov & Hutter, 2019) over 1000 optimization steps. For training of the parameters ϕ with gradient descent, we take the forward Kullback-Leibler (KL) divergence between the target distribution p G and the candidate distribution q G as the loss function and approximate it using samples from the target distribution.
Hardware Specification	Yes	The computations were conducted on a 11th Gen. Intel(R) Core i7-1165G7 processor with 2.80 GHz, 4 cores and 8 logical processing units paired with 32 GB of DDR SDRAM.
Software Dependencies	No	The paper mentions the "Adam Optimizer" and specific model architectures (e.g., "multilayer perceptron", "transformer architecture") but does not provide specific version numbers for any software libraries or frameworks.
Experiment Setup	Yes	In the supervised setting, we minimize the forward KL divergence between the target distribution and the model distribution using the Adam Optimizer with decoupled weight decay (Loshchilov & Hutter, 2019) over 1000 optimization steps. Further details and the used hyperparameters for each model are provided in section D.2. Table 5: Hyperparameters. (a) For the experiments in section 5.1 and 5.2. Graph model Learning rate # forward KL samples # IS for training # IS for evaluation RPM-DAG 0.5 25 1 100 ARCO-DAG 0.5 25 1 100 GFlow Net-DAG 0.001 25 10 100