Identifiability of Additive Noise Models Using Conditional Variances
Authors: Gunwoong Park
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Demonstrated through extensive simulated and real multivariate data is that the proposed algorithm successfully recovers directed acyclic graphs. This section provides numerical experiments to support our theoretical results: ANMs with non-equal error variances can be identifiable; and Algorithm 3 consistently recovers Gaussian linear SEMs. |
| Researcher Affiliation | Academia | Gunwoong Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea |
| Pseudocode | Yes | Algorithm 1: Ordering estimation using the forward stepwise selection Input : n i.i.d. samples from an ANM, X1:n Output: Estimated ordering, bπ = (bπ1, ..., bπp) Set bπ0 = for m = {1, 2, , p} do Set S = {bπ0, ..., bπm 1} for j {1, 2, , p} \ S do Estimate the conditional variance of Xj given XS, bσ2 j|S end The m-th element of the ordering bπm = arg minj bσ2 j|S end |
| Open Source Code | No | No explicit statement about the release of source code or a link to a repository for the methodology described in this paper is provided. The license mentioned is for the paper itself, not the code. |
| Open Datasets | Yes | We applied our algorithms and the comparison GES, GDS, LISTEN, and LINGAM algorithms to real multivariate Gaussian data involving students mathematics scores. More precisely, the variables are the examination marks for 88 students from five different sub- jects: mechanics, vectors, algebra, analysis, and statistics. This dataset is provided in the bnlearn R package (Scutari, 2009). |
| Dataset Splits | Yes | For the LISTEN algorithm, we used the regularization parameters to 0.001 and the hard threshold parameter was set to 0.25 because these seem to recover the model better. However, when the regularization and hard thresholding parameters are chosen from 10-fold cross validation, the performance in recovering the graph is poor because the cross-validation does not generally have consistency properties for model selection (see details in Shao, 1993). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'bnlearn R package (Scutari, 2009)' and refers to 'ordinary linear regression' and 'Fisher s independence test' but does not specify version numbers for these or any other key software components or libraries. |
| Experiment Setup | Yes | The set of non-zero parameters βjk R in Equation (5) were generated uniformly at random in the range βjk ( 0.5/d). Lastly, all noise variances were set to σ2 j = 0.75. The regularization parameters were set to 0.001, and the hard threshold parameter to half of the minimum value of true edge weights, min(|βjk|/2), by using the true model information because it seems to be much better than the parameters from cross validation when recovering graphs. We always set the significance level depending on the sample size, α = 1 Φ(n1/4/2), as in Theorem 9. We set the maximum degree m = 2. For homogeneous error variances, all noise variances were set to σ2 j = 0.5, and for heterogeneous error variances, all noise variances were randomly chosen σ2 j [0.475, 0.525]. error distributions were sequentially uniform, U( 1, 1), Gaussian N(0, 1/3), and a half of t-distribution with 10 degree of freedom. |