reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifiability of Additive Noise Models Using Conditional Variances

Authors: Gunwoong Park

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Demonstrated through extensive simulated and real multivariate data is that the proposed algorithm successfully recovers directed acyclic graphs. This section provides numerical experiments to support our theoretical results: ANMs with non-equal error variances can be identiﬁable; and Algorithm 3 consistently recovers Gaussian linear SEMs.
Researcher Affiliation	Academia	Gunwoong Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea
Pseudocode	Yes	Algorithm 1: Ordering estimation using the forward stepwise selection Input : n i.i.d. samples from an ANM, X1:n Output: Estimated ordering, bπ = (bπ1, ..., bπp) Set bπ0 = for m = {1, 2, , p} do Set S = {bπ0, ..., bπm 1} for j {1, 2, , p} \ S do Estimate the conditional variance of Xj given XS, bσ2 j\|S end The m-th element of the ordering bπm = arg minj bσ2 j\|S end
Open Source Code	No	No explicit statement about the release of source code or a link to a repository for the methodology described in this paper is provided. The license mentioned is for the paper itself, not the code.
Open Datasets	Yes	We applied our algorithms and the comparison GES, GDS, LISTEN, and LINGAM algorithms to real multivariate Gaussian data involving students mathematics scores. More precisely, the variables are the examination marks for 88 students from ﬁve diﬀerent sub- jects: mechanics, vectors, algebra, analysis, and statistics. This dataset is provided in the bnlearn R package (Scutari, 2009).
Dataset Splits	Yes	For the LISTEN algorithm, we used the regularization parameters to 0.001 and the hard threshold parameter was set to 0.25 because these seem to recover the model better. However, when the regularization and hard thresholding parameters are chosen from 10-fold cross validation, the performance in recovering the graph is poor because the cross-validation does not generally have consistency properties for model selection (see details in Shao, 1993).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'bnlearn R package (Scutari, 2009)' and refers to 'ordinary linear regression' and 'Fisher s independence test' but does not specify version numbers for these or any other key software components or libraries.
Experiment Setup	Yes	The set of non-zero parameters βjk R in Equation (5) were generated uniformly at random in the range βjk ( 0.5/d). Lastly, all noise variances were set to σ2 j = 0.75. The regularization parameters were set to 0.001, and the hard threshold parameter to half of the minimum value of true edge weights, min(\|βjk\|/2), by using the true model information because it seems to be much better than the parameters from cross validation when recovering graphs. We always set the signiﬁcance level depending on the sample size, α = 1 Φ(n1/4/2), as in Theorem 9. We set the maximum degree m = 2. For homogeneous error variances, all noise variances were set to σ2 j = 0.5, and for heterogeneous error variances, all noise variances were randomly chosen σ2 j [0.475, 0.525]. error distributions were sequentially uniform, U( 1, 1), Gaussian N(0, 1/3), and a half of t-distribution with 10 degree of freedom.