reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The AIM and EM Algorithms for Learning from Coarse Data

Authors: Manfred Jaeger

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical usability of the AIM algorithm by prototype implementations for parameter learning from continuous Gaussian data, and from discrete Bayesian network data. Extensive experiments show that the theoretical diﬀerences between AIM and EM can be observed in practice, and that a combination of the two methods leads to robust performance for both ignorable and non-ignorable mechanisms.
Researcher Affiliation	Academia	Manfred Jaeger EMAIL Department for Computer Science Aalborg University Selma Lagerl ofs Vej 300, 9220 Aalborg, Denmark
Pseudocode	Yes	Algorithm 1: The AIM procedure; Algorithm 2: Pseudo code for AIM using discretization for coarse continuous data; Algorithm 3: Pseudo code for local optimization routine; Algorithm 4: AIM for Bayesian Networks; Algorithm 5: Incremental AI step; Algorithm 6: Local improvement step
Open Source Code	Yes	The source code for the experiments of Section 7.1 is available at https://github.com/manfred-jaeger-aalborg/aim_for_gauss.
Open Datasets	No	The paper generates data from Gaussian distributions for its 1d-b and 2d-m scenarios. It also mentions the "traditional Asia network" but does not provide specific access information (link, DOI, formal citation) for this or any other publicly available dataset. It doesn't use pre-existing datasets for the Gaussian experiments.
Dataset Splits	No	The paper describes generating data with specified sample sizes (N), and initial parameter selection using a subsample of size l=20 for ACA. It does not provide explicit train/test/validation splits, proportions, or specific files for data partitioning.
Hardware Specification	No	The paper discusses computational bottlenecks and the use of the scipy.stats library, but it does not specify any particular hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions using "the norm and multivariate normal modules of the scipy.stats library" and the "HUGIN system" for Bayesian networks, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The parameters ai, bi are always set to µi ± 3 σi and µi + 3 σi, respectively, where µi, σi are the empirical mean and standard deviation obtained from the observed values in dimension i. We run AIM for a number of candidate granularities (g1, . . . , g K) = (3, 5, 10, 20, 50, 100) [for 1d-b] and (g1, . . . , g K) = (3, 5, 8, 12, 20) [for 2d-m]. For each granularity, the AIM procedure is restarted several times with diﬀerent initial parameter settings. The number of restarts is denoted #RS. ... obtain initial µ0, Σ0 by ACA on a subsample of size l = 20. ... For a given experimental setting we run 50 experiments along the same lines as described in Section 7.1.4. In each experiment the number of restarts is 10.