Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
The AIM and EM Algorithms for Learning from Coarse Data
Authors: Manfred Jaeger
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the practical usability of the AIM algorithm by prototype implementations for parameter learning from continuous Gaussian data, and from discrete Bayesian network data. Extensive experiments show that the theoretical differences between AIM and EM can be observed in practice, and that a combination of the two methods leads to robust performance for both ignorable and non-ignorable mechanisms. |
| Researcher Affiliation | Academia | Manfred Jaeger EMAIL Department for Computer Science Aalborg University Selma Lagerl ofs Vej 300, 9220 Aalborg, Denmark |
| Pseudocode | Yes | Algorithm 1: The AIM procedure; Algorithm 2: Pseudo code for AIM using discretization for coarse continuous data; Algorithm 3: Pseudo code for local optimization routine; Algorithm 4: AIM for Bayesian Networks; Algorithm 5: Incremental AI step; Algorithm 6: Local improvement step |
| Open Source Code | Yes | The source code for the experiments of Section 7.1 is available at https://github.com/manfred-jaeger-aalborg/aim_for_gauss. |
| Open Datasets | No | The paper generates data from Gaussian distributions for its 1d-b and 2d-m scenarios. It also mentions the "traditional Asia network" but does not provide specific access information (link, DOI, formal citation) for this or any other publicly available dataset. It doesn't use pre-existing datasets for the Gaussian experiments. |
| Dataset Splits | No | The paper describes generating data with specified sample sizes (N), and initial parameter selection using a subsample of size l=20 for ACA. It does not provide explicit train/test/validation splits, proportions, or specific files for data partitioning. |
| Hardware Specification | No | The paper discusses computational bottlenecks and the use of the scipy.stats library, but it does not specify any particular hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "the norm and multivariate normal modules of the scipy.stats library" and the "HUGIN system" for Bayesian networks, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The parameters ai, bi are always set to µi ± 3 σi and µi + 3 σi, respectively, where µi, σi are the empirical mean and standard deviation obtained from the observed values in dimension i. We run AIM for a number of candidate granularities (g1, . . . , g K) = (3, 5, 10, 20, 50, 100) [for 1d-b] and (g1, . . . , g K) = (3, 5, 8, 12, 20) [for 2d-m]. For each granularity, the AIM procedure is restarted several times with different initial parameter settings. The number of restarts is denoted #RS. ... obtain initial µ0, Σ0 by ACA on a subsample of size l = 20. ... For a given experimental setting we run 50 experiments along the same lines as described in Section 7.1.4. In each experiment the number of restarts is 10. |