Information Lattice Learning

Authors: Haizi Yu, James A. Evans, Lav R. Varshney

JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present applications in knowledge discovery, using ILL to distill music theory from scores and chemical laws from molecules and further revealing connections between them. We show ILL s efficacy and interpretability on benchmarks and assessments, as well as a demonstration of ILL-enhanced classifiers achieving human-level digit recognition using only one or a few MNIST training examples (1 10 per class).
Researcher Affiliation Academia Haizi Yu EMAIL James A. Evans EMAIL Knowledge Lab, University of Chicago, 1155 E 60th Street, Chicago, IL 60637 USA Lav R. Varshney EMAIL Coordinated Science Lab, University of Illinois at Urbana-Champaign, 1308 W Main Street, Urbana, IL 61801 USA
Pseudocode Yes Algorithm 1: Add_partition (Pτ, P): adds a tagged partition Pτ to a partition poset (P, )
Open Source Code No The paper does not provide an explicit statement or a link to their own source code implementation. While it references external tools (e.g., Harmonia by Illiac Software, Inc.), it does not offer its own code for the methodology described.
Open Datasets Yes We present applications in knowledge discovery, using ILL to distill music theory from scores and chemical laws from molecules and further revealing connections between them. We show ILL s efficacy and interpretability on benchmarks and assessments, as well as a demonstration of ILL-enhanced classifiers achieving human-level digit recognition using only one or a few MNIST training examples (1 10 per class). ... Signals are probability distributions of chords encoded as vectors of MIDI keys. Figure 8a shows such a signal the frequency distribution of two-note chords extracted from the soprano and bass parts of Bach s C-score chorales (Illiac Software, Inc., 2020) ... Signals are Boolean-valued functions indicating the presence of compound formulae encoded as vectors of atomic numbers in a molecule database. Figure 8b shows a signal attained by collecting two-element compounds from the Materials Project database (Jain et al., 2013).
Dataset Splits Yes We test the performance of our ILL-enhanced Nearest-Neighbor and Text Caps, as well as the vanilla Nearest-Neighbor using pixel-wise Euclidean distance as baseline, in the regime of only a few training examples per class. With the training size growing from 1 image per class, we run the three models on the same training set and collect their prediction accuracies on the entire MNIST test set. ... In Figure 15, training examples are selected as the first k (k = 1, 2, . . . , 20) images in the training set per class, which may be viewed as a random sample. One may carefully select (by hand or by algorithm) a training subset comprising distinct prototype digits to mimic what humans might naturally do: observe more but select less to memorize. Using one such selected training subset consisting of only 51 images in total (i.e., 5 per class), ILL-enhanced Nearest-Neighbor can still achieve 90% test accuracy
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only mentions general computing contexts like "running on the same training and test sets" without specifying the underlying hardware.
Software Dependencies No The paper mentions "python numpy.nan" in Appendix F as part of the instructions for an assignment, but it does not specify version numbers for Python, NumPy, or any other significant software libraries or frameworks used for the main experimental work. This is not sufficient to meet the requirement for specific versioned software dependencies.
Experiment Setup Yes For these two illustrations, we fix the same priors F, S in (8)(9) thus the same lattice. We fix the same parameters: ϵ-path is 0.2 < 3.2 < 6.2 < (tip: a small initial offset, e.g., 0.2, is used to achieve nearly-deterministic rules) and γ is 20% of the initial signal gap. This fixed setting is used to show generality and for comparison.