reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recursive Teaching Dimension, VC-Dimension and Sample Compression

Authors: Thorsten Doliwa, Gaojian Fan, Hans Ulrich Simon, Sandra Zilles

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper is concerned with various combinatorial parameters of classes that can be learned from a small set of examples. We show that the recursive teaching dimension, recently introduced by Zilles et al. (2008), is strongly connected to known complexity notions in machine learning, e.g., the self-directed learning complexity and the VC-dimension. To the best of our knowledge these are the ﬁrst results unveiling such relations between teaching and query learning as well as between teaching and the VC-dimension. It will turn out that for many natural classes the RTD is upper-bounded by the VCD, e.g., classes of VCdimension 1, intersection-closed classes and ﬁnite maximum classes. However, we will also show that there are certain (but rare) classes for which the recursive teaching dimension exceeds the VC-dimension. Moreover, for maximum classes, the combinatorial structure induced by the RTD, called teaching plan, is highly similar to the structure of sample compression schemes. Indeed one can transform any repetition-free teaching plan for a maximum class C into an unlabeled sample compression scheme for C and vice versa, while the latter is produced by (i) the corner-peeling algorithm of Rubinstein and Rubinstein (2012) and (ii) the tail matching algorithm of Kuzmin and Warmuth (2007).
Researcher Affiliation	Academia	Thorsten Doliwa EMAIL Faculty of Mathematics Ruhr-Universität Bochum, D-44780 Bochum, Germany Gaojian Fan EMAIL Department of Computing Science University of Alberta, Edmonton, AB, T6G 2E8, Canada Hans Ulrich Simon EMAIL Horst-Görtz Institute for IT Security and Faculty of Mathematics Ruhr-Universität Bochum, D-44780 Bochum, Germany Sandra Zilles EMAIL Department of Computer Science University of Regina, Regina, SK, S4S 0A2, Canada
Pseudocode	No	The paper describes mathematical concepts, definitions (e.g., Deﬁnition 1, Deﬁnition 2, Deﬁnition 5, Deﬁnition 6), lemmas (e.g., Lemma 3, Lemma 7, Lemma 10, Lemma 12, Lemma 13, Lemma 14, Lemma 16, Lemma 22, Lemma 23, Lemma 24, Lemma 27, Lemma 30, Lemma 33) and theorems (e.g., Theorem 11, Theorem 15, Theorem 17, Theorem 20, Theorem 26, Theorem 29, Theorem 31, Theorem 32) along with their proofs. There are no structured pseudocode blocks or algorithms explicitly labeled or formatted as such.
Open Source Code	No	The paper makes no explicit mention of open-source code availability, links to code repositories, or code provided in supplementary materials.
Open Datasets	No	The paper primarily deals with theoretical concept classes and mathematical constructs (e.g., 'domain X', 'd-dimensional boxes over domain [n]d', 'Warmuth’s class CW'). It does not describe or utilize any empirical datasets for experiments, nor does it provide any links, DOIs, repository names, or formal citations for publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not involve experimental evaluation on datasets, therefore, there is no mention of training, validation, or test dataset splits.
Hardware Specification	No	As the paper presents theoretical research without empirical experiments, there is no mention of hardware specifications used for computation.
Software Dependencies	No	The paper is theoretical and does not describe any implemented systems or experiments, thus it does not list any specific software dependencies or version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and definitions, rather than empirical experiments. Therefore, it does not include details on experimental setup, hyperparameters, or system-level training settings.