reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Class-wise Generalization Error: an Information-Theoretic analysis

Authors: Firas Laakom, Moncef Gabbouj, Jürgen Schmidhuber, Yuheng Bu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our proposed bounds in various neural networks and show that they accurately capture the complex class-generalization behavior. Moreover, we demonstrate that the theoretical tools developed in this work can be applied in several other applications.
Researcher Affiliation	Academia	Firas Laakom EMAIL Center of Excellence for Generative AI KAUST, Saudi Arabia Moncef Gabbouj EMAIL Faculty of Information Technology and Communication Sciences Tampere University, Finland Jürgen Schmidhuber EMAIL Center of Excellence for Generative AI, KAUST, Saudi Arabia The Swiss AI Lab, IDSIA, Switzerland Yuheng Bu EMAIL Department of Computer Science University of California, Santa Barbara, CA, USA
Pseudocode	No	The paper includes definitions, theorems, lemmas, and proofs, but does not contain any clearly labeled pseudocode or algorithm blocks. Procedures are described in prose.
Open Source Code	No	The paper states: "We use the same setup as in Harutyunyan et al. (2021), where the code is publicly available3." Footnote 3 points to a GitHub repository for Harutyunyan et al.'s work. This indicates the authors used code from prior work, not that they are releasing their own code for the methodology described in this paper.
Open Datasets	Yes	We empirically validate our proposed bounds in various neural networks using CIFAR10 and its noisy variant in Section 4. We use the same experimental settings in Harutyunyan et al. (2021), i.e., we fine-tune a Res Net-50 (He et al., 2016) on the CIFAR10 dataset (Krizhevsky et al., 2009) (pretrained (Schmidhuber, 1992) on Image Net (Deng et al., 2009)). The empirical evaluation of our bounds for generalization with sensitive attributes using using the COMPAS (Larson et al., 2016) dataset are available in Figure 5. Additionally, in Figure 13 of appendix C.7, we provide extra results using Adult (Kohavi & Becker, 1996) dataset. The main results for both approaches are presented in Figures 11 and 12, respectively. As can be seen, the results for both approaches are consistent with the neural networks experiments further confirming the ability of our bounds to capture the complex behavior of class-generalization. (Referring to MNIST).
Dataset Splits	No	The paper states: "For every number of training data n, we run m1 number of Monte-Carlo trials, i.e., we select m1 different 2n samples from the original dataset.Then, for each z[2n], we draw m2 different train/test splits, i.e., m2 random realizations of U." This describes a method of generating splits for their supersample setting but does not provide specific percentages or absolute counts for standard train/validation/test sets for the datasets used.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper mentions using "Res Net-50" and "SGD" but does not specify version numbers for these or any other software libraries or frameworks used.
Experiment Setup	Yes	The training is conducted for 40 epochs using SGD with a learning rate of 0.01 and a batch size of 256.