reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

Authors: Muhammed Ildiz, Halil Gozeten, Ege Taga, Marco Mondelli, Samet Oymak

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our results on numerical experiments both on ridgeless regression and on neural network architectures. In Figure 2a, we examine the surrogate-to-target model in the context of image classification. Specifically, we fine-tune a pretrained Res Net-50 model (He et al., 2015) using both ground-truth labels and predictions from a surrogate (weak) model on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009).
Researcher Affiliation	Academia	M. Emrullah Ildiz Halil Alperen Gozeten Ege Onur Taga University of Michigan, Ann Arbor EMAIL, Marco Mondelli Institute of Science and Technology Austria EMAIL, Samet Oymak University of Michigan, Ann Arbor EMAIL
Pseudocode	No	The paper primarily focuses on theoretical derivations and analysis. It does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	Yes	Specifically, we fine-tune a pretrained Res Net-50 model (He et al., 2015) using both ground-truth labels and predictions from a surrogate (weak) model on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009).
Dataset Splits	Yes	In the CIFAR-10 experiment, we initially trained the surrogate models on the training portion of the CIFAR-10 dataset. ... During testing, all models were evaluated using the test portion of the CIFAR-10 dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running the experiments.
Software Dependencies	No	We initialize the optimizer for our model using stochastic gradient descent (SGD) provided by the optim module of Py Torch. (Lacks specific version numbers for PyTorch or other libraries).
Experiment Setup	Yes	The optimizer is configured with the following parameters: learning rate set to 0.01, momentum to 0.9, and weight decay to 5 10 4. Additionally, we define a learning rate scheduler, specifically a cosine annealing scheduler, which adjusts the learning rate using a cosine function over 200 iterations, denoted T_max. We use a batch size of 32 and trained all models over 60 epochs.