reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What do larger image classifiers memorise?

Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a comprehensive empirical analysis of this question on image classiﬁcation benchmarks. We ﬁnd that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experience decreased memorisation under larger models, while the rest exhibit cap-shaped or increasing memorisation. We show that various proxies for the Feldman (2019) memorisation score fail to capture these fundamental trends. Lastly, we ﬁnd that knowledge distillation an eﬀective and popular model compression technique tends to inhibit memorisation, while also improving generalisation. Speciﬁcally, memorisation is mostly inhibited on examples with increasing memorisation trajectories, thus pointing at how distillation improves generalisation.
Researcher Affiliation	Industry	Michal Lukasik EMAIL Google Research Vaishnavh Nagarajan EMAIL Google Research Ankit Singh Rawat EMAIL Google Research Aditya Krishna Menon EMAIL Google Research Sanjiv Kumar EMAIL Google Research
Pseudocode	No	The paper describes methodologies using mathematical equations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about the release of source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Speciﬁcally, for a range of standard image classiﬁcation datasets CIFAR-10, CIFAR-100, and Tiny-Image Net we empirically quantify the memorisation score as we vary the capacity of standard neural models, based on the Res Net (He et al., 2016b;c) and Mobile Net-v3 (Howard et al., 2019b) family
Dataset Splits	Yes	In Table 5 we report train and test accuracies across architectures on CIFAR-100 from the one-hot training, while in Table 6 we report train and test accuracies from the distillation training across teachers of varying depths. We ﬁnd that increasing depth results in models that interpolate the training set, while also generalising better on the test set. We also show how how distillation worsens train accuracy while improving the test accuracy.
Hardware Specification	No	The paper mentions using Res Net and Mobile Net architectures on various datasets, but it does not provide any specific details about the hardware (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies	No	The paper does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be required for replication.
Experiment Setup	Yes	We train all models to minimise the softmax cross-entropy loss via minibatch SGD, with hyperparameter settings per Table 4. Table 4: Summary of training hyperparameter settings. Parameter CIFAR-10* Tiny-Image Net Weight decay 10e-4 5e-4 Batch size 1024 256 Epochs 450 90 Peak learning rate 1.0 0.1 Learning rate warmup epochs 15 5 Learning rate decay factor 0.1 Cosine schedule Learning rate decay epochs 200, 300, 400 N/A Nesterov momentum 0.9 0.9 Distillation weight 1.0 1.0 Distillation temperature 3.0 1.0