Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
What do larger image classifiers memorise?
Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a comprehensive empirical analysis of this question on image classification benchmarks. We find that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experience decreased memorisation under larger models, while the rest exhibit cap-shaped or increasing memorisation. We show that various proxies for the Feldman (2019) memorisation score fail to capture these fundamental trends. Lastly, we find that knowledge distillation an effective and popular model compression technique tends to inhibit memorisation, while also improving generalisation. Specifically, memorisation is mostly inhibited on examples with increasing memorisation trajectories, thus pointing at how distillation improves generalisation. |
| Researcher Affiliation | Industry | Michal Lukasik EMAIL Google Research Vaishnavh Nagarajan EMAIL Google Research Ankit Singh Rawat EMAIL Google Research Aditya Krishna Menon EMAIL Google Research Sanjiv Kumar EMAIL Google Research |
| Pseudocode | No | The paper describes methodologies using mathematical equations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Specifically, for a range of standard image classification datasets CIFAR-10, CIFAR-100, and Tiny-Image Net we empirically quantify the memorisation score as we vary the capacity of standard neural models, based on the Res Net (He et al., 2016b;c) and Mobile Net-v3 (Howard et al., 2019b) family |
| Dataset Splits | Yes | In Table 5 we report train and test accuracies across architectures on CIFAR-100 from the one-hot training, while in Table 6 we report train and test accuracies from the distillation training across teachers of varying depths. We find that increasing depth results in models that interpolate the training set, while also generalising better on the test set. We also show how how distillation worsens train accuracy while improving the test accuracy. |
| Hardware Specification | No | The paper mentions using Res Net and Mobile Net architectures on various datasets, but it does not provide any specific details about the hardware (e.g., GPU, CPU models) used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be required for replication. |
| Experiment Setup | Yes | We train all models to minimise the softmax cross-entropy loss via minibatch SGD, with hyperparameter settings per Table 4. Table 4: Summary of training hyperparameter settings. Parameter CIFAR-10* Tiny-Image Net Weight decay 10e-4 5e-4 Batch size 1024 256 Epochs 450 90 Peak learning rate 1.0 0.1 Learning rate warmup epochs 15 5 Learning rate decay factor 0.1 Cosine schedule Learning rate decay epochs 200, 300, 400 N/A Nesterov momentum 0.9 0.9 Distillation weight 1.0 1.0 Distillation temperature 3.0 1.0 |