On Local Overfitting and Forgetting in Deep Neural Networks

Authors: Uri Stern, Tomer Yaacoby, Daphna Weinshall

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluations demonstrate the efficacy of our method across multiple datasets, contemporary neural network architectures, and training protocols. In Section 6 we describe the empirical validation of our method in a series of experiments over image classification datasets with and without label noise, using various network architectures, including in particular modern networks over Imagenet.
Researcher Affiliation Academia Uri Stern, Tomer Yaacoby and Daphna Weinshall School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Knowledge Fusion (KF) Input: Checkpoints of trained model {n0,...,n E}, w, test-pt x Output: prediction for x {A1,...,Ak}, { ε1, ..., εk} calc early forget({ n0,...,n E}) prob get class probs[E] for i 1 to k do prob A mean(get class probs[Ai w : Ai + w]) prob εi prob A + (1 εi) prob end for prediction argmax(prob) Return prediction
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository. It refers to "Full implementation details are provided in App. E." and "complete archived version of this paper (Stern, Yaacoby, and Weinshall 2024)" for appendices, but this does not confirm the release of source code.
Open Datasets Yes We use various image classification datasets, neural network architectures, and training schemes. The main results are presented in Tables 1-3, followed by a brief review of our extensive ablation study and additional comparisons in Section 6.2. All references to appendices below are to be found in the complete archived version of this paper (Stern, Yaacoby, and Weinshall 2024). Specifically, in Table 1 we report results while using multiple architectures trained on CIFAR-100, Tiny Imagenet and Imagenet, with different learning rate schedulers and optimizers. For comparison, we report the results of both the original predictor and some baselines. Additional results for scenarios connected to overfitting are shown in Table 2 and App. F, where we test our method on these datasets with injected symmetric and asymmetric label noise (see App. E), as well as on a real label noise dataset (Animal10N).
Dataset Splits Yes In each experiment we use half of the test data for validation, to compute our method s hyper-parameters (the list of alternative epochs and {εi}), and then test the result on the remaining test data. The accuracy reported here is only on the remaining test data, averaged over three random splits of validation and test data, using different random seeds. In App. G.1 we report results on the original train/test split, where a subset of the training data is set aside for hyper-parameter tuning.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. It mentions various neural network architectures (Resnet, Conv Ne Xt, Vi T, Max Vi T) and optimizers (SGD, Adam W) but no hardware.
Software Dependencies No The paper does not specify any software names with version numbers. While it implicitly uses deep learning frameworks, no versions for libraries like Python, PyTorch, or CUDA are provided.
Experiment Setup Yes In Fig. 3a we report the results, showing that all networks forget some portion of the data during training as in the label noise scenario, even if the test accuracy never decreases. In Section 6.1, the paper mentions using "various image classification datasets, neural network architectures, and training schemes" with "different learning rate schedulers and optimizers (SGD, Adam W)". It also states: "In each experiment we use half of the test data for validation, to compute our method s hyper-parameters (the list of alternative epochs and {εi})" and "In our experiments, we use a fixed window w = 1". Furthermore, "Full implementation details are provided in App. E."