reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Use of Anchoring for Training Vision Models

Authors: Vivek Sivaraman Narayanaswamy, Kowshik Thopalli, Rushil Anirudh, Yamen Mubarka, Wesam Sakla, Jay Thiagarajan

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our proposed approach across datasets and architectures of varying scales and complexities, demonstrating substantial performance gains in generalization and safety metrics compared to the standard training protocol.
Researcher Affiliation	Collaboration	Vivek Narayanaswamy Lawrence Livermore National Laboratory EMAIL Kowshik Thopalli Lawrence Livermore National Laboratory EMAIL Rushil Anirudh Amazon EMAIL Yamen Mubarka Lawrence Livermore National Laboratory EMAIL Wesam Sakla Lawrence Livermore National Laboratory EMAIL Jayaraman J. Thiagarajan Lawrence Livermore National Laboratory EMAIL
Pseudocode	Yes	Figure 3: Py Torch style pseudo code for our proposed approach.
Open Source Code	Yes	The open-source code is available at https://software.llnl.gov/anchoring
Open Datasets	Yes	CIFAR-10 and (ii) CIFAR-100 [13] datasets contain 50, 000 training samples and 10, 000 test samples each of size 32 × 32 belonging to 10 and 100 classes, respectively; (iii) Image Net-1K [14] is a large-scale vision benchmark comprising 1.3 million training images and 50, 000 validation images across 1000 diverse categories.
Dataset Splits	Yes	Image Net-1K [14] is a large-scale vision benchmark comprising 1.3 million training images and 50, 000 validation images across 1000 diverse categories.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions using 'high-capacity architectures' but no specific hardware.
Software Dependencies	No	The paper mentions 'PyTorch style pseudo code' and references 'https://pytorch.org/vision', implying the use of PyTorch, but it does not specify any software components with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	Choice of α. Through extensive empirical studies with multiple architectures, we found using the masking schedule hyper-parameter α = 0.2 (corresponds to every 5th batch in an epoch), leads to stable convergence (closely match the top-1 validation accuracy of standard training) on Image Net and α = 0.25 for CIFAR10/100. Note that, our approach performs reference masking for an entire batch as determined by α. We have included our analysis on the impact of choice of α in Section A.1. Table 5 outlines the recipes (augmentations, epochs, optimizers) leveraged for model training.