reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributionally Robust Classification on a Data Budget

Authors: Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To rigorously address this question, we introduce JANu S (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard Res Net-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP Res Net-50 trained on 400 million samples.
Researcher Affiliation	Academia	Benjamin Feuer EMAIL Department of Computer Science and Engineering New York University 370 Jay St., 11 Fl., Brooklyn, NY, 11201 Ameya Joshi EMAIL Department of Electrical and Computer Engineering New York University Minh Pham EMAIL Department of Computer Science and Engineering New York University Chinmay Hegde EMAIL Department of Computer Science and Engineering New York University
Pseudocode	No	The paper includes diagrams (e.g., Figure 14 'Subset matching; an overview.') that illustrate processes, but it does not contain structured pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	In order to enable future research and reproducibility, we release our code, our dataset, and a complete enumeration of our results for all models in the study (see supplemental attachments).
Open Datasets	Yes	To rigorously address this question, we introduce JANu S (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions... In order to enable future research and reproducibility, we release our code, our dataset, and a complete enumeration of our results for all models in the study (see supplemental attachments).
Dataset Splits	Yes	Shifts. Following Radford et al. (2021), we focus on the following four distribution shifts: Imagenet-Sketch, Imagenet R, Imagenet-A, and Imagenet-V2, for our evaluation. For Image Net-R and Image Net-A, which are subsets of Image Net, we evaluate only the 35 shared classes. We reference the validation sets for these shifts on IN100 as IN100-V2, IN100-S, IN100-R, and IN100-A.
Hardware Specification	Yes	Models are typically distributed across a single node with 4 NVIDIA A100 GPUs; our largest models were trained on 16 NVIDIA GPUs.
Software Dependencies	No	The paper mentions using the 'AMP library' and refers to 'pytorch-image-models (timm) repository (Wightman, 2019)' and 'open-clip repository (Ilharco et al., 2021)'. However, it does not provide specific version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	In all our model training experiments, we train with mixed precision, at a batch size of 256, and do not use gradient clipping. We use the AMP library to implement the training process. Model hyperparameters are chosen via grid search. All JANu S models were trained for 256 epochs. Following Santurkar et al. (2022), we use Sim CLR augmentations (resize, crop, flip, jitter, blur, grayscale) rather than CLIP augmentations (resize and crop) for model training.