reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

Authors: Juliette Marrie, Michael Arbel, Julien Mairal, Diane Larlus

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are conducted using DINOv2 teachers (Oquab et al., 2024), recognized for providing strong baselines (see Section 4.2) and extended to EVA-02 MIMand CLIP-pretrained models (Fang et al., 2023; Sun et al., 2023) (see Appendix B.3). Our findings, summarized below, are validated across different architectures and various tasks: classification on specific image modalities, fine-grained classification, and semantic segmentation.
Researcher Affiliation	Collaboration	Juliette Marrie EMAIL NAVER LABS Europe Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble Michael Arbel EMAIL Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble Julien Mairal EMAIL Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble Diane Larlus EMAIL NAVER LABS Europe
Pseudocode	No	The paper describes methods and processes in narrative text and figures, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	Project page: https://europe.naverlabs.com/tskd is provided, but it is a general project page and not a direct link to a source code repository or an explicit statement of code release for the methodology described in the paper.
Open Datasets	Yes	For classification, we consider the painting, sketch and clipart datasets from Domain Net (Peng et al., 2019)... Fine-grained classification is conducted on the CUB (Wah et al., 2011), FGVC Aircraft (Maji et al., 2013) and DTD (Cimpoi et al., 2014) datasets... Finally, we use three benchmarks for segmentation: ADE20K (Zhou et al., 2017), Cityscapes (Cordts et al., 2016), and the augmented Pascal VOC (Everingham et al., 2010).
Dataset Splits	Yes	For classification, we consider the painting, sketch and clipart datasets from Domain Net (Peng et al., 2019), each composed of the same 345 classes, for which we isolate 20% of the training set for testing... In instances where no predefined validation set exists, we allocate 10% of the training set for this purpose.
Hardware Specification	Yes	All our experiments were performed on a single GPU (either V100 or A100)... For example, finetuning the Vi T-S for ADE20K takes 16 hours on a A100 GPU, while distillation with data augmentation based on Stable Diffusion takes 55 hours, and probing the Vi T-g takes 14 hours... our image mixing procedure (Pinkney, 2022) roughly takes 2 hours for 1000 images (on a V100 GPU).
Software Dependencies	No	The paper mentions optimizers (AdamW, SGD) and a software tool (mmsegmentation) but does not provide specific version numbers for any key software components or libraries.
Experiment Setup	Yes	Probing runs for 20 epochs for Vi T-L/g and 30 epochs for Vi T-S, while finetuning lasts for 50 epochs for Vi T-L and 80 epochs for Vi T-S. We use the Adam W optimizer for training Vi Ts and SGD with momentum for Res Net-50, and a cosine scheduler in both cases. We use a fixed distillation temperature of T = 2 and a constant weighting between Ltask and Ldistill set to α=0.5 for all experiments. The selection of weight decay and learning rate is determined through a grid search on the validation set, with specific details available in the appendix.