Why Fine-grained Labels in Pretraining Benefit Generalization?
Authors: Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley H. Chan, Enming Luo
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To convince readers who are less familiar with this particular training strategy, we conduct an experiment on Image Net with details described in Appendix A.2 (we also include experiments on i Naturalist 2021 in Appendix A). |
| Researcher Affiliation | Collaboration | Guan Zhe Hong EMAIL Purdue University Yin Cui EMAIL NVIDIA Ariel Fuxman EMAIL Google Research Stanley H. Chan EMAIL Purdue University Enming Luo EMAIL Google Research |
| Pseudocode | No | The paper describes mathematical derivations and the SGD update rule, but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | No | The paper states: "All of our experiments were performed using tools in the Scenic library Dehghani et al. (2022)." This refers to a third-party tool used by the authors, not their own implementation code being released. There is no explicit statement about releasing their source code, nor is a link provided. |
| Open Datasets | Yes | To convince readers who are less familiar with this particular training strategy, we conduct an experiment on Image Net with details described in Appendix A.2 (we also include experiments on i Naturalist 2021 in Appendix A). Published in Transactions on Machine Learning Research (10/2024) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. Grant Van Horn and macaodha. inat challenge 2021 fgvc8, 2021. URL https://kaggle.com/competitions/ inaturalist-2021. |
| Dataset Splits | Yes | Figure 2 shows an experiment of pre-training on Image Net21k and fine-tuning the pre-trained network using Image Net1k. More specifically, we set X src train and X tgt train both equal to the training split of the input samples in i Naturalist2021, and set X tgt train to the testing split of the input samples in i Naturalist2021. The Image Net21k dataset we experiment on contains a total of 12,743,321 training samples and 102,400 validation samples, with 21843 leaf labels. |
| Hardware Specification | Yes | Each training instance (90 epochs) is run on 64 TPU v4 chips, taking approximately 1.5 to 2 days. |
| Software Dependencies | No | The paper mentions "Scenic library" and "Vi T-B/16 model Dosovitskiy et al. (2021)" as tools and models used, but does not provide specific version numbers for these software components. For example, it does not state "Scenic library vX.Y.Z" or provide a version for Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Optimization: SGD with 0.9 momentum coefficient, 0.00005 weight decay, 4096 batch size, 90 epochs total training length. We perform 7 epochs of linear warmup in the beginning of training until the learning rate reaches 0.1 * 4096/256 = 1.6, and then apply the cosine annealing schedule. For finetuning, we keep everything in the pipeline the same except setting the batch size to 4096/4 = 1024 and base learning rate 1.6/4 = 0.4. |