Understanding Self-Supervised Pretraining with Part-Aware Representation Learning
Authors: Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare the off-the-shelf encoders pretrained with several representative methods on object-level recognition and part-level recognition. The results show that the fully-supervised model outperforms self-supervised models for object-level recognition, and most self-supervised contrastive learning and masked image modeling methods outperform the fully-supervised method for part-level recognition. |
| Researcher Affiliation | Collaboration | 1Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education, China 2School of Computer Science, Peking University, 3School of AI, Peking University, 4UC Berkeley 5School of EIC, Huazhong University of Science & Technology, 6University of Hong Kong, 7Baidu |
| Pseudocode | No | The paper describes methods using text and diagrams like Figure 3 and Figure 5, which illustrate pipelines, but does not contain explicit pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper states, "We take the training epochs specified in each work to ensure that all compared models are properly trained and use publicly available checkpoints," which refers to existing models, but it does not explicitly provide open-source code or a repository link for the methodology developed in this paper. |
| Open Datasets | Yes | We evaluate the object-level retrieval performance on two datasets, i.e., CIFAR10 (Krizhevsky et al., 2009) and COCO (Lin et al., 2014). We perform linear evaluation on ADE20K (Zhou et al., 2019)... We conduct part retrieval experiments on two datasets, CUB-200-2011 (Wah et al., 2011) and COCO (Lin et al., 2014), which provide both the positions and corresponding categories of the keypoints. We perform part-level linear semantic segmentation... on three widely used datasets: ADE20K-Part (Zhou et al., 2019)... Pascal-Part (Chen et al., 2014)... and LIP (Gong et al., 2017)... |
| Dataset Splits | Yes | CIFAR10 (Krizhevsky et al., 2009) is a widely-used dataset that has 50,000 training images and 10,000 test images. (Appendix A.2) There are 20,210 images in the training set and 2,000 images in the validation set. (Appendix A.2, referring to ADE20K) LIP (Gong et al., 2017)... There are 30,462 images in the training set and 10,000 images in the validation set. The rest 10,000 images are served as the test set with missing labels for competition evaluation. (Appendix A.2) |
| Hardware Specification | No | The paper discusses models, training epochs, batch sizes, and learning rates but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions various models (e.g., DeiT, MoCo v3, DINO, MAE, CAE, iBOT) and optimizers (AdamW), but does not provide specific version numbers for software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | We take the training epochs specified in each work to ensure that all compared models are properly trained and use publicly available checkpoints: 300 for Dei T, 300 (6007) for Mo Co v3, 400 (16007) for DINO and i BOT, 800 for BEi T, and 1600 for MAE and CAE. The learning rate (4e 4), training iterations (160k), and batch size (16) among all the experiments maintain the same during training for fair comparisons. We use SGD optimizer with a learning rate of 0.4 and 0.04 for linear probing and attentive probing, respectively. For both linear probing and attentive probing, the models are trained for 90 epochs. And the momentum of SGD is set to 0.9, the weight decay is set to 0, and the batch size is set to 1024. We use batch size 16 and 160k iterations following previous methods... For optimizer, we adopt Adam W... For learning rate, we conduct experiments on LIP part segmentation with different learning rates including 1e 5, 2e 4, 3e 4, 4e 4, 5e 4, and 1e 3... |