Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Authors: Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on different datasets (Krizhevsky et al., 2009; Wah et al., 2011; Russakovsky et al., 2015) and pre-trained representations (He et al., 2016; Dosovitskiy et al., 2021; Liu et al., 2021; 2022) reveal a positive correlation between interpretability and classifiability, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. |
| Researcher Affiliation | Collaboration | 1 Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS 2 University of Chinese Academy of Sciences 3 Harbin Institute of Technology, Weihai 4 Peng Cheng Laboratory 5 Huawei Inc. |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of procedures in text (e.g., Equation 3, Equation 4, Equation 5, Equation 7) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at here. Source code has been submitted as supplementary materials. |
| Open Datasets | Yes | We compute the IIS of representations on Image Net1K (Russakovsky et al., 2015), CUB200 (Wah et al., 2011), CIFAR-10 (Krizhevsky et al., 2009) and CIFAR-100 (Krizhevsky et al., 2009). These datasets cover diverse tasks. We also conduct experiments on video representations (Carreira & Zisserman, 2017; Feichtenhofer et al., 2019; Tong et al., 2022; Wang et al., 2023; Li et al., 2023b;a) in Appendix A.4. The Kinetics-400 dataset (Carreira & Zisserman, 2017). UCF-101 (Soomro, 2012), DTD (Cimpoi et al., 2014) and HAM10000 (Tschandl et al., 2018). |
| Dataset Splits | No | The paper mentions training sample sizes for some datasets (e.g., "CUB-200 having 5900 training samples, CIFAR datasets 50,000 each, and Image Net1K having 1-2 million training images"), but it does not specify explicit training/validation/test splits (e.g., percentages, exact counts for each split) or refer to how these splits were handled for the experiments (e.g., standard splits with a citation). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or other computing resource specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions "Torchvision (Paszke et al., 2019)" as providing pre-trained models, but it does not specify any version numbers for Torchvision or other key software components used in the implementation. |
| Experiment Setup | Yes | For the fine-tuning process with IIS maximization as the objective, we utilize the sparsity ratio s = 0.1 and vector number M = 200 to compute the simplified IIS. The pre-trained models undergo fine-tuning for 200 epochs (with the first 20 epochs to warmup), utilizing a batch size of 128. The finetuning process incorporates the Adam W optimizer with betas set to (0.9, 0.999), a momentum of 0.9, a cosine decay learning rate scheduler, an initial learning rate of 3e 4, and a weight decay of 0.3. Additional techniques such as label smoothing (0.1) and cutmix (0.2) are also employed. Table A8: Hyperparameters for CBM Training |