Exploring Weak-to-Strong Generalization for CLIP-based Classification
Authors: Jinhao Li, Sarah Monazam Erfani, Lei Feng, James Bailey, Feng Liu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to evaluate the performance of the proposed method, CPL, using the Domain Net dataset (Peng et al., 2019), which includes six diverse visual domains... Extensive experiments demonstrate that our approach is effective under these settings, achieving a 3.67% improvement over strong baseline methods. |
| Researcher Affiliation | Academia | Jinhao Li EMAIL School of Computing and Information Systems University of Melbournem, Australia. Sarah M. Erfani EMAIL School of Computing and Information Systems University of Melbourne, Australia. Lei Feng EMAIL School of Computer Science and Engineering Southeast University, China. James Bailey EMAIL School of Computing and Information Systems University of Melbourne, Australia. Feng Liu EMAIL School of Computing and Information Systems University of Melbourne, Australia. |
| Pseudocode | Yes | Algorithm 1: Weak-to-strong Generalization for VLMs |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing their code or a link to a repository. |
| Open Datasets | Yes | In our exploration of weak-to-strong scenarios, we turn to the challenging and relatively large dataset: Domain Net (Peng et al., 2019). Comprising six diverse domains, each housing 345 categories of common objects, Domain Net offers a rich landscape for analysis... Ablation on Office Home. As shown in Table 6, our method achieves the best performance across all four domains of the Office Home dataset, outperforming existing baselines by a clear margin. |
| Dataset Splits | Yes | (1) Dataset splitting: Referring to Table 2, each domain is divided into a training set Dtrain and a test set Dtest. The test set Dtest is further partitioned into Dhold and D test, comprising 80% and 20% of Dtest respectively. |
| Hardware Specification | Yes | All our experiments are conducted using a single A100 GPU with 40GB of memory, supported by 8 CPU workers and 64GB of RAM. |
| Software Dependencies | No | The code is mainly based on Pytorch and the Huggingface library. This statement lacks specific version numbers for the mentioned libraries. |
| Experiment Setup | Yes | During training, we used a test batch size of 2048 for evaluation. The weak model was trained for 3 epochs with a batch size of 512 and a learning rate of 1e-3, whereas the strong model underwent 10 epochs with the same batch size and a learning rate of 1e-2. The learning rate was adjusted dynamically, and a warm-up ratio of 0.1 was utilized. |