Exploring Weak-to-Strong Generalization for CLIP-based Classification

Authors: Jinhao Li, Sarah Monazam Erfani, Lei Feng, James Bailey, Feng Liu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate the performance of the proposed method, CPL, using the Domain Net dataset (Peng et al., 2019), which includes six diverse visual domains... Extensive experiments demonstrate that our approach is effective under these settings, achieving a 3.67% improvement over strong baseline methods.
Researcher Affiliation Academia Jinhao Li EMAIL School of Computing and Information Systems University of Melbournem, Australia. Sarah M. Erfani EMAIL School of Computing and Information Systems University of Melbourne, Australia. Lei Feng EMAIL School of Computer Science and Engineering Southeast University, China. James Bailey EMAIL School of Computing and Information Systems University of Melbourne, Australia. Feng Liu EMAIL School of Computing and Information Systems University of Melbourne, Australia.
Pseudocode Yes Algorithm 1: Weak-to-strong Generalization for VLMs
Open Source Code No The paper does not provide an explicit statement about open-sourcing their code or a link to a repository.
Open Datasets Yes In our exploration of weak-to-strong scenarios, we turn to the challenging and relatively large dataset: Domain Net (Peng et al., 2019). Comprising six diverse domains, each housing 345 categories of common objects, Domain Net offers a rich landscape for analysis... Ablation on Office Home. As shown in Table 6, our method achieves the best performance across all four domains of the Office Home dataset, outperforming existing baselines by a clear margin.
Dataset Splits Yes (1) Dataset splitting: Referring to Table 2, each domain is divided into a training set Dtrain and a test set Dtest. The test set Dtest is further partitioned into Dhold and D test, comprising 80% and 20% of Dtest respectively.
Hardware Specification Yes All our experiments are conducted using a single A100 GPU with 40GB of memory, supported by 8 CPU workers and 64GB of RAM.
Software Dependencies No The code is mainly based on Pytorch and the Huggingface library. This statement lacks specific version numbers for the mentioned libraries.
Experiment Setup Yes During training, we used a test batch size of 2048 for evaluation. The weak model was trained for 3 epochs with a batch size of 512 and a learning rate of 1e-3, whereas the strong model underwent 10 epochs with the same batch size and a learning rate of 1e-2. The learning rate was adjusted dynamically, and a warm-up ratio of 0.1 was utilized.