A Similarity Paradigm Through Textual Regularization Without Forgetting

Authors: Fangming Cui, Jan Fong, Rongfei Zeng, Xinmei Tian, Jun Yu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Four representative tasks (i.e., non-generalization few-shot learning, base-to-novel generalization, cross-dataset generalization, domain generalization) across 11 datasets demonstrate that SPTR outperforms existing prompt learning methods. Extensive experiments show that SPTR performs favorably well on 4 types of representative tasks across 11 datasets compared to the existing prompt learning methods, achieving state-of-the-art performance.
Researcher Affiliation Academia Fangming Cui1, Jan Fong2, Rongfei Zeng3, Xinmei Tian4, Jun Yu5* 1Shanghai Jiao Tong University 2Hong Kong Baptist University 3Northeastern University 4University of Science and Technology of China 5Harbin Institute of Technology (Shenzhen) EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical formulas and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes The datasets cover multiple scenes including Image Net (Deng et al. 2009) and Caltech101 (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), FGVCAircraft (Maji et al. 2013), SUN397 (Xiao et al. 2010) UCF101 (Soomro, Zamir, and Shah 2012), DTD (Cimpoi et al. 2014) and Euro SAT (Helber et al. 2019). We use Image Net A (Hendrycks et al. 2021b), Image Net-R (Hendrycks et al. 2021a), Image Net-Sketch (Wang et al. 2019) and Image Net V2 (Recht et al. 2019) for domain generalization.
Dataset Splits Yes The datasets are split into base and novel classes. The model is trained only on the base classes in the 16 shots setting and evaluated on base and novel classes. The evaluation of the model is performed at different K-shot levels per class, encompassing values of 1, 2, 4, 8, and 16.
Hardware Specification No The paper mentions training on 'a single GPU' and using 'Vi T-B/16 model-based CLIP' and other Vi T instances, but does not provide specific hardware details like GPU model numbers (e.g., NVIDIA A100, RTX 3090) or CPU specifications.
Software Dependencies No The paper mentions using pre-trained CLIP and Vi T-B/16 models, but does not specify any software libraries or dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We set the embeddings length to 4. We train 50 epochs for few-shot learning tasks and 20 epochs for other tasks. The learning rate is 0.0025 via SGD optimizer on a single GPU. We use the Vi T-B/16 model-based CLIP and set α to 0.3. For domain generalization and cross-dataset evaluation, we train the Image Net source model on all classes in the first 3 layers of encoders. For the base-to-novel settings and few-shot learning, we set learning depth to 9. We set N to 60, which is consistent with pre-trained CLIP. Attacks are generated with a perturbation boundary ϵ = 1/255.