reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation

Authors: Zhixiang Chi, Li Gu, Huan Liu, Ziqiang Wang, Yanan Wu, Yang Wang, Konstantinos Plataniotis

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show our method s superiority on 5 large-scale benchmarks (WILDS and Domain Net), notably improving over smaller networks like Vi T-B/16 with gains of +5.1 in F1 for i Wild Cam and +3.1% in WC Acc for FMo W. Our Code: L2C. We conduct ablation studies on Domain Net-Info, i Wildcam and FMo W using CLIP Vi T-B/16 on various components, including CPNet, Revert Attention (RT), text refinement (Text ref.), greedy ensemble (Greedy), uniformity loss (Luni), DAF module and training schemes in Table 3.
Researcher Affiliation	Academia	Zhixiang Chi1, Li Gu2, Huan Liu3, Ziqiang Wang2, Yanan Wu2 , Yang Wang2 , Konstantinos N Plataniotis1 1 University of Toronto, 2 Concordia University, 3 Mc Master University Q EMAIL
Pseudocode	Yes	Algorithm 1 Domain-centric learning to adapt Require: I/T: CLIP image/text encoders; {Pp}P p=1: P text prompt templates; C: C classes with names; Ds: source domains; α: learning rate; CP: CPNet; K/V: K-V domain cache; DAF: domain-aware fusion module; Mc/Md: text refinement; 1: // Greedy text feature ensemble 2: {T(Pp C)}P p=1 Compute and sort text features for all text prompt templates 3: Obtain Tgre via greedy ensemble using Eq. 3, then discard the text encoder.
Open Source Code	No	Pre-trained models and the full code will be released upon publication of the paper.
Open Datasets	Yes	We follow VDPG to evaluate on Domain Net (Peng et al., 2019), which comprises 569K images across 345 classes in 6 domains. We also evaluate on 4 WILDS (Koh et al., 2021) benchmarks, known for their real-world challenges and notably low CLIP zero-shot accuracy (Chi et al., 2024).
Dataset Splits	Yes	We follow the official leave-one-domain-out protocol to train 6 models and report accuracy. For each iteration, we consider it as an adaptation task on a randomly sampled source domain Dn s . Two disjoint support set (x S) and query set (x Q, y Q) are sampled.
Hardware Specification	Yes	All the experiments can be conducted with a single NVIDIA V100 GPU.
Software Dependencies	No	Other components, such as CPNet, DAF, text refinement, and K-V cache, utilize standard Py Torch functions.
Experiment Setup	Yes	The model is trained for 20 epochs with SGD using cosine decay with initial learning rates of 2.5e 3 and 1e 3 for WILDS and Domain Net. λ is set to 0.1 to balance the losses. We use 16 images for adaptation at inference. Appendix G&H lists additional hyperparameters and the text prompts. We set the batch size as 64 (12 images for support and 52 images for query set).