reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

Authors: Kai Gan, Bo Ye, Min-Ling Zhang, Tong Wei

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our approach significantly improves the adaptability of CLIP in target tasks with limited labeled data, achieving gains ranging from 1.72% 6.58% for zero-shot classification accuracy and 2.32% 3.23% for image-text retrieval performance on standard benchmarks. The source code is available at https://github.com/Gank0078/Semi CLIP.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China EMAIL
Pseudocode	No	The paper describes the methodology using textual descriptions and mathematical equations (e.g., Equation 1, 2, 3, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/Gank0078/Semi CLIP.
Open Datasets	Yes	We conduct extensive experiments on four publicly available datasets to evaluate the performance of SEMICLIP. Following previous method S-CLIP (Mo et al., 2023), the datasets include Remote sensing datasets (Yang & Newsam, 2010; Zhang et al., 2014; Lu et al., 2017), Fashion datasets (Han et al., 2017; Rostamzadeh et al., 2018; Vasileva et al., 2018), Sci Cap dataset (Hsu et al., 2021), and Simpsons dataset (Attia, 2018; Adler, 2023). ...we also incorporate the RESISC45 dataset (Cheng et al., 2017) as unlabeled data (L =U). ...we conducted comparative experiments on the COCO (Lin et al., 2014) dataset.
Dataset Splits	Yes	Under the default setting, we subsample 10% image-text pairs of the training dataset randomly as labeled data, leaving the rest as unlabeled data. The models are evaluated on zero-shot classification and image-text retrieval tasks, with performance measured by Top-1 classification accuracy (%) and recall at k (R@k). ...We utilize the validation sets from the classification variants of the RSICD and UCM datasets, referred to as RSICD-CLS and UCM-CLS, respectively. ...We conduct comparative experiments on the COCO (Lin et al., 2014) dataset. The results in Tab. 16 indicate that SEMICLIP can achieve significant performance improvements on general benchmark over CLIP (fine-tuned) and S-CLIP. It is worth noting that S-CLIP s performance shows an average decrease of 4.5% compared to CLIP (fine-tuned), aligning with the paper s claim (Mo et al., 2023) that S-CLIP experiences performance drops when trained on a small number of image-text pairs in common datasets like COCO. However, the superior performance of our proposed SEMICLIP is unaffected by the different types of datasets, achieving significant improvements on both commonly used datasets and taskspecific datasets. 1% labeled 10% labeled
Hardware Specification	Yes	All experiments are conducted on four NVIDIA A6000 GPUs with a batch size of 64 per GPU.
Software Dependencies	No	The paper mentions the use of NLTK (Bird et al., 2009) for concept extraction, AdamW (Loshchilov, 2017) as an optimizer, and the CLIP model (Ilharco et al., 2021) as the backbone, but it does not specify version numbers for general software dependencies like programming languages or libraries (e.g., Python, PyTorch versions).
Experiment Setup	Yes	We utilize Vi T-B-16 as the default vision encoder in our experiments, and experiments related to other vision encoders are shown in Appendix A.1. We train the model 25 epochs in the supervised pre-training, and 15 epochs for semi-supervised fine-tuning. We employ Adam W (Loshchilov, 2017) alongside a weight decay set at 5 10 4 and apply the default cosine learning rate scheduling with warmup for the first 10 steps. The learning rate is set to 5 10 5 for remote sensing and fashion datasets and 1 10 6 for Sci Cap and Simpsons datasets. We establish default values of 30 for P and 4 for k.