reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Authors: Jiaqing Zhang, Mingxiang Cao, Xue Yang, Kai Jiang, Yunsong Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Diff CLIP on widely used high-dimensional multimodal datasets, demonstrating its effectiveness in addressing few-shot annotated classification tasks. Diff CLIP achieves an overall accuracy improvement of 10.65% across three remote sensing datasets compared with CLIP, while utilizing only 2-shot image-text pairs. Experiments Experiments Setup Datasets Description The experiments are conducted on four widely recognized benchmarks to assess the performance of our proposed method: Houston (Debes et al. 2014), Trento (Rasti, Ghamisi, and Gloaguen 2017), MUUFL (Gader et al. 2013) and MRNet dataset (Bien et al. 2018). Comparison Results Ablation Studies
Researcher Affiliation	Academia	1The State Key Laboratory of Integrated Services Networks, Xidian University 2Shanghai AI Laboratory
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical formulations in sections like "Unsupervised Mask Diffusion" and "Few-shot Language-Driven Classification" but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://github.com/icey-zhang/Diff CLIP
Open Datasets	Yes	The experiments are conducted on four widely recognized benchmarks to assess the performance of our proposed method: Houston (Debes et al. 2014), Trento (Rasti, Ghamisi, and Gloaguen 2017), MUUFL (Gader et al. 2013) and MRNet dataset (Bien et al. 2018).
Dataset Splits	Yes	For fair comparison, we randomly sample 40 samples per class for training with labels, and the remaining samples for evaluation. ... using 10 samples of MRNet data to train and the rest to test.
Hardware Specification	Yes	The experiments are conducted on a system with an NVIDIA Ge Force RTX A100 GPU.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as programming languages or libraries. It only mentions general tools like the Adam optimizer and ViT.
Experiment Setup	Yes	For optimization in both unsupervised and few-shot learning, the Adam optimizer is used with an initial learning rate of 1e-4 and weight decay of 1e-5. Two schedulers are employed: a cosine scheduler for unsupervised learning and a step scheduler for few-shot learning. The training consists of 100 epochs for unsupervised learning and 150 epochs for fewshot learning. To ensure optimal performance in comparative experiments, the batch size is set to 256 for unsupervised learning and 64 for few-shot learning, with consistent parameter settings across all datasets.