MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

Authors: Yue Wang, Shuai Xu, Xuelin Zhu, Yicong Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three widely used datasets fully validate the effectiveness and superiority of the proposed model. Data and code are available at https://github.com/ltpwy/MSCI.
Researcher Affiliation Academia 1Nanjing University of Aeronautics and Astronautics, Nanjing, China 2Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education, China 3The Hong Kong Polytechnic University, Hong Kong, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using equations and prose, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Data and code are available at https://github.com/ltpwy/MSCI.
Open Datasets Yes We evaluate the performance of the proposed MSCI on three widely-used compositional zero-shot learning datasets: MITStates [Isola et al., 2015], UT-Zappos [Yu and Grauman, 2014], and C-GQA [Naeem et al., 2021].
Dataset Splits Yes Consistent with previous research, we adopt the dataset partitioning method proposed by Purushwalkam et al. [Purushwalkam et al., 2019] , with specific details presented in Table 1.
Hardware Specification Yes All experiments are conducted on an Nvidia H20 GPU.
Software Dependencies No The paper mentions "PyTorch" and "CLIP's backbone with the Vi T-L/14 architecture", but specific version numbers for these software components are not provided.
Experiment Setup Yes During training, we use the Adam optimizer, combined with learning rate decay and weight decay strategies. To simplify the model complexity, we use only one cross-attention layer for both local feature interaction and global feature fusion across the three datasets, with 12 attention heads and a dropout rate set to 0.1. The parameter β, used to control the inference weights of each branch, is set to 0.1, 1.0 and 0.1 for MITStates, UT-Zappos and C-GQA in the close-world setting, and set to 0.3, 1.0 and 0.3 in the open-world setting.