MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Authors: Yue Wang, Shuai Xu, Xuelin Zhu, Yicong Li
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three widely used datasets fully validate the effectiveness and superiority of the proposed model. Data and code are available at https://github.com/ltpwy/MSCI. |
| Researcher Affiliation | Academia | 1Nanjing University of Aeronautics and Astronautics, Nanjing, China 2Key Laboratory of Social Computing and Cognitive Intelligence (Dalian University of Technology), Ministry of Education, China 3The Hong Kong Polytechnic University, Hong Kong, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using equations and prose, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Data and code are available at https://github.com/ltpwy/MSCI. |
| Open Datasets | Yes | We evaluate the performance of the proposed MSCI on three widely-used compositional zero-shot learning datasets: MITStates [Isola et al., 2015], UT-Zappos [Yu and Grauman, 2014], and C-GQA [Naeem et al., 2021]. |
| Dataset Splits | Yes | Consistent with previous research, we adopt the dataset partitioning method proposed by Purushwalkam et al. [Purushwalkam et al., 2019] , with specific details presented in Table 1. |
| Hardware Specification | Yes | All experiments are conducted on an Nvidia H20 GPU. |
| Software Dependencies | No | The paper mentions "PyTorch" and "CLIP's backbone with the Vi T-L/14 architecture", but specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | During training, we use the Adam optimizer, combined with learning rate decay and weight decay strategies. To simplify the model complexity, we use only one cross-attention layer for both local feature interaction and global feature fusion across the three datasets, with 12 attention heads and a dropout rate set to 0.1. The parameter β, used to control the inference weights of each branch, is set to 0.1, 1.0 and 0.1 for MITStates, UT-Zappos and C-GQA in the close-world setting, and set to 0.3, 1.0 and 0.3 in the open-world setting. |