reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Seeing the Unseen: Composing Outliers for Compositional Zero-Shot Learning

Authors: Chenchen Jing, Mingyu Liu, Hao Chen, Yuling Xi, Xingyuan Bu, Dong Gong, Chunhua Shen

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three datasets show the effectiveness of our method in both the closed-world setting and the open-world setting.
Researcher Affiliation	Collaboration	Chenchen Jing1,2 , Mingyu Liu3 , Hao Chen3 , Yuling Xi3 , Xingyuan Bu4 , Dong Gong5 , Chunhua Shen1,2 1College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China 2Zhejiang Key Laboratory of Visual Information Intelligent Processing, Hangzhou, China 3Zhejiang University, China 4Alibaba Group 5The University of New South Wales
Pseudocode	No	The paper describes the method and architecture using natural language and figures (e.g., Figure 2: Overview of our method), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. It mentions using CLIP as a backbone but not providing their own implementation code.
Open Datasets	Yes	We conduct experiments on three widely used datasets, UT-Zappos [Yu and Grauman, 2014], MIT-States [Isola et al., 2015], and C-GQA [Naeem et al., 2021].
Dataset Splits	Yes	UT-Zappos is a synthetic fine-grained dataset consisting of 116 kinds of shoe classes composed of 16 attributes (e.g., rubber) and 12 objects (e.g. sandal). The dataset is split into 83 seen and 15/18 unseen compositions for training and validation/testing. MIT-States consists of 53,753 crawled web images labeled with 1962 attribute-object. The dataset contains 1,262 seen and 300/400 unseen compositions for training and validation/testing, respectively. C-GQA contains over 9,000 common compositions and is split into 5,592 seen and 1,040/923 unseen compositions for training and validation/testing, respectively.
Hardware Specification	No	The paper mentions using ResNet and CLIP (ViT-L/14) as backbones but does not specify any particular hardware like GPU models, CPU types, or memory used for experiments.
Software Dependencies	No	The paper mentions using backbones like Res Net [He et al., 2016] and CLIP [Radford et al., 2021] but does not provide specific version numbers for any software, libraries, or programming languages used.
Experiment Setup	Yes	For the CLIP backbone, the training epochs for each dataset as 5/15 for the two stages, respectively. In the first stage, the hyper-parameters α1, α2, and α3 are set as (0.1, 0.1, 5.0) for the UT-Zappos, (0.01, 0.01, 1.0) for MIT-State, and (0.1, 0.5, 1.0) for C-GQA, respectively. For the Res Net backbone, the training epochs for each dataset as 50/100 for the two stages, respectively. The hyperparameters α1, α2, and α3 are set as (5.0, 0.1, 5.0) for the UT-Zappos, (5.0, 0.1, 1.0) for MIT-States, and (0.1, 1.0, 1.0) for C-GQA, respectively.