Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

Authors: Ziming Liu, Jingcai Guo, Song Guo, Xiaocheng Lu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on large-scale MLZSL benchmark datasets NUS-Wide and Open-Images-v4 demonstrate that the proposed Epsilon outperforms other state-of-the-art methods with large margins. Extensive experiments on NUS-Wide and Open-Images-V4 datasets demonstrate the effectiveness of our method against other state-of-the-art MLZSL models.
Researcher Affiliation Academia 1Department of Computing, The Hong Kong Polytechnic University 2Department of Computer Science and and Engineering, The Hong Kong University of Science and Technology
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations (e.g., equations 1-13) and provides a pipeline diagram (Figure 2), but does not contain a dedicated pseudocode block or algorithm.
Open Source Code No The paper does not contain any explicit statement about making the code available, nor does it provide a link to a code repository.
Open Datasets Yes Datasets: The NUS-Wide dataset (Chua et al. 2009) contains approximately 270,000 images and a total of 1,006 labels. Another dataset is called the Open-Images-V4 dataset, which is much larger than the NUS-Wide dataset.
Dataset Splits Yes NUS-Wide: 81 labels manually annotated by humans will serve as labels for unseen classes . At the same time, these labels will also serve as ground-truth labels in the multilabel classification task. The remaining 925 labels were automatically extracted from Flickr users manual annotations of these images, where they will be used as labels for seen classes . This setting is similar with (Huynh and Elhamifar 2020; Ben-Cohen et al. 2021). Another dataset is called the Open-Images-V4 dataset... approximately 9 million are used as the training set... 125,456 test images and 400 unseen classes labels. These labels are derived from the other 400 most frequent labels that did not appear in the training set, which appeared at least 75 times.
Hardware Specification No The paper does not explicitly mention specific hardware components such as GPU models (e.g., NVIDIA A100, RTX series), CPU models, or memory specifications used for running experiments.
Software Dependencies No As for the selection of backbone network, we choose the pre-trained Vi T-B/16 (Dosovitskiy et al. 2020) as our backbone network. ... We choose the Adam optimizer (Kingma and Ba 2014)as the model s optimizer... The paper mentions a pre-trained model and an optimizer but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, scikit-learn) used for implementation.
Experiment Setup Yes The weight decay of the Adam optimizer is set to 4e 3. For the experiments of all the models in the NUSWide dataset, the entire training process requires a total of 7 epochs with a batch size of 96, and the initial learning rate is set to 1e 5, and then decreases by 1/2 at the 4-th epoch. In the experiments of the Open-Images-V4 dataset, the number of epochs in the training process is set to 7. Our optimizer s decay rate, model s learning rate, and batch size remain the same. ... our model has two hyper-parameters, M and λ... The model can obtain relatively optimal results when λ = 0.3.