ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning

Authors: Zihan Ye, Shreyank Gowda, Shiming Chen, Xiaowei Huang, Haotian Xu, Fahad Khan, Yaochu Jin, Kaizhu Huang, Xiaobo Jin

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three popular ZSL benchmarks demonstrate that Zero Diff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/Fouri Ye/Zero Diff ICLR25.
Researcher Affiliation Academia Zihan Ye1,2,6, Shreyank N Gowda3, Shiming Chen4, Xiaowei Huang2, Haotian Xu1,2, Fahad Shahbaz Khan4,5, Yaochu Jin6, Kaizhu Huang7, Xiaobo Jin1 1Xi an Jiaotong-Liverpool University, 2University of Liverpool, 3University of Nottingham, 4Mohamed bin Zayed University of Artificial Intelligence, 5Link oping University, 6Westlake University, 7Duke Kunshan University EMAIL
Pseudocode Yes The entire training and testing algorithms can be found in Appendix A.1. We include here pseudo-code for training algorithms (Alg.1 and Alg.2) and testing algorithm (Alg. 3). We also provide details of all loss functions here:
Open Source Code Yes Our codes are available at https://github.com/Fouri Ye/Zero Diff ICLR25.
Open Datasets Yes We conduct experiments on three popular ZSL benchmarks: AWA2 (Xian et al., 2018a), CUB (Welinder et al., 2010) and SUN (Patterson & Hays, 2012).
Dataset Splits Yes We follow the commonly used setting (Xian et al., 2018a) to divide the seen and unseen classes.
Hardware Specification Yes All experiments are conducted in Quadro RTX 8000.
Software Dependencies No The paper mentions using ResNet-101, Adam optimizer, and specific diffusion models (DDGAN, Variance Preserving SDE) but does not provide version numbers for any software libraries or programming languages.
Experiment Setup Yes We use Adam to optimize all networks with an initial learning rate of 0.0005. For all datasets, λgpadv, λgpdiff, and λgprep are fixed at 10. Following DDGAN (Xiao et al., 2022), the number of diffusion steps T is set to 4, and we use the discretization of the continuous-time extension, known as the Variance Preserving (VP) SDE (Song et al., 2020b) to compute βt in Eq. 5.