Few-shot Novel Category Discovery

Authors: Chunming Li, Shidong Wang, Haofeng Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and detailed analysis on five commonly used datasets demonstrate that our methods can achieve leading performance levels across different task settings and scenarios.
Researcher Affiliation Academia Chunming Li1 , Shidong Wang2 , Haofeng Zhang1 1School of Computer Science and Engineering, Nanjing University of Science and Technology, China. 2School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.
Pseudocode No The paper describes the methods Semi-supervised Hierarchical Clustering (SHC) and Uncertainty-aware K-means Clustering (UKC) in detail using descriptive text and mathematical equations (e.g., equations 2-6), but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Code is available at: https://github.com/Ashengl/ FSNCD.
Open Datasets Yes We evaluate our methods on two well-known large-scale datasets: CIFAR-100 [Krizhevsky et al., 2009] and Image Net-100 [Krizhevsky et al., 2012], as well as two fine-grained datasets, including CUB-200 [Reed et al., 2016] and Stanford Cars [Krause et al., 2013]. Statistical comparison of data partitions (i.e., training and test) across C-100 (CIFAR-100), I-100 (Image Net-100), CUB (CUB-200), Cars (Stanford Cars) and Aircraft (FGVC-Aircraft).
Dataset Splits Yes Each dataset is split into two sets: labeled data is used to train the model, and unlabeled data is used for testing. Statistics on the partitioning of adopted datasets are presented in Table 2. We mainly report results obtained with two different settings: 5-way 5-shot (5w5s) and 5-way 1-shot (5w1s). For better comparison, we set the number of new classes to 5 for both configurations (5n), denoted as nnew, with each class containing 15 images as the query set. Moreover, we initialize a real-time inference evaluation scenarios, which are in line with the objectives enabling the agent to freely switch between categorizing known and clustering novel classes. It is worth noting that for the real-time inference scenario, only one image from an individual class is allowed in the query set per episode.
Hardware Specification No The paper mentions using a 'vision transformer (Vi T-B-16) pre-trained on Image Net with DINO' for feature extraction, but it does not specify any hardware details like GPU/CPU models or memory used for training or inference.
Software Dependencies No The paper mentions using 'Vi T-B-16' and 'DINO' as core components and specifies 'supervised contrastive learning loss [Khosla et al., 2020]', but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup Yes The initial learning rate is set to 0.01. We mainly report results obtained with two different settings: 5-way 5-shot (5w5s) and 5-way 1-shot (5w1s). For better comparison, we set the number of new classes to 5 for both configurations (5n), denoted as nnew, with each class containing 15 images as the query set. Moreover, we initialize a real-time inference evaluation scenarios, which are in line with the objectives enabling the agent to freely switch between categorizing known and clustering novel classes. It is worth noting that for the real-time inference scenario, only one image from an individual class is allowed in the query set per episode.