reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Few-shot Novel Category Discovery

Authors: Chunming Li, Shidong Wang, Haofeng Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and detailed analysis on five commonly used datasets demonstrate that our methods can achieve leading performance levels across different task settings and scenarios.
Researcher Affiliation	Academia	Chunming Li1 , Shidong Wang2 , Haofeng Zhang1 1School of Computer Science and Engineering, Nanjing University of Science and Technology, China. 2School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.
Pseudocode	No	The paper describes the methods Semi-supervised Hierarchical Clustering (SHC) and Uncertainty-aware K-means Clustering (UKC) in detail using descriptive text and mathematical equations (e.g., equations 2-6), but does not present them in a structured pseudocode or algorithm block.
Open Source Code	Yes	Code is available at: https://github.com/Ashengl/ FSNCD.
Open Datasets	Yes	We evaluate our methods on two well-known large-scale datasets: CIFAR-100 [Krizhevsky et al., 2009] and Image Net-100 [Krizhevsky et al., 2012], as well as two fine-grained datasets, including CUB-200 [Reed et al., 2016] and Stanford Cars [Krause et al., 2013]. Statistical comparison of data partitions (i.e., training and test) across C-100 (CIFAR-100), I-100 (Image Net-100), CUB (CUB-200), Cars (Stanford Cars) and Aircraft (FGVC-Aircraft).
Dataset Splits	Yes	Each dataset is split into two sets: labeled data is used to train the model, and unlabeled data is used for testing. Statistics on the partitioning of adopted datasets are presented in Table 2. We mainly report results obtained with two different settings: 5-way 5-shot (5w5s) and 5-way 1-shot (5w1s). For better comparison, we set the number of new classes to 5 for both configurations (5n), denoted as nnew, with each class containing 15 images as the query set. Moreover, we initialize a real-time inference evaluation scenarios, which are in line with the objectives enabling the agent to freely switch between categorizing known and clustering novel classes. It is worth noting that for the real-time inference scenario, only one image from an individual class is allowed in the query set per episode.
Hardware Specification	No	The paper mentions using a 'vision transformer (Vi T-B-16) pre-trained on Image Net with DINO' for feature extraction, but it does not specify any hardware details like GPU/CPU models or memory used for training or inference.
Software Dependencies	No	The paper mentions using 'Vi T-B-16' and 'DINO' as core components and specifies 'supervised contrastive learning loss [Khosla et al., 2020]', but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	The initial learning rate is set to 0.01. We mainly report results obtained with two different settings: 5-way 5-shot (5w5s) and 5-way 1-shot (5w1s). For better comparison, we set the number of new classes to 5 for both configurations (5n), denoted as nnew, with each class containing 15 images as the query set. Moreover, we initialize a real-time inference evaluation scenarios, which are in line with the objectives enabling the agent to freely switch between categorizing known and clustering novel classes. It is worth noting that for the real-time inference scenario, only one image from an individual class is allowed in the query set per episode.