reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

Authors: Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four few-shot classification benchmarks and the BSCD-FSL cross-domain benchmark showcase remarkable advancements over the current state-of-the-art methods. Notably, for the challenging one-shot setting, our approach, utilizing the Res Net-12 backbone, achieves an impressive average improvement of 1.95% over the second-best competitor.
Researcher Affiliation	Academia	1College of Information Science & Electronic Engineering, Zhejiang University 2School of Aeronautics and Astronautics, Zhejiang University 3College of Computer Science and Technology, Zhejiang University EMAIL
Pseudocode	No	The paper describes its methodology and formulations using natural language and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets	Yes	We evaluate the proposed method across two primary tasks: the traditional FSL and the cross-domain FSL (CD-FSL). The traditional FSL is evaluated on four datasets, namely Mini Image Net (Vinyals et al. 2016), Tiered Image Net (Ren et al. 2018), CIFAR-FS (Lee et al. 2019), and FC100 (Oreshkin, Rodr ıguez L opez, and Lacoste 2018). Following (Guo et al. 2020), we evaluate the CD-FSL on BSCD-FSL benchmark, which involves training on Mini Image Net and testing on four unrelated datasets: Chest X (Wang et al. 2017), ISIC (Tschandl, Rosendahl, and Kittler 2018), Euro SAT (Helber et al. 2019), and Crop Disease (Mohanty, Hughes, and Salath e 2016).
Dataset Splits	Yes	For the evaluation, we uniformly sampled 600 classification tasks from a novel set that comprises classes that are disjoint from those in the base set. In each task, there are 15 query samples for each class. The mean and 95% confidence interval of the accuracy are reported.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Vi T-B/32 CLIP (Radford et al. 2021)' and 'GPT-4-o (Open AI 2023)' as models/APIs, but it does not specify any general software dependencies or library versions needed for reproducibility (e.g., Python version, PyTorch version).
Experiment Setup	Yes	The pre-training stage is set to 200 epochs for all datasets, while the meta-training stage is set to 50 epochs. The α and β in Eq. (9) are consistently assigned values of 0.2 and 0.8, respectively, across all datasets. During the pre-training phase, we set the batch size to 128, leveraging the Adam optimizer (Kingma and Ba 2014) with a learning rate of 1e-4 for optimization of the model parameters.