reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Improving Generalization of Few-shot models with Synthetic Data

Authors: Lan-Cuong Nguyen, Quan Nguyen-Tri, Bang Tran Khanh, Dung D. Le, Long Tran-Thanh, Khoat Than

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments results show that our approach demonstrates superior performance compared to state-of-the-art methods, outperforming them across multiple datasets. Section 5, titled 'Experiments', details the experimental settings, baselines, datasets, implementation details, main results, and ablation studies. Tables 1, 2, 3, 4, 5 and 6 present quantitative performance metrics on various datasets.
Researcher Affiliation	Collaboration	The authors are affiliated with: 1FPT Software AI Center, 2Hanoi University of Science and Technology, 3Vin University, 4University of Warwick. FPT Software AI Center is an industry affiliation, while Hanoi University of Science and Technology, VinUniversity, and University of Warwick are academic institutions. This mix indicates a collaborative affiliation.
Pseudocode	Yes	The paper includes 'Algorithm 1 Fine-tuning few-shot models with synthetic data' and 'Algorithm 2 Lightweight version', which are clearly labeled algorithm blocks detailing the proposed methods.
Open Source Code	No	The paper does not contain an explicit statement or a link to a repository for the open-source code of the methodology described. It mentions using the FAISS library and Stable Diffusion, which are third-party tools, but does not provide its own implementation code.
Open Datasets	Yes	The paper evaluates its method on 10 common datasets for few-shot image classification: FGVC Aircraft (Russakovsky et al., 2015), Caltech101 (Li et al., 2022), Food101 (Bossard et al., 2014), Euro SAT (Helber et al., 2019), Oxford Pets (Parkhi et al., 2012), DTD (Cimpoi et al., 2014), SUN397 (Xiao et al., 2010), Stanford Cars(Krause et al., 2013), and Flowers102 (Nilsback & Zisserman, 2008). These are all well-known, publicly available datasets, and the paper provides citations for them.
Dataset Splits	Yes	The paper specifies dataset usage for experiments: 'All experiments are conducted with 16 real shots and 500 synthetic images per-class, except our lightweight version, where only 64 synthetic images per class were utilized.' (Table 1 caption). It also mentions 'training/evaluation data split' in Section 5.1 when discussing DTD, implying specific data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running its experiments. It only mentions fine-tuning 'CLIP Vi T-B/16 image encoder' and using 'Stable Diffusion (SD) version 2.1', but not the hardware these were run on.
Software Dependencies	Yes	The paper specifies 'Stable Diffusion (SD) (Rombach et al., 2022) version 2.1' as the generator used, which includes a specific version number for a key software component.
Experiment Setup	Yes	The paper provides specific experimental setup details, including: 'the guidance scale of SD is set to be 2.0', 'the number of clusters... typically around twice the number of classes', 'The hyperparameters to be tuned are: λ1, λ2... The values of λ1, λ2 vary between the data sets, but consistently maintain the ratio of 1/10... The hyperparameter λ was chosen at 4 for all datasets except Stanford Cars, where we set it at 1.' (Section 5.1). Additionally, Appendix B states: 'We train our models using Adam W... searching the learning rate in {2e 4, 1e 4, 1e 5, 1e 6} and the weight decay in {1e 3, 5e 4, 1e 4}', and 'We run the K-means clustering step for 300 iterations... For the classifier tuning phase, we train for 50 epochs for the full approach and 150 epochs for the lightweight approach'.