reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Instruct Where the Model Fails: Generative Data Augmentation via Guided Self-contrastive Fine-tuning

Authors: Weijian Ma, Ruoxin Chen, Keyue Zhang, Shuang Wu, Shouhong Ding

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on few-shot class incremental learning show that our instruction-guided finetuning strategy consistently assists the downstream model with higher classification accuracy compared to generative data augmentation baselines such as Stable Diffusion and GPT-4o, and state-of-the-art non-generative strategies. Our experimental results on few-shot class incremental learning (FSCIL) demonstrate that our instruction-guided finetuning approach consistently enhances the downstream model s classification accuracy throughout the continual learning process. This improvement surpasses the performance achieved by generative data augmentation methods, including Stable Diffusion and GPT-4o, as well as state-of-the-art FSCIL strategies. Experiment Settings Dataset and Evaluation Metrics. We conduct our experiment under the setting of (Tao et al. 2020) and (Park, Song, and Park 2024) for fair comparison. The method is evaluated with state-of-the-art method on following datasets: mini Image Net (Ravi and Larochelle 2017), CUB200 (Wah et al. 2011) and CIFAR-100 (Krizhevsky 2009). Ablation Study We perform two ablation studies on mini Imagenet dataset and CUB200 dataset to justify our design of finetuning on both semantic level and in details.
Researcher Affiliation	Collaboration	Weijian Ma1*, Ruoxin Chen2, Keyue Zhang2, Shuang Wu2, Shouhong Ding2 1School of Computer Science, Fudan University 2Youtu Lab, Tencent EMAIL
Pseudocode	No	The paper describes the method conceptually with diagrams (Figure 1 and 2) and prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Dataset and Evaluation Metrics. We conduct our experiment under the setting of (Tao et al. 2020) and (Park, Song, and Park 2024) for fair comparison. The method is evaluated with state-of-the-art method on following datasets: mini Image Net (Ravi and Larochelle 2017), CUB200 (Wah et al. 2011) and CIFAR-100 (Krizhevsky 2009).
Dataset Splits	Yes	The split configurations in all datasets are shown in Table 2 which remains the same as the prevailing settings in (Tao et al. 2020) and (Park, Song, and Park 2024). ... Table 2: Configuration settings for FSCIL benchmarks on CUB-200, CIFAR-100, and mini Image Net. CUB200 Base 100, Incremental 10-way 5-shot, # of sessions 1+10; CIFAR-100 Base 60, Incremental 5-way 5-shot, # of sessions 1+8; mini Image Net Base 60, Incremental 5-way 5-shot, # of sessions 1+8
Hardware Specification	Yes	The method is trained on 8 H100-80G GPUs.
Software Dependencies	No	The paper mentions specific models like 'Stable Diffusion v1.5', 'GPT4o', and 'VIT-B/16', but does not provide specific version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA) that would be needed for replication.
Experiment Setup	Yes	Implementation Details. We use Stable Diffusion v1.5 as our diffusion augmentor with the CFG guidance scale as 2, following the configuration of (Sarıyıldız et al. 2023). The total diffusion steps is set to 20. The VLM we utilize is GPT4o in the main experiment. The initial prompt for SD 1.5 remains fixed as A picture of a [category]. For downstream models, we used a VIT-B/16 (Dosovitskiy et al. 2021) pretrained on Image Net-21K (Deng et al. 2009) for ours and other comparative methods. The learning rate of the downstream model is set as 2e-4, using the Adam optimizer and cosine annealing learning rate scheduler. For each image in the training set, we augment the number of images to the original size in each epoch for augmentation. For both base session and incremental sessions, we train our network until convergence.