reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Robust Incremental Learning Under Ambiguous Supervision

Authors: Rui Wang, Mingxuan Xia, Haobo Wang, Lei Feng, Junbo Zhao, Gang Chen, Chang Yao

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that PGDR achieves superior performance over the baselines in the IPLL task. 5 Experiments 5.1 Experimental Settings Datasets. We perform experiments on CIFAR100 [Krizhevsky et al., 2009] and Tiny-Image Net [Le and Yang, 2015]. Additionally, we further conduct experiments on CUB200 [Welinder et al., 2010].
Researcher Affiliation	Academia	Rui Wang1,2 , Mingxuan Xia1,2 , Haobo Wang1,2 , Lei Feng4 , Junbo Zhao3 Gang Chen3 , Chang Yao1,2 1School of Software Technology, Zhejiang University 2 Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3 College of Computer Science and Technology, Zhejiang University 4 School of Computer Science and Engineering, Southeast University, China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the 'Prototype-Guided Disambiguation and Replay Algorithm (PGDR)' in detail within Section 4 'Proposed Method' and outlines its components. It also includes 'Figure 3: Overall framework of PGDR.'. However, it does not present a structured pseudocode or an algorithm block explicitly labeled as such.
Open Source Code	No	The paper does not contain any explicit statement about the release of source code, nor does it provide a link to a code repository in the main text or acknowledgements.
Open Datasets	Yes	Datasets. We perform experiments on CIFAR100 [Krizhevsky et al., 2009] and Tiny-Image Net [Le and Yang, 2015]. Additionally, we further conduct experiments on CUB200 [Welinder et al., 2010].
Dataset Splits	No	The paper describes how classes are partitioned into tasks, stating 'We randomly partition all classes into 10 tasks, i.e., T = 10.' and defines how new and old class samples emerge within these tasks. However, it does not explicitly provide details about the training, validation, and test dataset splits with percentages or sample counts for the experiments on CIFAR100, Tiny-Image Net, or CUB200.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions using 'Res Net-18 for feature extraction' and training with 'SGD with a momentum of 0.9', but it does not specify any software dependencies like programming languages, libraries, or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Implementation Details. We employ Res Net-18 for feature extraction. The network model is trained using SGD with a momentum of 0.9. Trained for 200 epochs on the CIFAR100, and trained for 300 epochs on the Tiny-Image Net. The learning rate starts at 0.1 for Pi CO and 0.01 for Pa Pi. Batch sizes are set to 256 and 128 for the CIFAR100 and Tiny-Image Net, with a maximum sample storage limit of m of 2000. For prototypes, a moving average coefficient γ of 0.5 is used. In the sample selection for the storage phase, we set the number of nearest neighbors K to 10, with a maximum limit Nd for diverse sample storage of 0.67 m/\|Yt\|. In the label disambiguation stage, the threshold α is set to 0.8. We linearly ramp down β from 0.8 to 0.6 to ensure the full utilization of differential labels in the early stages of training.