reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Knowledgeable Supervised Lifelong Learning Systems

Authors: Diana Benavides-Prado, Yun Sing Koh, Patricia Riddle

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Proﬁciente in both synthetic and real-world datasets, and demonstrate scenarios where knowledgeable supervised learning systems can be achieved by means of transfer. Section 7. Experiments and Results of Selective Knowledge Transfer
Researcher Affiliation	Academia	Diana Benavides-Prado EMAIL Yun Sing Koh EMAIL Patricia Riddle EMAIL School of Computer Science The University of Auckland, New Zealand
Pseudocode	Yes	Algorithm 1: Pseudo-code for transfer with Acc Gen SVM. Algorithm 2: Pseudo-code for the HRSVM algorithm for hypothesis reﬁnement.
Open Source Code	Yes	Software and data used for the experiments is available online (Benavides-Prado, Koh, & Riddle, 2020).
Open Datasets	Yes	We also experiment with 20newsgroups (Mitchell, 1997), CIFAR-100 (Krizhevsky & Hinton, 2009), and a randomly selected Image Net subset of 500 classes (Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009a).
Dataset Splits	Yes	We repeatedly extract training (10%) and test samples (30%) without replacement for each problem, 30 times. We generate 1,000 random examples for each RBF concept. We repeatedly extract training (10%) and test samples (30%) without replacement for each class, 30 times, and compose balanced binary classiﬁcation problems of each RBF concept vs. rest. We sample 10% training and 30% test sets, and compose binary classiﬁcation tasks of each class vs. rest, 30 times.
Hardware Specification	No	The paper does not provide specific hardware details (like CPU/GPU models or memory) used for running the experiments. It only discusses software implementations and parameters.
Software Dependencies	No	The paper mentions several software tools and libraries such as LIBSVM, scikit-learn, ELLA, CL, and DEN implementations. However, it does not provide specific version numbers for these software dependencies as used in their experiments (e.g., "LIBSVM version X.Y").
Experiment Setup	Yes	For transfer forward using Acc Gen SVM we set the KL-divergence threshold (KL) to 0.4 and the number of nearest-neighbours (nn) to 2. For reﬁning f(s) 1 using HRSVM, the parameter ν is set to the maximal feasible value (Chen et al., 2005), and Γ = 0.01. For each dataset we train half of the hypotheses as initial sources using a C-SVM (Bottou & Lin, 2007), with C = 1, before starting transferring forward with Acc Gen SVM. Synthetic hyperplane tasks are trained with a linear kernel. Synthetic RBF tasks are trained with an RBF kernel and γ = 0.1 to make them subject to reﬁnement. 20newsgroups, CIFAR and Image Net tasks are trained with RBF kernels and γ = 1/d, with d the number of features. The KL-divergence threshold (KL) of Acc Gen SVM is selected by gridsearch on a 5% validation set, with values in {0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}. For synthetic hyperplane KL = 0.3, for synthetic RBF KL = 0.45, for 20newsgroups KL = 0.5, for CIFAR-100 and for Image Net KL = 0.3. The number of nearest neighbours is set to 2, similar to Benavides-Prado et al. (2017). HRSVM performs hypothesis reﬁnement using the same SVM parameters as the initial sources, ν is set to the maximal feasible value (Chen et al., 2005) and Γ = 0.01 in all cases. The sequential learner with no transfer, BL, is trained using C-SVM with C = 1. For ELLA, we tune the number of latent components using grid-search on a 5% validation set, with values in {0.05, 0.10, 0.15, 0.20, 0.25}, as a percentage of the total number of features. For the sparsity level, we select the optimal value from {0.05, 0.1, 0.2, 0.5, 0.8, 1}. Best values for the percentage of latent components are: synthetic hyperplane (0.10), synthetic RBF (0.25), 20newsgroups (0.10), CIFAR-100 (0.25), Image Net (0.20). For all datasets, the sparsity level is set to 1. For CL we tune the similarity threshold using grid-search on a 5% validation set, with values in {0.10, 0.15, 0.20, 0.25, 0.30}. Best values for each dataset are: synthetic hyperplane (0.15), synthetic RBF (0.15), 20newsgroups (0.20), CIFAR-100 (0.20), Image Net (0.20). The number of hidden layers is set to 2 in all cases, with the following number of neurons: for the synthetic datasets 250 and 200 neurons, for Image Net 500 and 250 neurons, for 20newsgroups 500 and 250 neurons, for CIFAR-100 1,500 and 500 neurons. All networks are FNN that learn binary classiﬁcation tasks, using the DEN implementation (Yoon, Yang, et al., 2017b). After experimentation, we set the values of the parameters: 5,000 maximum iterations, batch size of 500, learning rate of 0.001, L1 sparsity of 0.0001, L2 lambda of 0.0001, group Lasso lambda of 0.001, regularization lambda of 0.5, threshold for dynamic expansion of 0.1, threshold for split and duplication of 0.5. For the number of units of expansion, we experiment with: the default value of 10 and the number of tasks.