Towards Knowledgeable Supervised Lifelong Learning Systems
Authors: Diana Benavides-Prado, Yun Sing Koh, Patricia Riddle
JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Proficiente in both synthetic and real-world datasets, and demonstrate scenarios where knowledgeable supervised learning systems can be achieved by means of transfer. Section 7. Experiments and Results of Selective Knowledge Transfer |
| Researcher Affiliation | Academia | Diana Benavides-Prado EMAIL Yun Sing Koh EMAIL Patricia Riddle EMAIL School of Computer Science The University of Auckland, New Zealand |
| Pseudocode | Yes | Algorithm 1: Pseudo-code for transfer with Acc Gen SVM. Algorithm 2: Pseudo-code for the HRSVM algorithm for hypothesis refinement. |
| Open Source Code | Yes | Software and data used for the experiments is available online (Benavides-Prado, Koh, & Riddle, 2020). |
| Open Datasets | Yes | We also experiment with 20newsgroups (Mitchell, 1997), CIFAR-100 (Krizhevsky & Hinton, 2009), and a randomly selected Image Net subset of 500 classes (Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009a). |
| Dataset Splits | Yes | We repeatedly extract training (10%) and test samples (30%) without replacement for each problem, 30 times. We generate 1,000 random examples for each RBF concept. We repeatedly extract training (10%) and test samples (30%) without replacement for each class, 30 times, and compose balanced binary classification problems of each RBF concept vs. rest. We sample 10% training and 30% test sets, and compose binary classification tasks of each class vs. rest, 30 times. |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models or memory) used for running the experiments. It only discusses software implementations and parameters. |
| Software Dependencies | No | The paper mentions several software tools and libraries such as LIBSVM, scikit-learn, ELLA, CL, and DEN implementations. However, it does not provide specific version numbers for these software dependencies as used in their experiments (e.g., "LIBSVM version X.Y"). |
| Experiment Setup | Yes | For transfer forward using Acc Gen SVM we set the KL-divergence threshold (KL) to 0.4 and the number of nearest-neighbours (nn) to 2. For refining f(s) 1 using HRSVM, the parameter ν is set to the maximal feasible value (Chen et al., 2005), and Γ = 0.01. For each dataset we train half of the hypotheses as initial sources using a C-SVM (Bottou & Lin, 2007), with C = 1, before starting transferring forward with Acc Gen SVM. Synthetic hyperplane tasks are trained with a linear kernel. Synthetic RBF tasks are trained with an RBF kernel and γ = 0.1 to make them subject to refinement. 20newsgroups, CIFAR and Image Net tasks are trained with RBF kernels and γ = 1/d, with d the number of features. The KL-divergence threshold (KL) of Acc Gen SVM is selected by gridsearch on a 5% validation set, with values in {0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}. For synthetic hyperplane KL = 0.3, for synthetic RBF KL = 0.45, for 20newsgroups KL = 0.5, for CIFAR-100 and for Image Net KL = 0.3. The number of nearest neighbours is set to 2, similar to Benavides-Prado et al. (2017). HRSVM performs hypothesis refinement using the same SVM parameters as the initial sources, ν is set to the maximal feasible value (Chen et al., 2005) and Γ = 0.01 in all cases. The sequential learner with no transfer, BL, is trained using C-SVM with C = 1. For ELLA, we tune the number of latent components using grid-search on a 5% validation set, with values in {0.05, 0.10, 0.15, 0.20, 0.25}, as a percentage of the total number of features. For the sparsity level, we select the optimal value from {0.05, 0.1, 0.2, 0.5, 0.8, 1}. Best values for the percentage of latent components are: synthetic hyperplane (0.10), synthetic RBF (0.25), 20newsgroups (0.10), CIFAR-100 (0.25), Image Net (0.20). For all datasets, the sparsity level is set to 1. For CL we tune the similarity threshold using grid-search on a 5% validation set, with values in {0.10, 0.15, 0.20, 0.25, 0.30}. Best values for each dataset are: synthetic hyperplane (0.15), synthetic RBF (0.15), 20newsgroups (0.20), CIFAR-100 (0.20), Image Net (0.20). The number of hidden layers is set to 2 in all cases, with the following number of neurons: for the synthetic datasets 250 and 200 neurons, for Image Net 500 and 250 neurons, for 20newsgroups 500 and 250 neurons, for CIFAR-100 1,500 and 500 neurons. All networks are FNN that learn binary classification tasks, using the DEN implementation (Yoon, Yang, et al., 2017b). After experimentation, we set the values of the parameters: 5,000 maximum iterations, batch size of 500, learning rate of 0.001, L1 sparsity of 0.0001, L2 lambda of 0.0001, group Lasso lambda of 0.001, regularization lambda of 0.5, threshold for dynamic expansion of 0.1, threshold for split and duplication of 0.5. For the number of units of expansion, we experiment with: the default value of 10 and the number of tasks. |