RLScore: Regularized Least-Squares Learners
Authors: Tapio Pahikkala, Antti Airola
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we demonstrate the advantages of RLScore solvers on five benchmark tasks. Each of the considered tasks can be expressed either as a single or a sequence of RLS problems with closed form solutions. The baseline method solves each resulting system (K+λI)A = Y or (XT X + λI)W = XT Y with Python numpy.linalg.solve that calls the LAPACK gesv routine. Further we compare to two existing RLS solvers implemented in Python scikitlearn (version 0.18) (Pedregosa et al., 2011) and the MATLAB GURLS package (Tacchetti et al., 2013). The RLScore algorithms produce exactly the same results as the compared methods, but make use of a number of computational short-cuts resulting in substantial increases in efficiency. |
| Researcher Affiliation | Academia | Tapio Pahikkala EMAIL Antti Airola EMAIL Department of Information Technology 20014 University of Turku Finland |
| Pseudocode | Yes | Listing 1: feature selection with greedy RLS algorithm import numpy as np from r l s c o r e . lea rne r import Greedy RLS from scipy . s t a t s import kendalltau #regression problem with 3 important f e a t u r e s X = np . random . randn (100 , 20) y = X[ : , 0] + X[ : , 2] X[ : , 5] + 0.1 np . random . randn (100) #s e l e c t 3 f e a t u r e s with greedy RLS r l s = Greedy RLS(X[ : 5 0 ] , y [ : 5 0 ] , regparam=1, s u b s e t s i z e =3) #Did we s e l e c t the r i g h t f e a t u r e s ? print ( r l s . s e l e c t e d ) #Compute t e s t s e t p r e d i c t i on s p = r l s . predict (X[ 5 0 : ] ) print ( kendalltau (y [ 5 0 : ] , p )) |
| Open Source Code | Yes | RLScore is a Python open source module for kernel based machine learning. The library provides implementations of several regularized least-squares (RLS) type of learners. ... Benchmark codes for comparing RLScore and scikit-learn RLS implementations are included in the RLScore code repository. |
| Open Datasets | No | The paper describes the characteristics of the data used in its benchmarks, such as "Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10." but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets. It refers to general RLS problems or synthetic data descriptions for its examples. |
| Dataset Splits | Yes | A fast leave-group-out (LGO) CV (Pahikkala et al., 2012b), where folds containing multiple instances are left out, is provided, complementing the classical fast RLS LOO algorithm (also included) (Rifkin and Lippert, 2007). The approach allows implementing fast K-fold CV, and more importantly, implementing CV for non i.i.d. data with natural group structure. ... (b) Leave-group-out CV, 10 instances per fold, Gaussian kernel, 500 features. ... (e) Learning sparse models. We consider greedy forward selection, where on each iteration one selects the feature whose addition provides the lowest RLS LOO error. Greedy RLS implements this procedure in linear time, with scikit-learn we use the fast LOO algorithm, baseline is a pure wrapper implementation. Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10. |
| Hardware Specification | No | The paper references "CPU seconds" in its benchmark figures but does not provide specific details about the CPU models, GPU models, memory, or any other hardware specifications used for the experiments. |
| Software Dependencies | Yes | RLScore is implemented as a Python module that depends on Num Py (van der Walt et al., 2011) for basic data structures and linear algebra, Sci Py (Jones et al., 2001 ) for sparse matrices and optimization methods, and Cython (Behnel et al., 2011) for implementing low-level routines in C-language. ... Further we compare to two existing RLS solvers implemented in Python scikitlearn (version 0.18) (Pedregosa et al., 2011) and the MATLAB GURLS package (Tacchetti et al., 2013). |
| Experiment Setup | Yes | Listing 1: ... r l s = Greedy RLS(X[ : 5 0 ] , y [ : 5 0 ] , regparam=1, s u b s e t s i z e =3) ... (a) LOO + fast regularization (parameter grid {2^-15, ..., 2^15}, linear kernel, equal number of instances and features). ... (b) Leave-group-out CV, 10 instances per fold, Gaussian kernel, 500 features. (c) Leave-pair-out CV, Gaussian kernel, 500 features. (d) Kronecker product kernel K G is a popular choice in pair-input learning. ... We generate two kernel matrices of size n n, the label vector Y contains n2 entries, one label for each pair. ... (e) Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10. |