reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RLScore: Regularized Least-Squares Learners

Authors: Tapio Pahikkala, Antti Airola

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we demonstrate the advantages of RLScore solvers on ﬁve benchmark tasks. Each of the considered tasks can be expressed either as a single or a sequence of RLS problems with closed form solutions. The baseline method solves each resulting system (K+λI)A = Y or (XT X + λI)W = XT Y with Python numpy.linalg.solve that calls the LAPACK gesv routine. Further we compare to two existing RLS solvers implemented in Python scikitlearn (version 0.18) (Pedregosa et al., 2011) and the MATLAB GURLS package (Tacchetti et al., 2013). The RLScore algorithms produce exactly the same results as the compared methods, but make use of a number of computational short-cuts resulting in substantial increases in eﬃciency.
Researcher Affiliation	Academia	Tapio Pahikkala EMAIL Antti Airola EMAIL Department of Information Technology 20014 University of Turku Finland
Pseudocode	Yes	Listing 1: feature selection with greedy RLS algorithm import numpy as np from r l s c o r e . lea rne r import Greedy RLS from scipy . s t a t s import kendalltau #regression problem with 3 important f e a t u r e s X = np . random . randn (100 , 20) y = X[ : , 0] + X[ : , 2] X[ : , 5] + 0.1 np . random . randn (100) #s e l e c t 3 f e a t u r e s with greedy RLS r l s = Greedy RLS(X[ : 5 0 ] , y [ : 5 0 ] , regparam=1, s u b s e t s i z e =3) #Did we s e l e c t the r i g h t f e a t u r e s ? print ( r l s . s e l e c t e d ) #Compute t e s t s e t p r e d i c t i on s p = r l s . predict (X[ 5 0 : ] ) print ( kendalltau (y [ 5 0 : ] , p ))
Open Source Code	Yes	RLScore is a Python open source module for kernel based machine learning. The library provides implementations of several regularized least-squares (RLS) type of learners. ... Benchmark codes for comparing RLScore and scikit-learn RLS implementations are included in the RLScore code repository.
Open Datasets	No	The paper describes the characteristics of the data used in its benchmarks, such as "Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10." but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets. It refers to general RLS problems or synthetic data descriptions for its examples.
Dataset Splits	Yes	A fast leave-group-out (LGO) CV (Pahikkala et al., 2012b), where folds containing multiple instances are left out, is provided, complementing the classical fast RLS LOO algorithm (also included) (Rifkin and Lippert, 2007). The approach allows implementing fast K-fold CV, and more importantly, implementing CV for non i.i.d. data with natural group structure. ... (b) Leave-group-out CV, 10 instances per fold, Gaussian kernel, 500 features. ... (e) Learning sparse models. We consider greedy forward selection, where on each iteration one selects the feature whose addition provides the lowest RLS LOO error. Greedy RLS implements this procedure in linear time, with scikit-learn we use the fast LOO algorithm, baseline is a pure wrapper implementation. Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10.
Hardware Specification	No	The paper references "CPU seconds" in its benchmark figures but does not provide specific details about the CPU models, GPU models, memory, or any other hardware specifications used for the experiments.
Software Dependencies	Yes	RLScore is implemented as a Python module that depends on Num Py (van der Walt et al., 2011) for basic data structures and linear algebra, Sci Py (Jones et al., 2001 ) for sparse matrices and optimization methods, and Cython (Behnel et al., 2011) for implementing low-level routines in C-language. ... Further we compare to two existing RLS solvers implemented in Python scikitlearn (version 0.18) (Pedregosa et al., 2011) and the MATLAB GURLS package (Tacchetti et al., 2013).
Experiment Setup	Yes	Listing 1: ... r l s = Greedy RLS(X[ : 5 0 ] , y [ : 5 0 ] , regparam=1, s u b s e t s i z e =3) ... (a) LOO + fast regularization (parameter grid {2^-15, ..., 2^15}, linear kernel, equal number of instances and features). ... (b) Leave-group-out CV, 10 instances per fold, Gaussian kernel, 500 features. (c) Leave-pair-out CV, Gaussian kernel, 500 features. (d) Kronecker product kernel K G is a popular choice in pair-input learning. ... We generate two kernel matrices of size n n, the label vector Y contains n2 entries, one label for each pair. ... (e) Data matrix X contains 10000 instances and 1000 features, and the number of outputs in Y is 10.