Multiplicative Multitask Feature Learning

Authors: Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun, Minghu Song

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies have identified the statistical properties of data that would be in favor of the new formulations. Extensive empirical studies on various classification and regression benchmark data sets have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks. [...] 6. Experiments We empirically evaluated the performance of the multiplicative MTFL algorithms on both synthetic data sets and a variety of real-world data sets, where we solved either classification (using the logistic regression loss) or regression (using the least squares loss) problems.
Researcher Affiliation Collaboration Xin Wang EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Jinbo Bi EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Shipeng Yu EMAIL Health Services Innovation Center Siemens Healthcare Malvern, PA 19355, USA Jiangwen Sun EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Minghu Song EMAIL Worldwide Research and Development Pfizer Inc. Groton, CT 06340, USA
Pseudocode Yes Algorithm 1 The blockwise coordinate descent algorithm for multiplicative MTFL Input: Xt, yt, t = 1, , T, as well as γ1, γ2, p and k Initialize: cj = 1, j = 1, , d, and s = 1 repeat Compute Xt = Xtdiag(cs), t = 1, , T for t = 1, , T do Solve the following problem for βs t min βt L(βt, Xt, yt) + γ1||βt||p p (30) end for Compute αs t = diag(cs)βs t Set s = s + 1 Compute cs+1 using αs t according to Eq.(10) until maxt,j(|(αt j)s (αt j)s 1|) < ϵ (or other proper termination rules) Output: αt, c and βt, t = 1, , T
Open Source Code No The paper does not provide explicit statements about releasing source code for the described methodology, nor does it include links to a code repository. It mentions implementation of algorithms but not their public availability.
Open Datasets Yes Sarcos (Argyriou et al., 2007): Sarcos data were collected for a robotics problem... Readers can consult with http://www.gaussianprocess.org/ gpml/data/ for more details. College Drinking (Bi et al., 2013): The college drinking data were collected... QSAR (Ma et al., 2015): The quantitative structure-activity relationship (QSAR) methods are commonly used... C.M.S.C. (Lucas et al., 2013): The Climate Model Simulation Crashes (C.M.S.C.) data set contained records... Landmine (Xue et al., 2007): The original Landmine data contained 29 data sets... Alphadigits (Maurer et al., 2013): This data set was composed of binary 20 16 images... Underwatermine (Liu et al., 2009b): This data set was originally used... Animal recognition (Kang et al., 2011): This data set consisted of images... HWMA base and HWMA peak (Qazi et al., 2007; Bi and Wang, 2015): The heart wall motion abnormality (HWMA) detection data set was used...
Dataset Splits Yes In all experiments, unless otherwise noted, the original data set was partitioned to have 25%, 33% or 50% of the data in a training set and the rest used for testing. For each specified partition ratio (corresponding to a trial), we randomly partitioned the data 15 times and reported the average performance. [...] For each task, we randomly selected 2000 cases for training and the remaining 5291 cases for test. [...] Because there were only 30 records for each person, we used 66%, 75% and 80% of the records to form the training set, and the rest for test.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or specific computing environments.
Software Dependencies No The paper mentions that "we implemented and compared Algorithm 1" and discusses using "logistic regression loss" or "least squares loss", but it does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes The same tuning process was used to tune the hyperparameters (e.g.,γ1 and γ2) of every method in the comparison. In every trial, an internal three-fold cross validation (CV) was performed within the training data of the first partition to select a proper hyperparameter value for each of the methods from the choices of 2k with k = 10, 9, , 7.