reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiplicative Multitask Feature Learning

Authors: Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun, Minghu Song

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies have identified the statistical properties of data that would be in favor of the new formulations. Extensive empirical studies on various classification and regression benchmark data sets have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks. [...] 6. Experiments We empirically evaluated the performance of the multiplicative MTFL algorithms on both synthetic data sets and a variety of real-world data sets, where we solved either classification (using the logistic regression loss) or regression (using the least squares loss) problems.
Researcher Affiliation	Collaboration	Xin Wang EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Jinbo Bi EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Shipeng Yu EMAIL Health Services Innovation Center Siemens Healthcare Malvern, PA 19355, USA Jiangwen Sun EMAIL Department of Computer Science and Engineering University of Connecticut Storrs, CT 06279, USA Minghu Song EMAIL Worldwide Research and Development Pﬁzer Inc. Groton, CT 06340, USA
Pseudocode	Yes	Algorithm 1 The blockwise coordinate descent algorithm for multiplicative MTFL Input: Xt, yt, t = 1, , T, as well as γ1, γ2, p and k Initialize: cj = 1, j = 1, , d, and s = 1 repeat Compute Xt = Xtdiag(cs), t = 1, , T for t = 1, , T do Solve the following problem for βs t min βt L(βt, Xt, yt) + γ1\|\|βt\|\|p p (30) end for Compute αs t = diag(cs)βs t Set s = s + 1 Compute cs+1 using αs t according to Eq.(10) until maxt,j(\|(αt j)s (αt j)s 1\|) < ϵ (or other proper termination rules) Output: αt, c and βt, t = 1, , T
Open Source Code	No	The paper does not provide explicit statements about releasing source code for the described methodology, nor does it include links to a code repository. It mentions implementation of algorithms but not their public availability.
Open Datasets	Yes	Sarcos (Argyriou et al., 2007): Sarcos data were collected for a robotics problem... Readers can consult with http://www.gaussianprocess.org/ gpml/data/ for more details. College Drinking (Bi et al., 2013): The college drinking data were collected... QSAR (Ma et al., 2015): The quantitative structure-activity relationship (QSAR) methods are commonly used... C.M.S.C. (Lucas et al., 2013): The Climate Model Simulation Crashes (C.M.S.C.) data set contained records... Landmine (Xue et al., 2007): The original Landmine data contained 29 data sets... Alphadigits (Maurer et al., 2013): This data set was composed of binary 20 16 images... Underwatermine (Liu et al., 2009b): This data set was originally used... Animal recognition (Kang et al., 2011): This data set consisted of images... HWMA base and HWMA peak (Qazi et al., 2007; Bi and Wang, 2015): The heart wall motion abnormality (HWMA) detection data set was used...
Dataset Splits	Yes	In all experiments, unless otherwise noted, the original data set was partitioned to have 25%, 33% or 50% of the data in a training set and the rest used for testing. For each specified partition ratio (corresponding to a trial), we randomly partitioned the data 15 times and reported the average performance. [...] For each task, we randomly selected 2000 cases for training and the remaining 5291 cases for test. [...] Because there were only 30 records for each person, we used 66%, 75% and 80% of the records to form the training set, and the rest for test.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or specific computing environments.
Software Dependencies	No	The paper mentions that "we implemented and compared Algorithm 1" and discusses using "logistic regression loss" or "least squares loss", but it does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers that would be necessary to replicate the experiments.
Experiment Setup	Yes	The same tuning process was used to tune the hyperparameters (e.g.,γ1 and γ2) of every method in the comparison. In every trial, an internal three-fold cross validation (CV) was performed within the training data of the first partition to select a proper hyperparameter value for each of the methods from the choices of 2k with k = 10, 9, , 7.