reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Benefit of Multitask Representation Learning

Authors: Andreas Maurer, Massimiliano Pontil, Bernardino Romera-Paredes

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The purpose of the experiments is to compare MTL and LTL to independent task learning (ITL) in the simple setting of linear feature learning (or subspace learning)1. We wish to study the regime in which MTL/LTL learning is beneﬁcial over ITL as a function of the number of tasks T and the sample size per task n. We consider noiseless linear binary classiﬁcation tasks, namely halfspace learning. We generated the data in the following way. The ground truth weight vectors u1, . . . , u T are obtained by the equation ut = Dct, where ct RK is sampled from the uniform distribution on the unit sphere in RK, and the dictionary D Rd K is created by ﬁrst sampling a d-dimension orthonormal matrix from the Haar measure, and then selecting the ﬁrst K columns (atoms). We create all input marginals by sampling from the uniform distribution on the d radius sphere in Rd. For each task we sample n instances to build the training set, and 1000 instances for the test set. We train the methods with the hinge loss function h(z) := max{0, 1 z/c}, where c is the margin. We choose c = 2/ϵ, so that the true error relative to the best hypothesis is of order ϵ. We ﬁxed the value of ϵ to be (K/n)1/2. For ITL we optimize that loss function constraining the ℓ2-norm of the weights, for MTL and LTL we constrain D to have a Frobenius norm less or equal than 1, and each ct is constrained to have an ℓ2 norm less or equal than 1. During testing we use the 0-1 loss. For example the task-average error is evaluated as i=1 1{sign( ut, xi ) = sign( ˆut, xi )} (11) where ˆut are the weight vectors learned by the assessed method.
Researcher Affiliation	Academia	Andreas Maurer EMAIL Adalbertstrasse 55, D-80799 M unchen, Germany Massimiliano Pontil EMAIL Istituto Italiano di Tecnologia, 16163, Genoa, Italy Department of Computer Science, University College London, WC1E 6BT, UK Bernardino Romera-Paredes EMAIL Department of Engineering Science, University of Oxford, OX1 3PJ, UK
Pseudocode	No	The paper describes methods and algorithms conceptually (e.g., "Multitask representation learning (MTRL) solves the optimization problem"), but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code used for the experiments presented in this section is available at http://romera-paredes.com/multitask-representation.
Open Datasets	No	The paper describes a process for generating synthetic data for its experiments: "We generated the data in the following way. The ground truth weight vectors u1, . . . , u T are obtained by the equation ut = Dct, where ct RK is sampled from the uniform distribution on the unit sphere in RK, and the dictionary D Rd K is created by ﬁrst sampling a d-dimension orthonormal matrix from the Haar measure, and then selecting the ﬁrst K columns (atoms). We create all input marginals by sampling from the uniform distribution on the d radius sphere in Rd." It does not mention using any pre-existing publicly available dataset nor does it provide a link to its generated data.
Dataset Splits	Yes	For each task we sample n instances to build the training set, and 1000 instances for the test set. We let d = 50, and vary T {5, 10, . . . , 150}, n {5, 10, . . . , 150} considering the cases K = 2 and K = 5.
Hardware Specification	No	The paper does not specify any particular hardware used for running the numerical experiments. It discusses experimental settings and results but omits details about the computing resources.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions general machine learning approaches like neural networks, kernel methods, and convex optimization, but no specific libraries or tools with versions.
Experiment Setup	Yes	We train the methods with the hinge loss function h(z) := max{0, 1 z/c}, where c is the margin. We choose c = 2/ϵ, so that the true error relative to the best hypothesis is of order ϵ. We ﬁxed the value of ϵ to be (K/n)1/2. For ITL we optimize that loss function constraining the ℓ2-norm of the weights, for MTL and LTL we constrain D to have a Frobenius norm less or equal than 1, and each ct is constrained to have an ℓ2 norm less or equal than 1. During testing we use the 0-1 loss.