reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multitask Online Mirror Descent

Authors: Nicolò Cesa-Bianchi, Pierre Laforgue, Andrea Paudice, Massimiliano Pontil

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce and analyze MT-OMD, a multitask generalization of Online Mirror Descent (OMD) which operates by sharing updates between tasks. We prove that the regret of MT-OMD is of order p 1 + σ2(N 1) T, where σ2 is the task variance according to the geometry induced by the regularizer, N is the number of tasks, and T is the time horizon. [...] Finally, we present experiments which support our theoretical ﬁndings. [...] In this section, we empirically compare the performance of Hedge-MT-OGD/EG against two natural alternatives: an independent-task approach (IT-OGD/EG) where the agents do not communicate, and a single-task approach (ST-OGD/EG) where a single model is learned and shared by all agents. [...] Online Gradient Descent. For this experiment, we use the Lenk dataset (Lenk et al., 1996; Argyriou et al., 2007). [...] Exponentiated Gradient. For our second experiment, we consider EMNIST, a classiﬁcation dataset...
Researcher Affiliation	Academia	Nicolò Cesa-Bianchi EMAIL Department of Computer Science, University of Milan, Italy Pierre Laforgue EMAIL Department of Computer Science, University of Milan, Italy Andrea Paudice EMAIL Department of Computer Science, University of Milan, Italy Istituto Italiano di Tecnologia, Genova, Italy Massimiliano Pontil EMAIL Istituto Italiano di Tecnologia, Genova, Italy University College London, United Kingdom
Pseudocode	Yes	Algorithm 1 MT-OGD input: A positive deﬁnite matrix A RN N (interaction matrix), σ2 > 0 (task variance), D > 0 (comparators set diameter), ηt > 0 (learning-rate schedule) initialization: A = A Id RNd Nd, x1 = 0, y1 = A 1/2x1 Algorithm 2 MT-EG input: A positive deﬁnite matrix A RN N (interaction matrix), η > 0 (learning-rate) initialization: A = A Id RNd Nd, x1 = 1/d RNd, y1 = A 1/2x1
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Online Gradient Descent. For this experiment, we use the Lenk dataset (Lenk et al., 1996; Argyriou et al., 2007). It consists of 2880 computer ratings in the range {1, 2, . . . , 10}, made by 180 individuals (the tasks) on the basis of 14 binary features. [...] Exponentiated Gradient. For our second experiment, we consider EMNIST, a classiﬁcation dataset consisting of 62 classes (images of digits, small and capital letters).
Dataset Splits	No	For the Lenk dataset, the paper describes its composition ('2880 computer ratings', '180 individuals') but does not specify how it was split into training, validation, or test sets. For the EMNIST dataset, it states: 'To each task, we assigned 10 examples (5 positive, 5 negative) randomly chosen from the set of examples for that task.' This describes the task-specific data selection but not a standard train/test/validation split for the overall experiment.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper does not explicitly list any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	Online Gradient Descent. For this experiment, we use the Lenk dataset [...]. We run Hedge-MT-OGD using the clique interaction matrix A = (1 + N)IN 11 and the square loss. For all algorithms, the value of η is set according to the optimal theoretical value, see Proposition 2. In Hedge-MT-OGD, the variance σ2 is learned in a set of 5 experts uniformly spaced over [0, 1]. For simplicity, we use D = 1 and compute the resulting Lipschitz constant accordingly. [...] Exponentiated Gradient. For our second experiment, we consider EMNIST, a classiﬁcation dataset [...]. We considered the linear logistic regression and ran Hedge-MT-EG with the parameterized clique interaction matrix A(b) = (1 + b)IN b11 /N. The value of b is set according to the theoretical value (that depends on σ2, see Proposition 8), while σ2 is learned in a set of 5 experts uniformly spaced over [0, 1]. For all algorithms, the value of η is set according to the optimal theoretical values.