Vector-Valued Least-Squares Regression under Output Regularity Assumptions

Authors: Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our theoretical insights on synthetic least-squares problems. Then, we propose a surrogate structured prediction method derived from this reduced-rank method. We assess its benefits on three different problems: image reconstruction, multi-label classification, and metabolite identification. 5. Numerical Experiments We now carry out experiments with the methods proposed in this work. In Section 5.1, we illustrate our theoretical insights on synthetic least-squares problems. In Section 5.2, we test the proposed structured prediction method on three different problems: image reconstruction, multi-label classification, and metabolite identification.
Researcher Affiliation Academia Luc Brogat-Motte EMAIL LTCI, T el ecom Paris, IP Paris, France Alessandro Rudi EMAIL INRIA, Paris, France, Ecole Normale Sup erieure, Paris, France PSL Research, France C eline Brouard EMAIL INRAE, Toulouse, France MIAT, Toulouse, France Universit e de Toulouse, France Juho Rousu EMAIL Department of Computer Science, Aalto University, Espoo, Finland Florence d Alch e-Buc EMAIL LTCI, T el ecom Paris, IP Paris, France
Pseudocode Yes Algorithm 1 Reduced-rank IOKR-ridge Training phase Algorithm 2 Reduced-rank IOKR-ridge Decoding phase
Open Source Code No No explicit statement or link to open-source code for the methodology described in this paper was found.
Open Datasets Yes Problem and data set. The goal of the image reconstruction problem provided by Weston et al. (2003) is to predict the bottom half of a USPS handwritten postal digit (16 x 16 pixels), given its top half. The data set contains 7291 training labeled images and 2007 test images. Problem and data set. Bibtex and Bookmarks (Katakis et al., 2008) are tag recommendation problems... Link to downloadable data set https://web.stanford.edu/~hastie/Stat Learn Sparsity_ files/DATA/zipcode.html Link to downloadable data set http://mulan.sourceforge.net/datasets-mlc.html
Dataset Splits Yes For d = 300, X = Hx = Y = Rd, we choose µp(C) = 1 p, µp(E) = 0.2 p1/10 . We draw randomly the eigenvector associated to each eigenvalue. We draw H0 Rd d with independently drawn coefficients from the standard normal distribution. We consider two different optimums H = H0 (β = 1) and H = (H0CH0)H0 (β = 1/3). Then, we generate n [102, . . . , 5 103], nval = 1000, ntest = 1000 couples (x, y) such that x N(0, C), ϵ N(0, E), and y = Hx + ϵ. The data set contains 7291 training labeled images and 2007 test images. The hyper-parameters for all tested methods (including σ2 input, λ, p, and SPEN layers sizes) have been selected using logarithmic grids via 5 repeated random sub-sampling validation (80%/20%). With Bookmarks (n/nte = 60000/27856) we used a Nystr om approximation with 15000 anchors when computing ˆh to reduce the training complexity, and we learned ˆP only with a subset of 12000 training data. For this setting, we consider only the 2000 first couples (xi, yi) of each multi-label data set as training set. We adopt a similar numerical experimental protocol (5-CV Outer/4-CV Inner loops) than in Brouard et al. (2016a)
Hardware Specification No No specific hardware details (like GPU/CPU models or cloud instances) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers for the authors' implementation were provided in the main text.
Experiment Setup Yes We select the hyper-parameters of the three estimators ˆh, Pˆh, and ˆPˆh in logarithmic grids, with the best validation MSE. As in Weston et al. (2003) we used as target loss an RBF loss ψ(y) ψ(y ) 2 Hy induced by a Gaussian kernel k and visually chose the kernel s width σ2 output = 10, looking at reconstructed images of the method using the ridge estimator (i.e. without reduced-rank estimation). We used a Gaussian input kernel of width σ2 input. For the pre-image step, we used the same candidate set for all methods constituted with all the 7291 training bottom half digits. We considered λ := λ1 = λ2 for the proposed method. The hyper-parameters for all tested methods (including σ2 input, λ, p, and SPEN layers sizes) have been selected using logarithmic grids via 5 repeated random sub-sampling validation (80%/20%).