Multiple Output Regression with Latent Noise
Authors: Jussi Gillberg, Pekka Marttinen, Matti Pirinen, Antti J. Kangas, Pasi Soininen, Mehreen Ali, Aki S. Havulinna, Marjo-Riitta Järvelin, Mika Ala-Korpela, Samuel Kaski
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models. Keywords: Bayesian reduced-rank regression, latent variable models, latent signal-to-noise ratio, multiple-output regression, nonparametric Bayes, shrinkage priors, structured noise, weak effects |
| Researcher Affiliation | Collaboration | Jussi Gillberg EMAIL Pekka Marttinen EMAIL Helsinki Institute for Information Technology HIIT Department of Computer Science PO Box 15600, Aalto University, 00076 Aalto, Finland; Matti Pirinen EMAIL Institute for Molecular Medicine Finland (FIMM) University of Helsinki, Finland; Antti J. Kangas EMAIL Pasi Soininen EMAIL Computational Medicine Faculty of Medicine University of Oulu & Biocenter Oulu, Oulu, Finland; Marjo-Riitta J arvelin EMAIL Department of Epidemiology and Biostatistics MRC-PHE Centre for Environment & Health, School of Public Health, Imperial College London, UK; Disclosure: AJK, PS and MAK are shareholders of Brainshake Ltd., a company offering NMR-based metabolite profiling. |
| Pseudocode | No | The paper describes inference methods using Gibbs sampling and discusses computational complexity but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures displaying structured steps. |
| Open Source Code | Yes | Code in R for the new method is available for download at //http://research.cs.aalto.fi/pml/software/latent Noise/. |
| Open Datasets | Yes | NFBC1966 [N = 4702, P = 101, K = 96, metabolomics prediction from SNPs] The NFBC1966 data set comprises genome-wide SNP data along with metabolomics measurements for a cohort of 4,702 individuals (Rantakallio, 1969; Soininen et al., 2009).; DILGOM [N = 509, P = 65, K = 18 . . . 137, metabolomics and gene expression prediction from SNPs] The DILGOM data set (Inouye et al., 2010) consists of genomewide SNP data along with metabolomics and gene expression measurements.; fMRI [N = 1307, P = 776, K = 250, f MRI response prediction from text stimuli] The cognitive neuroscience data set (Wehbe et al., 2014) consists of a time series of f MRI measurements from 8 subjects reading a chapter from Harry Potter and the Sorcerers Stone using Rapid Serial Visual Presentation: words of the text are presented one by one in the center of a screen.; econ [N = 120, P = 52, K = 52, macroeconomic time series prediction] The macroeconomic time series data set (Stock and Watson, 2006) consists of monthly values of 52 macroeconomic indicators. |
| Dataset Splits | Yes | For this data set, the comparison method GFlasso required excessive training time and we used 5-fold cross-validation to evaluate test set performances. Where cross-validation was needed for selecting model parameter values, the validation data performance was measured as an average over 3 validation sets, each comprising 1/10 of the training samples.; On these data sets, 10-fold cross-validation was used to evaluate test set performances. To select values of the parameters that required evaluation on validation data, the training data was then further divided into 9 folds, on which cross-validation was performed to select parameters according to averaged validation set performance. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models or memory) are mentioned in the paper. The acknowledgments section refers to 'computational resources provided by the Aalto Science-IT project' which is too general to count as a specific hardware specification. |
| Software Dependencies | No | The paper mentions implementing code in R and using packages like 'glmnet' and 'PEER software' but does not specify any version numbers for R or the used libraries. This lack of versioning makes the software dependencies unreproducible. |
| Experiment Setup | Yes | Hyperparameters a1 and a2 of all the BRRR models were fixed to 10 and 4, respectively. In total 1,000 MCMC samples were generated and 500 were discarded as burn-in. The remaining samples, thinned by a factor of 10, were used for prediction.; With the NFBC1966 data, the latent signal-to-noise ratio β was selected using cross-validation from a range of values from 100 to 1/100, β = {100, 10, 2, 1, 1/60, 1/100}, in order to thoroughly evaluate the sensitivity of the model to this parameter.; The mixture parameter α controlling the balance between L1 and L2 regularization was evaluated on the grid [0, 0.1, . . ., 0.9, 1.0] and selected using a 10-fold cross validation.; Kernel ridge regression was regularized according to the standard approach of adding parameter λ to the diagonal elements of the kernel. The value of λ was selected using cross-validation from a set of 10 values ranging from 0.1 to 100, [10^-1, 10^-0.66, . . . , 10^1.67, 10^2]. |