On Semi-Supervised Linear Regression in Covariate Shift Problems

Authors: Kenneth Joseph Ryan, Mark Vere Culp

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Performance is validated on simulated and real data. Keywords: joint optimization, semi-supervised regression, usefulness of unlabeled data ... The geometry helps articulate realistic assumptions for the theoretical risk results in Section 5, and the theoretical risk results help define informative simulations and real data tests in Section 6. In addition, the simulations and real data applications validate the theoretical risk results.
Researcher Affiliation Academia Kenneth Joseph Ryan EMAIL Mark Vere Culp EMAIL Department of Statistics West Virginia University Morgantown, WV 26506, USA
Pseudocode No The paper includes mathematical formulations and theoretical derivations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The Elastic Net Optimization Problem (7) is convex and can be solved quickly by the glmnet package in R (Friedman et al., 2010; R Core Team, 2015), so this helps make our semi-supervised adjustment computationally viable.
Open Datasets Yes The 10 tests listed in Table 4 were constructed using 8 publicly available data sets and a simulated toy extrapolation data set. Each is expected to have a covariate shifted empirical feature data distribution either because the characteristic used to define the labeled set is associated with other variables in the model matrix, because of the curse of dimensionality, or because the simulated toy data were generated from a model with covariate shift. ... Table 4: These ten covariate shift tests are used to establish benchmarks in Table 5. Data Set (n, p) ... Data Set Source Toy Cov. Shift (1200, 1) Sugiyama et al. (2007) Auto-MPG (398, 8) Lichman (2013) ... Eye (120, 200) Rats 1-30 Express Scheetz et al. (2006) ... Ethanol (589, 1037) Sols. 1-294 Ethanol Shen et al. (2013)
Dataset Splits Yes For K-fold cross-validation in the semi-supervised setting, the L cases were partitioned into K folds, {Lk}K k=1. ... The JT-ENET estimate bβˆγ,ˆλ minimized bσ2 3 over the grid for λ1/(λ1 + 2λ2), γ1, and γ2. ... This particular implementation is optimized for estimating λ1 + 2λ2 with 10-fold cross validation given λ1/(λ1 + 2λ2).
Hardware Specification Yes Cross-validation took an average of 3.5 minutes per data set on a 2.6 GHz Intel Core i7 Power Mac. ... The JT-ENET fit fairly quickly on a 2.6 GHz Intel Core i7 Power Mac.
Software Dependencies Yes The Elastic Net Optimization Problem (7) is convex and can be solved quickly by the glmnet package in R (Friedman et al., 2010; R Core Team, 2015), so this helps make our semi-supervised adjustment computationally viable. ... The caret package in R (Kuhn, 2008) was also used to fit the SVM with a polynomial kernel on the real data examples.
Experiment Setup Yes This particular implementation is optimized for estimating λ1 + 2λ2 with 10-fold cross validation given λ1/(λ1 + 2λ2). First, the supervised elastic net was implemented by varying λ1/(λ1 +2λ2) [0, 1] over an equally spaced grid of length 57 to optimize parameters λ. Second, the semi-supervised JT-ENET was implemented by estimating its parameters (λ, γ) simultaneously. ... Parameter λ1/(λ1 + 2λ2) was optimized over the grid {0, 0.25, 0.5, 0.75, 1, ˆa}, where ˆa was the optimal supervised setting for this parameter. Fixed grids γ1 ν 1 and γ2 ν were used for the other parameters, where ν = {0.1, 0.5, 1, 10, 100, 1000, 10000, } and ν 1 = {1/r : r ν}.