Local Kernel Ridge Regression for Scalable, Interpolating, Continuous Regression
Authors: Mingxuan Han, Chenglong Ye, Jeff Phillips
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate interpolation ability, generalization accuracy, and efficiency of this new local krr model by comparing with global models, discontinuous local models, and non-learned interpolating local models. For the examples for d = 2 we have ground truth values on a fine grid (XG, y G), in addition to input data (X, y). Prediction Errors: For tasks in R2 it is useful to plot the error ei = µ(xi) yi over (xi, yi) (XG, y G) on a fine grid, with blue as a positive error, red as a negative error, and white near 0 error, in Figure 2. RMSE: The root mean square error, rmse = q xi XG(µ(xi) yi)2. This is measured over a fine grid (XG, y G) or test data. Worst Case Error: The worst case error, denoted ℓ , shows how far the model is from interpolating the true data (XG, y G): ℓ = maxxi XG |µ(xi) yi|. Relative Error: Relative error εi = yi µ(xi) yi is relevant for our d = 3 physics simulation example. We show maximum over a held out set. Average Curvature: This captures the smoothness of the model for d = 1. The discrete curvature at a point (xi, yi) can be defined Ci = x i y i x iy i ((x i)2+(y i)2)3/2 , where x i, y i are discrete derivatives, and x i , y i are the discrete second derivatives. The Average Curvature avg C is the average on all grid points. Tuning Hyper-parameters. For clarity we define several variables for local krr, but some (like λ and k) do not noticeably affect the result, other than in runtime, after their constraints are met. There are two modeling parameters that one could tune: the points per local region ℓto control how large the local models should be, and ridge parameter η to control how close to interpolating the data. Except in illustrative 1d examples, we choose these via grid search on a training set using RMSE. We consider ℓ {10, 20, 30, 40, 50} and η {1e-7, 1e-5, 1e-3, 1, 10}. For global krr (build one single global regression model using krr), singular kernel method, and NWKR (with Gaussian) we select bandwidth parameter b from b {0.01, 0.05, 0.1, 1, 5}, and global krr also choose η from {1e-7, 1e-5, 1e-3, 1, 10}. In the real data experiment of methane data, we also compare knn-svm (Hable, 2013) and local-svm (Meister & Steinwart, 2016) with our methods. For knn-svm, we tune k {10, 20, 30, 40, 50} and η, b in the same ranges as local krr. For local-svm we tune, on each local model, radius r {0.05, 0.1, 0.2, 0.3, 0.4} and η and b as elsewhere. For RBF-Interpolator compared in 2D simulation example, we tune number of neighbors in {10, 20, 30, 40, 50}. We observe in all case that the selected values are typically near the median one, and do not largely affect the evaluation in the middle of these ranges. 5.1 1D Simulation 5.2 2D Simulation 5.3 Real Data: Slovakian Precipitation 5.4 3D Combustion Simulation |
| Researcher Affiliation | Academia | Mingxuan Han EMAIL School of Computing University of Utah Chenglong Ye EMAIL Dr. Bing Zhang Department of Statistics University of Kentucky Jeff M. Phillips EMAIL School of Computing University of Utah |
| Pseudocode | Yes | Algorithm 1 Local krr Training 1: Determine M. 2: for mi M do 3: Get ℓ-nn X(mi); set bi = mi ℓ-nn X(mi) . 4: Retrieve Xj = range X(mi, λbi) 5: Learn krr model µi( ) on (Xi, y(i)) with bi. 6: end for Algorithm 2 Local krr Evaluation at q 1: Get Mq = {m1, m2, mk} = k-NNM(q) 2: Let rj = q mj for j [1...k] 3: Set weight wj = rk rj rk r1 for j [1...k] 4: Return the weighted average evaluation of q as: µ(q) = Pk j=1 wj µj(q) Pk |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own code or a direct link to a code repository for the methodology described. It mentions using 'FAISS (Johnson et al., 2021)' and references 'ann benchmarks' but these are third-party tools or benchmarks, not the authors' own code release. |
| Open Datasets | Yes | We next compare local krr with global krr on a benchmark GIS dataset of Slovakia precipitation (Neteler & Mitasova, 2008). The data was gathered from a high-fidelity simulation of methane using the gri12 mechanism (Frenklach et al., 2021) in Spitfire (Hansen et al., 2020). |
| Dataset Splits | Yes | We choose up to 16000 training points randomly beyond that size takes too long for global krr, and is already too expensive (more than 12 hours) for the Simplex Method retaining 180,000 for testing. The methane data set has n = 44,000 observations. We evaluate under a 67/33 train-test split under five different random seeds then, report the average and the standard deviation of the maximum relative errors (max rel. er) from five trials. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or specific compute infrastructure. |
| Software Dependencies | No | The paper mentions software like 'FAISS', 'python’s RBF-Interpolator', 'gri12 mechanism', and 'Spitfire', but it does not specify any version numbers for these software components. For example, 'FAISS (Johnson et al., 2021)' cites the paper for FAISS, but doesn't state the version of FAISS used. |
| Experiment Setup | Yes | Tuning Hyper-parameters. For clarity we define several variables for local krr, but some (like λ and k) do not noticeably affect the result, other than in runtime, after their constraints are met. There are two modeling parameters that one could tune: the points per local region ℓto control how large the local models should be, and ridge parameter η to control how close to interpolating the data. Except in illustrative 1d examples, we choose these via grid search on a training set using RMSE. We consider ℓ {10, 20, 30, 40, 50} and η {1e-7, 1e-5, 1e-3, 1, 10}. For global krr (build one single global regression model using krr), singular kernel method, and NWKR (with Gaussian) we select bandwidth parameter b from b {0.01, 0.05, 0.1, 1, 5}, and global krr also choose η from {1e-7, 1e-5, 1e-3, 1, 10}. In the real data experiment of methane data, we also compare knn-svm (Hable, 2013) and local-svm (Meister & Steinwart, 2016) with our methods. For knn-svm, we tune k {10, 20, 30, 40, 50} and η, b in the same ranges as local krr. For local-svm we tune, on each local model, radius r {0.05, 0.1, 0.2, 0.3, 0.4} and η and b as elsewhere. For RBF-Interpolator compared in 2D simulation example, we tune number of neighbors in {10, 20, 30, 40, 50}. We observe in all case that the selected values are typically near the median one, and do not largely affect the evaluation in the middle of these ranges. To tune hyper-parameters, we randomly select 10 percent of the data from training data as validation, then perform grid-search over hyper-parameters space. We use the best performed hyper-parameters on that validation set to evaluate the whole 201 by 201 grid. Details on a more thorough sensitivity analysis for local krr is in Appendix B. We used the following parameters: k = 3, η = 1e-5, b = 0.1 for knn-svm and for local-svm radius r = 0.5, η = 1e-5, b = 0.1. |