Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression

Authors: Motonobu Kanagawa

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments confirm the validity of the fast computation method. We empirically check the validity of the formula (7) for efficient LOOCV computation. We consider a real-valued regression problem where X = Rd and Y = R, using two real datasets from scikit-learn: Diabetes and Wine .
Researcher Affiliation Academia Motonobu Kanagawa EMAIL Data Science Department EURECOM
Pseudocode No The paper describes the method using mathematical formulas (Lemma 1, Corollary 1) and natural language, without structured pseudocode or algorithm blocks.
Open Source Code Yes The code for reproducing the experiments is available on https://github.com/motonobuk/LOOCV-kNN
Open Datasets Yes We consider a real-valued regression problem where X = Rd and Y = R, using two real datasets from scikit-learn: Diabetes and Wine .
Dataset Splits Yes LOOCV for k-NN regression is defined as follows. ... For each ℓ= 1, . . . , n, consider the training dataset (1) with the ℓ-th pair (xℓ, yℓ) removed: Dn\{(xℓ, yℓ)} = {(x1, y1), . . . , (xℓ 1, yℓ 1), (xℓ+1, yℓ+1), . . . , (xn, yn)}.
Hardware Specification Yes CPU: 1.1 GHz Quad-Core Intel Core i5. Memory: 8 GB 3733 MHz LPDDR4X.
Software Dependencies No The paper mentions using 'scikit-learn' for implementing k-NN regression but does not specify a version number for scikit-learn or any other software dependency. Footnote 2 points to a general stable documentation URL rather than a specific version.
Experiment Setup Yes We standardized each input feature to have mean zero and unit variance. ... We show the LOOCV scores computed by the two methods for different values of k... for fixed k = 5.