Online model selection by learning how compositional kernels evolve

Authors: Eura Shin, Predrag Klasnja, Susan Murphy, Finale Doshi-Velez

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using pilot data, we learn a set of kernel evolutions that can be used to quickly select kernels for new test users. KEM reliably selects high-performing kernels for a range of synthetic and real data sets, including two health data sets. 6 Experimental Setup 7.1 Demonstrative Results on Synthetic Data
Researcher Affiliation Academia Eura Shin EMAIL Department of Computer Science Harvard University Predrag Klasnja EMAIL School of Information University of Michigan Susan A. Murphy EMAIL Department of Computer Science Harvard University Finale Doshi-Velez finale@seas.harvard.edu Department of Computer Science Harvard University
Pseudocode Yes In algorithm 1, we define a selection model that leverages these learned evolutions to select a kernel for a new test user u at time t. Algorithm 1 Selection method for KEM
Open Source Code No The paper does not provide a direct link to a source-code repository or an explicit statement about the release of their code for the methodology described.
Open Datasets Yes The datasets used in our experiments, reflected in table 1, have different properties. ... UCI: Energy (Tsanas & Xifara, 2012), Concrete (Yeh, 1998), Boston Housing (Harrison Jr & Rubinfeld, 1978), and Fires (Abid & Izeboudjen, 2019). ... Medical Information Mart for Intensive Care (MIMIC-III) data set Johnson et al. (2016) ... Heart Steps V1 (Klasnja et al., 2019)
Dataset Splits Yes 10.2.2 Train/Test Splits: Synthetic Experiments: Training Users (10 training users) Testing Users (50 test users) ... The test set is composed of 200 uniformly spaced points along the x-axis from [0, 20]. Real Data Experiments: For the real data experiments, users were randomly assigned to either the training or testing set in a 50 : 50 split.
Hardware Specification Yes The reported runtimes are on a 4 core Intel Cascade Lake CPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming language versions or library versions, needed to replicate the experiment.
Experiment Setup Yes 10.2.5 KEM Priors details specific prior distributions and parameters for kernel hyperparameters (lengthscale, period, amplitude, observation noise) for both synthetic and real data. For example, 'Lengthscale: log p(θlengthscale) = N(0, 2) for synthetic data, log p(θlengthscale) = N(0.2, 0.5) for real data'. Also, 10.2.4 KEM Inference Parameters specifies: 'Gibbs sampling during pilot training: ... up to 200 iterations 10 iterations of global updates ... 100 samples from MH sampling algorithm...'