Online model selection by learning how compositional kernels evolve
Authors: Eura Shin, Predrag Klasnja, Susan Murphy, Finale Doshi-Velez
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using pilot data, we learn a set of kernel evolutions that can be used to quickly select kernels for new test users. KEM reliably selects high-performing kernels for a range of synthetic and real data sets, including two health data sets. 6 Experimental Setup 7.1 Demonstrative Results on Synthetic Data |
| Researcher Affiliation | Academia | Eura Shin EMAIL Department of Computer Science Harvard University Predrag Klasnja EMAIL School of Information University of Michigan Susan A. Murphy EMAIL Department of Computer Science Harvard University Finale Doshi-Velez finale@seas.harvard.edu Department of Computer Science Harvard University |
| Pseudocode | Yes | In algorithm 1, we define a selection model that leverages these learned evolutions to select a kernel for a new test user u at time t. Algorithm 1 Selection method for KEM |
| Open Source Code | No | The paper does not provide a direct link to a source-code repository or an explicit statement about the release of their code for the methodology described. |
| Open Datasets | Yes | The datasets used in our experiments, reflected in table 1, have different properties. ... UCI: Energy (Tsanas & Xifara, 2012), Concrete (Yeh, 1998), Boston Housing (Harrison Jr & Rubinfeld, 1978), and Fires (Abid & Izeboudjen, 2019). ... Medical Information Mart for Intensive Care (MIMIC-III) data set Johnson et al. (2016) ... Heart Steps V1 (Klasnja et al., 2019) |
| Dataset Splits | Yes | 10.2.2 Train/Test Splits: Synthetic Experiments: Training Users (10 training users) Testing Users (50 test users) ... The test set is composed of 200 uniformly spaced points along the x-axis from [0, 20]. Real Data Experiments: For the real data experiments, users were randomly assigned to either the training or testing set in a 50 : 50 split. |
| Hardware Specification | Yes | The reported runtimes are on a 4 core Intel Cascade Lake CPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming language versions or library versions, needed to replicate the experiment. |
| Experiment Setup | Yes | 10.2.5 KEM Priors details specific prior distributions and parameters for kernel hyperparameters (lengthscale, period, amplitude, observation noise) for both synthetic and real data. For example, 'Lengthscale: log p(θlengthscale) = N(0, 2) for synthetic data, log p(θlengthscale) = N(0.2, 0.5) for real data'. Also, 10.2.4 KEM Inference Parameters specifies: 'Gibbs sampling during pilot training: ... up to 200 iterations 10 iterations of global updates ... 100 samples from MH sampling algorithm...' |