Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Dynamic Online Ensembles of Basis Expansions
Authors: Daniel Waxman, Petar Djuric
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We provide three different experiments in the main text, with additional experiments in the appendices. In the first experiment (Section 5.1), we compare ensembles of several different basis expansions, showing that the best-performing model varies widely. In the second experiment (Section 5.2), we show how model collapse between static and dynamic models can occur, and how the model introduced in Section 4.2 alleviates it. Finally, we show that E-DOEBE can effectively ensemble methods that are static and dynamic, and of different basis expansions (Section 5.3). The metrics we use are the normalized mean square error (n MSE), and the predictive log-likelihood (PLL). |
| Researcher Affiliation | Academia | Daniel Waxman EMAIL Department of Electrical and Computer Engineering Stony Brook University Petar M. Djuriฤ EMAIL Department of Electrical and Computer Engineering Stony Brook University |
| Pseudocode | No | The paper describes methods and equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 4. We provide Jax/Objax (Bradbury et al., 2018; Objax Developers, 2020) code at https://www.github.com/danwaxman/Dynamic Online Basis Expansions that only requires the user to specify the design matrix, with several choices already implemented. |
| Open Datasets | Yes | Across all experiments, we use several publicly available datasets, ranging in both size and the number of features. A table of dataset statistics is available in Table 1. Friedman #1 and Friedman #2 are synthetic datasets designed to be highly nonlinear, and notably, are i.i.d. The Elevators dataset is related to controlling the elevators on an aircraft. The SARCOS dataset uses simulations of a robotic arm, and Kuka #1 is a similar real dataset from physical experiments. Ca Data is comprised of California housing data, and the task of CPU Small is to predict a type of CPU usage from system properties. Table 1: Dataset statistics, including the number of samples, the number of features, and the original source. |
| Dataset Splits | Yes | All hyperparameter optimization was done on the first 1,000 samples of each dataset; since we already assume access, each dataset was additionally standardized in both x and y using the statistics of the first 1,000 samples. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Jax/Objax' as frameworks used, citing their foundational papers (Bradbury et al., 2018; Objax Developers, 2020), but it does not specify explicit version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For RFF GPs, 50 Fourier features were used (so F = 2 50), and for HSGPs, 100/D features were used for each dimension (so F 100). In either case, an SE-ARD kernel was used. For RBF networks, 100 locations were initialized via K-means and then optimized with empirical Bayes, as well as ARD length scales. For dynamic models, ฯ2 rw was set to 10 3 following Lu et al. (2022). The initial values of ฯ2 ฮธ and ฯ2 ฯต were 1.0 and 0.25, respectively. Optimization was performed with Adam (Kingma & Ba, 2014). |