Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Dynamic Online Ensembles of Basis Expansions

Authors: Daniel Waxman, Petar Djuric

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We provide three different experiments in the main text, with additional experiments in the appendices. In the first experiment (Section 5.1), we compare ensembles of several different basis expansions, showing that the best-performing model varies widely. In the second experiment (Section 5.2), we show how model collapse between static and dynamic models can occur, and how the model introduced in Section 4.2 alleviates it. Finally, we show that E-DOEBE can effectively ensemble methods that are static and dynamic, and of different basis expansions (Section 5.3). The metrics we use are the normalized mean square error (n MSE), and the predictive log-likelihood (PLL).
Researcher Affiliation Academia Daniel Waxman EMAIL Department of Electrical and Computer Engineering Stony Brook University Petar M. Djuriฤ‡ EMAIL Department of Electrical and Computer Engineering Stony Brook University
Pseudocode No The paper describes methods and equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes 4. We provide Jax/Objax (Bradbury et al., 2018; Objax Developers, 2020) code at https://www.github.com/danwaxman/Dynamic Online Basis Expansions that only requires the user to specify the design matrix, with several choices already implemented.
Open Datasets Yes Across all experiments, we use several publicly available datasets, ranging in both size and the number of features. A table of dataset statistics is available in Table 1. Friedman #1 and Friedman #2 are synthetic datasets designed to be highly nonlinear, and notably, are i.i.d. The Elevators dataset is related to controlling the elevators on an aircraft. The SARCOS dataset uses simulations of a robotic arm, and Kuka #1 is a similar real dataset from physical experiments. Ca Data is comprised of California housing data, and the task of CPU Small is to predict a type of CPU usage from system properties. Table 1: Dataset statistics, including the number of samples, the number of features, and the original source.
Dataset Splits Yes All hyperparameter optimization was done on the first 1,000 samples of each dataset; since we already assume access, each dataset was additionally standardized in both x and y using the statistics of the first 1,000 samples.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions 'Jax/Objax' as frameworks used, citing their foundational papers (Bradbury et al., 2018; Objax Developers, 2020), but it does not specify explicit version numbers for these or any other software dependencies.
Experiment Setup Yes For RFF GPs, 50 Fourier features were used (so F = 2 50), and for HSGPs, 100/D features were used for each dimension (so F 100). In either case, an SE-ARD kernel was used. For RBF networks, 100 locations were initialized via K-means and then optimized with empirical Bayes, as well as ARD length scales. For dynamic models, ฯƒ2 rw was set to 10 3 following Lu et al. (2022). The initial values of ฯƒ2 ฮธ and ฯƒ2 ฯต were 1.0 and 0.25, respectively. Optimization was performed with Adam (Kingma & Ba, 2014).