Nearest Neighbor Dirichlet Mixtures
Authors: Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification. Section 4 contains simulation experiments comparing NN-DM with a rich variety of competitors in univariate and multivariate examples, including an assessment of UQ performance. Section 5 contains a real data application, and Section 6 a discussion. |
| Researcher Affiliation | Academia | Shounak Chattopadhyay EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA Antik Chakraborty EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA |
| Pseudocode | Yes | Algorithm 1: Nearest neighbor-Dirichlet mixture algorithm to obtain Monte Carlo samples from the pseudo-posterior of f(x) with Gaussian kernel and normal-inverse Wishart prior. Algorithm 2: Leave-one-out cross-validation for choosing the hyperparameter δ2 0 in nearest neighbor-Dirichlet mixture method. Algorithm 3: Nearest neighbor-Dirichlet mixture algorithm to obtain Monte Carlo samples from the pseudo-posterior of f(x) with Gaussian kernel and normal-inverse gamma prior. |
| Open Source Code | Yes | R package NNDM available at https://github.com/shounakchattopadhyay/NN-DM was used for the numerical experiments. |
| Open Datasets | Yes | We consider 10 choices of f0 from the R package benchden (Mildenberger and Weinert, 2012);... The high time resolution universe survey data (Keith et al., 2010) contain information on sampled pulsar stars. ...The data are publicly available from the University of California at Irvine machine learning repository. |
| Dataset Splits | Yes | In our experiments, we set nt = 500 and R = 20. We create a test data set of 200 stars, among which 23 are pulsar stars. The training size is then varied from 300 to 1800 in increments of 300, each time adding 300 training points by randomly sampling from the entire data leaving out the initial test set. |
| Hardware Specification | Yes | With all the simulations carried out on an M1 Mac Book Pro with 16 GB of RAM. |
| Software Dependencies | Yes | All simulations were carried out using the R programming language (R Core Team, 2018). For Dirichlet process mixture models, we collect 2, 000 Markov chain Monte Carlo (MCMC) samples after discarding a burn-in of 3, 000 samples using the dirichletprocess package (J. Ross and Markwick, 2019)... R package version 0.3.1. |
| Experiment Setup | Yes | In our experiments, we set nt = 500 and R = 20. We set n = 200, 500 with kn = n1/3 + 1. ... The prior hyperparameter choices for the proposed method are µ0 = 0, ν0 = 0.001, γ0 = 1; δ2 0 is chosen via the cross-validation method of Section 2.3. For the multivariate cases, we consider n = 200 and 1000. The number of neighbors is set to k = 10 and the dimension p is chosen from {2, 3, 4, 6}. The hyperparameters for the nearest neighbor-Dirichlet mixture are chosen as µ0 = 0p, ν0 = 0.001, γ0 = p, and Ψ0 = {(γ0 p + 1)δ2 0}Ip = δ2 0 Ip, where the optimal δ2 0 is chosen via cross-validation as described in Section 2.3. We implement the DP-MC with base measure NIWp(0p, 0.01, p, Ip) and a Gamma(2, 4) prior on the concentration parameter as in West (1992). For the NN-DM, we take k = 8 in the univariate case and k = 5 in the bivariate case, α = 0.001, and other hyperparameters chosen as before. |