Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Inconsistency of Pitman-Yor Process Mixtures for the Number of Components
Authors: Jeffrey W. Miller, Matthew T. Harrison
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this manuscript, we prove that under fairly general conditions, when using a Dirichlet process mixture, the posterior on the number of clusters will not concentrate at any finite value, and therefore will not be consistent for the number of components in a finite mixture. In fact, our results apply to a large class of nonparametric mixtures including DPMs, and Pitman Yor process mixtures (PYMs) more generally, over a wide variety of families of component distributions. |
| Researcher Affiliation | Academia | Jeffrey W. Miller EMAIL Matthew T. Harrison EMAIL Division of Applied Mathematics Brown University Providence, RI 02912, USA |
| Pseudocode | No | The paper describes theoretical proofs and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks. The methods are explained in textual mathematical notation. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | We begin with a motivating example. In population genetics, determining the population structure is an important step in the analysis of sampled data. To illustrate, consider the impala, a species of antelope in southern Africa. ... Lorenzen et al. (2006) collected samples from 216 impalas, and analyzed the genetic variation between/within the two subspecies. |
| Dataset Splits | No | The paper mentions data from Lorenzen et al. (2006) and simulated data, but it does not provide specific details about how these datasets are split into training, validation, or test sets for experimental reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the described simulations or analyses. |
| Software Dependencies | No | The paper mentions using 'Gibbs sampling (Mac Eachern, 1994; Neal, 2000)' for estimates, and 'Structure (Pritchard et al., 2000)' and 'Structurama (Huelsenbeck and Andolfatto, 2007)' as tools used in related works, but it does not specify version numbers for any software dependencies directly used in their own work or simulations. |
| Experiment Setup | Yes | For both (a) and (b), estimates were made via Gibbs sampling (Mac Eachern, 1994; Neal, 2000), with 10^5 burn-in sweeps and 2 × 10^5 sample sweeps. |