Nonparametric Bayesian Aggregation for Massive Data
Authors: Zuofeng Shang, Botao Hao, Guang Cheng
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 5 provides a simulation study to justify our methods. Section 6 applies the proposed procedures to a real dataset of large size. |
| Researcher Affiliation | Academia | Zuofeng Shang EMAIL Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, USA; Botao Hao EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA; Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA |
| Pseudocode | No | The paper describes the aggregation procedures (e.g., in Section 2.2 and 4.2) using mathematical formulas and descriptive text, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Other results and additional plots are given in a supplementary document Shang and Cheng. This statement does not indicate that the source code for the methodology described in the paper is openly available. No specific repository link or explicit code release statement is provided. |
| Open Datasets | No | The MSD is a perfect example of large dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. ... The data consists of flight arrival and departure information for all commercial flights within the United States, from October 1987 to April 2008. While these datasets are mentioned as available, the paper does not provide specific links, DOIs, repository names, or formal citations with author names and years for accessing them. |
| Dataset Splits | No | For ACR and FCR, we chose the number of divisions s = 1,2,3,4,5,6,8,10,12,15,20,24,30,40,60. ... We randomly split the observations to s = 5,10,20 subsets. ... We randomly split the observations to s = 10,100,500 subsamples. This text describes how the total dataset was split into subsets for the distributed aggregation procedure, not how it was partitioned into training, testing, or validation sets for model evaluation and reproduction. No specific percentages or sample counts for conventional dataset splits are provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to conduct the experiments or simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We choose m = β = 2 in our GP prior (4). Results are based on N = 1200 observations generated from (1) and a GP prior (4) with m = β = 2 and λ = N^-2/3. ... A set of credibility levels were examined, i.e., 1-α = 0.1,0.3,0.5,0.7,0.9,0.95. |