reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Probabilistic Numerical Integration with Tree-Based Models

Authors: Harrison Zhu, Xing Liu, Ruya Kang, Zhichao Shen, Seth Flaxman, Francois-Xavier Briol

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem.
Researcher Affiliation	Academia	Harrison Zhu, Xing Liu Imperial College London EMAIL Ruya Kang Brown University EMAIL Zhichao Shen University of Oxford EMAIL Seth Flaxman Imperial College London EMAIL François-Xavier Briol University College London EMAIL
Pseudocode	Yes	Algorithm 1 Sequential Design for BART-Int
Open Source Code	No	The paper states that an external tool `dbarts` was used ("For BART-Int, we used the default prior settings in dbarts [20]"), but it does not provide a link or explicit statement about the release of its own source code for the methodology described.
Open Datasets	Yes	We use individual-level anonymised census data from the United States [79] ... [79] U.S. Census Bureau. American Community Survey, 2012-2016 ACS 5-Year PUMS Files. Technical report, U.S. Department of Commerce, Janurary 2018.
Dataset Splits	No	The paper describes how data points were selected for sequential design and numerical integration (e.g., "nini = 20d design points", "nseq = 20d additional points"), and how ground truth was computed for evaluation, but it does not specify traditional train/validation/test dataset splits with percentages or counts for model training or hyperparameter tuning.
Hardware Specification	No	The paper discusses computational complexity and run-times (Figure 2) but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "dbarts [20]" for BART-Int but does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	For BART-Int, we used the default prior settings in dbarts [20], whereas for GP-BQ we used a Matérn kernel whose lengthscale was chosen through maximum likelihood. ... The MAPE is given by given by 1/r Σt=1 \|Π[f] − Πˆt[f]\|/\|Π[f]\|, where Πˆt[f] for t = 1, . . . , r, are estimates of Π[f] for r different initial i.i.d. uniform point sets. ... BART-Int (m = 1500, T = 200 m = 1000, T = 50, with a burn-in of 1000 and keeping every 5 samples afterwards) ... The number of post-burn-in samples is chosen to be 10^4. We set γ = 2, di = 0.5i and ci = 0.2i. ... We randomly select our initial set (of size nini = 20) and candidate set (of size S = 10,000).