Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Approximate Bayesian Computation via Classification
Authors: Yuexi Wang, Tetsuya Kaji, Veronika Rockova
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation. Keywords: Approximate Bayesian Computation, Classification, Likelihood-free Inference, Kullback-Leibler Divergence, Posterior Concentration. Section 4 shows performance on simulated datasets and Section 5 further highlights the practical value of our approach on real data. |
| Researcher Affiliation | Academia | Booth School of Business University of Chicago Chicago, IL 60637, USA. Email domains are @chicagobooth.edu for all authors. |
| Pseudocode | Yes | Algorithm 1: KL-ABC with Accept-Reject. Algorithm 2: KL-ABC with Exponential Weighting. |
| Open Source Code | No | The paper does not provide explicit links to source code or statements about its availability. It mentions using 'R package random Forest' and 'R package glmnet' which are third-party tools, but not the authors' own implementation code. |
| Open Datasets | Yes | Following the example in Rogers and Zhou (2008), we examine a small data set of stock prices focusing on two stocks: Boeing (BA) and Proctor & Gamble (PG). The prices were obtained from NYSE (Yahoo Finance), starting from 3rd January 2011 and consisting of 1 000 trading days. |
| Dataset Splits | No | The paper describes generating synthetic data and using real stock price data for inference. However, for the real stock data (Boeing and Proctor & Gamble prices from NYSE), there is no mention of splitting this data into distinct training, validation, or test sets; it appears to be used as a single dataset for Bayesian inference, which typically does not involve such splits. |
| Hardware Specification | No | The paper discusses computational complexity and provides computation times in Appendix G, but it does not specify any particular hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | For the former, we use the default setting in the R package random Forest. For the latter, we implement the discriminator with R package glmnet... The paper mentions specific R packages like 'random Forest' and 'glmnet', but does not provide specific version numbers for these packages or for the R environment itself. No other key software dependencies are listed with versions. |
| Experiment Setup | Yes | The prior on (θ1, θ2 θ1, θ3) is uniform on [0, 10]2 [0, 0.5]. In each experiment, unless otherwise noted, we set the tolerance threshold ϵ adaptively such that 1 000 of 100 000 (i.e. the top 1%) proposed ABC samples are accepted. For the DNN approach, we deploy a 3-layer DNN with 100 neurons and hyperbolic tangent (tanh) activation on each hidden layer. The model is trained on 106 samples and validated on 105 samples, with early stopping once the validation error starts to increase. |