reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Quantification with Black-Box Estimators

Authors: Albert Ziegler, Paweł Czyż

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare the introduced model against the established point estimators in a variety of scenarios, and show it is competitive, and in some cases superior, with the non-Bayesian alternatives. 4 Experimental results
Researcher Affiliation	Collaboration	Albert Ziegler EMAIL XBOW, Head of AI Uppsala, Sweden Paweł Czyż EMAIL ETH AI Center and Department of Biosystems Science and Engineering ETH Zürich Zürich, Switzerland
Pseudocode	No	The paper describes algorithms like Expectation-Maximization and Gibbs sampler in prose, and refers to the NUTS algorithm, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	The code and workflows used to run the experiments and generate the figures are available in the https: //github.com/pawel-czyz/labelshift repository.
Open Datasets	Yes	Darmanis et al. (2017) collected biopsy specimens from four glioblastoma multiforme tumors corresponding to four different populations of cells. [...] We downloaded the TPM-normalized (Zhao et al., 2021) data sequenced by Darmanis et al. (2017) from the Curated Cancer Cell Atlas.
Dataset Splits	Yes	We fix the data set sizes N = 103 and N = 500 and use L = K = 5 as a default setting. [...] We consider a semi-realistic scenario in which one wants to estimate cell prevalence in an automated fashion employing a given black-box cell type classifier. We treat the first two samples as an auxiliary cell atlas on which a generic black-box cell type classifier was trained (we use a random forest), the third sample as an available labeled data set, {(Xi, Yi)}, and the fourth sample as an unlabeled data set, {X j}, for the quantification problem.
Hardware Specification	Yes	Experiments described in Appendices E.1, E.4, and E.5 were run on a laptop with 32 Gi B RAM and 16 CPU cores clocked at 4680 MHz and finished under six hours. Experiments described in Appendices E.2 and E.3 [...] We ran them sequentially on a cluster equipped with 384 Gi B RAM and 128 CPU cores clocked at 2.25 3.7 GHz.
Software Dependencies	Yes	As a random forest we used the Sci Kit-Learn implementation (Pedregosa et al., 2011, v. 1.4.1) with default hyperparameters and 20 trees.
Experiment Setup	Yes	We fix the data set sizes N = 103 and N = 500 and use L = K = 5 as a default setting. The ground-truth prevalence vectors are parametrized as π = (1/L, . . . , 1/L) and π (r) = r, 1 r L 1, . . . , 1 r L 1 . By default, we use r = 0.7. The ground-truth matrix P(C \| Y ) is parameterized as φ yy = q and φ yk = (1 q)/(K 1) for k = y and K L, with the default value q = 0.85. [...] For each simulated data set, we ran four Markov chains with 500 warm-up steps and 1000 samples each using the NUTS algorithm of Hoffman & Gelman (2014).