reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fisher Consistency for Prior Probability Shift

Authors: Dirk Tasche

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The usefulness of this unbiasedness concept is demonstrated with three examples of classiﬁers used for quantiﬁcation: Adjusted Count, EM-algorithm and CDE-Iterate. We ﬁnd that Adjusted Count and EM-algorithm are Fisher consistent. A counter-example shows that CDE-Iterate is not Fisher consistent and, therefore, cannot be trusted to deliver reliable estimates of class probabilities. ... We present the counter-examples as a simulation and estimation experiment that is executed for each of the three following example models: ... Table 1 shows the class 0 prevalence estimates made in the double binormal setting of this section.
Researcher Affiliation	Industry	Dirk Tasche EMAIL Swiss Financial Market Supervisory Authority FINMA Laupenstrasse 27 3003 Bern Switzerland
Pseudocode	Yes	CDE-Iterate algorithm 1) Set initial parameters: k = 0, c(0) 0 = 1, c(0) 1 = 1. 2) Find Bayes classiﬁer under training distribution P(X, Y ): ... 6) If convergence is reached or k = kmax then stop, and accept qk 1 as the CDE-Iterate estimate of Q[Y = 0]. Else continue with step 2.
Open Source Code	No	The R-scripts used for creating the tables and figures of this paper can be received upon request from the author.
Open Datasets	No	The paper uses synthetic data generated through Monte-Carlo simulations based on models (e.g., "classical binormal model"). While it mentions the "artificial data set in Karpov et al. (2016)", it does not provide concrete access information or links to its own generated datasets for public access.
Dataset Splits	Yes	For both data sets we have used stratiﬁed sampling such that the proportion of (xi,tr, yi,tr) with yi,tr = 0 in the training set is exactly P[Y = 0], and the proportion of (xi,te, yi,te) with yi,te = 0 in the test set is exactly Q[Y = 0]. The sample sizes for both the training and the test set samples have been chosen to be 10,000, i.e. m = n = 10, 000.
Hardware Specification	No	The paper does not mention any specific hardware (e.g., GPU/CPU models, memory, or cloud resources) used for running its experiments.
Software Dependencies	No	The paper mentions "Logistic regression as coded by R Core Team (2014)". While it names R as the software, it does not provide specific version numbers for R or any other libraries/packages used.
Experiment Setup	Yes	For this section s numerical experiment, the following parameter values have been chosen: µ = 0, ν = 2, σ = 1. ... For each model, we consider a training set with class probabilities 50%, combined with test sets with class 0 probabilities 1%, 5%, 10%, 30%, 50%, 70%, 90%, 95% and 99%.