reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Forward Approach for Sufficient Dimension Reduction in Binary Classification

Authors: Jongkyeong Kang, Seung Jun Shin

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a simulation study to evaluate the ﬁnite-sample performance of the w OPG method. We set πh = h/10, h = 1, ..., 9, and employ the Gaussian kernel K(x, x ) = exp{ x x 2/(2σ2)} for the RKHS with σ being the median of the pairwise distances between the predictors in the positive and negative classes (Jaakkola et al., 1999). ... Table 1 contains the averaged d(b B, B) over 100 independent repetitions under the models (I) (III) with diﬀerent combinations of (n, p) {500, 1000} {10, 20}. In Table 1, one can observe that SAVE, p Hd, and DR developed under the regression context exhibit worse performance than the rest of the methods carefully designed for the binary response. ... We applied both w OPG-LR and w OPG-SVM to Breast Cancer Coimbra (BCC) data available at the UCI machine learning repository (https://archive.ics.uci.edu/ml/index.php). ... Finally, we conducted a validation study in order to evaluate the eﬀect of SDR in terms of classiﬁcation performance. Toward this, we randomly split the data into training and test sets ... These steps were repeated independently for a hundred times, and Figure 3 compares the boxplots of test error rates for diﬀerent SDR methods.
Researcher Affiliation	Academia	Jongkyeong Kang EMAIL Department of Information Statistics Kangwon National University Gangwon-do, 24341, Korea and Department of Statistics Korea University Seoul, 02841, Korea Seung Jun Shin EMAIL Department of Statistics Korea University Seoul, 02841, Korea
Pseudocode	Yes	Appendix B. Computing Algorithms In this section, we suppress π for the sake of simplicity. Let α = (α 0 , ..., α n ) , and ωij = ws(xi xj). B.1 w OPG-LR For the w OPG-LR, we have the following objective function for (18). ... (followed by equations and descriptions of iterative updates) B.2 w OPG-SVM For the w OPG-SVM, (18) with the hinge loss can be equivalently written as ... (followed by equations and descriptions of iterative updates)
Open Source Code	No	The paper does not provide an explicit statement or link to its source code. While it mentions the Journal of Machine Learning Research and a CC-BY 4.0 license for the paper itself, this does not pertain to the implementation code of the methodology. A dataset repository link (UCI) is provided, but no code repository.
Open Datasets	Yes	We applied both w OPG-LR and w OPG-SVM to Breast Cancer Coimbra (BCC) data available at the UCI machine learning repository (https://archive.ics.uci.edu/ml/index.php). The BCC data contains breast cancer diagnosis results for 116 patients with nine continuous predictors including age, body mass index, and seven measurements from the blood test, i.e., glucose, insulin, homeostatic model assessment (HOMA), leptin, adiponectin, resistin, and monocyte chemoattractant protein-1 (MCP-1). See Hosni et al. (2019) for more details about the data.
Dataset Splits	Yes	Finally, we conducted a validation study in order to evaluate the eﬀect of SDR in terms of classiﬁcation performance. Toward this, we randomly split the data into training and test sets denoted by Dtr = {(ytr 1 , xtr 1 ), ..., (ytr 58, xtr 58)} and Dts = {(yts 1 , xts 1 ), ..., (yts 58, xts 58)}, respectively. We then applied various SDR methods to Dtr and obtained the estimated basis of SY \|X denoted by b Btr.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU or CPU models, or specific cloud instances.
Software Dependencies	No	The paper does not provide specific details about ancillary software, such as library names with version numbers, that are needed to replicate the experiment.
Experiment Setup	Yes	We set πh = h/10, h = 1, ..., 9, and employ the Gaussian kernel K(x, x ) = exp{ x x 2/(2σ2)} for the RKHS with σ being the median of the pairwise distances between the predictors in the positive and negative classes (Jaakkola et al., 1999). Tuning parameters λ and θℓ, ℓ= 0, 2, ..., p, and the bandwidth parameter s are chosen as described in Section 4.2. ... In this article, we set πh = h/(H + 1), h = 1, ..., H with H = 9, i.e., πh = h/10, h = 1, ..., 0, and thus δ = 0.1.