Sufficient reductions in regression with mixed predictors
Authors: Efstathia Bura, Liliana Forzani, Rodrigo Garcia Arancibia, Pamela Llop, Diego Tomassi
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the performance of the proposed method and compare it with other approaches through simulations and real data examples. Section 5 contains an extensive simulation study that demonstrates the competitive performance of our approach. Furthermore, we show the superior performance of our methods as compared with generalized linear models and a version of principal component regression that allows for mixed predictors in the analysis of three data sets in Section 6. |
| Researcher Affiliation | Academia | Efstathia Bura EMAIL Institute of Statistics and Mathematical Methods in Economics Faculty of Mathematics and Geoinformation TU Wien Vienna, 1040, Austria; Liliana Forzani EMAIL Facultad de Ingenier ıa Qu ımica Universidad Nacional del Litoral Researcher of CONICET Santa Fe, Argentina; Rodrigo Garc ıa Arancibia EMAIL Instituto de Econom ıa Aplicada Litoral-FCE-UNL Universidad Nacional del Litoral Researcher of CONICET Santa Fe, Argentina; Pamela Llop EMAIL Facultad de Ingenier ıa Qu ımica Universidad Nacional del Litoral Researcher of CONICET Santa Fe, Argentina; Diego Tomassi EMAIL Facultad de Ingenier ıa Qu ımica Universidad Nacional del Litoral Researcher of CONICET Santa Fe, Argentina |
| Pseudocode | No | The paper describes its methodology using mathematical formulations and prose. It does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | The R code we used in both simulations and real data analyses in Section 6 can be found at https://github.com/lforzani/SDR mixed predictions. |
| Open Datasets | Yes | Krzanowski (1975) studied the problem of discriminating between two groups... We analyze four of the five data sets in Krzanowski s paper... Governance Indicators and per capita GDP data can be downloaded from Worldwide Governance Indicators and The World Bank Data, respectively. |
| Dataset Splits | Yes | The prediction error is computed as ||PαT (XN,HN) PbαT (XN,HN)||2, where (XN, HN) is a new sample of size N = 2000 that is independent of the training sample. In Table 4 we report the leave-one-out misclassification rate... The average of the leave-one-out mean square prediction errors of the linear and kernel regression models in Table 5, provides an unbiased measure of predictive performance. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. It focuses on the methodology and results without specifying details such as GPU models, CPU types, or other computational resources. |
| Software Dependencies | No | The R code we used in both simulations and real data analyses in Section 6 can be found at https://github.com/lforzani/SDR mixed predictions. Using the np R package, the value of the nonparametric version of R2 is 0.32 for the PFC-based CG index, which is much lower than 0.54, the value for the PCA-based index. |
| Experiment Setup | Yes | In all our simulations, the response is generated from the uniform distribution on the integers {1, . . . , r +1}, with r = 5, and we set fy = I(y = j) nj/n, where I is the indicator function, n denotes the total sample size and nj the number of observations in category j, for j = 1, . . . , r. All reported results are based on sample sizes n = 100, 200, 300, 500, 750, and 100 repetitions. Selection of the hyperparameters (λ, γ) in (39) is carried out via 10-fold cross validation and minimizing the prediction error as optimization criterion. The procedure starts by estimating an upper bound λm so that the whole estimate vanishes for any λ > λm. We then set a grid of nλ candidate values for λ, uniformly spaced on a logarithmic scale between 0 and λm. Here, we set nλ = 100. For γ, we consider 11 values uniformly spaced in [0, 1]. |