More Efficient Estimation for Logistic Regression with Optimal Subsamples
Authors: HaiYing Wang
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of the more efficient estimators in terms of both estimation efficiency and computational efficiency in this section. For simulation, to compare with the original OSMAC estimator, we use exactly the same setup used in Section 5.1 of Wang et al. (2018). Specifically, the full data sample size N = 10, 000 and the true value of β, βt, is a 7 1 vector of 0.5. The following 6 distributions of x are considered: multivariate normal distribution with mean zero (mz Normal), multivariate normal distribution with nonzero mean (nz Normal), multivariate normal distribution with mean zero and unequal variances (ue Normal), mixture of two multivariate normal distributions with different means (mix Normal), multivariate t distribution with degrees of freedom 3 (T3), and exponential distribution (EXP). [...] Figure 1 presents the relative efficiency of βuw and βp based on two different choices of πOS i : πAopt i and πLopt i . It is seen that in general βuw and βp are more efficient than βw. [...] We also calculate the empirical unconditional MSE by generating the full data in each repetition of the simulation. The results are similar and thus are omitted. To evaluate the performance of the proposed method with different choices of the subsampling probabilities for subsampling with replacement and Poisson subsampling, Figure 2 plots empirical MSEs of using πAopt, πLopt, πlcc (local case-control), and the uniform subsampling probability. |
| Researcher Affiliation | Academia | Hai Ying Wang EMAIL Department of Statistics University of Connecticut Storrs, CT 06269, USA |
| Pseudocode | Yes | Algorithm 1 More efficient estimation based on subsampling with replacement [...] Algorithm 2 More efficient estimation based on Poisson subsampling |
| Open Source Code | No | The paper does not contain any statement about releasing the code for the methodology described, nor any specific links to a code repository. |
| Open Datasets | Yes | We also apply the more efficient estimation methods to a supersymmetric (SUSY) benchmark data set (Baldi et al., 2014) available from the Machine Learning Repository (Dua and Karra Taniskidou, 2017). |
| Dataset Splits | Yes | We fixed the first step sample size n0 = 200 and choose n to be 100, 200, 400, 600, 800, and 1000. This is the same setup used in Wang et al. (2018). [...] We use the more efficient estimation methods with subsample size n to estimate parameters in logistic regression. Figures 4 gives the relative efficiency of βuw and βp to βw for both πLopt i and πAopt i . |
| Hardware Specification | Yes | All methods are implemented in the R programming language (R Core Team, 2017), and computations are carried out on a desktop running Ubuntu Linux 16.04 with an Intel I7 processor and 16GB RAM. Only one logical CPU is used for the calculation. [...] We also use a smaller computer with 8GB RAM to implement the method. |
| Software Dependencies | Yes | All methods are implemented in the R programming language (R Core Team, 2017) |
| Experiment Setup | Yes | The full data sample size N = 10, 000 and the true value of β, βt, is a 7 1 vector of 0.5. [...] We fixed the first step sample size n0 = 200 and choose n to be 100, 200, 400, 600, 800, and 1000. This is the same setup used in Wang et al. (2018). [...] We set the value of d to d = 50, the values of N to be N = 104, 105, 106 and 107, and the subsample sizes to be n0 = 200 and n = 1000. |