Stable Regression: On the Power of Optimization over Randomization
Authors: Dimitris Bertsimas, Ivan Paskov
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we describe our testing methodology, and then present computational results for both unregularized and regularized regression across four metrics: prediction error (MSE), standard deviation of prediction error (i.e., variability of prediction error with respect to the choice of test split), coefficient standard deviation (i.e., how spread out the coefficients are around their mean, where distance is measured via the typical Euclidean formula for vectors, with respect to the choice of test split), and hyperparameter standard deviation (variability of the hyperparameter with respect to the choice of test split). |
| Researcher Affiliation | Academia | Dimitris Bertsimas EMAIL Sloan School of Management and Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA Ivan Paskov EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA |
| Pseudocode | No | The paper uses mathematical formulations of optimization problems (e.g., Problem (1), (2), (3)) and discusses an 'Efficient Algorithm' in Section 3, but it does not present these as structured pseudocode or algorithm blocks with step-by-step instructions. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. The license information provided pertains to the paper's text, not its implementation code. |
| Open Datasets | Yes | We collected 10 data sets from the UCI Machine Learning Repository (Dua and Taniskidou (2017)): Abalone, Auto MPG, Computer Hardware, Concrete, Ecoli, Forest Fires, Glass, Housing, Space Shuttle, Breast Cancer Wisconsin (Diagnostic). |
| Dataset Splits | Yes | For a given data set, we took a random 10% subset of the data and put it to the side as the testing set. We then divided the remaining 90% of the data into training and validation sets. For the randomized procedure, we did this in one of two ways: for the first, by randomly selecting 10%, 20%, 30%, 40%, or 50% of the data and designating it as validation data, and then, respectively, leaving the remaining 90%, 80%, 70%, 60%, or 50% as training data; for the second, applying the previously described k-fold cross validation procedure for k = 5 and k = 10 (only for the regularized case). |
| Hardware Specification | No | The paper includes a section on 'Running Times' (Section 7.1) with average computation times, but it does not provide specific hardware details (e.g., CPU type, GPU models, memory) used for these experiments. |
| Software Dependencies | No | The paper states that Problem (3) 'can be solved by commercial optimization software in very high dimensions,' but it does not specify the name or version of this software or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We then learned the optimal coefficients β from these training/validation splits over a sequence of λ values, and then that pair of λ and β that yielded the smallest error on the validation set was selected. Note for the case of unregularized regression, λ was simply taken to be equal to zero. (...) we append to each of our original problems an L0 constraint, which enforces that exactly k coefficients are nonzero. The specific way we implement this in the case of using optimization to train is via: (...) where in both cases above, M is chosen to be some very large number... |