Variance-based Regularization with Convex Objectives
Authors: John Duchi, Hongseok Namkoong
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems. Keywords: variance regularization, robust optimization, empirical likelihood |
| Researcher Affiliation | Academia | John Duchi EMAIL Department of Statistics and Electrical Engineering Stanford University Stanford, CA 94305, USA Hongseok Namkoong EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | We summarize this discussion with pseudo-code in Figures 6 and 7, which provide a main routine and sub-routine for finding the optimal vector p. Figure 6. Procedure Find P to find the vector p minimizing Pn i=1 pizi subject to the constraint 1 2n np 1 2 2 ρ. Method takes log 1 ϵ iterations of the loop. Figure 7. Procedure Find Shift to find index i and parameter η such that, for the definition vi = 1 λzi, we have vj η 0 for j i, vj η 0 for j > i, and Pn j=1 (vj η)+ = 1. Method requires time O(log n). |
| Open Source Code | Yes | . Code is available at https://github.com/hsnamkoong/robustopt. |
| Open Datasets | Yes | For our second experiment, we compare our robust regularization procedure to other regularizers using the HIV-1 protease cleavage dataset from the UCI ML-repository (Lichman, 2013). For our final experiment, we consider a multi-label classification problem with a reasonably large dataset. The Reuters RCV1 Corpus (Lewis et al., 2004) has 804,414 examples with d = 47,236 features, where feature j is an indicator variable for whether word j appears in a given document. |
| Dataset Splits | Yes | For validation, we perform 50 experiments, where in each experiment we randomly select 9/10 of the data to train the model, evaluating its performance on the held out fraction (test). We partition the data into ten equally-sized sub-samples and perform ten validation experiments, where in each experiment we use one of the ten subsets for fitting the logistic models and the remaining nine partitions as a test set to evaluate performance. |
| Hardware Specification | No | The paper describes various experiments in Sections 5.2, 5.3, and 5.4, including simulation experiments, protease cleavage experiments, and document classification. However, it does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to conduct these experiments. |
| Software Dependencies | No | The paper mentions using a "gradient descent-based procedure" and "backtracking (Armijo) line search (Boyd and Vandenberghe, 2004, Chapter 9.2)" but does not provide specific version numbers for any software libraries, programming languages, or tools used in the implementation of their experiments. For example, it does not state "Python 3.x" or "PyTorch 1.x". |
| Experiment Setup | No | The paper mentions using "gradient descent on the robust risk Rn( , Pn), with stepsizes chosen by a backtracking (Armijo) line search" in Section 5.1. While this describes a general optimization strategy, it does not provide specific hyperparameter values such as the learning rate, batch size, number of epochs, or other model-specific configurations used in their experiments. It also mentions different constraint sets for regularization (e.g., L1-constraints with r {50, 100, 500, 1000, 5000}) but does not detail other hyperparameters for training. |