reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variance-based Regularization with Convex Objectives

Authors: John Duchi, Hongseok Namkoong

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classiﬁcation problems. Keywords: variance regularization, robust optimization, empirical likelihood
Researcher Affiliation	Academia	John Duchi EMAIL Department of Statistics and Electrical Engineering Stanford University Stanford, CA 94305, USA Hongseok Namkoong EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA
Pseudocode	Yes	We summarize this discussion with pseudo-code in Figures 6 and 7, which provide a main routine and sub-routine for ﬁnding the optimal vector p. Figure 6. Procedure Find P to find the vector p minimizing Pn i=1 pizi subject to the constraint 1 2n np 1 2 2 ρ. Method takes log 1 ϵ iterations of the loop. Figure 7. Procedure Find Shift to find index i and parameter η such that, for the deﬁnition vi = 1 λzi, we have vj η 0 for j i, vj η 0 for j > i, and Pn j=1 (vj η)+ = 1. Method requires time O(log n).
Open Source Code	Yes	. Code is available at https://github.com/hsnamkoong/robustopt.
Open Datasets	Yes	For our second experiment, we compare our robust regularization procedure to other regularizers using the HIV-1 protease cleavage dataset from the UCI ML-repository (Lichman, 2013). For our ﬁnal experiment, we consider a multi-label classiﬁcation problem with a reasonably large dataset. The Reuters RCV1 Corpus (Lewis et al., 2004) has 804,414 examples with d = 47,236 features, where feature j is an indicator variable for whether word j appears in a given document.
Dataset Splits	Yes	For validation, we perform 50 experiments, where in each experiment we randomly select 9/10 of the data to train the model, evaluating its performance on the held out fraction (test). We partition the data into ten equally-sized sub-samples and perform ten validation experiments, where in each experiment we use one of the ten subsets for ﬁtting the logistic models and the remaining nine partitions as a test set to evaluate performance.
Hardware Specification	No	The paper describes various experiments in Sections 5.2, 5.3, and 5.4, including simulation experiments, protease cleavage experiments, and document classification. However, it does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to conduct these experiments.
Software Dependencies	No	The paper mentions using a "gradient descent-based procedure" and "backtracking (Armijo) line search (Boyd and Vandenberghe, 2004, Chapter 9.2)" but does not provide specific version numbers for any software libraries, programming languages, or tools used in the implementation of their experiments. For example, it does not state "Python 3.x" or "PyTorch 1.x".
Experiment Setup	No	The paper mentions using "gradient descent on the robust risk Rn( , Pn), with stepsizes chosen by a backtracking (Armijo) line search" in Section 5.1. While this describes a general optimization strategy, it does not provide specific hyperparameter values such as the learning rate, batch size, number of epochs, or other model-specific configurations used in their experiments. It also mentions different constraint sets for regularization (e.g., L1-constraints with r {50, 100, 500, 1000, 5000}) but does not detail other hyperparameters for training.