reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dropout Training is Distributionally Robust Optimal

Authors: José Blanchet, Yang Kang, José Luis Montiel Olea, Viet Anh Nguyen, Xuhui Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Numerical Experiments We conduct numerical experiments in this section to compare our preferred implementation of dropout training to stochastic gradient descent, as well as our recommended selection of δ to cross-validation. The beneﬁts of our suggested unbiased multi-level Monte Carlo algorithm are analyzed using high-dimensional regression, whereas our selection of δ is analyzed using a low-dimensional regression model. [...] Figure 1 shows the l2 divergence from the true β n of the two algorithms for varying L, while Figure 2 and Figure 3 show the l and l1 divergence, respectively. We provide supporting evidence in Appendix A.7 to argue our choice of the learning rate, initialization, and wall-clock time, where our proposed algorithm is robust to any reasonable choices. [...] Table 1: Frequency of in-sample loss covering the true population loss.
Researcher Affiliation	Academia	Jos e Blanchet EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA; Yang Kang EMAIL Department of Statistics Columbia University New York, NY 10027, USA; Jos e Luis Montiel Olea EMAIL Department of Economics Cornell University Ithaca, NY 14850, USA; Viet Anh Nguyen EMAIL Department of Systems Engineering and Engineering Management Chinese University of Hong Kong Hong Kong SAR; Xuhui Zhang EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA
Pseudocode	Yes	6.5 Algorithm for the Unbiased Multilevel Monte Carlo We present a parallelized version using L processors, which works even when L = 1. Parallel computing reduces the variance of the estimator, so we suggest using as many processors as are available per run. Fix an integer m0 N such that 2m0+1 2d. For each processor l = 1, . . . , L we consider the following steps. i) Take a random (integer) draw, m l , from a geometric distribution with parameter ii) Given m l , take 2K l +1 i.i.d. draws from the d-dimensional vector ξi Q 1 . . . Q d , K l m0 + m l .
Open Source Code	No	The paper discusses various algorithms and their implementations but does not provide any explicit statement about releasing its source code or a link to a code repository.
Open Datasets	No	Our simulation setting considers a linear regression model with a covariate vector having dimensionality d = 100 and sample size n = 50. We pick a known regression coeﬃcient β0 Rd being a vector with all entries equal to 1. With ﬁxed coeﬃcients, we assume the covariate vector follows an independent Gaussian, as well as for the regression noise. More speciﬁcally, we can get our 50 observations (xi, yi) via sampling xi N(0, Id), i = 1, . . . , n, sampling yi R conditional on xi, where yi is given by the linear assumption and εi are i.i.d. random noise following N(0, 102), for i = 1, . . . , n.
Dataset Splits	No	The paper describes a simulation setting where data is programmatically generated for numerical experiments (Section 7.1). It does not involve pre-existing datasets that require explicit training/test/validation splits.
Hardware Specification	Yes	We run our simulation on a cluster with two Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40 GHz processors (with 10 cores each), and a total memory of 128 GB.
Software Dependencies	No	The paper describes algorithms and their conceptual implementation (e.g., stochastic gradient descent, multi-level Monte Carlo) but does not specify any particular programming languages, libraries, or frameworks with version numbers used for its experimental setup.
Experiment Setup	Yes	Standard SGD algorithm with a learning rate 0.0001 and initialization at the origin. [...] Multi-level Monte Carlo algorithm with a geometric rate r = 0.6 and a burn-in period m0 = 5. Note that in each parallel run, we use gradient descent (GD) with 0.01 learning rate and initialization at origin for steps iii) and iv) in Section 6.4.