reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regularization via Mass Transportation

Authors: Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Peyman Mohajerin Esfahani

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical out-of-sample guarantees through simulated and empirical experiments. Numerical experiments are reported in Section 6.
Researcher Affiliation	Academia	Soroosh Shaﬁeezadeh-Abadeh EMAIL Daniel Kuhn EMAIL Risk Analytics and Optimization Chair, EPFL, Switzerland Peyman Mohajerin Esfahani EMAIL Delft Center for Systems and Control TU Delft, The Netherlands
Pseudocode	No	The paper describes algorithms such as stochastic proximal gradient descent (W k+1 m = proxηkρ Wm W k m ηk Wmℓ(h(bxik; W k [M]), byik) m [M]) and provides descriptions of experimental procedures (e.g., training phase split into epochs, step size reduction) but does not present these in structured pseudocode or algorithm blocks.
Open Source Code	Yes	All experiments are run on an Intel XEON CPU (3.40GHz), and the corresponding codes are made publicly available at https://github.com/sorooshafi ee/Regularization-via-Transportation.
Open Datasets	Yes	We showcase the power of regularization via mass transportation in various applications based on standard datasets from the literature. ... from the MNIST database (Le Cun et al., 1998) ... from the UCI repository (Bache and Lichman, 2013). ... PASCAL VOC 2007 dataset Everingham et al. (2010) ... Image Net dataset (Krizhevsky et al., 2012) ... synthetic threenorm classiﬁcation problem (Breiman, 1996).
Dataset Splits	Yes	In each trial, we randomly select 500 images to train the DRSVM model (20) and use the remaining images for testing. ... In each trial, we randomly select 75% of the data for training and the remaining 25% for testing. ... PASCAL VOC 2007 dataset ... pre-partitioned into 25% for training, 25% for validation and 50% for testing. ... In each trial we generate N training samples for some N {10, . . . , 90} {100, . . . , 1,000} as well as 105 test samples.
Hardware Specification	Yes	All experiments are run on an Intel XEON CPU (3.40GHz), and the corresponding codes are made publicly available at https://github.com/sorooshafi ee/Regularization-via-Transportation.
Software Dependencies	Yes	All optimization problems are implemented in Python and solved with Gurobi 7.5.1
Experiment Setup	Yes	In the ﬁrst experiment we optimize over linear hypotheses and use the separable transporation metric (16) involving the -norm on the input space. All results are averaged over 100 independent trials. In each trial, we randomly select 500 images to train the DRSVM model (20) and use the remaining images for testing. The correct classiﬁcation rate (CCR) on the test data, averaged across all 100 trials, is visualized in Figure 1 as a function of the Wasserstein radius ρ for each κ {0.1, 0.25, 0.5, 0.75, }. The best out-of-sample CCR is obtained for κ = 0.25 uniformly across all Wasserstein radii, and performance deteriorates signiﬁcantly when κ is reduced or increased. ... All free parameters of the resulting DRSVM model are restricted to ﬁnite search grids in order to ease the computational burden of cross validation. Speciﬁcally, we select the Wasserstein radius ρ from within {b 10e : b {1, 5}, e {1, 2, 3, 4}} and the label ﬂipping cost κ from within {0.1, 0.25, 0.5, 0.75, }. Moreover, we select the degree d of the polynomial kernel from within {1, 2, 3, 4, 5} and the peakedness parameter γ of the Laplacian and Gaussian kernels from within { 1 25}. ... The Wasserstein radius ρ and the label ﬂipping cost κ in the DRSVM as well as the regularization weight ρ in the RSVM are estimated via stratiﬁed 5-fold cross validation. ... At the beginning we preprocess the entire dataset by resizing each image to 256 256 pixels and extracting the central patch of 244 244 pixels. ... We tune the Wasserstein radius ρ {b 10e : b {1, . . . , 9}, e { 2, 3, 4}} and the label ﬂipping cost κ {0.1, 0.2, . . . , 1, } via the holdout method using the validation data. ... we replace the original M-th layer of the network with a new fully connected layer characterized by a parameter matrix WM R20 1000, and we set σM to the Sigmoid activation function. ... We use the stochastic proximal gradient descent algorithm of Section 3.4 to tune WM, including an additional momentum term with weight 0.9. As in (Krizhevsky et al., 2012), we split the training phase into 100 epochs, each corresponding to a complete pass through the training dataset in a random order. As the ALEXNET requires input images of size 244 244, in each iteration we extract a random patch of 244 244 pixels from the current image and ﬂip it horizontally at random. This procedure eﬀectively augments the training dataset. The initial step size is set to 10 3 and then reduced by a factor of 10 after every 7 epochs. The algorithm terminates after 100 epochs.