Incorporating Unlabeled Data into Distributionally Robust Learning

Authors: Charlie Frogner, Sebastian Claici, Edward Chien, Justin Solomon

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine the performance of this new formulation on 14 real data sets and find that it often yields effective classifiers with nontrivial performance guarantees in situations where conventional DRL produces neither.
Researcher Affiliation Academia Charlie Frogner EMAIL Sebastian Claici EMAIL Edward Chien EMAIL Justin Solomon EMAIL Computer Science & Artificial Intelligence Laboratory (CSAIL) Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode Yes Algorithm 1 SGD for distributionally robust learning with unlabeled data Algorithm 2 SGD for distributionally robust active learning
Open Source Code No The paper does not provide an explicit statement about releasing code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes In the experiments described in Sections 5.2, 5.1, and 6.3, we use 14 real data sets, taken from the UCI repository (Dua and Graff, 2019).
Dataset Splits No The paper describes sampling strategies like 'we sample a small number Nl of labeled examples' (Appendix D.1) or 'sample an initial set ˆZl of 20 labeled examples, with the remaining samples forming the unlabeled set ˆ Xu' (Appendix D.5). However, it does not provide specific train/test/validation dataset splits with exact percentages, sample counts, or citations to predefined splits that would allow for precise reproduction of a fixed data partitioning.
Hardware Specification Yes Average wall clock time per iteration is 0.022 seconds on a Xeon E5-2690.
Software Dependencies No The paper mentions using the 'Adam optimizer (Kingma and Ba, 2014)' but does not specify any software versions for libraries, programming languages, or other tools used in their implementation.
Experiment Setup Yes We solve (37) via its dual (Section 3.3), using the Adam optimizer (Kingma and Ba, 2014) with β1 = 0.9, β2 = 0.999, ϵ = 10−8, and a batch size of 100 and decreasing the learning rate by a factor of 8 every 10000 steps. In all experiments we use the transport cost c((x, y), (x', y')) = ||x - x'||^2 + κ|y - y'| with κ = 1 and y in {+1, -1} R.