reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Incorporating Unlabeled Data into Distributionally Robust Learning

Authors: Charlie Frogner, Sebastian Claici, Edward Chien, Justin Solomon

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine the performance of this new formulation on 14 real data sets and ﬁnd that it often yields eﬀective classiﬁers with nontrivial performance guarantees in situations where conventional DRL produces neither.
Researcher Affiliation	Academia	Charlie Frogner EMAIL Sebastian Claici EMAIL Edward Chien EMAIL Justin Solomon EMAIL Computer Science & Artiﬁcial Intelligence Laboratory (CSAIL) Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode	Yes	Algorithm 1 SGD for distributionally robust learning with unlabeled data Algorithm 2 SGD for distributionally robust active learning
Open Source Code	No	The paper does not provide an explicit statement about releasing code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	In the experiments described in Sections 5.2, 5.1, and 6.3, we use 14 real data sets, taken from the UCI repository (Dua and Graﬀ, 2019).
Dataset Splits	No	The paper describes sampling strategies like 'we sample a small number Nl of labeled examples' (Appendix D.1) or 'sample an initial set ˆZl of 20 labeled examples, with the remaining samples forming the unlabeled set ˆ Xu' (Appendix D.5). However, it does not provide specific train/test/validation dataset splits with exact percentages, sample counts, or citations to predefined splits that would allow for precise reproduction of a fixed data partitioning.
Hardware Specification	Yes	Average wall clock time per iteration is 0.022 seconds on a Xeon E5-2690.
Software Dependencies	No	The paper mentions using the 'Adam optimizer (Kingma and Ba, 2014)' but does not specify any software versions for libraries, programming languages, or other tools used in their implementation.
Experiment Setup	Yes	We solve (37) via its dual (Section 3.3), using the Adam optimizer (Kingma and Ba, 2014) with β1 = 0.9, β2 = 0.999, ϵ = 10−8, and a batch size of 100 and decreasing the learning rate by a factor of 8 every 10000 steps. In all experiments we use the transport cost c((x, y), (x', y')) = \|\|x - x'\|\|^2 + κ\|y - y'\| with κ = 1 and y in {+1, -1} R.