reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Authors: Virginia Smith, Simone Forte, Chenxin Ma, Martin Takáč, Michael I. Jordan, Martin Jaggi

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we demonstrate the empirical performance of Co Co A in the distributed setting. We ﬁrst compare Co Co A to competing methods for two common machine learning applications: lasso regression (Section 6.1) and support vector machine (SVM) classiﬁcation (Section 6.2). We then explore the performance of Co Co A in the primal versus the dual directly by solving an elastic net regression model with both variants (Section 6.3). Finally, we illustrate general properties of the Co Co A method empirically in Section 6.4.
Researcher Affiliation	Academia	Virginia Smith EMAIL Department of Computer Science Stanford University Stanford, CA 94305, USA; Simone Forte EMAIL Department of Computer Science ETH Zürich 8006 Zürich, Switzerland; Chenxin Ma EMAIL Industrial and Systems Engineering Department Lehigh University Bethlehem, PA 18015, USA; Martin Takáč EMAIL Industrial and Systems Engineering Department Lehigh University Bethlehem, PA 18015, USA; Michael I. Jordan EMAIL Division of Computer Science and Department of Statistics University of California Berkeley, CA 94720, USA; Martin Jaggi EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland
Pseudocode	Yes	Algorithm 1 Generalized Co Co A Distributed Framework; Algorithm 2 Co Co A-Primal (Mapping Problem (I) to (A)); Algorithm 3 Co Co A-Dual (Mapping Problem (I) to (B))
Open Source Code	Yes	All algorithms for comparison are implemented in Apache Spark and run on Amazon EC2 clusters. Our code is available at: gingsmith.github.io/cocoa/.
Open Datasets	Yes	Table 5: Datasets for Empirical Study includes: url, epsilon, kddb, webspam. We demonstrate these gains with an extensive experimental comparison on real-world distributed datasets.
Dataset Splits	No	The paper provides 'Training Size' and 'Feature Size' in Table 5, but does not specify explicit training/test/validation splits (e.g., percentages or counts) or reference standard splits.
Hardware Specification	Yes	All experiments are run on Amazon EC2 clusters of m3.xlarge machines with one core per machine.
Software Dependencies	Yes	All experiments are run on Amazon EC2 clusters of m3.xlarge machines, with one core per machine. The code for each method is written in Apache Spark, v1.5.0. Our code is open source and publicly available at gingsmith.github.io/cocoa/. Mb-SGD: Mini-batch stochastic gradient. For our experiments with lasso, we compare against Mb-SGD with an L1-prox. The first three methods are optimized and implemented in Apache Spark s MLlib (v1.5.0) (Meng et al., 2016).
Experiment Setup	Yes	We carefully tune each competing method in our experiments for best performance. ADMM requires the most tuning, both in selecting the penalty parameter ρ and in solving the subproblems... For Mb-SGD, we tune the step size and mini-batch size parameters. For Mb-CD and Mb-SDCA, we scale the updates at each round by β/b for mini-batch size b and β ∈ [1, b], and tune both parameters b and β. The only parameter inﬂuencing the overall performance of Co Co A is the level of approximation quality, which we parameterize in the experiments through H, the number of local iterations of the iterative method run locally.