Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms
Authors: Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric Xing
ICML 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on structural SVM and Group Fused Lasso, and observe significant speedups over competing state-of-the-art (and synchronous) methods. |
| Researcher Affiliation | Academia | Yu-Xiang Wang EMAIL Veeranjaneyulu Sadhanala EMAIL Wei Dai EMAIL Willie Neiswanger EMAIL Suvrit Sra EMAIL Eric P. Xing EMAIL Carnegie Mellon University, 5000 Forbes Ave, PA 15213, USA Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA |
| Pseudocode | Yes | Pseudocode of our scheme is given in Algorithm 1. Algorithm 1 AP-BCFW: Asynchronous Parallel Block Coordinate Frank-Wolfe (distributed) |
| Open Source Code | No | The paper does not provide any statement about making the source code available or a link to a code repository. |
| Open Datasets | Yes | In our simulation, we re-use the structural SVM setup from Lacoste-Julien et al. (2013) for a sequence labeling task on a subset of the OCR dataset (Taskar et al., 2004) (n = 6251, d = 4082). |
| Dataset Splits | No | The paper mentions using a 'subset of the OCR dataset' and a 'synthetic dataset' but does not provide specific details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All shared-memory experiments were implemented in C++ and conducted on a 16-core machine with Intel(R) Xeon(R) CPU E5-2450 2.10GHz processors and 128G RAM. |
| Software Dependencies | No | The paper mentions implementation in C++ but does not provide version numbers for any specific software libraries, frameworks, or dependencies. |
| Experiment Setup | Yes | We use λ = 1 with weighted averaging and line-search throughout (no delay is allowed). We use λ = 0.01 and a primal suboptimality threshold as our convergence criterion. We first fix the number of workers at T = 8 and vary the mini-batch size τ. |