An efficient distributed learning algorithm based on effective local functional approximations
Authors: Dhruv Mahajan, Nikunj Agrawal, S. Sathiya Keerthi, Sundararajan Sellamanickam, Leon Bottou
JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the efficiency of our method by comparing it against several existing distributed training methods on five large data sets. We first discuss our experimental setup. We then briefly list each method considered and then do experiments to decide the best overall setting for each method. This applies to our method too, for which the setting is mainly decided by the choice made for the function approximation, ˆfp; see Subsection 3.2 for details of these choices. Finally, we compare, in detail, all the methods under their best settings. This study clearly demonstrates scenarios under which our method performs better than other methods. |
| Researcher Affiliation | Collaboration | Dhruv Mahajan EMAIL Facebook AI Menlo Park, CA 94025, USA Nikunj Agrawal EMAIL Indian Institute of Technology Dept. of Computer Science & Engineering Kanpur, India S. Sathiya Keerthi EMAIL Office Data Science Group Microsoft Mountain View, CA 94043, USA Sundararajan Sellamanickam EMAIL Microsoft Research Bangalore, India L eon Bottou EMAIL Facebook AI Research New York, NY, USA |
| Pseudocode | Yes | Algorithm 1: Descent method for f Algorithm 2: FADL Function Approximation based Distributed Learning. com: communication; cmp: = computation; agg: aggregation. M is the optimizer used for minimizing ˆfp. |
| Open Source Code | No | The text does not contain any explicit statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We consider the following publicly available datasets having a large number of examples:9 kdd2010, url, webspam, mnist8m and rcv. ... 9 These datasets are available at: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. |
| Dataset Splits | No | The paper mentions using "a small validation set" for tuning the regularizer and evaluating performance using AUPRC, but it does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit methodology for creating splits). |
| Hardware Specification | Yes | We ran all our experiments on a Hadoop cluster with 379 nodes and 10 Gbit interconnect speed. Each node has Intel (R) Xeon (R) E5-2450L (2 processors) running at 1.8 GHz. The communication bandwidth is 1 Gbps (gigabits per sec). |
| Software Dependencies | No | The paper mentions using specific optimization methods like "Trust Region Newton method (TRON)" and "conjugate-gradient method," and a framework "Hadoop." However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We use the squared-hinge loss function for all the experiments. Unless stated differently, for all numerical optimizations we use the Trust Region Newton method (TRON) proposed in Lin et al. (2008). ... We use α = 10^-4, β = 0.9 in (4) and (5). ... For optimizing the quadratic approximation in (10) with (14)-(15), we used the conjugate-gradient method (Shewchuk, 1994). For all other (non-quadratic) approximations of FADL as well as all nonlinear solvers needed by other methods, we used TRON. ... Each method was terminated when it reached within 0.1% of the steady state AUPRC value achieved by full, perfect training of (8). |