Automatic Differentiation in Machine Learning: a Survey
Authors: Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. |
| Researcher Affiliation | Academia | Atılım Guneş Baydin Department of Engineering Science University of Oxford Oxford OX1 3PJ, United Kingdom; Barak A. Pearlmutter Department of Computer Science National University of Ireland Maynooth Maynooth, Co. Kildare, Ireland; Alexey Andreyevich Radul Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139, United States; Jeffrey Mark Siskind School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907, United States |
| Pseudocode | Yes | Table 2: Forward mode AD example, with y = f(x1, x2) = ln(x1)+x1x2 sin(x2) evaluated at (x1, x2) = (2, 5) and setting x1 = 1 to compute ∂y/∂x1. The original forward evaluation of the primals on the left is augmented by the tangent operations on the right, where each line complements the original directly to its left. [...] Table 3: Reverse mode AD example, with y = f(x1, x2) = ln(x1)+x1x2 sin(x2) evaluated at (x1, x2) = (2, 5). After the forward evaluation of the primals on the left, the adjoint operations on the right are evaluated in reverse (cf. Figure 1). Note that both ∂y/∂x1 and ∂y/∂x2 are computed in the same reverse pass, starting from the adjoint ¯v5 = ∂y/∂y = 1. |
| Open Source Code | No | The paper is a survey and does not present new methodology for which it would release source code. It references numerous external open-source projects and tools (e.g., autograd, Chainer, PyTorch, DiffSharp) that implement AD techniques, but it does not provide its own implementation code for its described methodology. |
| Open Datasets | No | The paper is a survey and does not conduct its own experiments using datasets. It discusses a benchmark based on the Helmholtz free energy function (a mathematical function) to illustrate AD performance, but this is not a dataset in the typical machine learning sense, nor is it collected or used by the authors for their own empirical validation. |
| Dataset Splits | No | The paper is a survey and does not conduct its own experiments or use datasets, therefore, no dataset splits are discussed or provided. |
| Hardware Specification | Yes | Times are measured by averaging a thousand runs on a machine with Intel Core i7-4785T 2.20 GHz CPU and 16 GB RAM, using DiffSharp 0.5.7. |
| Software Dependencies | Yes | Times are measured by averaging a thousand runs on a machine with Intel Core i7-4785T 2.20 GHz CPU and 16 GB RAM, using DiffSharp 0.5.7. |
| Experiment Setup | No | This paper is a survey of Automatic Differentiation in Machine Learning and does not present new experimental results or conduct its own experiments. Therefore, it does not provide specific experimental setup details such as hyperparameters or training configurations for its own work. |