PyTorch: An Imperative Style, High-Performance Deep Learning Library

Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

NeurIPS 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Evaluation In this section we compare the performance of Py Torch with several other commonly-used deep learning libraries, and find that it achieves competitive performance across a range of tasks. All experiments were performed on a workstation with two Intel Xeon E5-2698 v4 CPUs and one NVIDIA Quadro GP100 GPU. 6.1 Asynchronous dataflow We start by quantifying the ability of Py Torch to asynchronously execute dataflow on GPU. We use the built-in profiler [44] to instrument various benchmarks and record a timeline of the execution of a single training step. 6.2 Memory management We used the NVIDIA profiler to trace the execution of the CUDA runtime as well as the execution of the CUDA kernels launched during one training iteration of the Res Net-50 model. 6.3 Benchmarks Finally, we can get an overall sense of single-machine eager mode performance of Py Torch by comparing it to three popular graph-based deep learning frameworks (CNTK, MXNet and Tensor Flow), a define-by-run framework (Chainer), and production oriented platform (Paddle Paddle). Our results are summarized in Table 1.
Researcher Affiliation Collaboration Adam Paszke University of Warsaw EMAIL Sam Gross Facebook AI Research EMAIL Francisco Massa Facebook AI Research EMAIL Adam Lerer Facebook AI Research EMAIL James Bradbury Google EMAIL Gregory Chanan Facebook AI Research EMAIL Trevor Killeen Self Employed EMAIL Zeming Lin Facebook AI Research EMAIL Natalia Gimelshein NVIDIA EMAIL Luca Antiga Orobix EMAIL Alban Desmaison Oxford University EMAIL Andreas Köpf Xamla EMAIL Edward Yang Facebook AI Research EMAIL Zach De Vito Facebook AI Research EMAIL Martin Raison Nabla EMAIL Alykhan Tejani Twitter EMAIL Sasank Chilamkurthy Qure.ai EMAIL Benoit Steiner Facebook AI Research EMAIL Lu Fang Facebook EMAIL Junjie Bai Facebook EMAIL Soumith Chintala Facebook AI Research EMAIL
Pseudocode Yes Listing 1: A custom layer used as a building block for a simple but complete neural network. (...) Listing 2: Simplified training of a generative adversarial networks.
Open Source Code Yes This paper introduces Py Torch, a Python library that performs immediate execution of dynamic tensor computations with automatic differentiation and GPU acceleration, and does so while maintaining performance comparable to the fastest current libraries for deep learning.
Open Datasets Yes Table 1: Training speed for 6 models using 32bit floats. Throughput is measured in images per second for the Alex Net, VGG-19, Res Net-50, and Mobile Net models, in tokens per second for the GNMTv2 model, and in samples per second for the NCF model. (...) The Appendix details all the steps needed to reproduce our setup.
Dataset Splits Yes The Appendix details all the steps needed to reproduce our setup.
Hardware Specification Yes All experiments were performed on a workstation with two Intel Xeon E5-2698 v4 CPUs and one NVIDIA Quadro GP100 GPU.
Software Dependencies Yes The Py Torch team. Pytorch Autograd Profiler. https://pytorch.org/docs/1.0.1/autograd.html#profiler.
Experiment Setup Yes The Appendix details all the steps needed to reproduce our setup.