A Primer on Neural Network Models for Natural Language Processing
Authors: Yoav Goldberg
JAIR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation. This primer is not intended as a comprehensive resource for those that will go on and develop the next advances in neural-network machinery (though it may serve as a good entry point). Rather, it is aimed at those readers who are interested in taking the existing, useful technology and applying it in useful and creative ways to their favourite NLP problems. |
| Researcher Affiliation | Academia | Yoav Goldberg EMAIL Computer Science Department Bar-Ilan University, Israel |
| Pseudocode | Yes | Algorithm 1 Online Stochastic Gradient Descent Training, Algorithm 2 Minibatch Stochastic Gradient Descent Training, Algorithm 3 Computation Graph Forward Pass, Algorithm 4 Computation Graph Backward Pass (Backpropagation), Algorithm 5 Neural Network Training with Computation Graph Abstraction (using minibatches of size 1) |
| Open Source Code | No | Several software packages implement the computation-graph model, including Theano55, Chainer56, penne57 and CNN/py CNN58. All these packages support all the essential components (node types) for defining a wide range of neural network architectures, covering the structures described in this tutorial and more. The paper is a survey and describes general methods and tools; it does not present a specific new methodology that would require its own code release. The code snippets provided are illustrative examples for using existing libraries, not a release of the authors' own experimental code. |
| Open Datasets | No | The common case is that we do not have an auxiliary task with large enough amounts of annotated data (or maybe we want to help bootstrap the auxiliary task training with better vectors). In such cases, we resort to unsupervised methods, which can be trained on huge amounts of unannotated text. The paper is a survey and does not present its own experimental results; therefore, it does not specify any particular dataset used for its own work or provide concrete access information for such a dataset. |
| Dataset Splits | No | The paper is a survey and does not present its own experimental results; therefore, it does not contain specific dataset split information needed to reproduce data partitioning. |
| Hardware Specification | No | For modest sizes of m, some computing architectures (i.e. GPUs) allow an efficient parallel implementation of the computation in lines 6-9. On the one hand, once compiled, large graphs can be run efficiently on either the CPU or a GPU, making it ideal for large graphs with a fixed structure, where only the inputs change between instances. The paper does not describe the hardware used for any specific experiments as it is a survey paper and does not conduct experiments. |
| Software Dependencies | No | Several software packages implement the computation-graph model, including Theano55, Chainer56, penne57 and CNN/py CNN58. While these tools are mentioned, the paper does not specify software dependencies with version numbers for any experiments conducted by the authors, as it is a survey paper and does not conduct experiments. |
| Experiment Setup | No | The paper is a survey and does not present its own experimental results; therefore, it does not contain specific experimental setup details such as hyperparameters or training configurations. |