Encoding word order in complex embeddings
Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen
ICLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. |
| Researcher Affiliation | Academia | Benyou Wang University of Padua EMAIL Donghao Zhao Tianjin University EMAIL Christina Lioma University of Copenhagen EMAIL Qiuchi Li University of Padua EMAIL Peng Zhang Tianjin University EMAIL Jakob Grue Simonsen University of Copenhagen EMAIL |
| Pseudocode | Yes | We list the basic code to construct our general embedding as below: import torch import math class ComplexNN (torch.nn.Module): def init (self, opt): super(ComplexNN, self).init() self.word_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.frequency_emb = torch.nn.Embedding(opt.n_token, opt.d_model) self.initial_phase_emb = torch.nn.Embedding(opt.n_token, opt.d_model) |
| Open Source Code | Yes | 1The code is on https://github.com/iclr-complex-order/complex-order |
| Open Datasets | Yes | We use six popular text classification datasets: CR, MPQA, SUBJ, MR, SST, and TREC (see Tab. 1)... We use the standard WMT 2016 English-German dataset (Sennrich et al., 2016)... We use the text8 (Mahoney, 2011) dataset |
| Dataset Splits | Yes | CV means 10-fold cross validation. The last 2 datasets come with train/dev/test splits. |
| Hardware Specification | Yes | Figure 2: Computation time (seconds) per epoch in Tensorflow on TITAN X GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'torch' (PyTorch) but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | We search the hyper parameters from a parameter pool, with batch size in {32, 64, 128}, learning rate in {0.001, 0.0001, 0.00001}, L2-regularization rate in {0, 0.001, 0.0001}, and number of hidden layer units in {120, 128}. |