Learning semantic similarity in a continuous space

Authors: Michel Deudon

NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is evaluated on Quora duplicate questions dataset and performs strongly. ... Table 1 compares different models for this task on the Quora dataset.
Researcher Affiliation Academia Michel Deudon Ecole Polytechnique Palaiseau, France EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is made publicly available on github.2 [https://github.com/MichelDeudon/variational-siamese-network]
Open Datasets Yes We evaluated our proposed framework on Quora question pairs dataset which consists of 404k sentence pairs annotated with a binary value that indicates whether a pair is duplicate (same intent) or not.1 [https://www.kaggle.com/quora/question-pairs-dataset]
Dataset Splits No The paper mentions a 'Dev set' in Table 1 and Figure 5, implying a validation split. However, it does not provide specific details on the size, percentage, or methodology for this split, only stating 'The split considered is that of Bi MPM [47]' without further elaboration on the split details themselves.
Hardware Specification Yes For a given query, our model runs 3500+ comparisons per second on two Tesla K80. ... All queries were retrieved in less than a second on two Tesla K80 GPU.
Software Dependencies Yes We implemented our model using python 3.5.4, tensorflow 1.3.0 [42], gensim 3.0.1 [43] and nltk 3.2.4 [44].
Experiment Setup Yes Our variational space (µ, σ) is of dimension h = 1000. Our bi-LSTM encoder network consists of a single layer of 2h neurons and our LSTM [31] decoder has a single layer with 1000 neurons. Our MLP s inner layer has 1000 neurons. All weights were randomly intialized with 'Xavier' initializer [45]. ... We employ stochastic gradient descent with ADAM optimizer [46] (lr = 0.001, β1 = 0.9, β2 = 0.999) and batches of size 256 and 128. Our learning rate is initialized for both task to 0.001, decayed every 5000 step by a factor 0.96 with an exponential scheme. We clip the L2 norm of our gradients to 1.0 to avoid exploding gradients in deep neural networks.