Knowledge Graph Completion via Complex Tensor Factorization
Authors: Théo Trouillon, Christopher R. Dance, Éric Gaussier, Johannes Welbl, Sebastian Riedel, Guillaume Bouchard
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the method proposed in this paper on both synthetic and real data sets. ... We compared Compl Ex to state-of-the-art models, namely Trans E (Bordes et al., 2013b), Dist Mult (Yang et al., 2015), RESCAL (Nickel et al., 2011) and also to the canonical polyadic decomposition (CP) (Hitchcock, 1927), to emphasize empirically the importance of learning unique embeddings for entities. |
| Researcher Affiliation | Collaboration | Th eo Trouillon EMAIL Univ. Grenoble Alpes, 700 avenue Centrale, 38401 Saint Martin d H eres, France; Christopher R. Dance EMAIL NAVER LABS Europe, 6 chemin de Maupertuis, 38240 Meylan, France; Eric Gaussier EMAIL Univ. Grenoble Alpes, 700 avenue Centrale, 38401 Saint Martin d H eres, France; Johannes Welbl EMAIL Sebastian Riedel EMAIL University College London, Gower St, London WC1E 6BT, United Kingdom; Guillaume Bouchard EMAIL Bloomsbury AI, 115 Hampstead Road, London NW1 3EE, United Kingdom University College London, Gower St, London WC1E 6BT, United Kingdom |
| Pseudocode | Yes | Algorithm 1 describes stochastic gradient descent (SGD) to learn the proposed multirelational model with the Ada Grad learning-rate updates (Duchi et al., 2011). We refer to the proposed model as Compl Ex, for Complex Embeddings. |
| Open Source Code | Yes | Code is available at: https://github.com/ttrouill/complex |
| Open Datasets | Yes | The Kinships data set (Denham, 1973) describes the 26 different kinship relations of the Alyawarra tribe in Australia, among 104 individuals. The unified medical language system (UMLS) data set (Mc Cray, 2003) represents 135 medical concepts and diseases, linked by 49 relations describing their interactions. ... Finally, we evaluated Compl Ex on the FB15K and WN18 data sets, as they are well established benchmarks for the link prediction task. |
| Dataset Splits | Yes | We conducted a 5-fold cross-validation on the lower-triangular matrices, using the uppertriangular parts plus 3 folds for training, one fold for validation and one fold for testing. Each training set contains 1392 observed triples, whereas validation and test sets contain 174 triples each. ... We performed a 10-fold cross-validation, keeping 8 for training, one for validation and one for testing. ... We used the same training, validation and test set splits as in Bordes et al. (2013b). Table 3 summarizes the metadata of the two data sets. |
| Hardware Specification | Yes | training the Compl Ex model on a single GPU (NVIDIA Tesla P40) takes 45 minutes on WN18 (K = 150, η = 1), and three hours on FB15K (K = 200, η = 10). |
| Software Dependencies | No | For experimental fairness, we reimplemented these models within the same framework as the Compl Ex model, using a Theano-based SGD implementation3 (Bergstra et al., 2010). |
| Experiment Setup | Yes | In all the following experiments we used a maximum number of iterations m = 1000, a batch size b = |Ω| / 100, and validated the models for early stopping every s = 50 iterations. ... Reported results are given for the best set of hyper-parameters evaluated on the validation set for each model, after a distributed grid-search on the following values: K {10, 20, 50, 100, 150, 200}, λ {0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0}, α {1.0, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01}, η {1, 2, 5, 10} with λ the L2 regularization parameter, α the initial learning rate, and η the number of negatives generated per positive training triple. |