Is Complex Query Answering Really Complex?
Authors: Cosimo Gregucci, Bo Xiong, Daniel Hernández, Lorenzo Loconte, Pasquale Minervini, Steffen Staab, Antonio Vergari
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods. We re-evaluate previous So TA approaches (Sec. 5), revealing that neural link predictors rely on memorized information from the training set. |
| Researcher Affiliation | Collaboration | 1Institute for Artificial Intelligence, University of Stuttgart, Germany 2Stanford University 3School of Informatics University of Edinburgh Edinburgh, UK 4Miniml.AI 5University of Southampton, UK. |
| Pseudocode | No | The paper describes methods and procedures in narrative form, without explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Old and new benchmarks, the generation scripts, and the implementation of CQD-hybrid are included in our official repo.6 https://github.com/april-tools/is-cqa-complex |
| Open Datasets | Yes | Performance measured on de-factostandard benchmarks such as FB15k237 (Toutanova & Chen, 2015) and NELL995 (Xiong et al., 2017) suggests impressive progress achieved in recent years on CQA on queries having different structures... To these, we build ICEWS18+H from the temporal KG ICEWS18 (Boschee et al., 2015)... |
| Dataset Splits | Yes | To evaluate them, standard benchmarks such as FB15k237 and NELL995 artificially divide G into Gtrain and Gtest, treating the triples in the latter as missing links. To this end, we leverage the temporal information in ICEWS18 by (1) ordering the links based on their timestamp; (2) removing the temporal information, thus obtaining regular triples; and (3) selecting the train set to be the first temporally-ordered 80% of triples, the valid the next 10%, and the remaining to be the test split. |
| Hardware Specification | No | The paper mentions receiving compute time on "Hore Ka HPC (NHR@KIT)" but does not provide specific details on the GPU/CPU models, processor types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions various models and frameworks used (e.g., GNN-QE, ULTRAQ, CQD, ComplEx, ConE, CLMPT, QTO) but does not provide specific version numbers for these software components or their underlying libraries (e.g., PyTorch, TensorFlow, Python version, CUDA version). |
| Experiment Setup | Yes | CQD-specific hyperparameters, namely the CQD beam k , ranging from [2,512] and the t-norm type being prod or min In Table F.1 we provide the hyperparameter selection for the old benchmarks FB15k237 and NELL995. GNN-QE For GNN-QE, we tuned the following hyperparameters: (1) batchsize, with values 8 or 48, and concat hidden being True or False... CQD We train Compl Ex (Trouillon et al., 2017) link predictor with hyperparameters regweight 0.1 or 0.01, and batch size 1000 or 2000... CLMPT For CLMPT, we tuned the following hyperparameters: (1) learning rate, with values in [1e-5,5e-2,5e-3,5e-4,5e5,,5e-6], (2) temp, with values in [0.1, 0.2]... |