reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Query and Predicate Emptiness in Ontology-Based Data Access

Authors: Franz Baader, Meghyn Bienvenu, Carsten Lutz, Frank Wolter

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate that predicate emptiness is a useful reasoning service for static analysis, we perform experiments using the well-known and large-scale medical ontology SNOMED CT coupled with both a real-world data vocabulary (corresponding to terms obtained by analyzing clinical notes from a hospital) and with randomly generated vocabularies. For the real world vocabulary, which contains 8,858 of the 370,000 concept names and 16 of the 62 role names in SNOMED CT, 16,212 predicates turned out to be non-empty for IQs and 17,339 to be non-empty for CQs. Thus, SNOMED CT provides a very substantial number of additional predicates for query formulation while a large number of other predicates cannot meaningfully be used in queries over Σ-databases; thus, identifying the relevant predicates via predicate emptiness is potentially very helpful. We also consider the use of query and predicate emptiness for the extraction of modules from an ontology. Thus, instead of using emptiness directly to support query formulation, we show how it can be used to simplify an ontology. ... To analyze the practical interest of CQΣ-cores, we carry out a case study where we compute CQΣ-cores for the ontology SNOMED CT coupled with various signatures, showing that they tend to be drastically smaller than the original ontology and also smaller than -modules, a popular way of extracting modules from ontologies (Grau, Horrocks, Kazakov, & Sattler, 2008).
Researcher Affiliation	Academia	Franz Baader EMAIL TU Dresden, Germany Meghyn Bienvenu EMAIL CNRS, Universit e de Montpellier & INRIA, France Carsten Lutz EMAIL University of Bremen, Germany Frank Wolter EMAIL Department of Computer Science University of Liverpool, UK
Pseudocode	No	The paper describes methods and proofs using mathematical notation and textual descriptions of procedures. It does not contain any clearly labeled pseudocode or algorithm blocks. For example, in Appendix C, a construction is described with bullet points, but it is not formatted as pseudocode.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories for the methodologies described.
Open Datasets	Yes	To demonstrate that predicate emptiness is a useful reasoning service for static analysis, we perform experiments using the well-known and large-scale medical ontology SNOMED CT coupled with both a real-world data vocabulary (corresponding to terms obtained by analyzing clinical notes from a hospital) and with randomly generated vocabularies. ... Ontologies of this kind typically have a very broad coverage and their vocabulary often contain tens or even hundreds of thousands of predicates that embrace various subject areas such as anatomy, diseases, medication, and even social context and geographic location. In particular, there are now many such ontologies in the bio-medical domain such as SNOMED CT (IHTSDO, 2016), NCI (Golbeck, Fragoso, Hartel, Hendler, Oberthaler, & Parsia, 2003), and GO (Gene Ontology Consortium, 2016), which are all formulated in a DL and allow a comparably inexpensive adoption of OBDA in bio-medical applications such as querying electronic medical records (Patel, Cimino, Dolby, Fokoue, Kalyanpur, Kershenbaum, Ma, Schonberg, & Srinivas, 2007).
Dataset Splits	No	The paper discusses using a 'real-world data vocabulary' and 'randomly generated vocabularies' for its case study, and describes the parameters for generating these vocabularies (e.g., 'randomly generated signatures that contain 500, 1,000, 5,000, and 10,000 concept names and 16 or 31 role names'), but it does not describe dataset splits for training, testing, or validation in the context of machine learning experiments.
Hardware Specification	No	The paper discusses the computational complexity of various problems and presents a case study with experimental results. However, it does not specify any particular hardware (e.g., CPU, GPU models, memory) used to conduct these experiments or run the algorithms.
Software Dependencies	No	The paper mentions various description logics (e.g., EL, DL-Lite, ALC) and ontologies (SNOMED CT, NCI, GO), and refers to OWL2 profiles. However, it does not list any specific software libraries, frameworks, or tools with their version numbers that were used for implementation or experimentation.
Experiment Setup	Yes	We have analyzed randomly generated signatures that contain 500, 1,000, 5,000, and 10,000 concept names and 16 or 31 role names (1/2 and 1/4 of the role names in the ontology). Every signature contains the special role name role-group, which is used in SNOMED CT to implement a certain modeling pattern and should be present also in ABoxes to allow the same pattern there. For each number of concept and role names, we generated 10 signatures.