Towards an Ontology-Driven Approach to Document Bias

Authors: Mayra Russo, Maria-Esther Vidal

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition, we demonstrate the potential of Doc-Bias O with an experiment on an existing benchmark and as part of a neuro-symbolic system. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact through its documentation.
Researcher Affiliation Academia MAYRA RUSSO , L3S Research Center & Leibniz University Hannover, Germany MARIA-ESTHER VIDAL, TIB Leibniz Information Center for Science and Technology, L3S Research Center & Leibniz University of Hannover, Germany
Pseudocode No The paper describes methodologies and processes but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured, code-like procedural steps for an algorithm.
Open Source Code No The paper mentions that the Doc-Bias O ontology is publicly available on Git Hub (footnote 18: https://github.com/SDM-TIB/Doc-BIASO) and as a Vo Col repository. However, it does not provide concrete access to source code for the experimental methodology or implementations described in the use cases.
Open Datasets Yes The datasets we use here were elaborated from data of the United States (US) Census, and were introduced as an updated version of the UCI Adult dataset [13]. The American Community Survey (ACS) Public Use Microdata Sample (PUMS) ACS PUMS Dataset [22], comprises tabular data on individuals across the United States spanning multiple years... The datasets are accessible through the Folktables Python package.24 (footnote 24: https://github.com/socialfoundations/folktables)
Dataset Splits No The paper describes performing a bias analysis by extracting counts of individuals for age groups and conducting distribution analysis over the datasets. However, it does not provide specific train/test/validation splits (e.g., percentages, sample counts, or predefined splits) for machine learning model training or evaluation.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like the 'Folktables Python package' and 'Probabilistic Soft Logic (PSL)' but does not provide specific version numbers for these or other key software components used in the experimental setup, which would be necessary for reproduction. It does mention 'Pellet (v.2.2.0)' and 'Protégé (v.5.6.1)' but these were for ontology validation, not the experimental methodology.
Experiment Setup No The paper describes two use cases to demonstrate the ontology, including a bias analysis over a benchmark dataset and integration into a neuro-symbolic system. However, it does not provide specific experimental setup details such as hyperparameter values, model initialization, or training configurations for any machine learning models involved in these demonstrations.