Towards an Ontology-Driven Approach to Document Bias
Authors: Mayra Russo, Maria-Esther Vidal
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we demonstrate the potential of Doc-Bias O with an experiment on an existing benchmark and as part of a neuro-symbolic system. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact through its documentation. |
| Researcher Affiliation | Academia | MAYRA RUSSO , L3S Research Center & Leibniz University Hannover, Germany MARIA-ESTHER VIDAL, TIB Leibniz Information Center for Science and Technology, L3S Research Center & Leibniz University of Hannover, Germany |
| Pseudocode | No | The paper describes methodologies and processes but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured, code-like procedural steps for an algorithm. |
| Open Source Code | No | The paper mentions that the Doc-Bias O ontology is publicly available on Git Hub (footnote 18: https://github.com/SDM-TIB/Doc-BIASO) and as a Vo Col repository. However, it does not provide concrete access to source code for the experimental methodology or implementations described in the use cases. |
| Open Datasets | Yes | The datasets we use here were elaborated from data of the United States (US) Census, and were introduced as an updated version of the UCI Adult dataset [13]. The American Community Survey (ACS) Public Use Microdata Sample (PUMS) ACS PUMS Dataset [22], comprises tabular data on individuals across the United States spanning multiple years... The datasets are accessible through the Folktables Python package.24 (footnote 24: https://github.com/socialfoundations/folktables) |
| Dataset Splits | No | The paper describes performing a bias analysis by extracting counts of individuals for age groups and conducting distribution analysis over the datasets. However, it does not provide specific train/test/validation splits (e.g., percentages, sample counts, or predefined splits) for machine learning model training or evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software like the 'Folktables Python package' and 'Probabilistic Soft Logic (PSL)' but does not provide specific version numbers for these or other key software components used in the experimental setup, which would be necessary for reproduction. It does mention 'Pellet (v.2.2.0)' and 'Protégé (v.5.6.1)' but these were for ontology validation, not the experimental methodology. |
| Experiment Setup | No | The paper describes two use cases to demonstrate the ontology, including a bias analysis over a benchmark dataset and integration into a neuro-symbolic system. However, it does not provide specific experimental setup details such as hyperparameter values, model initialization, or training configurations for any machine learning models involved in these demonstrations. |