Aequitas Flow: Streamlining Fair ML Experimentation
Authors: Sérgio Jesus, Pedro Saleiro, Inês Oliveira e Silva, Beatriz M. Jorge, Rita P. Ribeiro, João Gama, Pedro Bizarro, Rayid Ghani
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Aequitas Flow is an open-source framework and toolkit for end-to-end Fair Machine Learning (ML) experimentation, and benchmarking in Python. This package fills integration gaps that exist in other fair ML packages. In addition to the existing audit capabilities in Aequitas, the Aequitas Flow module provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling easy-to-use and rapid experiments and analysis of results. The goal is to help 1)researchers compare and benchmark new methods they develop against existing methods in a systematic and reproducible manner and 2) practitioners easily evaluate existing bias mitigation methods and deploy ones that best match their goals. Table 1: Comparison of packages for training and evaluation of fair ML Methods. Figure 2: Plots introduced in Aequitas Flow. Plot (a) is designed for model selection; Plot (b) compares the different tested methods. |
| Researcher Affiliation | Collaboration | Sérgio Jesus1,2 EMAIL Pedro Saleiro1 EMAIL Inês Oliveira e Silva1 EMAIL Beatriz M. Jorge1 EMAIL Rita P. Ribeiro2 EMAIL João Gama2 EMAIL Pedro Bizarro1 EMAIL Rayid Ghani3 EMAIL 1Feedzai 2University of Porto 3Carnegie Mellon University |
| Pseudocode | No | The paper includes Python code snippets demonstrating the usage of the Aequitas Flow framework, such as `exp = Experiment(config_file="configs/experiment.yaml") exp.run()` and `model = methods.inprocessing.FairGBM() model.fit(train.X, train.y, train.s)`. However, these are examples of implementation code rather than structured pseudocode or algorithm blocks describing a method's steps. |
| Open Source Code | Yes | Aequitas Flow is an open-source framework and toolkit for end-to-end Fair Machine Learning (ML) experimentation, and benchmarking in Python. This paper introduces Aequitas Flow, an open-source framework for reproducible and extensible end-to-end fair ML experimentation that extends Aequitas, our original bias audit toolkit. 1. https://github.com/dssg/aequitas |
| Open Datasets | Yes | The framework initially encompasses eleven tabular datasets, including those from the Bank Account Fraud (Jesus et al., 2022) and Folktables (Ding et al., 2021). The component also permits user-supplied datasets in CSV or parquet formats with splits based on a column, or randomly. dataset = datasets.FolkTables(variant=ACSIncome) |
| Dataset Splits | No | The paper mentions that the Datasets component has two primary functions: "loading the data and generating splits." It also states that "The component also permits user-supplied datasets in CSV or parquet formats with splits based on a column, or randomly." While it describes the capability of the framework to create splits, it does not provide specific details (e.g., percentages, methodology, random seeds) for the experimental splits used in the paper itself. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It focuses on the software framework and its capabilities. |
| Software Dependencies | No | The paper mentions the use of 'Python', 'Optuna (Akiba et al., 2019)' for hyperparameter selection, 'Aequitas (Saleiro et al., 2018)' for bias auditing, and 'pandas dataframe format (pandas development team, 2020)'. However, it does not specify exact version numbers for Python or any of these libraries. |
| Experiment Setup | No | The paper describes the functionalities of the 'Optimizer' component, stating that 'Several attributes of the hyperparameter optimization can be determined by configurations, such as the number of trials and jobs, the selection algorithm (e.g., random search, grid search), and the random seed.' However, it does not provide the specific hyperparameter values or training configurations used for any experiments conducted by the authors in the paper, instead describing how a user of the framework *could* set them. |