Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces

Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP. We evaluate Fair Fict RL (Algorithm 4) on Barab asi-Albert graphs (Barab asi & P osfai, 2016) with groups assigned based on the degree distribution of nodes. We show that our algorithm converges efficiently to a solution with low average constraint violations for all groups, while still optimizing the global objective. See Section 4.
Researcher Affiliation Academia 1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
Pseudocode Yes Algorithm 1 MORL-BRNR (Multi-Objective RL Approximate Min-Max RL Algorithm) Algorithm 2 FTPL in Tabular MDPs Algorithm 3 Contextual FTPL in Large State Space MDPs Algorithm 4 Fair Fict RL Algorithm 5 Contextual FTPL Error Cancellation Algorithm 6 Fair Fict RL Error Cancellation
Open Source Code No The paper does not provide any specific links to source code repositories, nor does it explicitly state that the code for the described methodology is open-source or available in supplementary materials.
Open Datasets No The paper describes constructing an MDP based on Barabási-Albert graphs: "We construct a multi-objective RL task based on such graphs using the Barab asi-Albert model (Barab asi & P osfai, 2016)." It references a graph model but does not provide a specific, publicly available dataset with access information.
Dataset Splits No The paper evaluates an RL algorithm on a constructed MDP. It describes the MDP construction and reward structure, but does not refer to static datasets or provide details on training/test/validation splits, as is typical for RL environments where data is generated through interaction.
Hardware Specification No The paper does not contain any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions using "standard value iteration as an oracle for the learner" but does not specify any software libraries, frameworks, or their version numbers used for implementing the algorithms or experiments.
Experiment Setup Yes First, we run Fair Fict RL with a minimum value of average group reward that we want to achieve of αH = 0.04. Given the reward stucture in the graph, it is not the case that all groups will obtain this reward under the optimal policy. We use standard value iteration as an oracle for the learner; for higher-dimensional problems, this could be substituted with a deep RL algorithm. The regulator runs the policy returned by the learner for 500 episodes to obtain an average reward estimate and best responds according to Definition 2.3 with Cek = 25.