Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP. We evaluate Fair Fict RL (Algorithm 4) on Barab asi-Albert graphs (Barab asi & P osfai, 2016) with groups assigned based on the degree distribution of nodes. We show that our algorithm converges efficiently to a solution with low average constraint violations for all groups, while still optimizing the global objective. See Section 4. |
| Researcher Affiliation | Academia | 1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA. |
| Pseudocode | Yes | Algorithm 1 MORL-BRNR (Multi-Objective RL Approximate Min-Max RL Algorithm) Algorithm 2 FTPL in Tabular MDPs Algorithm 3 Contextual FTPL in Large State Space MDPs Algorithm 4 Fair Fict RL Algorithm 5 Contextual FTPL Error Cancellation Algorithm 6 Fair Fict RL Error Cancellation |
| Open Source Code | No | The paper does not provide any specific links to source code repositories, nor does it explicitly state that the code for the described methodology is open-source or available in supplementary materials. |
| Open Datasets | No | The paper describes constructing an MDP based on Barabási-Albert graphs: "We construct a multi-objective RL task based on such graphs using the Barab asi-Albert model (Barab asi & P osfai, 2016)." It references a graph model but does not provide a specific, publicly available dataset with access information. |
| Dataset Splits | No | The paper evaluates an RL algorithm on a constructed MDP. It describes the MDP construction and reward structure, but does not refer to static datasets or provide details on training/test/validation splits, as is typical for RL environments where data is generated through interaction. |
| Hardware Specification | No | The paper does not contain any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using "standard value iteration as an oracle for the learner" but does not specify any software libraries, frameworks, or their version numbers used for implementing the algorithms or experiments. |
| Experiment Setup | Yes | First, we run Fair Fict RL with a minimum value of average group reward that we want to achieve of αH = 0.04. Given the reward stucture in the graph, it is not the case that all groups will obtain this reward under the optimal policy. We use standard value iteration as an oracle for the learner; for higher-dimensional problems, this could be substituted with a deep RL algorithm. The regulator runs the policy returned by the learner for 500 episodes to obtain an average reward estimate and best responds according to Definition 2.3 with Cek = 25. |