reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces

Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP. We evaluate Fair Fict RL (Algorithm 4) on Barab asi-Albert graphs (Barab asi & P osfai, 2016) with groups assigned based on the degree distribution of nodes. We show that our algorithm converges efficiently to a solution with low average constraint violations for all groups, while still optimizing the global objective. See Section 4.
Researcher Affiliation	Academia	1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
Pseudocode	Yes	Algorithm 1 MORL-BRNR (Multi-Objective RL Approximate Min-Max RL Algorithm) Algorithm 2 FTPL in Tabular MDPs Algorithm 3 Contextual FTPL in Large State Space MDPs Algorithm 4 Fair Fict RL Algorithm 5 Contextual FTPL Error Cancellation Algorithm 6 Fair Fict RL Error Cancellation
Open Source Code	No	The paper does not provide any specific links to source code repositories, nor does it explicitly state that the code for the described methodology is open-source or available in supplementary materials.
Open Datasets	No	The paper describes constructing an MDP based on Barabási-Albert graphs: "We construct a multi-objective RL task based on such graphs using the Barab asi-Albert model (Barab asi & P osfai, 2016)." It references a graph model but does not provide a specific, publicly available dataset with access information.
Dataset Splits	No	The paper evaluates an RL algorithm on a constructed MDP. It describes the MDP construction and reward structure, but does not refer to static datasets or provide details on training/test/validation splits, as is typical for RL environments where data is generated through interaction.
Hardware Specification	No	The paper does not contain any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions using "standard value iteration as an oracle for the learner" but does not specify any software libraries, frameworks, or their version numbers used for implementing the algorithms or experiments.
Experiment Setup	Yes	First, we run Fair Fict RL with a minimum value of average group reward that we want to achieve of αH = 0.04. Given the reward stucture in the graph, it is not the case that all groups will obtain this reward under the optimal policy. We use standard value iteration as an oracle for the learner; for higher-dimensional problems, this could be substituted with a deep RL algorithm. The regulator runs the policy returned by the learner for 500 episodes to obtain an average reward estimate and best responds according to Definition 2.3 with Cek = 25.