Group Fairness in Reinforcement Learning
Authors: Harsh Satija, Alessandro Lazaric, Matteo Pirotta, Joelle Pineau
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process. ... In Section 4.3, we provide empirical evidence that our approach is indeed able to achieve good performance while achieving the fairness requirement on simulated robotic locomotion and navigation tasks. |
| Researcher Affiliation | Collaboration | Harsh Satija EMAIL Mc Gill University, Mila Alessandro Lazaric EMAIL Meta AI (FAIR) Matteo Pirotta EMAIL Meta AI (FAIR) Joelle Pineau EMAIL Mc Gill University, Mila, Meta AI (FAIR) |
| Pseudocode | Yes | Algorithm 1 LP based algorithm for Section 3 ... Algorithm 2 Experiment procedure for River Swim ... Algorithm 3 General algorithm methodology for the Deep-RL case ... Algorithm 4 FOC-PPO for |Z| = 2 |
| Open Source Code | Yes | The implementations for both the environments is provided in the supplemental material. |
| Open Datasets | Yes | For the first set of experiments, we modify the Half-Cheetah-v3 environment from the Open AI gym (Brockman et al., 2016)... We take the River Swim environment (| S| = 7, H = 10, |A| = 2) (Strehl and Littman, 2008) |
| Dataset Splits | No | In Appendix G.4, we train the algorithms with a fixed random seed in a train environment, and then evaluate the performance of the algorithm on ten test environments, each with a different random seed. This describes environments for training and testing, not specific dataset splits like percentages or absolute counts for a fixed dataset. |
| Hardware Specification | Yes | In terms of compute, on an Nvidia Quadro RTX 8000 GPU with AMD EPYC 7502 32-Core Processor, the navigation experiments take about 3 hours to run with 16 CPU cores and Half-Cheetah experiments take about 7 hours to run with a single CPU core. |
| Software Dependencies | No | We use Py Torch (Paszke et al., 2019) for implementing the Deep-RL algorithms. ... We used cvxpy (Diamond and Boyd, 2016) with the default parameters for solving all the different LP problems. Specific version numbers for PyTorch and cvxpy are not provided. |
| Experiment Setup | Yes | We set the νmax hyper-parameter to a very large value 1000.0 and do not fine tuned it. We found that the learning rate for the ν1, ν2 parameters typically works best in the range [0.01, 0.1] for our tasks, and we ended up using α = 0.01 for the our experiments. For the λ hyper-parameters, we did hyper-parameter search in range {1.0, 1.5, 3.0, 10.0} and used λ = 1.0 for the maze navigation tasks and λ = 1.5 for the Half-Cheetah tasks. The initial values for all the ν1, ν2 parameters are set to 0. |