reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Group Fairness in Reinforcement Learning

Authors: Harsh Satija, Alessandro Lazaric, Matteo Pirotta, Joelle Pineau

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process. ... In Section 4.3, we provide empirical evidence that our approach is indeed able to achieve good performance while achieving the fairness requirement on simulated robotic locomotion and navigation tasks.
Researcher Affiliation	Collaboration	Harsh Satija EMAIL Mc Gill University, Mila Alessandro Lazaric EMAIL Meta AI (FAIR) Matteo Pirotta EMAIL Meta AI (FAIR) Joelle Pineau EMAIL Mc Gill University, Mila, Meta AI (FAIR)
Pseudocode	Yes	Algorithm 1 LP based algorithm for Section 3 ... Algorithm 2 Experiment procedure for River Swim ... Algorithm 3 General algorithm methodology for the Deep-RL case ... Algorithm 4 FOC-PPO for \|Z\| = 2
Open Source Code	Yes	The implementations for both the environments is provided in the supplemental material.
Open Datasets	Yes	For the first set of experiments, we modify the Half-Cheetah-v3 environment from the Open AI gym (Brockman et al., 2016)... We take the River Swim environment (\| S\| = 7, H = 10, \|A\| = 2) (Strehl and Littman, 2008)
Dataset Splits	No	In Appendix G.4, we train the algorithms with a fixed random seed in a train environment, and then evaluate the performance of the algorithm on ten test environments, each with a different random seed. This describes environments for training and testing, not specific dataset splits like percentages or absolute counts for a fixed dataset.
Hardware Specification	Yes	In terms of compute, on an Nvidia Quadro RTX 8000 GPU with AMD EPYC 7502 32-Core Processor, the navigation experiments take about 3 hours to run with 16 CPU cores and Half-Cheetah experiments take about 7 hours to run with a single CPU core.
Software Dependencies	No	We use Py Torch (Paszke et al., 2019) for implementing the Deep-RL algorithms. ... We used cvxpy (Diamond and Boyd, 2016) with the default parameters for solving all the different LP problems. Specific version numbers for PyTorch and cvxpy are not provided.
Experiment Setup	Yes	We set the νmax hyper-parameter to a very large value 1000.0 and do not fine tuned it. We found that the learning rate for the ν1, ν2 parameters typically works best in the range [0.01, 0.1] for our tasks, and we ended up using α = 0.01 for the our experiments. For the λ hyper-parameters, we did hyper-parameter search in range {1.0, 1.5, 3.0, 10.0} and used λ = 1.0 for the maze navigation tasks and λ = 1.5 for the Half-Cheetah tasks. The initial values for all the ν1, ν2 parameters are set to 0.