A Causal Lens for Learning Long-term Fair Policies

Authors: Jacob Lear, Lu Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to evaluate the policy optimization algorithms we have proposed and compare them with baselines regarding the achievement of long-term fairness.
Researcher Affiliation Academia Jacob Lear & Lu Zhang Department of Electrical Engineering and Computer Science University of Arkansas EMAIL
Pseudocode No The paper describes methodologies and policy optimizations (PPO, PPO-C, PPO-Cb) in paragraph form and mathematical equations (e.g., Section 3.3 Policy Optimization and Section 3.4 Causal Decomposition of Cπ(θ)), but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1All the code is available at https://github.com/j-proj/Causal-Lens-Fair-RL.
Open Datasets Yes We leverage the simulation environment developed in D Amour et al. (2020) that is commonly used in related work (e.g., Yu et al. (2022); Hu et al. (2023)). ... Setting 1 uses probabilities generated using the Home Credit Default Risk dataset Montoya et al. (2018), and the probabilities for Setting 2 are from a dataset previously released by Lending Club Wagh (2017).
Dataset Splits No The paper describes using repayment probabilities generated by fitting a logistic regression model to credit score datasets (Home Credit Default Risk dataset Montoya et al. (2018) and Lending Club Wagh (2017)). However, it does not provide specific details on training/test/validation splits used for these datasets or for the overall experimental setup.
Hardware Specification No The paper describes a simulation environment for experiments but does not provide any specific details about the hardware used to run these simulations or train the models.
Software Dependencies No The paper refers to policy optimization algorithms like PPO but does not specify any particular software libraries, programming languages, or version numbers used for its implementation or experiments.
Experiment Setup Yes Our choice for policy optimization is mostly typical as a variant of Proximal Policy Optimization (PPO) that incorporates the KL divergence as a penalty Schulman et al. (2017). ... The final objective function is obtained by incorporating Λ into Eq. (2): J(θ) = LUT IL βKLLKL βC( ˆCπ)2 βΛΛ. ... In Figure 7, a larger βKL can help reduce the model variance. ... In Figure 8, we show the influence of the strength of enforcing benefit fairness on long-term fairness. Notably, we observe that the loan rate for βΛ = 2 is more balanced than that for βΛ = 0.