A Causal Lens for Learning Long-term Fair Policies
Authors: Jacob Lear, Lu Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to evaluate the policy optimization algorithms we have proposed and compare them with baselines regarding the achievement of long-term fairness. |
| Researcher Affiliation | Academia | Jacob Lear & Lu Zhang Department of Electrical Engineering and Computer Science University of Arkansas EMAIL |
| Pseudocode | No | The paper describes methodologies and policy optimizations (PPO, PPO-C, PPO-Cb) in paragraph form and mathematical equations (e.g., Section 3.3 Policy Optimization and Section 3.4 Causal Decomposition of Cπ(θ)), but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1All the code is available at https://github.com/j-proj/Causal-Lens-Fair-RL. |
| Open Datasets | Yes | We leverage the simulation environment developed in D Amour et al. (2020) that is commonly used in related work (e.g., Yu et al. (2022); Hu et al. (2023)). ... Setting 1 uses probabilities generated using the Home Credit Default Risk dataset Montoya et al. (2018), and the probabilities for Setting 2 are from a dataset previously released by Lending Club Wagh (2017). |
| Dataset Splits | No | The paper describes using repayment probabilities generated by fitting a logistic regression model to credit score datasets (Home Credit Default Risk dataset Montoya et al. (2018) and Lending Club Wagh (2017)). However, it does not provide specific details on training/test/validation splits used for these datasets or for the overall experimental setup. |
| Hardware Specification | No | The paper describes a simulation environment for experiments but does not provide any specific details about the hardware used to run these simulations or train the models. |
| Software Dependencies | No | The paper refers to policy optimization algorithms like PPO but does not specify any particular software libraries, programming languages, or version numbers used for its implementation or experiments. |
| Experiment Setup | Yes | Our choice for policy optimization is mostly typical as a variant of Proximal Policy Optimization (PPO) that incorporates the KL divergence as a penalty Schulman et al. (2017). ... The final objective function is obtained by incorporating Λ into Eq. (2): J(θ) = LUT IL βKLLKL βC( ˆCπ)2 βΛΛ. ... In Figure 7, a larger βKL can help reduce the model variance. ... In Figure 8, we show the influence of the strength of enforcing benefit fairness on long-term fairness. Notably, we observe that the loan rate for βΛ = 2 is more balanced than that for βΛ = 0. |