Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Remembering to Be Fair Again: Reproducing Non-Markovian Fairness in Sequential Decision Making
Authors: Domonkos Nagy, Lohithsai Yadala Chanchu, Krystof Bobek, Xin Zhou, Jacobus Smit
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We reproduce and extend their findings by validating their claims and introducing novel enhancements. We confirm that Fair QCM outperforms standard baselines in fairness enforcement and sample efficiency across different environments. |
| Researcher Affiliation | Academia | Domonkos Nagy EMAIL Informatics Institute University of Amsterdam Lohithsai Yadala Chanchu EMAIL Informatics Institute University of Amsterdam Kryštof Bobek EMAIL Informatics Institute University of Amsterdam Xin Zhou EMAIL Informatics Institute University of Amsterdam Martin Smit EMAIL Informatics Institute University of Amsterdam |
| Pseudocode | No | The paper describes methodologies and algorithms (Fair QCM, Fair SCM) but does not provide specific pseudocode blocks or algorithms formatted as figures or distinct sections. |
| Open Source Code | Yes | The original code, modified to be 70% more efficient, and our extensions are available on Git Hub: https://github.com/bozo22/remembering-to-be-fair-again. |
| Open Datasets | No | The paper uses the 'Resource Allocation (Donut)' and 'Simulated Lending' environments, citing previous work (Katoh and Ibaraki, 1998; Liu et al., 2018) for their definitions. It also 'created a COVID vaccine allocation gym environment'. These refer to problem formulations or simulation environments rather than publicly accessible datasets in the form of raw data files. |
| Dataset Splits | No | The paper describes reinforcement learning environments and experimental parameters (e.g., 'each consisting of 500 episodes of 100 time steps' for resource allocation, 'Episode Length 24' for COVID-19 simulation), but it does not specify traditional training/test/validation dataset splits, which are not typically applicable to dynamically generated data in RL. |
| Hardware Specification | Yes | We ran experiments using an AMD Ryzen 2600 CPU and a Nvidia RTX 3060 GPU. |
| Software Dependencies | No | The paper mentions using 'Stable-Baselines3 (Raffin et al. (2021))' for the SAC agent's implementation but does not specify a version number for this or any other software component used in the experiments. |
| Experiment Setup | Yes | Table 2: COVID-19 Simulation Hyperparameters Table 3: Resource Allocation Hyperparameters These tables list specific hyperparameters like Episode Length, Learning Rate, Discount Factor (γ), Replay Buffer Size, Batch Size, Soft Update Coefficient (τ), Entropy regularization coefficient, and Min Exploration Rate (ϵ). |