Risk-averse Total-reward MDPs with ERM and EVaR
Authors: Xihong Su, Marek Petrik, Julien Grand-Clément
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effect of risk-aversion on the structure of the optimal policy, we use the gambler s ruin problem (Hau, Petrik, and Ghavamzadeh 2023; B auerle and Ott 2011). In this problem, a gambler starts with a given amount of capital and seeks to increase it up to a cap K. ... The algorithm was implemented in Julia 1.10, and is available at https://github.com/suxh2019/ERMLP. Please see Su, Grand Cl ement, and Petrik (2024, appendix F) for more details. Figure 3 shows optimal policies for four different EVa R risk levels α computed by Algorithm 1. ... To understand the impact of risk-aversion on the distribution of returns, we simulate the resulting policies over 7,000 episodes and show the distribution of capitals in Figure 4. |
| Researcher Affiliation | Academia | 1University of New Hampshire, 33 Academic Way, Durham, NH, 03824 USA 2 HEC Paris, 1 Rue de la Lib eration, Jouy-en-Josas, 78350 France EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Simple EVa R algorithm |
| Open Source Code | Yes | The algorithm was implemented in Julia 1.10, and is available at https://github.com/suxh2019/ERMLP. |
| Open Datasets | No | The paper uses the 'gambler's ruin problem' as a simulation environment with specified parameters (q=0.68, K=7) and simulates policies over 7,000 episodes. This is a self-generated simulated dataset, not a publicly available external dataset with access information. |
| Dataset Splits | No | The paper describes a simulation study for the gambler's ruin problem where policies are simulated over 7,000 episodes. It does not mention any explicit training, validation, or test splits for a dataset, as the data is generated through simulation rather than being a pre-existing dataset that needs splitting. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions the implementation language. |
| Software Dependencies | Yes | The algorithm was implemented in Julia 1.10 |
| Experiment Setup | Yes | In the formulation, we use q = 0.68, and a cap is K = 7. ... Figure 3 shows optimal policies for four different EVa R risk levels α computed by Algorithm 1. ... To understand the impact of risk-aversion on the distribution of returns, we simulate the resulting policies over 7,000 episodes |