Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures
Authors: Umit Köse, Andrzej Ruszczyński
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also perform an empirical study on a complex transportation problem. |
| Researcher Affiliation | Academia | Umit K ose EMAIL Department of Management Science and Information Systems Rutgers University Piscataway, NJ 08854, USA Andrzej Ruszczy nski EMAIL Department of Management Science and Information Systems Rutgers University Piscataway, NJ 08854, USA |
| Pseudocode | No | The paper describes algorithms using mathematical equations and text, but does not include any clearly labeled pseudocode blocks, algorithm figures, or structured code-like procedures. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. The license mentioned (CC-BY 4.0) pertains to the paper itself, not its associated code. |
| Open Datasets | No | The paper states: "At each time period t, a stochastic demand Dijt for transportation from location i to location j occurs... The demand arrays Dt in different time periods are independent and drawn from a truncated normal distribution: Dijt = max(0, N(0, sij)) ." This indicates the authors generated data through simulation based on a distribution, rather than using or providing access to a pre-existing public dataset. |
| Dataset Splits | No | The paper describes simulating data on the fly based on a truncated normal distribution. It mentions: "We tested the risk-averse and the risk-neutral TD(λ) methods under the same long simulated sequence of demand vectors." and "...we used 207 distinct trajectories, each with 200 decision stages, to compare the performance..." While it explains how the simulated data was used in experiments, it does not refer to fixed dataset splits (e.g., predefined train/test/validation sets) because the data is generated dynamically. |
| Hardware Specification | No | The acknowledgments mention: "...the Office of Advanced Research Computing (http://oarc.rutgers.edu) at Rutgers, The State University of New Jersey, for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here." Also, "The choice of N = 4 was due to the use of a four-core computer, on which the N transitions could be simulated and analyzed in parallel." While a cluster name and a generic 'four-core computer' are mentioned, specific hardware details like CPU models, GPU types, or memory specifications are not provided for the experimental setup. |
| Software Dependencies | No | The paper does not specify any software libraries, frameworks, or tools with version numbers that were used in the implementation or experimentation. |
| Experiment Setup | Yes | We used β = 1 and N = 4. The stepsize was constant and equal to γ = 0.0001. In the expected value model (β = 0), we also used N = 4 observations per stage, and we averaged them, to make the comparison fair. The choice of N = 4 was due to the use of a four-core computer, on which the N transitions could be simulated and analyzed in parallel. We compared the performance of the risk-averse and risk-neutral TD(λ) algorithms for λ = 0, 0.5, and 0.9, and α = 0.95, 0.8, and 0.6, in terms of average profit per stage, on a trajectory with 20,000 decision stages. ... As a reference policy we chose the myopic policy, corresponding to π = 0 in (53). |