reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures

Authors: Umit Köse, Andrzej Ruszczyński

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also perform an empirical study on a complex transportation problem.
Researcher Affiliation	Academia	Umit K ose EMAIL Department of Management Science and Information Systems Rutgers University Piscataway, NJ 08854, USA Andrzej Ruszczy nski EMAIL Department of Management Science and Information Systems Rutgers University Piscataway, NJ 08854, USA
Pseudocode	No	The paper describes algorithms using mathematical equations and text, but does not include any clearly labeled pseudocode blocks, algorithm figures, or structured code-like procedures.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. The license mentioned (CC-BY 4.0) pertains to the paper itself, not its associated code.
Open Datasets	No	The paper states: "At each time period t, a stochastic demand Dijt for transportation from location i to location j occurs... The demand arrays Dt in diﬀerent time periods are independent and drawn from a truncated normal distribution: Dijt = max(0, N(0, sij)) ." This indicates the authors generated data through simulation based on a distribution, rather than using or providing access to a pre-existing public dataset.
Dataset Splits	No	The paper describes simulating data on the fly based on a truncated normal distribution. It mentions: "We tested the risk-averse and the risk-neutral TD(λ) methods under the same long simulated sequence of demand vectors." and "...we used 207 distinct trajectories, each with 200 decision stages, to compare the performance..." While it explains how the simulated data was used in experiments, it does not refer to fixed dataset splits (e.g., predefined train/test/validation sets) because the data is generated dynamically.
Hardware Specification	No	The acknowledgments mention: "...the Oﬃce of Advanced Research Computing (http://oarc.rutgers.edu) at Rutgers, The State University of New Jersey, for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here." Also, "The choice of N = 4 was due to the use of a four-core computer, on which the N transitions could be simulated and analyzed in parallel." While a cluster name and a generic 'four-core computer' are mentioned, specific hardware details like CPU models, GPU types, or memory specifications are not provided for the experimental setup.
Software Dependencies	No	The paper does not specify any software libraries, frameworks, or tools with version numbers that were used in the implementation or experimentation.
Experiment Setup	Yes	We used β = 1 and N = 4. The stepsize was constant and equal to γ = 0.0001. In the expected value model (β = 0), we also used N = 4 observations per stage, and we averaged them, to make the comparison fair. The choice of N = 4 was due to the use of a four-core computer, on which the N transitions could be simulated and analyzed in parallel. We compared the performance of the risk-averse and risk-neutral TD(λ) algorithms for λ = 0, 0.5, and 0.9, and α = 0.95, 0.8, and 0.6, in terms of average proﬁt per stage, on a trajectory with 20,000 decision stages. ... As a reference policy we chose the myopic policy, corresponding to π = 0 in (53).