reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

q-Learning in Continuous Time

Authors: Yanwei Jia, Xun Yu Zhou

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
Researcher Affiliation	Academia	Yanwei Jia EMAIL Department of System Engineering and Engineering Management The Chinese University of Hong Kong Shatin, NT, Hong Kong; Xun Yu Zhou EMAIL Department of Industrial Engineering and Operations Research & The Data Science Institute Columbia University New York, NY 10027, USA
Pseudocode	Yes	Algorithm 1 Oﬄine Episodic q-Learning ML Algorithm; Algorithm 2 Oﬄine Episodic q-Learning Algorithm; Algorithm 3 Online-Incremental q-Learning Algorithm; Algorithm 4 q-Learning Algorithm for Ergodic Tasks; Algorithm 5 Oﬄine Episodic q-Learning Mean Variance Algorithm
Open Source Code	Yes	The code to reproduce our simulation studies is publicly available at https://www.dropbox.com/sh/ 34cgnupnuaix15l/AAAj2y QYf NCOt PUc1_7Vhbk Ia?dl=0.
Open Datasets	No	The paper conducts simulation experiments and generates its own data based on specified configurations, rather than using publicly available datasets. For example, it states: "To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32."
Dataset Splits	No	The paper mentions generating "20 years of training data" for its simulations, but it does not specify any explicit splits of this data into training, validation, or test sets for reproduction purposes.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instance types) are provided for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) are provided in the paper.
Experiment Setup	Yes	We conduct simulations with the following conﬁgurations: µ {0, 0.1, 0.3, 0.5}, σ {0.1, 0.2, 0.3, 0.4}, T = 1, x0 = 1, z = 1.4. Other tuning parameters in all the algorithms are chosen as γ = 0.1, m = 10, αθ = αψ = 0.001, αw = 0.005, and l(j) = 1 j0.51. To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32.