q-Learning in Continuous Time

Authors: Yanwei Jia, Xun Yu Zhou

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
Researcher Affiliation Academia Yanwei Jia EMAIL Department of System Engineering and Engineering Management The Chinese University of Hong Kong Shatin, NT, Hong Kong; Xun Yu Zhou EMAIL Department of Industrial Engineering and Operations Research & The Data Science Institute Columbia University New York, NY 10027, USA
Pseudocode Yes Algorithm 1 Offline Episodic q-Learning ML Algorithm; Algorithm 2 Offline Episodic q-Learning Algorithm; Algorithm 3 Online-Incremental q-Learning Algorithm; Algorithm 4 q-Learning Algorithm for Ergodic Tasks; Algorithm 5 Offline Episodic q-Learning Mean Variance Algorithm
Open Source Code Yes The code to reproduce our simulation studies is publicly available at https://www.dropbox.com/sh/ 34cgnupnuaix15l/AAAj2y QYf NCOt PUc1_7Vhbk Ia?dl=0.
Open Datasets No The paper conducts simulation experiments and generates its own data based on specified configurations, rather than using publicly available datasets. For example, it states: "To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32."
Dataset Splits No The paper mentions generating "20 years of training data" for its simulations, but it does not specify any explicit splits of this data into training, validation, or test sets for reproduction purposes.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) are provided for running the experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) are provided in the paper.
Experiment Setup Yes We conduct simulations with the following configurations: µ {0, 0.1, 0.3, 0.5}, σ {0.1, 0.2, 0.3, 0.4}, T = 1, x0 = 1, z = 1.4. Other tuning parameters in all the algorithms are chosen as γ = 0.1, m = 10, αθ = αψ = 0.001, αw = 0.005, and l(j) = 1 j0.51. To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32.