q-Learning in Continuous Time
Authors: Yanwei Jia, Xun Yu Zhou
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms. |
| Researcher Affiliation | Academia | Yanwei Jia EMAIL Department of System Engineering and Engineering Management The Chinese University of Hong Kong Shatin, NT, Hong Kong; Xun Yu Zhou EMAIL Department of Industrial Engineering and Operations Research & The Data Science Institute Columbia University New York, NY 10027, USA |
| Pseudocode | Yes | Algorithm 1 Offline Episodic q-Learning ML Algorithm; Algorithm 2 Offline Episodic q-Learning Algorithm; Algorithm 3 Online-Incremental q-Learning Algorithm; Algorithm 4 q-Learning Algorithm for Ergodic Tasks; Algorithm 5 Offline Episodic q-Learning Mean Variance Algorithm |
| Open Source Code | Yes | The code to reproduce our simulation studies is publicly available at https://www.dropbox.com/sh/ 34cgnupnuaix15l/AAAj2y QYf NCOt PUc1_7Vhbk Ia?dl=0. |
| Open Datasets | No | The paper conducts simulation experiments and generates its own data based on specified configurations, rather than using publicly available datasets. For example, it states: "To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32." |
| Dataset Splits | No | The paper mentions generating "20 years of training data" for its simulations, but it does not specify any explicit splits of this data into training, validation, or test sets for reproduction purposes. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) are provided for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) are provided in the paper. |
| Experiment Setup | Yes | We conduct simulations with the following configurations: µ {0, 0.1, 0.3, 0.5}, σ {0.1, 0.2, 0.3, 0.4}, T = 1, x0 = 1, z = 1.4. Other tuning parameters in all the algorithms are chosen as γ = 0.1, m = 10, αθ = αψ = 0.001, αw = 0.005, and l(j) = 1 j0.51. To have more realistic scenarios, we generate 20 years of training data and compare the three algorithms with the same dataset for N = 20, 000 episodes with a batch size 32. |