Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Authors: Ziyi Zhang, Yorie Nakahira, Guannan Qu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a function of the prediction horizon. We validate our approach through simulations and confirm its efficacy in non-stationary environments. ... 6 Simulation
Researcher Affiliation Academia Ziyi Zhang EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Yorie Nakahira EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Guannan Qu EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University
Pseudocode Yes Algorithm 1 Model predictive dynamical programming (MPDP) 1: Select v(0) Rn, specify ϵ > 0, and set S = 0. 2: for t = 0, 1, 2, . . . , T do 3: Forcast ˆPt, . . . , ˆPt+k, ˆrt, . . . , ˆrt+k 4: Select at according to equation 10. 5: st+1 Pt( |st, at).
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets No In the first simulation, we simulate a queueing system based on the setup provided in Example 1. ... In this section, we consider a scenario of EV charging station under the setup of Example 2 with time horizon T = 50.
Dataset Splits No For each k {1, . . . , 15}, we run 20 trials and record the average regret for each k value. The paper describes simulations and trials, but does not mention specific train/test/validation splits for any dataset.
Hardware Specification No The paper mentions running 'simulations' but does not specify any particular hardware (e.g., GPU, CPU models, memory) used for these simulations.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Specifically, we consider a representative example of 3 servers whose service rates {µi}i=1,2,3 are 100, 10, 1, respectively, with time horizon T = 100 and varying load λt fluctuating from 10 to 100. ... The agent has access to the predicted arrival rate of jobs with some Gaussian additive prediction error ˆλt := λt +N(0, σ) with σ {0, 1, 2}. ... In this section, we consider a scenario of EV charging station under the setup of Example 2 with time horizon T = 50. The charging station has three charging stands, and the energy price fluctuates between 2 and 18.