Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information
Authors: Ziyi Zhang, Yorie Nakahira, Guannan Qu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a function of the prediction horizon. We validate our approach through simulations and confirm its efficacy in non-stationary environments. ... 6 Simulation |
| Researcher Affiliation | Academia | Ziyi Zhang EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Yorie Nakahira EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Guannan Qu EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Model predictive dynamical programming (MPDP) 1: Select v(0) Rn, specify ϵ > 0, and set S = 0. 2: for t = 0, 1, 2, . . . , T do 3: Forcast ˆPt, . . . , ˆPt+k, ˆrt, . . . , ˆrt+k 4: Select at according to equation 10. 5: st+1 Pt( |st, at). |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | No | In the first simulation, we simulate a queueing system based on the setup provided in Example 1. ... In this section, we consider a scenario of EV charging station under the setup of Example 2 with time horizon T = 50. |
| Dataset Splits | No | For each k {1, . . . , 15}, we run 20 trials and record the average regret for each k value. The paper describes simulations and trials, but does not mention specific train/test/validation splits for any dataset. |
| Hardware Specification | No | The paper mentions running 'simulations' but does not specify any particular hardware (e.g., GPU, CPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Specifically, we consider a representative example of 3 servers whose service rates {µi}i=1,2,3 are 100, 10, 1, respectively, with time horizon T = 100 and varying load λt fluctuating from 10 to 100. ... The agent has access to the predicted arrival rate of jobs with some Gaussian additive prediction error ˆλt := λt +N(0, σ) with σ {0, 1, 2}. ... In this section, we consider a scenario of EV charging station under the setup of Example 2 with time horizon T = 50. The charging station has three charging stands, and the energy price fluctuates between 2 and 18. |