Rank-One Modified Value Iteration
Authors: Arman Sharifi Kolarijani, Tolga Ok, Peyman Mohajerin Esfahani, Mohamad Amin Sharifi Kolarijani
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems. In Section 5, we provide the results of our extensive numerical simulations and compare the proposed algorithms with a range of existing algorithms for solving the optimal control problem of MDPs. |
| Researcher Affiliation | Academia | 1Delft University of Technology, The Netherlands 2University of Toronto, Canada. Correspondence to: Arman S. Kolarijani <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Rank-One Value Iteration (R1-VI) Algorithm 2 Rank-One Q-Learning (R1-QL) |
| Open Source Code | No | The paper describes its proposed algorithms (R1-VI and R1-QL) and mentions adapting update rules for existing algorithms, but it does not provide any explicit statement about releasing its own implementation code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The experiments are conducted on Garnet (Archibald et al., 1995) and Graph MDPs (Devraj & Meyn, 2017), focusing on the Bellman errors T(vk) vk and T(qk) qk and the value errors vk v and qk q . |
| Dataset Splits | No | The paper describes the generation of MDP instances for Garnet (e.g., "25 randomly generated instances of Garnet MDPs") and refers to configurations from cited works for Graph MDPs. However, it does not specify explicit training/testing/validation splits for a fixed dataset, such as percentages, sample counts, or references to standard benchmark splits for data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the numerical simulations, such as GPU models, CPU types, or memory configurations. It only generally mentions "numerical simulations" and "experiments". |
| Software Dependencies | No | The paper mentions that update rules were adapted from a specific source for implementation ("We adapt the update rules provided by (Kolarijani & Mohajerin Esfahani, 2023) in our implementations for the planning algorithms."), but it does not specify any software libraries, frameworks, or tools with their version numbers (e.g., Python version, PyTorch/TensorFlow version, or specific solvers). |
| Experiment Setup | Yes | All the learning algorithms use the same samples generated through the training. Besides the proposed R1-QL Algorithm 2, we report the performance of Speedy QL (Ghavamzadeh et al., 2011), Zap QL (Devraj & Meyn, 2017), and the standard QL (5) in Garnet and Graph MDPs for several discount factors; see Appendix B.1 for the update rule of Speedy QL and Zap QL. We run each algorithm using the same step-size schedule (λk) k=0, namely, linearly decaying λk = 1/(1 + k), to ensure a fair comparison. |