reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rank-One Modified Value Iteration

Authors: Arman Sharifi Kolarijani, Tolga Ok, Peyman Mohajerin Esfahani, Mohamad Amin Sharifi Kolarijani

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems. In Section 5, we provide the results of our extensive numerical simulations and compare the proposed algorithms with a range of existing algorithms for solving the optimal control problem of MDPs.
Researcher Affiliation	Academia	1Delft University of Technology, The Netherlands 2University of Toronto, Canada. Correspondence to: Arman S. Kolarijani <EMAIL>.
Pseudocode	Yes	Algorithm 1 Rank-One Value Iteration (R1-VI) Algorithm 2 Rank-One Q-Learning (R1-QL)
Open Source Code	No	The paper describes its proposed algorithms (R1-VI and R1-QL) and mentions adapting update rules for existing algorithms, but it does not provide any explicit statement about releasing its own implementation code, nor does it provide a link to a code repository.
Open Datasets	Yes	The experiments are conducted on Garnet (Archibald et al., 1995) and Graph MDPs (Devraj & Meyn, 2017), focusing on the Bellman errors T(vk) vk and T(qk) qk and the value errors vk v and qk q .
Dataset Splits	No	The paper describes the generation of MDP instances for Garnet (e.g., "25 randomly generated instances of Garnet MDPs") and refers to configurations from cited works for Graph MDPs. However, it does not specify explicit training/testing/validation splits for a fixed dataset, such as percentages, sample counts, or references to standard benchmark splits for data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the numerical simulations, such as GPU models, CPU types, or memory configurations. It only generally mentions "numerical simulations" and "experiments".
Software Dependencies	No	The paper mentions that update rules were adapted from a specific source for implementation ("We adapt the update rules provided by (Kolarijani & Mohajerin Esfahani, 2023) in our implementations for the planning algorithms."), but it does not specify any software libraries, frameworks, or tools with their version numbers (e.g., Python version, PyTorch/TensorFlow version, or specific solvers).
Experiment Setup	Yes	All the learning algorithms use the same samples generated through the training. Besides the proposed R1-QL Algorithm 2, we report the performance of Speedy QL (Ghavamzadeh et al., 2011), Zap QL (Devraj & Meyn, 2017), and the standard QL (5) in Garnet and Graph MDPs for several discount factors; see Appendix B.1 for the update rule of Speedy QL and Zap QL. We run each algorithm using the same step-size schedule (λk) k=0, namely, linearly decaying λk = 1/(1 + k), to ensure a fair comparison.