reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deflated Dynamics Value Iteration

Authors: Jongmin Lee, Amin Rakhsha, Ernest K. Ryu, Amir-massoud Farahmand

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show the effectiveness of the proposed algorithms. Finally, in Section 6, we empirically evaluate the proposed methods and show their practical feasibility. For our experiments, we use the following environments: Maze with 5 5 states and 4 actions, Cliffwalk with 3 7 states with 4 actions, Chain Walk with 50 states with 2 actions, and random Garnet MDPs (Bhatnagar et al., 2009) with 200 states. The discount factor is set to γ = 0.99 for the comparison of DDVI with different ranks and the DDTD experiments, and γ = 0.995 in other experiments. Appendix D provides full definitions of the environments and policies used for PE. All experiments were carried out on local CPUs. We report the normalized error of V k defined as V k V π 1/ V π 1.2
Researcher Affiliation	Academia	Jongmin Lee EMAIL Seoul National University Amin Rakhsha EMAIL Department of Computer Science, University of Toronto Vector Institute Ernest K. Ryu EMAIL University of California, Los Angeles Amir-massoud Farahmand EMAIL Polytechnique Montréal Mila Quebec AI Institute University of Toronto
Pseudocode	Yes	Algorithm 1 DDVI with Auto PI Algorithm 2 Rank-s DDVI with the QR Iteration. Algorithm 3 Rank-s DDTD with QR iteration
Open Source Code	Yes	2The source code for the experiments can be found at https://github.com/adaptive-agents-lab/ddvi.
Open Datasets	Yes	For our experiments, we use the following environments: Maze with 5 5 states and 4 actions, Cliffwalk with 3 7 states with 4 actions, Chain Walk with 50 states with 2 actions, and random Garnet MDPs (Bhatnagar et al., 2009) with 200 states. Garnet We use the Garnet environment as described by Farahmand & Ghavamzadeh (2021); Rakhsha et al. (2022), which is based on Bhatnagar et al. (2009).
Dataset Splits	No	The paper uses Markov Decision Process (MDP) environments (Maze, Cliffwalk, Chain Walk, Garnet) which are simulated environments rather than static datasets with predefined splits. Data for these experiments is generated through agent interaction within the environment, and the paper does not specify traditional training/test/validation splits of a pre-collected dataset.
Hardware Specification	No	All experiments were carried out on local CPUs. (Section 6) This statement is too general as it does not specify any particular CPU model, generation, or number of cores. It lacks the specific details required for reproducibility.
Software Dependencies	No	We use the Implicitly Restarted Arnoldi Method (Lehoucq et al., 1998) from Sci Py package to calculate the eigenvalues and eigenvectors for DDVI. This mentions the 'Sci Py package' but does not provide a specific version number, which is crucial for reproducibility.
Experiment Setup	Yes	For our experiments, we use the following environments: Maze with 5 5 states and 4 actions, Cliffwalk with 3 7 states with 4 actions, Chain Walk with 50 states with 2 actions, and random Garnet MDPs (Bhatnagar et al., 2009) with 200 states. The discount factor is set to γ = 0.99 for the comparison of DDVI with different ranks and the DDTD experiments, and γ = 0.995 in other experiments. In all experiments, we set DDVI s α = 0.99. We perform an extensive comparison of DDVI against the prior accelerated VI methods... For PID VI, we set η = 0.05 and ϵ = 10 10. In Anderson VI, we have m = 5. The hyperparamaters of TD Learning and DDTD are given in Tables 1,2, and 3. (Appendix G)