reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Steady-State Planning in Expected Reward Multichain MDPs

Authors: George K. Atia, Andre Beckus, Ismail Alkhouri, Alvaro Velasquez

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the theoretical ﬁndings of our work using a comprehensive set of numerical experiments performed in various environments. The results demonstrate the correctness of the proposed LPs in yielding policies with provably correct behavior and the scalability of the proposed solutions to large problem sizes.
Researcher Affiliation	Collaboration	George K. Atia EMAIL Department of Electrical and Computer Engineering Department of Computer Science University of Central Florida, FL 32816, USA Andre Beckus EMAIL Air Force Research Laboratory, NY 13441, USA Ismail Alkhouri EMAIL Department of Electrical and Computer Engineering University of Central Florida, FL 32816, USA Alvaro Velasquez EMAIL Air Force Research Laboratory, NY 13441, USA
Pseudocode	Yes	Algorithm 1 Generation of a policy π ΠCPU Input: LMDP M with speciﬁcations Φ L . Output: Stationary policy π ΠCPU which satisﬁes Φ L . Determine the TSCCs rk(M), k m of M is SConnected = False, C = {.}, A = {.} while is SConnected = False do Solve LP (22) with constraint (23) to get optimal values x sa, y sa, (s, a) S A(s). Compute the support E+ k (x ) of each TSCC corresponding to x (See Theorem 5). if digraph (V + k (x ), E+ k (x )) forms a SCC for every k [m] then compute π using (14) is SConnected = True else ﬁnd a cut and update C and A end if end while
Open Source Code	No	No explicit statement or link for open-source code for the methodology described in this paper was found.
Open Datasets	Yes	This Frozen Island scenario is motivated by that found in Open AI Gym s Frozen Lake environment (Brockman et al., 2016). We also experiment with random MDPs constructed from n-node directed Gaussian partition graphs generated using the Network X toolbox (Hagberg et al., 2008).
Dataset Splits	No	The paper uses Markov Decision Process (MDP) environments (Frozen Island, Toll Collector, Gaussian partition graphs) for numerical experiments, which are simulation-based and do not typically involve explicit training/test/validation dataset splits in the conventional machine learning sense. Instead, they involve simulating policies within these environments and measuring performance metrics.
Hardware Specification	No	The ﬁrst set of experiments are performed on a standard desktop with 16GB of RAM using the Matlab CVX package for convex optimization (Grant & Boyd, 2014, 2008). We also perform a second set of experiments on a standard desktop of 128GB of RAM using the commercial CPLEX Optimizer, which provides a higher-precision mathematical solver for large-scale linear programming.
Software Dependencies	Yes	The ﬁrst set of experiments are performed on a standard desktop with 16GB of RAM using the Matlab CVX package for convex optimization (Grant & Boyd, 2014, 2008). We also perform a second set of experiments on a standard desktop of 128GB of RAM using the commercial CPLEX Optimizer, which provides a higher-precision mathematical solver for large-scale linear programming. We run simulations of the LP in (22), LP1 (with the positivity constraints) and LP2 (with the ﬂow constraints) for random instances of the Frozen Islands problem. The runtime results are reported in Table 5. For a 64 64 and a 128 128 grid, LP2 (the most complex) is solved in about 20 seconds and 15 minutes, respectively, demonstrating the eﬀectiveness of the developed formulations even for MDPs with over ten thousand states. ... using the CPLEX 12.8 solver.
Experiment Setup	Yes	For the ﬁrst small island we have steady-state speciﬁcations (Llog1, [0.25, 1]), (Llog2, [0.25, 1]), (Lcanoe1, [0.05, 1.0]) and reward R( , , Lﬁsh1) = R( , , Lﬁsh2) = 1. Likewise, the second small island has steady-state speciﬁcations (Lcanoe2, [0.05, 1.0]), (Lﬁsh1, [0.1, 1.0]), (Lﬁsh2, [0.1, 1.0]) and reward R( , , S \ (Lﬁsh1 Lﬁsh2)) = 0. Because the islands are covered in ice, the agent has a chance of slipping in three possible directions whenever it moves. Speciﬁcally, if the agent attempts to go right (left), it has a 90% chance of transitioning to the right (left), and there is a 5% chance of transitioning instead to either of the states above or below it. Similarly, if the agent tries to go up (down), it moves to the states above (below) it with 90% chance, and to the states to the right and left of it with chance 5% each. The constant Ntr is used to control the time of entry into the recurrent sets. Therefore, in our experiments we have set ϵ = 10 4.