Policy Iteration Based on Stochastic Factorization

Authors: A. M. S. Barreto, J. Pineau, D. Precup

JAIR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest. ...Section 5 investigates the computational issues surrounding the use of PISF in practice and presents possible solutions to efficiently compute the factorization of an MDP. In Section 5 some of the proposed solutions are put to the test in a large-scale decision problem involving the maintenance of an asset with components that deteriorate over time.
Researcher Affiliation Academia André M. S. Barreto EMAIL Laboratório Nacional de Computação Científica Petrópolis, Brazil Joelle Pineau EMAIL Doina Precup EMAIL School of Computer Science Mc Gill University Montreal, Canada
Pseudocode Yes Algorithm 1 Policy iteration Algorithm 2 Policy iteration based on stochastic factorization (PISF) Algorithm 3 Stochastic-factorization computation
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories.
Open Datasets No The paper describes a synthetic problem setup, the 'Multicomponent-Replacement Decision Task,' where '100 instances of the problem were randomly generated' for each value of nc. It does not use or provide access to a pre-existing, publicly available dataset.
Dataset Splits No The paper does not provide traditional training/test/validation splits for a fixed dataset. For evaluation when nc > 5, it mentions '10,000 test states si sampled uniformly at random from S,' but this is a sampling method for evaluation, not a dataset split.
Hardware Specification No The experiments were run using computational resources made available by Compute Canada and Calcul Québec. This is a general statement about resource providers and does not specify any particular hardware models (e.g., GPU, CPU models, or memory configurations).
Software Dependencies No The paper does not specify any particular software, libraries, or solvers with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) that were used for implementation or experimentation.
Experiment Setup Yes The experiments described in this section were carried out assuming a general version of the maintenance task in which components interact with each other. The failure probability of component cj was modeled as a linear function of the asset s general condition. Suppose the asset is in state si and the decision maker decides to perform action ah. Then, denoting the next state by sk, the probability of component cj failing is: P(sk j = 0 | sij > 1,ahj = 0) = f ( f fmin)(sij 1) lj 1 + ˆf u = j(lu siu) u = j lu , (21) where f, fmin and ˆf are parameters of the model and it is assumed that lj > 1 for all j. ... In the experiments, equation (21) was considered with f = 0.1, fmin = 0.01, and ˆf = 0.1. ... The variables lj and rj were drawn from normal distributions with means µl = 10 and µr = 10 and common standard deviation σlr = 3 (the values sampled for lj were rounded to the closest natural number and sampled again in case the result was smaller than 2). ... The constant term of the setup cost was fixed at Rs = 10 and the failure cost was set as R f = 5nc. ... The multicomponent-replacement task was modeled as a discounted problem with γ = 0.999... The number of neighbors η used in the approximation was set to nc, the function ω was defined as the constant 1/η, and the neighborhood radius σ was varied in {200,400,600}. ... The iterative value function computation was interrupted according to the stop criterion described in Puterman s (1994) Proposition 6.6.5, with ε = 10 6.