The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes

Authors: Nikolaos Kariotoglou, Maryam Kamgarpour, Tyler H. Summers, John Lygeros

JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the proposed method and analyze its potential with numerical case studies. The final contribution of our paper is the focus on numerical validation of the LP approach to stochastic reach-avoid problems. We develop and solve a series of benchmark problems and evaluate our approximate solutions in two ways. First, we compute the closed-loop empirical reach-avoid policy by applying the approximated control input obtained from (15). Second, we use scalable alternative approaches to approximate the benchmark reach-avoid problems. We also computed empirical reach-avoid probabilities in simulation by sampling 100 noise trajectories from each initial state and implementing the ADP control policy using the associated value function approximation. The controller was tested on the ORCA setup by running 10 experiments from each initial condition.
Researcher Affiliation Academia Nikolaos Kariotoglou EMAIL Automatic Control Laboratory Information Technology and Electrical Engineering ETH Z urich, CH-8092, Switzerland, Maryam Kamgarpour EMAIL Automatic Control Laboratory Information Technology and Electrical Engineering ETH Z urich, CH-8092, Switzerland, Tyler H. Summers EMAIL Control, Optimization, and Networks Lab Mechanical Engineering The University of Texas at Dallas, Texas, TX 75080, USA, John Lygeros EMAIL Automatic Control Laboratory Information Technology and Electrical Engineering ETH Z urich, CH-8092, Switzerland
Pseudocode Yes We summarize the method to approximate the reach-avoid value function in Algorithm 1. Algorithm 1 linear programming based reach-avoid value function approximation
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It states: "The controller was tested on the ORCA setup by running 10 experiments from each initial condition1. We pre-computed the control inputs at the predicted mean trajectory of the states over the horizon for each experiment. Implementing the feedback policy online would require solving problem (15) within the sampling time of 0.08 seconds. In theory, this computation is possible since the control space is only two dimensional but it requires developing an embedded nonlinear programming solver compatible with the ORCA setup. Here, we have implemented the open loop controller."
Open Datasets No The paper uses synthetic data for Example 1 and 2, and for Example 3, it uses a model derived from empirical data from the ORCA project, citing "Liniger, Domahidi, and Morari (2014)" for the model derivation. However, it does not provide concrete access information (link, DOI, repository, or formal citation for a public dataset) for any dataset used in its experiments, nor does it explicitly state any dataset is open or publicly available.
Dataset Splits No The paper describes how initial conditions and noise trajectories are sampled for simulations rather than providing specific training/test/validation dataset splits. For example: "To evaluate the performance of the approximation, we sampled 100 initial conditions x0, uniformly from X. For each initial condition we generated 100 noise trajectories {ωk}T 1 k=0." and "We simulated the performance of the ADP controller starting from 100 different initial conditions... For every initial condition we sampled 100 different noise trajectory realizations..."
Hardware Specification Yes All computations were carried out on an Intel Core i7 Q820 CPU clocked at 1.73 GHz with 16GB of RAM memory, using IBM s CPLEX optimization toolkit in its default settings.
Software Dependencies No All computations were carried out on an Intel Core i7 Q820 CPU clocked at 1.73 GHz with 16GB of RAM memory, using IBM s CPLEX optimization toolkit in its default settings. The paper mentions "IBM's CPLEX optimization toolkit" but does not provide a specific version number for this software component.
Experiment Setup Yes The design choices include the number of basis functions, their centers and variances, the sample violation and confidence bounds in Lemma 2 and the state-relevance weights. We chose 100, 500 and 1000 GRBF elements for the reach-avoid problems of dim(X U) = 4, 6, 8, respectively (Table 1). We used uniform measures supported on X and [0.02, 0.095]n to sample the GRBFs centers and variances, respectively. The violation and confidence levels for every k {0, . . . , 4} were set to εk = 0.05, 1 βk = 0.99 and the measure P X U required to generate samples from X U was chosen to be uniform. We used 2000 GRBFs for each approximation step with centers and variances sampled according to uniform measures supported on X and on the hyper-rectangle defined by the product of intervals in the rows of Table 5, respectively. We used a uniform state-relevance measure and a uniform sampling measure to construct each one of the finite linear programs in Algorithm 1. All violation and confidence levels were chosen to be εk = 0.2 and 1 βk = 0.99 respectively for k = {0, . . . , 5}.