Multi-Objective Verification on MDPs. Tim Quatmann, Joost-Pieter Katoen Lehrstuhl für Informatik 2

Similar documents
Problem Set 8 Solutions

Reinforcement Learning

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

Lecture 8: Period Finding: Simon s Problem over Z N

Assignment for Mathematics for Economists Fall 2016

Stochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

Memoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o

Departure Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity

arxiv: v1 [math.mg] 25 Aug 2011

Lecture 7: Testing Distributions

c n b n 0. c k 0 x b n < 1 b k b n = 0. } of integers between 0 and b 1 such that x = b k. b k c k c k

List coloring hypergraphs

THE THERMOELASTIC SQUARE

arxiv: v2 [math.nt] 30 Apr 2015

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

Performance Evaluation

Bogoliubov Transformation in Classical Mechanics

What lies between Δx E, which represents the steam valve, and ΔP M, which is the mechanical power into the synchronous machine?

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

CS 170: Midterm Exam II University of California at Berkeley Department of Electrical Engineering and Computer Sciences Computer Science Division

CHAPTER 6. Estimation

March 18, 2014 Academic Year 2013/14

p. (The electron is a point particle with radius r = 0.)

HSC PHYSICS ONLINE KINEMATICS EXPERIMENT

Technical Appendix: Auxiliary Results and Proofs

Problem 1. Construct a filtered probability space on which a Brownian motion W and an adapted process X are defined and such that

Chapter 4. The Laplace Transform Method

To describe a queuing system, an input process and an output process has to be specified.

ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS

Robustness analysis for the boundary control of the string equation

Computers and Mathematics with Applications. Sharp algebraic periodicity conditions for linear higher order

Research Article Existence for Nonoscillatory Solutions of Higher-Order Nonlinear Differential Equations

3.1 The Revised Simplex Algorithm. 3 Computational considerations. Thus, we work with the following tableau. Basic observations = CARRY. ... m.

Root Locus Diagram. Root loci: The portion of root locus when k assume positive values: that is 0

Pulsed Magnet Crimping

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

Bayesian Learning, Randomness and Logic. Marc Snir

Geometric Measure Theory

Secretary problems with competing employers

A Study on Simulating Convolutional Codes and Turbo Codes

Convex Hulls of Curves Sam Burton

Physics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014

Avoiding Forbidden Submatrices by Row Deletions

Call Centers with a Postponed Callback Offer

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

COHOMOLOGY AS A LOCAL-TO-GLOBAL BRIDGE

ECE-320 Linear Control Systems. Spring 2014, Exam 1. No calculators or computers allowed, you may leave your answers as fractions.

Determination of the local contrast of interference fringe patterns using continuous wavelet transform

Massachusetts Institute of Technology Dynamics and Control II

GNSS Solutions: What is the carrier phase measurement? How is it generated in GNSS receivers? Simply put, the carrier phase

Lecture 9: Shor s Algorithm

FILTERING OF NONLINEAR STOCHASTIC FEEDBACK SYSTEMS

Bayesian-Based Decision Making for Object Search and Characterization

Copyright 1967, by the author(s). All rights reserved.

Chapter 13. Root Locus Introduction

NAME (pinyin/italian)... MATRICULATION NUMBER... SIGNATURE

Minimal state space realization of MIMO systems in the max algebra

THE SPLITTING SUBSPACE CONJECTURE

Stresses near a plate vertex due to a shear force on one of the edges

A Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. = 4 for QPSK) E b Q 2. 2E b.

Incremental Quantitative Verification for Markov Decision Processes

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

Equivalent Strain in Simple Shear Deformations

OPTIMIZATION OF INTERDEPENDENT QUEUEING MODEL WITH BULK SERVICE HAVING VARIABLE CAPACITY

TAYLOR POLYNOMIALS FOR NABLA DYNAMIC EQUATIONS ON TIME SCALES

EE Control Systems LECTURE 6

SOLUTIONS TO ALGEBRAIC GEOMETRY AND ARITHMETIC CURVES BY QING LIU. I will collect my solutions to some of the exercises in this book in this document.

One Class of Splitting Iterative Schemes

FRTN10 Exercise 3. Specifications and Disturbance Models

arxiv: v1 [quant-ph] 22 Oct 2010

Preemptive scheduling on a small number of hierarchical machines

AP Physics Charge Wrap up

Competitive Analysis of Task Scheduling Algorithms on a Fault-Prone Machine and the Impact of Resource Augmentation

Eigenvalues and eigenvectors

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

A note on strong convergence for Pettis integrable functions (revision)

A Partially Backlogging Inventory Model for Deteriorating Items with Ramp Type Demand Rate

ME 375 FINAL EXAM Wednesday, May 6, 2009

TMA4125 Matematikk 4N Spring 2016

PHYS 110B - HW #6 Spring 2004, Solutions by David Pace Any referenced equations are from Griffiths Problem statements are paraphrased

ONLINE APPENDIX: TESTABLE IMPLICATIONS OF TRANSLATION INVARIANCE AND HOMOTHETICITY: VARIATIONAL, MAXMIN, CARA AND CRRA PREFERENCES

P ( N m=na c m) (σ-additivity) exp{ P (A m )} (1 x e x for x 0) m=n P (A m ) 0

MEM 355 Performance Enhancement of Dynamical Systems Root Locus Analysis

Notes on Phase Space Fall 2007, Physics 233B, Hitoshi Murayama

POINCARE INEQUALITY AND CAMPANATO ESTIMATES FOR WEAK SOLUTIONS OF PARABOLIC EQUATIONS

Primitive Digraphs with the Largest Scrambling Index

Conduction Heat transfer: Unsteady state

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get


STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND

Midterm Review - Part 1

INEQUALITIES FOR THE NUMERICAL RADIUS IN UNITAL NORMED ALGEBRAS

An Approach for Solving Multi-Objective Linear Fractional Programming Problem and It s Comparison with Other Techniques

Multi-dimensional Fuzzy Euler Approximation

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

OPTIMAL INVESTMENT AND CONSUMPTION PORTFOLIO CHOICE PROBLEM FOR ASSETS MODELED BY LEVY PROCESSES

The simplex method is strongly polynomial for deterministic Markov decision processes

Math 273 Solutions to Review Problems for Exam 1

Transcription:

Multi-Objective Verification on MDP Lehrtuhl für Informatik 2

Motivation Planning Under Uncertainty Scenario: Travel to the airport T a x i Travel time i uncertain... Goal: Arrive before the flight depart!!2 Multi-objective Verification of MDP

Motivation Traveling with Computer Scientit Model trip to airport a a Markov deciion proce (MDP) Controlable nondeterminim 0.7 Probabilitic branching Maximie probability to arrive before the flight depart Pr( 3h) Other type of cot play a role a well: Maximize Pr( 50 ) fuel, pollution, tre, waiting time,... 0.3 0.5 0.5............!3 Multi-objective Verification of MDP

Multi-objective Model Checking Analye Tradeoff Between Objective.0 Arrive within 3 hour v. Invet le than 50 Pr( 3h) 0.8 0.6 0.4 T a x i Pareto Curve 0.2 0 0 0.2 0.4 0.6 0.8.0 Pr( 50 )!4 Multi-objective Verification of MDP

Overview α MDP with Multiple Objective Linear Programming Approach fin() = fout() for all S / S y,α,t 0 for all,t S, α Act() Σ S-i Σα Act() Σt S+i y,α,t pi for all i {,, m} Weighted Sum Approach!5 Multi-objective Verification of MDP

Markov Deciion Procee (MDP) Markov deciion proce M = (S, Act, P, init) Nondeterminim Probabilitic branching 0.7... Randomied Policy S reolve nondeterminim: S(π)(α) = "Probability to chooe action α when oberving finite path π" init α β 0.3 0.5...... Probability meaure Pr S : Pr S ( G) = "Probability to reach G under S" 0.5...!6 Multi-objective Verification of MDP

MDP with Multiple Objective Single-objective: maximal probability Prmax( G) max S Pr S ( G) Multi-objective: tradeoff Prmax( G) v. Prmax( G2) v.... There i not a ingle policy that maximie all probabilitie init α β 0.7 0.3 0.5......... 0.5...!7 Multi-objective Verification of MDP

Geometry p = (p,, pm) R m i a point in m N dimenional Euclidean pace p[i] refer to pi R Let p,q R m and λ R p q hold iff p[i] q[i] for all i {,, m} p < q hold iff p q and p q λ p (λ p[],, λ p[m]) R m (calar multiplication) p q Σ i m p[i] q[i] R (dot product) Definition: A et B R m i convex iff p,q B implie λ p + (-λ) q B for all 0 λ.!8 Multi-objective Verification of MDP

Achievable Point Definition: For MDP M = (S, Act, P, init) let Π, Π2,, Πm Path(M) be m meaurable et of path, and ~, ~2,, ~m {<,,, >}. A point p [0,] m R m i called achievable iff there i a (randomied) policy S.t. Pr S (Πi) ~i p[i] for all i {,, m}. We alo ay that point p i achieved by policy S A( Πi,~i m) denote the et of achievable point Example: α 0.8 α 0.5 0.5 Pr( G2).0 0.8 0.6 0.4 0.2 A( G,, G2, ) p 0.2 0 0 0.2 0.4 0.6 0.8.0 Pr( G)!9 Multi-objective Verification of MDP

Achievable Point Lemma: The et of achievable point A( Πi, ~i m) i convex. Proof (ketch): Let p, q in A( Πi, ~i m), i.e., p and q are achieved by ome policie Sp and Sq. For any λ [0,], the point λ p + (-λ) q i achieved by the randomied policy S which initially flip a coin: With probability λ, it mimic policy Sp With probability -λ, it mimic policy Sq!0 Multi-objective Verification of MDP

Multi-Objective Reachability For implicity, we only conider maximiing reachability probabilitie, i.e., Πi = Gi for goal-tate Gi S ~i = We imply write A( Gi m) intead of A( Gi, m) A( Gi m) i downward cloed, i.e., p A( Gi m) and p q implie q A( Gi m)! Multi-objective Verification of MDP

Pareto Curve Definition: Let A( Gi m) be a et of achievable point. q R m dominate p R m, iff q > p. p A( Gi m) i Pareto optimal, iff no q A( Gi m) dominate p, i.e., q A( Gi m) implie q p. P( Gi m) { p p i Pareto optimal } i the Pareto curve.!2 Multi-objective Verification of MDP

Multi-Objective Verification Querie Achievability Query: Given: MDP M, goal tate et G,, Gm, p R m Output: True iff p A( Gi m) Quantitative Query: Given: MDP M, goal tate et G,, Gm, p2,, pm R Output: max { p R (p, p2,, pm) A( Gi m) } Pareto Query: Given: MDP M, goal tate et G,, Gm Output: Pareto Curve P( Gi m)!3 Multi-objective Verification of MDP

Policy Requirement In general, we need policie with randomiation and finite memory, e.g.: α β t u (0.5, 0.5) A( {t}, {u})? Only with randomied policy S()(α) = S()(β) = 0.5 t α β u (, ) A( {t}, {u})? γ Only with finite-memory policy S(π)(α) =, if π ha not viited {t}, yet {0, otherwie!4 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t α β u t α β u γ γ t α β u t α β u γ γ!5 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t β α u t α β u γ γ α t β u t α β u γ γ!6 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t γ β α u t α γ β u α t β u t α β u γ γ!7 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice β α t α γ β u u!8 Multi-objective Verification of MDP

Goal-unfolding Definition: The goal-unfolding of MDP M = (S, Act, P, init) and G,, Gm S \ {init} i the MDP MU = (S {0,} m, Act, PU, init, (0,, 0) ), where and for i {,, m}: PU(,b, α, t,c ) = ucc(b,t)[i] = P(, α, t), if c = ucc(b,t) { 0, otherwie {, if t G i b[i], otherwie The ize of MU i polynomial in the ize of M and exponential in m!9 Multi-objective Verification of MDP

Goal-unfolding Lemma: p i achievable in M iff p i achievable in MU. To anwer a multi-objective query for M, we can analye MU intead Lemma: p i achievable in MU iff p i achieved by a poitional policy. For the analyi of MU we only need to conider poitional policie!20 Multi-objective Verification of MDP

Overview α MDP with Multiple Objective Linear Programming Approach fin() = fout() for all S / S y,α,t 0 for all,t S, α Act() Σ S-i Σα Act() Σt S+i y,α,t pi for all i {,, m} Weighted Sum Approach!2 Multi-objective Verification of MDP

{ Linear Programming Approach Idea: Ue variable y,α to encode the expected number of time we leave tate via action α Act() Exp. time i entered fin() Σt S Σα Act(t) yt,α PU(t, α, ) + {, if i the initial tate 0, otherwie fout() Σα Act() y,α Exp. time i left fin() { fout() We aert fin() = fout()!22 Multi-objective Verification of MDP

Solving Quantitative Querie Quantitative Query: Given: MDP M, goal tate et G,, Gm, p2,, pm R Output: max { p R (p, p2,, pm) A( Gi m) } Conider the goal-unfolding MU For i {,, m} let S-i = {,b b[i] = 0 } and S+i = {,b b[i] = } Let S = {,b t,c Pot*(,b ) implie c = b } Return the optimum of the following LP: max Σ S- Σα Act() Σt S+ y,α PU(, α, t) Exp. time G i entered uch that: fin() = fout() for all S / S y,α 0 for all S, α Act() Σ S-i Σα Act() Σt S+i y,α PU(, α, t) pi for all i {2,, m} Exp. time Gi i entered!23 Multi-objective Verification of MDP

Solving Achievability Querie Achievability Query: Given: MDP M, goal tate et G,, Gm, p R m Output: True iff p A( Gi m) Conider the goal-unfolding MU For i {,, m} let S-i = {,b b[i] = 0 } and S+i = {,b b[i] = } Let S = {,b t,c Pot*(,b ) implie c = b } Return True iff the following LP ha a feaible olution max 0 uch that: fin() = fout() for all S / S y,α 0 for all S, α Act() Σ S-i Σα Act() Σt S+i y,α PU(, α, t) pi for all i {,, m}!24 Multi-objective Verification of MDP

Overview α MDP with Multiple Objective Linear Programming Approach fin() = fout() for all S / S y,α,t 0 for all,t S, α Act() Σ S-i Σα Act() Σt S+i y,α,t pi for all i {,, m} Weighted Sum Approach!25 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored q Not achievable achievable pw w!26 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored Not achievable Recall: A( Gi m) i convex achievable pw w!27 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored Recall: A( Gi m) i convex achievable Not achievable w pw!28 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) Proof (ketch): pw A( Gi m) follow by definition Aume there i q A( Gi m) with w q > w pw Let q be achieved by policy S, i.e., Pr S ( Gi) q[i] for all i {,, m} Σ i m w[i] Pr Sw ( Gi) = w pw < w q Σ i m w[i] Pr S ( Gi) Contradiction to definition of Sw!29 Multi-objective Verification of MDP

Computation of Point pw for given w Weighted Value Iteration: Given: MDP M, goal tate et G,, Gm, w [0,] m, preciion ε Output: Point pw Conider the goal-unfolding MU = (SU, Act, PU, init, (0,, 0) ), where SU = S {0,} m Let g(b,c) {0,} m with g(b,c)[i] = iff b[i] = 0 and c[i] = for i {,, m} For,b SU and i {,, m}: x 0 (,b ) 0, y 0,i (,b ) 0, and Sw(,b ) α for ome arbitrary α Act(,b ) For j {, 2, }: x j (,b ) maxα Act(,b ) ( Σ t,c SU w g(b,c) + PU(,b, α, t,c ) x j- ( t,c ) ) Aopt arg maxα Act(,b ) ( Σ t,c SU w g(b,c) + PU(,b, α, t,c ) x j- ( t,c ) ) if Sw(,b ) Aopt then Sw(,b ) α for ome α Aopt For i {,, m}: y j,i (,b ) Σ t,c SU g(b,c)[i] + PU(,b, Sw(,b ), t,c ) y j-,i ( t,c ) Stop when max,b SU ( x j (,b ) - x j- (,b ) ε Return point pw with pw[i] = y j,i ( init, (0,, 0) )!30 Multi-objective Verification of MDP

Concluion Multi-objective MDP Satify multiple propertie at once Need randomied, finite-memory policie Two Approache Linear programming approach Weighted um approach Active Field of Reearch Ak u for Bachelor / Mater thei topic Thank you for your attention!3 Multi-objective Verification of MDP