Multi-Objective Verification on MDPs. Tim Quatmann, Joost-Pieter Katoen Lehrstuhl für Informatik 2

Multi-Objective Verification on MDP Lehrtuhl für Informatik 2

Motivation Planning Under Uncertainty Scenario: Travel to the airport T a x i Travel time i uncertain... Goal: Arrive before the flight depart!!2 Multi-objective Verification of MDP

Motivation Traveling with Computer Scientit Model trip to airport a a Markov deciion proce (MDP) Controlable nondeterminim 0.7 Probabilitic branching Maximie probability to arrive before the flight depart Pr( 3h) Other type of cot play a role a well: Maximize Pr( 50 ) fuel, pollution, tre, waiting time,... 0.3 0.5 0.5............!3 Multi-objective Verification of MDP

Multi-objective Model Checking Analye Tradeoff Between Objective.0 Arrive within 3 hour v. Invet le than 50 Pr( 3h) 0.8 0.6 0.4 T a x i Pareto Curve 0.2 0 0 0.2 0.4 0.6 0.8.0 Pr( 50 )!4 Multi-objective Verification of MDP

Overview α MDP with Multiple Objective Linear Programming Approach fin() = fout() for all S / S y,α,t 0 for all,t S, α Act() Σ S-i Σα Act() Σt S+i y,α,t pi for all i {,, m} Weighted Sum Approach!5 Multi-objective Verification of MDP

Markov Deciion Procee (MDP) Markov deciion proce M = (S, Act, P, init) Nondeterminim Probabilitic branching 0.7... Randomied Policy S reolve nondeterminim: S(π)(α) = "Probability to chooe action α when oberving finite path π" init α β 0.3 0.5...... Probability meaure Pr S : Pr S ( G) = "Probability to reach G under S" 0.5...!6 Multi-objective Verification of MDP

MDP with Multiple Objective Single-objective: maximal probability Prmax( G) max S Pr S ( G) Multi-objective: tradeoff Prmax( G) v. Prmax( G2) v.... There i not a ingle policy that maximie all probabilitie init α β 0.7 0.3 0.5......... 0.5...!7 Multi-objective Verification of MDP

Geometry p = (p,, pm) R m i a point in m N dimenional Euclidean pace p[i] refer to pi R Let p,q R m and λ R p q hold iff p[i] q[i] for all i {,, m} p < q hold iff p q and p q λ p (λ p[],, λ p[m]) R m (calar multiplication) p q Σ i m p[i] q[i] R (dot product) Definition: A et B R m i convex iff p,q B implie λ p + (-λ) q B for all 0 λ.!8 Multi-objective Verification of MDP

Achievable Point Definition: For MDP M = (S, Act, P, init) let Π, Π2,, Πm Path(M) be m meaurable et of path, and ~, ~2,, ~m {<,,, >}. A point p [0,] m R m i called achievable iff there i a (randomied) policy S.t. Pr S (Πi) ~i p[i] for all i {,, m}. We alo ay that point p i achieved by policy S A( Πi,~i m) denote the et of achievable point Example: α 0.8 α 0.5 0.5 Pr( G2).0 0.8 0.6 0.4 0.2 A( G,, G2, ) p 0.2 0 0 0.2 0.4 0.6 0.8.0 Pr( G)!9 Multi-objective Verification of MDP

Achievable Point Lemma: The et of achievable point A( Πi, ~i m) i convex. Proof (ketch): Let p, q in A( Πi, ~i m), i.e., p and q are achieved by ome policie Sp and Sq. For any λ [0,], the point λ p + (-λ) q i achieved by the randomied policy S which initially flip a coin: With probability λ, it mimic policy Sp With probability -λ, it mimic policy Sq!0 Multi-objective Verification of MDP

Multi-Objective Reachability For implicity, we only conider maximiing reachability probabilitie, i.e., Πi = Gi for goal-tate Gi S ~i = We imply write A( Gi m) intead of A( Gi, m) A( Gi m) i downward cloed, i.e., p A( Gi m) and p q implie q A( Gi m)! Multi-objective Verification of MDP

Pareto Curve Definition: Let A( Gi m) be a et of achievable point. q R m dominate p R m, iff q > p. p A( Gi m) i Pareto optimal, iff no q A( Gi m) dominate p, i.e., q A( Gi m) implie q p. P( Gi m) { p p i Pareto optimal } i the Pareto curve.!2 Multi-objective Verification of MDP

Multi-Objective Verification Querie Achievability Query: Given: MDP M, goal tate et G,, Gm, p R m Output: True iff p A( Gi m) Quantitative Query: Given: MDP M, goal tate et G,, Gm, p2,, pm R Output: max { p R (p, p2,, pm) A( Gi m) } Pareto Query: Given: MDP M, goal tate et G,, Gm Output: Pareto Curve P( Gi m)!3 Multi-objective Verification of MDP

Policy Requirement In general, we need policie with randomiation and finite memory, e.g.: α β t u (0.5, 0.5) A( {t}, {u})? Only with randomied policy S()(α) = S()(β) = 0.5 t α β u (, ) A( {t}, {u})? γ Only with finite-memory policy S(π)(α) =, if π ha not viited {t}, yet {0, otherwie!4 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t α β u t α β u γ γ t α β u t α β u γ γ!5 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t β α u t α β u γ γ α t β u t α β u γ γ!6 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice t γ β α u t α γ β u α t β u t α β u γ γ!7 Multi-objective Verification of MDP

Goal-unfolding A policy might need to memorie which et Gi ha been reached already Idea: Encode thi information into the tate-pace. Then, poitional (randomied) policie uffice β α t α γ β u u!8 Multi-objective Verification of MDP

Goal-unfolding Definition: The goal-unfolding of MDP M = (S, Act, P, init) and G,, Gm S \ {init} i the MDP MU = (S {0,} m, Act, PU, init, (0,, 0) ), where and for i {,, m}: PU(,b, α, t,c ) = ucc(b,t)[i] = P(, α, t), if c = ucc(b,t) { 0, otherwie {, if t G i b[i], otherwie The ize of MU i polynomial in the ize of M and exponential in m!9 Multi-objective Verification of MDP

Goal-unfolding Lemma: p i achievable in M iff p i achievable in MU. To anwer a multi-objective query for M, we can analye MU intead Lemma: p i achievable in MU iff p i achieved by a poitional policy. For the analyi of MU we only need to conider poitional policie!20 Multi-objective Verification of MDP

{ Linear Programming Approach Idea: Ue variable y,α to encode the expected number of time we leave tate via action α Act() Exp. time i entered fin() Σt S Σα Act(t) yt,α PU(t, α, ) + {, if i the initial tate 0, otherwie fout() Σα Act() y,α Exp. time i left fin() { fout() We aert fin() = fout()!22 Multi-objective Verification of MDP

Solving Quantitative Querie Quantitative Query: Given: MDP M, goal tate et G,, Gm, p2,, pm R Output: max { p R (p, p2,, pm) A( Gi m) } Conider the goal-unfolding MU For i {,, m} let S-i = {,b b[i] = 0 } and S+i = {,b b[i] = } Let S = {,b t,c Pot*(,b ) implie c = b } Return the optimum of the following LP: max Σ S- Σα Act() Σt S+ y,α PU(, α, t) Exp. time G i entered uch that: fin() = fout() for all S / S y,α 0 for all S, α Act() Σ S-i Σα Act() Σt S+i y,α PU(, α, t) pi for all i {2,, m} Exp. time Gi i entered!23 Multi-objective Verification of MDP

Solving Achievability Querie Achievability Query: Given: MDP M, goal tate et G,, Gm, p R m Output: True iff p A( Gi m) Conider the goal-unfolding MU For i {,, m} let S-i = {,b b[i] = 0 } and S+i = {,b b[i] = } Let S = {,b t,c Pot*(,b ) implie c = b } Return True iff the following LP ha a feaible olution max 0 uch that: fin() = fout() for all S / S y,α 0 for all S, α Act() Σ S-i Σα Act() Σt S+i y,α PU(, α, t) pi for all i {,, m}!24 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored q Not achievable achievable pw w!26 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored Not achievable Recall: A( Gi m) i convex achievable pw w!27 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) In particular, pw lie on the Pareto curve Approach: Compute pw for different w Stop when the Pareto curve i explored Recall: A( Gi m) i convex achievable Not achievable w pw!28 Multi-objective Verification of MDP

Weighted Sum Approach Theorem: For w [0,] m let Sw arg max S ( Σ i m w[i] Pr S ( Gi) ) and pw = ( Pr Sw ( G),, Pr Sw ( Gm) ). Then, pw A( Gi m) and for all q R m : w q > w pw implie q A( Gi m) Proof (ketch): pw A( Gi m) follow by definition Aume there i q A( Gi m) with w q > w pw Let q be achieved by policy S, i.e., Pr S ( Gi) q[i] for all i {,, m} Σ i m w[i] Pr Sw ( Gi) = w pw < w q Σ i m w[i] Pr S ( Gi) Contradiction to definition of Sw!29 Multi-objective Verification of MDP

Computation of Point pw for given w Weighted Value Iteration: Given: MDP M, goal tate et G,, Gm, w [0,] m, preciion ε Output: Point pw Conider the goal-unfolding MU = (SU, Act, PU, init, (0,, 0) ), where SU = S {0,} m Let g(b,c) {0,} m with g(b,c)[i] = iff b[i] = 0 and c[i] = for i {,, m} For,b SU and i {,, m}: x 0 (,b ) 0, y 0,i (,b ) 0, and Sw(,b ) α for ome arbitrary α Act(,b ) For j {, 2, }: x j (,b ) maxα Act(,b ) ( Σ t,c SU w g(b,c) + PU(,b, α, t,c ) x j- ( t,c ) ) Aopt arg maxα Act(,b ) ( Σ t,c SU w g(b,c) + PU(,b, α, t,c ) x j- ( t,c ) ) if Sw(,b ) Aopt then Sw(,b ) α for ome α Aopt For i {,, m}: y j,i (,b ) Σ t,c SU g(b,c)[i] + PU(,b, Sw(,b ), t,c ) y j-,i ( t,c ) Stop when max,b SU ( x j (,b ) - x j- (,b ) ε Return point pw with pw[i] = y j,i ( init, (0,, 0) )!30 Multi-objective Verification of MDP

Concluion Multi-objective MDP Satify multiple propertie at once Need randomied, finite-memory policie Two Approache Linear programming approach Weighted um approach Active Field of Reearch Ak u for Bachelor / Mater thei topic Thank you for your attention!3 Multi-objective Verification of MDP