Lecture 9 Synthesis of Reactive Control Protocols Nok Wongpiromsarn Singapore-MIT Alliance for Research and Technology Richard M. Murray and Ufuk Topcu California Institute of Technology EECI, 16 May 2012 Outline Open System Synthesis: definition of open systems and open system synthesis problem Reactive System Synthesis: problem statement, realizability, games, solving games, complexity General Reactivity(1) Games
Hierarchical Structure Preview Alice s navigation stack Mission Planner Different views long-horizon specification W L Multi-scale models W 0... W L 1 W L W 0 W L 1 Hierarchical control architecture Traffic Planner short-horizon specification Path Planner continuous dynamics& constraints min T t 0 L(x, u)dt s.t. ẋ = f(x, u) g(x, u) 0 Vehicle Actuation This lecture covers this TuLiP: Temporal logic planning toolbox (Open source at http://tulip-control.sf.net) [Coming up in the next lecture]
Alice s Logic Planner DR,NP,S coll.-free path for DR,PR,S found DR,A coll.-free path for DR,A found DR,B coll.-free path found passing finished or obs disappeared STO,NP,S DR,P,S passing finished or obs disappeared 3 & #DR,PR,S >= M & #lanes = 1 coll.-free path found coll.-free path found & #lanes=1 OFF-ROAD STO,A STO,B stopped & obs detected coll.-free path found backup finished & #BACKUP < N backup finished & #BACKUP >= N & #lanes>1 FAILED STO,P,S & #DR,PR,S < M DR,PR,S coll.-free path found & #DR,PR,S >= M & #lanes > 1 BACKUP & #lanes > 1 & #lanes = 1 PAUSED ROAD REGION STO,PR,S Given a specification Φ, whether the planner is correct with respect to Φ depends on the environment s actions (e.g., how obstacles move) a correct planner needs to ensure that Φ is satisfied for all the possible valid behaviors of the environment How to design such a correct planner?
y P E C Open System Synthesis y x An open system is a system whose behaviors can be affected by external influence x E CP y Open (synchronous) synthesis: Given a system which describes all the possible actions - plant actions y are controllable - environment actions x are uncontrollable a specification Φ(x, y) find a strategy f(x) for the controllable actions which will maintain the specification against all possible adversary moves, i.e., x Φ(x, f(x)) time E x 0 x 1 x 2 x 3 CP y 0 = f(x 0 ) y 1 = f(x 0 x 1 ) y 2 = f(x 0 x 1 x 2 ) y 3 = f(x 0 x 1 x 2 x 3 ) 4
Reactive System Synthesis Reactive systems are open systems that maintain an ongoing interaction with their environment rather than producing an output on termination. Consider the synthesis of a reactive system with input x and output y, specified by the linear temporal formula Φ(x, y). The system contains 2 components S1 (i.e., environment ) and S2 (i.e., reactive module ) - Only S1 can modify x - Only S2 can modify y Want to show that S2 has a winning strategy for y against all possible x scenarios the environment may present to it. - Two-person game: treat environment as adversary S2 does its best, by manipulating y, to maintain Φ(x, y) S1 does its best, by manipulating x, to falsify Φ(x, y) If a winning strategy for S2 exists, we say that Φ(x, y) is realizable x S1 S2 y 5
Satisfiability Realizability Realizability should guarantee the specification against all possible (including adversarial) environment (Rosner 98) To solve the problem one must find a satisfying tree where the branching represents all possible inputs,y0 x {x 0,x 1 },y1,y2,y3,y4,y5,y6 Satisfiability of Φ(x, y) only ensures that there exists at least one behavior, listing the running values of x and y that satisfies Φ(x, y) There is a way for the plant and the environment to cooperate to achieve Φ(x, y) Existence of a winning strategy for C2 can be expressed by the AE-formula x y Φ(x, y) 6
The Runner Blocker System R B Goal Runner R tries to reach Goal. Blocker B tries to intercept and stop R. 7
The Runner Blocker System lose lose win 8
Solving Reactive System Synthesis Solution is typically given as the winning set The winning set is the set of states starting from which there exists a strategy for C2 to satisfy the specification for all the possible behaviors of C1 A winning strategy can then be constructed by saving intermediate values in the winning set computation Worst case complexity is double exponential Construct a nondeterministic Buchi automaton from Φ(x, y) first exponent Determinize Buchi automaton into a deterministic Rabin automaton second exponent Follow a similar procedure as in closed system synthesis and construct the product of the system and the deterministic Rabin automaton Find the set of states starting from which all the possible runs in the product automaton are accepting This set can be obtained by computing the recurrent and the attractor sets Special Cases of Lower Complexity For a specification of the form p, p, p or p, the controller can be synthesized in O(N 2 ) time where N is the size of the state space Avoid translation of the formula to an automaton and determinization of the automaton 9
Special Case: Reachability Transition system TS =(S, Act,,I,AP,L) Specification Φ = p Define the set WIN {s S : s = p} Define the predecessor operator Pre :2 S 2 S by Pre (R) ={s S : r R s.t. s r} The set of all the states starting from which WIN is reachable (if the plant and the environment to cooperate) can be computed efficiently by the iteration sequence R 0 = WIN R i = R i 1 Pre (R i 1 ), i >0 From Tarski-Knaster Theorem: - There exists a natural number n such that Rn = Rn-1 - Such an Rn is the minimal solution of the fix-point equation R = WIN Pre (R) - The minimal solution of the above fix-point equation is denoted by µr.(win Pre (R)) 10
The Runner Blocker System R 4 R 3 R 2 lose R 1 lose R 0 win
Reachability in Adversarial Setting Transition system TS =(S, Act,,I,AP,L) Specification Φ = p Define the set WIN {s S : s = p} Define the operator Pre :2 S 2 S and Pre :2 S 2 S by Pre (R) = {s S : r S if s r, then r R} = the set of states whose all successors are in R Pre (R) = Pre (Pre (R)) = the set of states whose all successors have at least one successor in R The set of all the states starting from which the controller can force the system into WIN can be computed efficiently by the iteration sequence R 0 = WIN R i = R i 1 Pre (R i 1 ), i >0 - There exists a natural number n such that Rn = Rn-1 - Such Rn is the minimal solution of the fix-point equation - The minimal solution of the above fix-point equation is denoted by µr.(win Pre (R)) 12 R = WIN Pre (R)
The Runner Blocker System R 3 R 1 lose lose R 0 win
More Complicated Case DR,NP,S coll.-free path found STO,NP,S ROAD REGION stopped & obs detected coll.-free path for DR,PR,S found DR,A coll.-free path for DR,A found DR,B passing finished or obs disappeared DR,P,S passing finished or obs disappeared & #DR,PR,S >= M & #lanes = 1 coll.-free path found coll.-free path found & #lanes=1 STO,A STO,B coll.-free path found backup finished & #BACKUP < N backup finished & #BACKUP >= N & #lanes>1 STO,P,S & #DR,PR,S < M DR,PR,S coll.-free path found & #DR,PR,S >= M & #lanes > 1 BACKUP & #lanes > 1 & #lanes = 1 STO,PR,S OFF-ROAD FAILED PAUSED Game Automata Approach Consider the specification as the winning condition in an infinite two-person game between input player (C1) and output player (C2). Decide whether player C2 has a winning strategy, and if this is the case construct a finite state winning strategy. 14
Game Structures A game structure is a tuple G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) V = {v 1,...,v n } is a finite set of state variables. Σ V is the set of all the possible assignments to variables in V X V is a set of input variables Y = V\X is a set of output variables θ e (X ) is a proposition characterizing the initial states of the environment θ s (V) is a proposition characterizing the initial states of the system primed copy of X represents the set of next input variables ρ e (V, X ) is a proposition characterizing the transition relation of the environment ρ s (V, X, Y ) is a proposition characterizing the transition relation of the system AP is a set of atomic propositions L : Σ V 2 AP is a labeling function ϕ is an LTL formula characterizing the winning condition 15,y0,y1,y2,y0,y1,y2 V = {x, y}, X = {x}, Σ X = {x 0,x 1 }, Y = {y}, Σ Y = {y 0,y 1,y 2 }, x 0 = θ e,x 1 = θ e, (x 0,y 0 ) = θ s, (x i,y j ) = θ s, i, j = 0, ((x 0,y i ),x j ) = ρ e, i, j, ((x 1,y 0 ),x 0 ) = ρ e, ((x 1,y 0 ),x 1 ) = ρ e, ((x 1,y i ),x 0 ) = ρ e, i {1, 2}, ((x 1,y i ),x 1 ) = ρ e, i {1, 2}, ((x 0,y 0 ),x 0,y i ) = ρ s, i, ((x 0,y 0 ),x 1,y 0 ) = ρ s, ((x 0,y 0 ),x 1,y i ) = ρ s, i = 0,...
Autonomous Car Example Game Structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) X (environment): obstacles, other cars, pedestrians Y (plant): vehicle state (drive VS stop, passing?, reversing?, etc) θ e describes the valid initial states of the environment, e.g., where obstacles can be θ s describes the valid initial states of the vehicle, e.g., the stop state ρ e describes how obstacles may move ρ s describes the valid transitions of the vehicle state ϕ describes the winning condition, e.g., vehicle does not get stuck 16
Plays Game structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) infinite or the last state in the sequence has no valid successor A play of G is a maximal sequence of states σ = s 0 s 1... satisfying s 0 = θ e θ s and (s j,s j+1 ) = ρ e ρ s, j 0. Initially, the environment chooses an assignment s X Σ X such that s X = θ e and the system chooses an assignment s Y Σ Y such that (s X,s Y ) = θ e θ s. From a state s j, the environment chooses an input s X Σ X such that (s j,s X ) = ρ e and the system chooses an output s Y Σ Y such that (s, s X,s Y ) = ρ s. A play σ is winning for the system if either σ = s 0 s 1...s n is finite and (s n,s X ) = ρ e, s X Σ X, or σ is infinite and σ = ϕ. Otherwise σ is winning for the environment.,y1,y0,y2,y0 ϕ = (x = x 0 y = y 2 ) σ = (x 0,y 0 ), (x 0,y 2 ), (x 0,y 1 ) ω is winning for the system σ = (x 0,y 0 ) ω is winning for the environment,y1,y2 17
Strategies Game structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) memory domain A strategy for the system is a function f : M Σ V Σ X M Σ Y such that for all s Σ V,s X Σ X,m M, iff(m, s, s X )=(m,s Y ) and (s, s X ) = ρ e,then (s, s X,s Y ) = ρ s. A play σ = s 0 s 1...is compliant with strategy f if f(m i,s i,s i+1 X )=(m i+1,s i+1 Y ), i. A strategy f is winning for the system from state s Σ V if all plays that start from s and are compliant with f are winning for the system. If such a winning strategy exists, we call s a winning state for the system.,y1,y0,y2,y0 G,y1,y0,y2,y0 f Is f winning for the system?,y1,y2,y1,y2 f(m, (x 0,y 0 ),x 0 ) = (m, y 2 ) f(m, (x 0,y 0 ),x 1 ) = (m, y 0 ) f(m, (x 0,y 1 ),x 0 ) = (m, y 0 ) f(m, (x 0,y 1 ),x 1 ) = (m, y 1 ) f(m, (x 0,y 2 ),x 0 ) = (m, y 1 ) f(m, (x 0,y 2 ),x 1 ) = (m, y 0 ) f(m, (x 1,y 0 ),x 0 ) = (m, y 2 ) f(m, (x 1,y 0 ),x 1 ) = (m, y 2 ) 18 f(m, (x 1,y 1 ),x 0 ) = (m, y 2 ) f(m, (x 1,y 1 ),x 1 ) = (m, y 2 ) f(m, (x 1,y 2 ),x 0 ) = (m, y 2 ) f(m, (x 1,y 2 ),x 1 ) = (m, y 0 )
Winning Games A game structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) iswinning for the system if for each s X Σ X such that s X = θ e,thereexistss Y Σ Y such that (s X,s Y ) = θ s and (s X,s Y )isa winning state for the system G f,y0,y0,y1,y2,y0,y1,y2,y0,y1,y2,y1,y2 (x 0,y 0 ) is a winning state for the system x 0 = θ e but x 1 = θ e (x 0,y 0 ) = θ s G is winning for the system 19
Runner Blocker Example s3 Game Structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) X := {x}, Σ X = {s 0,s 1,s 2,s 3,s 4 } Y := {y}, Σ Y = {s 0,s 1,s 3,s 4 } θ e := (x = s 2 ) s0 R B s2 s4 θ s := (y = s 0 ) ρ e := (x = s 2 ) = (x = s 2 ) (x = s 2 ) = (x = s 2 ) ρ s := (y = s 0 y = s 4 ) = (y = s 1 y = s 3 ) (y = s 1 y = s 3 ) = (y = s 0 y = s 4 ) (y = x ) s1 ϕ describes the winning condition, e.g., (y = s 4 ) 20
q3 Runner Blocker Example Play: An infinite sequence σ = s 0 s 1... of system (blocker + runner) states such that s0 is a valid initial state and (sj, sj+1) satisfies the transition relation of the blocker and the runner q0 R B q2 q4 Strategy: A function that gives the next runner state, given a finite number of previous system states of the current play, the current system state and the next blocker state Winning state: A state starting from which there exists a strategy for the runner to satisfy the winning condition for all the possible behaviors of the blocker q1 Winning game: For any valid initial blocker state sx, there exists a valid initial runner state sy such that (sx, sy) is a winning state Solving game: Identify the set of winning states 21
Solving Reachability Games Game structure G = (V, X, Y, θ e, θ s, ρ e, ρ s, AP, L, ϕ) For a proposition p, let [[p]] = {s Σ V s p} For a set R, let [[ R]] = s Σ V s X Σ X, (s, s X ) ρ e s Y Σ Y s.t. (s, s X,s Y) ρ s and (s X,s Y) R similar to the Pre operator we saw earlier Reachability game: ϕ = p The set of winning states can be computed efficiently by the iteration sequence R 0 = R i+1 = [[p]] [[ R i ]], i 0 R i+1 is the set of states starting from which the system can force the play to reach a state satisfying p within i steps There exists a natural number n such that R n = R n 1 Such R n is the minimal solution of the fix-point equation R = [[p]] [[ R]] In µ-calculus, the minimal solution of the above fix-point equation is denoted by µr(p R) least fixpoint 22
Runner Blocker Example: R1 R i+1 = [[p]] [[ R i ]], i 0 [[p]] 23
Runner Blocker Example: R2 R i+1 = [[p]] [[ R i ]], i 0 [[p]] 24
Runner Blocker Example: R3 = R4 =... R i+1 = [[p]] [[ R i ]], i 0 [[p]] 25
Solving Safety Games Game structure G = (V, X, Y, θ e, θ s, ρ e, ρ s, AP, L, ϕ) For a proposition p, let [[p]] = {s Σ V s p} For a set R, let [[ R]] = s Σ V s X Σ X, (s, s X ) ρ e s Y Σ Y s.t. (s, s X,s Y) ρ s and (s X,s Y) R Safety game: ϕ = p The set of winning states can be computed efficiently by the iteration sequence R 0 = Σ V R i+1 = [[p]] [[ R i ]], i 0 R i+1 is the set of states starting from which the system can force the play to stay in states satisfying p for i steps There exists a natural number n such that R n = R n 1 Such R n is the maximal solution of the fix-point equation R = [[p]] [[ R]] In µ-calculus, the minimal solution of the above fix-point equation is denoted by νr(p R) greatest fixpoint 26
Runner Blocker Example: R1 = R2 =... R i+1 = [[p]] [[ R i ]], i 0 27
Solving Games Game structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) ϕ The set of winning states for the system p µx(p X) µr.(win Pre (R)) R 0 = WIN R i = R i 1 Pre (R i 1 ), i >0 p νx(p X) p µyνx(p X) Y p p U q νxµy (p X) Y µx(p X) q The disjunction and µy operators ensure that the system is in a state where it can force the play to reach a state satisfying p The conjunction and the νx operators ensure that the above statement is true at all time 28
Games and Realizability Game structure G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) Consider a specification ψ = θ e ρ e ϕ e = θs ρ s ϕ s Fulfillment of the system safety depends on the liveness of the environment - The system may violate its safety if it ensures that the environment cannot fulfill its liveness If the system wins in G, then ψ is realizable (but not vice versa) - A winning strategy for G is also a winning strategy for ψ (but not vice versa) By adding extra output variables that represent the memory of whether the system or the environment violate their initial requirements or their safety requirements, we can construct a game G such that G is won by the system iff ψ is realizable,y0,y1,y0 G,y1 X = {x}, Σ X = {x 0,x 1 }, Y = {y}, Σ Y = {y 0,y 1 }, θ e, θ s true ρ e (x = x 1 ) ρ s ((x = x 1 ) = (y = y 1 )) ϕ e ((x = x 1 ) (y = y 1 )) ϕ s (y = y 0 ) ψ is realizable The system always picks y = y 0 ψ is not realizable The system does not win in G 29
General Reactivity(1) Games GR(1) game is a game G =(V, X, Y, θ e, θ s, ρ e, ρ s, AP,L,ϕ) with the winning condition ϕ =( p 1... p m ) = ( q 1... q n ) ϕ e The winning states in a GR(1) game can be computed using the fixpoint expression Z 1 µy m νx(q 1 Z 2 ) Y ( p i X) i=1 ν Z 2 µy m νx(q 2 Z 3 ) Y ( p i X) i=1 Z n µy m νx(q n Z 1 ) Y ( p i X) i=1 µy νx Y ( p i X) characterizes the set of states from which the system can force the play to stay indefinitely in p i states The two outer fixpoints make sure that the system wins from the set q j Z j 1 Y - The disjunction and µy operators ensure that the system is in a state where it can force the play to reach a q j Z j 1 state in a finite number of steps - The conjunction and νz j operators ensure that after visiting q j, we can loop and visit q j 1 30 ϕ s
Extracting GR(1) Strategies The intermediate values in the computation of the fixpoint can be used to compute a strategy, represented by a finite transition system, for a GR(1) game. This strategy does one of the followings Iterates over strategies f1,..., fn where fj ensures that the play reaches a qj state Eventually uses a fixed strategy ensuring that the play does not satisfy one of the liveness assumptions pj Complexity: A game structure G with a GR(1) winning condition can be solved by a symbolic algorithm in time proportional to nmσ V 3 31
Extensions The algorithm for solving GR(1) game can be applied to any game with the winning condition of the form ϕ = ϕ e = ϕ s ϕ e where and can be represented by a deteministic Buchi automaton. Add to the game additional variables and a transition relation which encodes the deterministic Buchi automaton Examples: ϕ s (p q) - Introduce a Boolean variable x - Initial condition: x = 1 - Transition relation for the environment: - Winning condition: x ρ e x = (q x p) 32
(p q) Converting LTL to GR(1) LTL spec Initially: x = 1; Specification: x; Transition relation: x = q (x p) LTL2BA x (q = q0) q0 p q A ϕs Initially: q = q0; Specification: (q = q0); Transition relation: (q = q0) ( p q) (q = q0) q q1 true true (q = q0) (p q) (q = q1) (q = q1) q (q = q0) (q = q1) q (q = q1) Determinize NBA (e.g., use LTL2DSTAR) Equivalent TS A ϕs A ϕs q p q q0 q1 p q q p q q0q1 p q 33 q q0 q1 p q p q q