Game Theory Extra Lecture 1 (BoB)

Size: px

Start display at page:

Download "Game Theory Extra Lecture 1 (BoB)"

Domenic Melton
6 years ago
Views:

1 Game Theory 2014 Extra Lecture 1 (BoB) Differential games Tools from optimal control Dynamic programming Hamilton-Jacobi-Bellman-Isaacs equation Zerosum linear quadratic games and H control Baser/Olsder, pp , , ,

2 To understand Definition of difference/differential games Optimal Control Dynamic programming and HJBI equation Open-loop and feedback Nash Equilibria Copies from Baser/Olsder Material 2

3 Difference Games State equation u i determined by player i x k+1 = f(x k,u 1,u 2,...,u N ), x 1 given Observations y i k = hi k (x k), i = 1,...,N Information structure, for each player a subset η i k of {y 1 1,...,y 1 k;...;y N 1,...,y N k ;u 1 1,...,u 1 k 1;...;u N 1,...,u N k 1} Goal: Player i wants to minimize some functional, L i, of x and u given his available information. Solution concepts: As before, Nash, Stackelberg, etc 3

4 Differential Games d dt x(t) = f(x(t),u 1(t),u 2 (t),...,u N (t)), x(t 0 ) given Goal: to minimize L i = T 0 g i (x,u 1,...,u N ) dt T either fixed, or given by an implicit equation T = min t {t;l(x(t)) 0} 4

5 Example, Robust Control H control Assume u 1,u 2 and y are related via ẋ = Ax+B 1 u 1 +B 2 u 2 z = C 1 x y = C 2 x+du 2 Typical H question: Is z 2 (t)+u 2 1(t)dt γ 2 u 2 2(t)dt? Differential game between controller, u 1 (t) function of y([0,t]) disturbance, u 2 (t) function of x([0,t]) and u 1 ([0,t]) Performance criterium min u 1 max u 2 0 (z 2 +u 2 1 γ 2 u 2 2)dt 5

6 Robust/H Control The interpretation is that u 1 is the control signal and u 2 is a worstcase disturbance signal. Introduce the L 2 (or energy) norm of the signal w w 2 = ( 0 ) 1/2 w(t) 2 dt The H norm of a linear system G(s) is defined by G = sup w 0 Gw w 6

7 Robustness The H norm measures the largest amplification of energy by the system. Minimization of the norm is clearly interesting if w is a disturbance signal. Another motivation is given by the so called small-gain theorem Theorem A closed loop system is stable for all perturbations with norm < γ if and only if G K 1/γ G Κ 7

8 H control G K < γ K : z 2 γ 2 w 2 < 0 min u max w ( z 2 γ 2 w 2 ) < 0 This is exactly (the upper value of) an affine quadratic game The relation between H control and game theory was noted rather late (end of 80s). For details see Section 6.6 8

9 Pursuit Evasion Games 9

10 Tools from One-Person Optimization Dynamic programming The (maximum) minimum principle Baser/Olsder Ch

11 Dynamic Programming, discrete time x k+1 = f k (x k,u k ), u k U k K L(u) = g k (x k+1,u k,x k ) k=1 f k,g k,u k,k,x 1 given. Want u k = γ k (x k ) that minimizes L Idea: Generalize the problem; Calculate the value function V(k,x) = min γ k,...,γ K for all initial conditions x k = x. K i=k g i (x i+1,u i,x k ) 11

12 Principle of Optimality An optimal control sequence u 1,...,u K should be sequentially optimal. The only coupling between the optimization problems on time horizons [1,k 1] and [k,k] is via the state x k. This leads to V(k,x) = min u k U k [g k (f k (x,u k ),u k,x)+v(k +1,f k (x,u k ))] with the final condition V(K,x) = min u K g k (x K+1,u K,x K ). 12

13 Example Affine quadratic problems has the solution x k+1 = A k x k +B k u k +c k L(u) = 1 K (x 2 k+1qx k+1 +u kru k ) k=1 V(k,x) = 1 2 x S k x+x s k +q k u k = P k S k+1 A k x k P k (s k+1 +S k+1 c k ) Formulas for P k,s k,s k are given on p

14 Dynamic Programming, continuous-time The same reasoning leads to a PDE equation called the Hamilton- Jacobi-Bellman (HJB) equation ẋ(t) = f(x(t),u(t)) u(t) = γ(x(t)) L(u) = T 0 g(x(t),u(t)dt+q(t,x(t)) The final time T can be either fixed known, or given implicitly by T = min{t : l(x(t)) = 0} t 0 14

15 HJB equation T V(t,x) = min g(x(s),u(s))ds+q(t,x(t)) {u(s),s [t,t]} t V(T,x) = q(t,x) along l(x) = 0 Principle of optimality shows that if V is C 1 then V(t,x) [ ] V(t,x) = min f(x,u)+g(x,u) t u U x (+ final condition when t = T) 15

16 Theorem of Sufficiency Theorem 5.3 p 237 If a C 1 function V can be found that satisfies the HJB equation and boundary conditions above then it generates the optimal strategy u through the pointwise optimization problem defined by the right hand side. Proof see p. 237 Example: Affine-quadratic problems has a quadratic function V(t,x) = 1 2 x S(t)x+k (t)x+m(t), see p

17 The Minimum Principle Introduce the costate vector p (t) = V(t,x (t))/ x where x denotes the optimal trajectory corresponding to u, i.e. ẋ (t) = f(x (t),u (t)). Also define (the Hamiltonian) Theorem 5.4 H(t,p,x,u) = g(x,u)+p (t)f(x,u) p (T) = q(t(x ),x ) along l(t,x) = 0 x ṗ (t) = H(t,p,x,u ) x u = arg min u U H(t,p,x,u) See also the discrete-time counterpart Theorem

18 N person difference games How does this translate to difference games with N > 1 players? x k+1 = f k (x k,u 1 k,...,u N k ) K J i (u 1,...,u N ) = gk(x i k+1,u 1 k,...,u N k,x k ) k=1 The control laws u i = γ i constitute a Nash equilibrium if Open-loop information: u i k J i (γ ) J i ({γ i,γ i }) i is an open-loop function of k Closed-loop information: u i k (x k) is allowed to be a function of x k. Hence changing u i will result in changes in u j 18

19 See Section Open Loop Dynamic Games Idea: Player i solves a one-player problem. His choice u i will not influence the other u j, so these can be treated as given functions of time. Then the results from the N = 1 apply directly 19

20 Open Loop Theorems Theorem 6.1 If u = γ provides an open-loop Nash equilibrium, then there exists a sequence of costate vectors p i such that x k+1 = f k (x k,u k) γ i k = arg min u i k p i k = x k f k H i k(p i k+1,{γ i k,u i k},x k) [ p i k+1 + ( ) ] gk i x k+1 where H i = g i +p i f. For details see p ( ) gk i x k The affine case results in coupled Riccati equations. The special zero-sum case for N = 2 is given in Theorem 6.3, and 6.4. Only one Riccati equation must be solved. The condition for existence of Nash equilibrium has the form I B 2 k S k+1b 2 k > 0. 20

21 Feedback Solutions Theorem 6.6 The set of strategies γ provides a feedback Nash equilibrium if and only if there exists functions V i (k,x) such that V i (k,x) = arg min[gk( i,u i i k)+v i (k +1, u i k = g i k( i,γ i k )+V i (k +1, f i k (x,γ i k )) f i k (x,u i k))] where f i k (x,ui k ) = f k(x,{γ i (x),u i k }) 21

22 Feedback Solutions, Zero-sum case N = 2 See Corollary 6.2 p. 282 The set of strategies γ 1,γ 2 provides a feedback saddle-point solution if and only if there exists functions V(k,x) such that for all k V(k,x) = min u 1 k = max u 2 k max u 2 k min u 1 k [ gk (f k (x,u 1 k,u 2 k),u 1 k,u 2 k,x)+v(k +1,f k (x,u 1 k,u 2 k) ] [ gk (f k (x,u 1 k,u 2 k),u 1 k,u 2 k,x)+v(k +1,f k (x,u 1 k,u 2 k) ] = g k (f k (x,γ 1 k,γ 2 k ),γ 1 k (x),γ 2 k (x),x)+v(k +1,f k (x,γ 1 k (x),γ 2 k (x))) 22

23 Differential Games Open-loop Nash Equilibria Want to present counterparts for continuous time. Existence of smooth cost function V is more problematic. L i (u 1,...,u N ) = = ẋ(t) = f(x(t),u 1 (t),...,u N (t)) T 0 g i (x(t),u 1 (t),...,u N (t))dt+q i (x(t)) 23

24 Open-loop Nash Equilibria Theorem 6.11 (Some smoothness assumptions) If u i = γ i (t,x 0 ) provides an open-loop Nash equilibrium (with corresponding x ) then there exist N costate functions p i (t) such that ẋ (t) = f(x (t),u 1 (t),...,u N (t)) γ i = arg min u i U i H i (p i (t),x (t),{u i,u i }) ṗ i (t) = x Hi (p i (t),x,u (t)) ṗ i (T) = x qi (x (T)) where H i (p i,x,u) = g i (x,u)+p i f(x,u). 24

25 The Affine-Quadratic Case f(x,u) = A(t)x+ B i (t)u i +c(t) g i (x,u) = 1 2 x Q i (t)x+ u j R ij u j q i (x(t)) = 1 2 x (T)Q i fx(t) with Q i 0,Q i f 0,Rii > 0. 25

26 The Affine-Quadratic Case Theorem 6.12+p.316 There is an open-loop Nash equilibrium solution to the affine-quadratic game for any T [0,T f ] if and only if the following coupled matrix Riccati differential equations have solutions for any T [0,T f ] M i +M i A+A M i +Q i M i j B j R jj (t) 1 B j M j = 0 M i (T) = Q i f The NE is given by u i = R ii (t) 1 B i [M i x (t) + m i ] where the feedforward signal m i is given by (6.51) 26

27 P1 minimizes, P2 maximizes. L(u 1,u 2 ) = 1 2 N = 2 Zero-Sum Case T with Q 0, Q f 0. 0 x Qx+u 1 u 1 u 2 u 2 dt+ 1 2 x (T)Q f x(t) Assume there exists a unique bounded symmetric solution S( ) to the matrix equation Ṡ +A S +SA+Q+SB 2 B 2 S = 0, S(T) = Q f on the interval [0,T]. 27

28 Then there exists a solution to Ṁ +A M +MA+Q M(B 1 N 1 N 2 B 2 )M = 0;M(T) = Q f and the game admits a unique open-loop saddle-point given by γ 1 (t,x 0 ) = B 1 (t) [M(t)x (t)+m(t)] γ 2 (t,x 0 ) = B 2 (t) [M(t)x (t)+m(t)] 28

29 Closed-loop Feedback Nash Equilibria Will only give the result for N = 2 and zero-sum situation Corollary 6.6, p326 A pair of strategies {γ i } provides a feedback saddle-point solution if there exists a function V satisfying the PDE V(t,x) [ ] V(t,x) = minmax f(x,u 1,u 2 )+g(x,u 1,u 2 ) t u 1 u 2 x [ ] V(t,x) = maxmin f(x,u 1,u 2 )+g(x,u 1,u 2 ) u 2 u 1 x [ ] V(t,x) = f(x,γ 1,γ 2 )+g(x,γ 1,γ 2 ) x The value of the game is V(0,x 0 ) This is the famous Hamilton-Jacobi-Bellman-Isaacs equation obtained by Isaacs in 1950s. 29

30 N = 2, Zero-sum Affine Case ẋ = Ax+B 1 u 1 +B 2 u 2 +c L = 1 2 T 0 x Qx+u 1 u 1 u 2 u 2 dt Theorem 6.17 If there exists a unique symmetric bounded solution to the Riccati equation Ż +A Z +ZA+Q Z(B 1 B 1 B 2 B 2 )Z = 0, Z(T) = Q f then the two-player zero-sum game admits a unique feedback saddle-point given by (for details see p. 327) γ i (t,x) = ( 1) i B i (t) [Z(t)x(t)+ζ(t)] 30

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from