Leader-Follower strategies for a Multi-Plant differential game

Size: px

Start display at page:

Download "Leader-Follower strategies for a Multi-Plant differential game"

Pierce Kelley
5 years ago
Views:

1 008 American Control Conference Westin Seattle Hotel, Seattle, Washington, USA June -3, 008 FrA066 Leader-Follower strategies for a Multi-Plant differential game Manuel Jiménez-Lizárraga, Alex Poznyak and MA Alcorta Abstract In this paper the formulation of a concept for a type of robust Leader-Follower equilibrium for a Multi- Plant or multiple scenarios differential game is developed The game dynamic is given by a family of N different possible differential equations Multi-Model representation with no information about the trajectory which is realized The robust Leader-Follower strategy for each player must confront with all possible scenarios simultaneously The problem of each player is the designing of min-max strategies for each player which guarantee an equilibrium for the worst case scenario Based on the Robust Maximum Principle, the conditions for a game to be in Robust Leader-Follower Equilibrium are presented As in the Nash equilibrium case the initial min-max differential game may be converted into a standard static game given in a multidimensional simplex A numerical procedure for resolving the case of linear quadratic differential game is presented I INTRODUCTION Beginning from the seminal works of [], [], [3] the leader-follower or Stackelberg solution for an open loop information structure in a two person differential games has been well established see also [4], [5] This concept of a solution for a game, introduced in 934 by the economist H von Stackelberg [6], is suitable when one of the players is forced to subordinate to an authority that must announces his decision first, before play his own strategy, or when one of the players chooses to play the game passively, that is, composes just his optimal reaction solve an oneplayer optimization problem given that the other player has announced his strategy As it is shown in the above publications this kind of problems are still tractable using results of optimal control theory All of them tackle this problem when the model of the considered dynamics is exactly known In the case of one player optimization problems the robust version of the traditional Maximum Principle referred to as Robust Maximum Principle RMP see [7], [8] and [9] allows to design an "optimal policy" of a min-max type for a multi-plant or multi-scenario problem, where each possible scenario is seen as possible parametric realization of the dynamic equation So, it appears that we have N possible linear state dynamic equations, each of them describing a model and there is no a priory information which will be the active one The RMP is based on the concept of min-max control problem where the operation of the maximization is taken over a set of uncertainty a parameter from a finite M Jiménez-Lizárraga and MA Alcorta are with the Faculty of Physical and Mathematical Sciences Autonomous University of Nuevo León, San Nicolas de los Garza N L, México majimenez,aalcorta@fcfmuanlmx A Poznyak is with the Automatic Control Department CINVESTAV-IPN México DF AP apoznyak@ctrlcinvestavmx set and the operation of the minimization is taken over set of admissible control strategies For a game with multiparticipants, choosing Nash strategies, some similar concept of Multi-scenarios has been used to exemplify the discrepancies of the players in information sets, models, cost functions or even different amount of information that the players could hold of a large scale system, [0], [], [], and recently, applying the Robust Maximum Principle there was derived a type of Robust Nash equilibrium [3] for a multi-scenario game parametrized by a parameter belonging to a given finite parametric set and the problem was formulated as a min-max problem of the game The purpose of this paper is to develop the RMP for a family of two person Multi-Plant game when one player is considered as the leader and the second one as the follower We formulate the problem for both leader and follower as min-max problem, that to the best of our knowledge has never been considered before, but presents a great interest because of its high spread applications The focus is, as in [3], the designing of strategies that should provide a "Robust leader-follower equilibrium" being applied to all scenarios or models of the game simultaneously It is shown that the resulting robust strategies for each player again appears as a mixture with the weights fulfilling the leader-follower condition of the controls which are the leader-follower for each fixed parameter value This technique permits to transform the formulation of the game with a leader from a Banach space where it was initially formulated into a finite dimensional space where the strategies to be found are the preference weights of each scenario in the weighted sum of the individual optimal controls The problem of finding such equilibrium weights is solve by the implementation of a special numerical procedure which extends the method given in [3] A numerical example illustrate the effectiveness of this approach II STANDARD LEADER-FOLLOWER EQUILIBRIUM We begin with a brief ly review of the concept for leaderfollower equilibrium The basic idea of a leader-follower strategy for a static two-person game seems to be very simple Consider two players Player and Player The cost function associated with the players are J u,u for Player J u,u for Player Both players want to minimize their criteria that naturally provokes a conf lict situation Defining the setsu andu for the admissible strategies for the player and player /08/$ AACC 3839

2 The resolution of this problem is given by the following equilibrium concept Choosing the player as the leader and player as the follower The set of strategies are said to be in a Leader-Follower equilibria with Player as the leader and Player as the follower if: and J u,u u J u,u o u J u,u o u = min u U J u,u andu o =u o u is the optimal policy of the player for a given strategy of the leader, and as is usual u u = u o u This means that being player as the leader, he must advance his strategy to play first and because the player want to minimize his functionalj thenu o is the "optimal reaction" OR of player for the minimization of J givenu If Player chooses any other strategyu, then Player will choose a strategy u o that minimizesj, but the resulting cost for Player may be greater than or equal to that when the Leader-Follower strategy with Player as the leader is used III MULTI-MODEL TWO PLAYERS GAME Consider the following two players multi-model differential game: ẋ =f x,u,u,t 3 wherex R n is the state vector of the game at timet [t 0,T], u j R m j j =, are the control strategies of each player at time t and is the entire index from a finite set A :={,,,M} describing each possible -model of the dynamics game 3,M is the number of possible model Let the individual aim performance h i, of each player i=, for each-model scenario be given by h i, :=h i 0x T+ T g i x,u,u,t dt i=, 4 The worst-case with respect to a possible scenario cost functionalf i for each player under fixed admissible strategiesu U andu U is defined by F i u,u :=max hi, u,u A Robust leader-follower Equilibrium The set of strategies are said to be in a Robust leaderfollower equilibria with Player as the leader and Player as the follower if: for any admissible strategyu,u U U the next inequalities hold F u,u F u,u o u, u :=u o u where: 5 F u,u o u = min u U F u,u 6 Note here that the main difference with the Standard leader-follower equilibrium is the max operation taken over all possible scenarios B Open-Loop Robust leader-follower Strategies for Multi- Model Differential Games Proceeding first outlining the robust optimal reaction of the follower, let us represent the robust optimal control problem for the follower following the standard procedure in Optimal Control Theory For each possible scenario introduce the extended variables for the follower x,f = x,,x n,x,f n+ defined inr n+ and the last component given by x,f n+ T x,f n+ = or, in differential form, ẋ,f n+ =g g x,u,u,τ dτ x,u,u,t, x,f n+ t 0=0 7 Now the initial individual aim performance for the follower 4 can be represented in the Mayer form without an integral term: h, =h, 0 xt+x,f n+ T 8 Notice thath, 0 x does not depend on the last coordenate x,f n+, that is, x,f n+ extended conjugate vector-variable by satisfying ψ j =- n k= h, 0 x = 0 Define also the new ψ :=ψ,,ψ n R n ψ := ψ,,ψ n,ψ n+ R n+ f k x,u,u,t ψ k - g x,u,u,t x j with some terminal condition j=,,n+ x n+ ψ n+ ψ j T=b j 0 t T, j=,,n+ 0 where the vectorb will be defined below For the "superextended" vectors for the follower defined by x,f := x,,x n+;;x M,,x M n+ ψ := ψ,,ψ n+;;ψ M,,ψ M n+ f,f := f,,f n,g ;;f M,,fn M,g M f,f =f,,f n,g Rn+ these vectors represent the complete family of trajectories including then+ state variables for the follower Using the previous vectors, following [8], define the next "generalized" Hamiltonian function for the follower: H ψ, x,f,u,u,t x := ψ, f,f,f,u,u,t = ψ, x,f,u,u,t = [ H ψ j f j x,u,u,t +ψ n+g x,f,u,u,t ]

3 the direct 3, 7 and conjugate 9 ODE equations may be represented shortly in the standard Hamiltonian form as d H ψ, x,f,u,u,t dt x,f = ψ ψ, x,f,u,u,t H d dt ψ = x,f Again, as it follows from the definition 6, and as it is shown in [], [], the optimal reaction u o of the follower to and fixed leader strategyu of the leader, is solved as an standard optimal control problem, for the presented Multi- Model problem the solution is given as in [9], the follower should solve the following robust optimal control problem: max h, u,u u min u Uadm Uadm i :={ u i u,u } 3 o u that provides the robust Stackelberg strategy for the follower Following the result of [9] the necessary condition for a robust optimality of a strategy for the follower must fulfill the next condition: the maximality condition The control strategies for the followeru o t Uadm t [0,T] satisfies ψ, x,f,u,u,t ψ o, x,f,u,u,t H or, equivalently, u o Arg max H u Uadm H ψ, x,f,u,u o,t 4 the complementary slackness condition For every next conditions hold µ h, F =0 5 the transversality condition Follower For every and everyi=,,n b +µ gradh, x T=0 ψ T=b, ψ n+t+µ =0 6 To derive the robust optimal condition for the leader, using the same Mayer representation as before, the functional for the leader is represented as: where h, =h, 0 xt+x,l n+ T T x,l n+ = g x,u,u,τ dτ Now consider the next hamiltonian representation for the leader: H ψ, λ, λ,λ 3, x,l,u,u,t := λ, f,l x,l,u,u,t - λ H x +λ H,f 3 u = H ψ, λ, λ,λ 3, x,l,u,u,t = [ λ,j fj x,u,u,t + λ,n+g x,f,u,u,t - λ H +λ x,f 3 H u where the extended vector are defined as: x,l := x,,x,l n+ ;;xm,,x M,l n+ λ := λ,,,λ,n+ ;;λm,,,λm,n+ λ := λ,,,λ,n+ ;;λm,,,λm,n+ f,l := f,,f n,g ;;f M,,fn M,g M f,l =f,,f n,g R n+ with the adjoint variables satisfying conditions that must be satisfy by the leader d H dt λ ψ, λ, λ,λ 3, x,l,u,u,t = x,l H ψ, λ, λ,λ 3, x,l,u,u,t d dt λ = ψ λ t 0=0 The application of the result in [9], yields to the next conditions that must be satisfy by the leader the maximality condition The control strategies for the leaderu t Uadm t [t 0,T], satisfies: H ψ, λ, λ,λ 3, x,l,u,u,t H ψ, λ, λ,λ 3, x,l,u,u,t equivalently, u Arg max H u Uadm ψ, λ, λ,λ 3, x,l,u,u,t 7 the complementary slackness condition For every next conditions hold µ h, F =0 8 the transversality condition Follower For every and everyi=,,n c +µ [ gradh, x] T x h, x Tλ T =0; λ T =c λ n+t+µ =0 9 additionally the leader must satisfy the next relation: H ψ, λ, λ,λ 3, x,l,u,u,t u =0 0 C Open-Loop Robust Leader-Follower Strategies in LQ Differential Games Consider the next two-player linear quadratic multimodel game: ẋ t=a tx t+ B,j tu j t j= x t R n, x t 0 =x 0, u j t R mj, j=,; t [t 0,T], T < ; ={,,,M} 384

4 Suppose that the individual aim performance h i, of each i-player i=, for each-model scenario is given by h i, u [,u = x TQ i f x T+ T x Q i x + u j R ij u ]dt j j= As we mentioned before the robust optimal control problem for each player is formulated as 3 In what follows we illustrate how the construction given above is applied for the case of two players leader follower LQ multi-model differential games The Hamiltonians for the leader and follower LQ are: [ x Q x + H = ψ n+ j= u j R j u j + ψ A x +B, u +B, u H = [ λ,n+ x Q x + u j R j u j + j= λ A x +B, u +B, u -λ Q x +A, ψ + R u +ψ B, λ 3 3 The conditions 4-0 for the LQ multimodel plant are as follows: ψ =- x H =-A ψ -ψ n+ Q x ; ψ n+ t =0 [ ψ T =-µ grad x TQ f x T +x,f n+ ]= T µ Q x T; ψ n+ T =-µ λ = - x H = -A λ λ,n+q x +Q λ λ,n+t=0 [ ] λ T=µ Q f x T Q f λ T λ,n+t=-µ λ = - ψ H = -A λ +B, λ 3; λ t 0 =0 4 the robust Stackelberg strategies for the leader satisfies: R u t= µ B, λ 5 and for the follower R u t= µ B, ψ 6 and H u = [ - µ R u t+λ B, +R λ ] 3 = [ -µ µ R R B, ψ +λ B, +R λ ] 3 =0 7 finally λ 3 = R [ µ µ R R ] B, ψ B, λ 8 Since at least one index A is active we have: µ i >0Introducing the normalized adjoint variables with as { ψ ψ n t µ nt= if µ >0 0 if µ =0 { λ k,n t= λ n t µ if µ >0 0 if µ =0 k=,,3 we get ψ t =-A ψ - ψ ψ n+t=0; n+ Q x with the corresponding transversality conditions given by ψ T= Q x T, ψ n+t =- λ t=-a λ λ,n+q x +Q λ λ T=Q f x T Q f λ T; λ,n+t=- λ = -A λ +B, λ 3 ; λ t 0=0 the robust Stackelberg strategies becomes: u t= µ R µ B, λ = R ν, B, λ and for the follower 9 u t= µ R µ B, ψ = R ν, B, ψ 30 where the vectors ν i := ν,,,ν,m i=, belongs to the simplex: { S i,m := ν i R M= A :ν,i =µ i µ i 0, ν,i = Finally 3 λ 3 = R [ R R ] B, ψ B, λ 3 As one can see from 9 and 30, the Robust Optimal Control for both leader an follower is a mixture of the control actions optimal for each independent index 384

5 D Extended form for the game Now consider the next representation for the game A 0 Q i 0 A:= ;Q i := 0 A M 0 Q i Q i f 0 ν,i I 0 Q i f :=,Γ i := 0 Q i f 0 ν M,i I B i := [ B i, Bi,M ], I R n n i=, 33 In the extended form we obtain the general dynamics given by bold stand for extended vectors and matrices: ẋ=ax+b u +B u, x t 0 = x 0,,x M 0 ψ=-a ψ+q x; ψt =-Q f xt; λ =-A λ Q x+q λ ; λ T =Q f xt -Q f λ T λ = -Aλ +B λ 3 ; λ t 0 =0 λ 3 = R [ R R ] B ψ B λ u = R B Γ λ u = R B Γ ψ where x := x,,,x, n ;;x M,,,xn M, ψ ψ, :=,, ψ, n ;; ψ M,,, ψ M, n λ, M, M, k := λ, k,,, λ k,n ;; λ k,,, λ,n k=,,3 R nm R nm R nm Theorem: If for the two person linear quadratic differential game 34 with the following restrictions to the matrices: R ii > 0, R 0, Q i 0 and Q i f 0 i =, there exists a solution set of the following parametrized coupled differential equation [4]: Ṗ ν Λ,ν A B R B Λ,ν B R B P ν + P ν +A P ν +Γ Q =0; P ν T=Γ Q f ; P ν =P ν R nm nm A B R B Λ,ν B R B P ν + Λ,ν +A Λ,ν +Γ Q Γ Q Λ,ν =0; Λ,ν T=Γ Q f Q f Λ,ν T A B R B Λ,ν B R B P ν Λ,ν +Λ,ν +AΛ,ν +Γ B R R R B P ν Γ B R B Λ,ν =0; Γ Λ,ν t 0 =0 36 then the open-loop robust leader-follower equilibrium strategies with Player acting as the leader are: u = R B Λ,ν x u = R B P ν x 37 define the open-loop robust leader-follower equilibrium solution if the matrixγ i i=, in 33 contains the vectors ν i which satisfy the leader-follower equilibrium condition: J ν,ν ν J ν,ν o ν ν o ν :=arg min ν S N J ν,ν for anyν i S i,m 38 where J i ν,ν :=max hi, u,u i=, 39 with given by 37 parametrized byν andν ν i S i,m through 36 Proof: Let us try to represent Γ ψ as Γ ψ= P ν x, Γ λ = Λ,ν x, and Γ λ = Λ,ν x By 36 and by the commutation of the operators Γ i A =A Γ i andγ i Q i =Q i Γ i the result follows IV NUMERICAL PROCEDURE For the problem of finding the leader-follower equilibrium weights ν,ν ν we propose the use of the next minimizing numerical procedure To make this analytically is not a simple task Assuming that J i ν,ν ν >0for all ν i S i,m i =, if not, one can add to the cost functions any necessary positive constant that, evidently does not change anything, define the series of the vectors iterations { ν i,k} for any fixedn as { ν i,k+ =π S i,m ν i,k γ i,k + F J i ν,k,ν,k} i ν,k,ν,k ν i,0 S i,m, k=,, F i ν,k,ν,k [ h,i = ν,k,ν,k, h M,i ν,k,ν,k] J i ν,k,ν,k = max h,i ν,k,ν,k,n 40 where π S i,m{ } is the projector of an argument to the simplexs i,m and the new functional h,i is defined as: h,i ν,k,ν,k := δ ν i +h,i ν,k,ν,k 4 whether this algorithm converges to a unique point or not we let this discussion for a large version of this work Algorithm The algorithm for finding a solution for the robust leader-follower strategies is summarized as follows: Step Select an initial condition for the leader s weights Γ Step Applying the control action equal to the weighted combination of standard stackelberg strategies, calculate all possible for each scenario dynamics and the corresponding cost functionals Step 3 Using the corresponding cost functionals perform iteratively the minimizing procedure 40 to find the optimal response of the follower Γ for a given fixed strategy of the leader Step 4 Given the optimal response of the follower Γ keep fixed this 3843

6 values and use again 40 to minimize the leader s weights Γ Step 5 Continue form the Step making the initial conditions with values found in 3 until convergence A Solving Coupled equations For the solution of the set of coupled Riccati equations 36, we follow the work of [], which is based on the solution of an auxiliary system For the lack of space we omit the details of this method V NUMERICAL EXAMPLE Consider the next two-scenarios two-players LQ differential game given by ẋ=ax+b [ ] u +B u, A 0 3 A= 0 A ;x 0 = A = ;A = ; B, =B, = ;B 0, =B, = Q, =Q, = I ;Q, =Q, =5 I Q, f =3I ;Q, f =I ;Q, f =3I ;Q, f =I R ii =R ij =I i,j=, The tables below show the convergence to dependence of the cost functionals on the weights ν = ν, ν, ν = ν, ν, ν 0,ν 0 k ν ν h, h, Table Cost Function Leader k ν ν h, h, Table Cost Function Follower As one can see in the tables and the numerical procedure works efficiently finding the robust strategies, in this case the algorithm practically finish after one cycle of performing 5 iterations for the follower and 5 iterations for the leader At the end of the process the cost functionals arrives to the practically same values as we expected from the complementary slackness condition 5 and 8 VI CONCLUSIONS In this paper the formulation of a concept for a type of robust Leader-Follower equilibrium for a Multi-Plant differential game was presented The dynamic of the game for this kind of problems is given by a set of N different possible differential equations Multi-Plant problem with no information about the trajectory which is realized The problem is solved designing of min-max strategies for each player which guarantee an equilibrium for the worst case scenario The suggested approach was based on the Robust Maximum Principle, the conditions for a game to be in Robust Leader- Follower Equilibrium was given As in the Nash equilibrium case the initial min-max differential game is converted into a standard static game given in a multidimensional simplex The realization of the numerical procedure confirms the effectiveness of the suggested approach REFERENCES [] C Chen and J Cruz, Stackelberg solution for a two person games with biased information patterns, IEEE Transactions of Automatic Control, vol 7, no 6, pp , 97 [] M Simman and J Cruz, On the stackerberg strategy in nonzero games, Journal of Optimization Theory and Applications, vol, no 5, 973 [3], Additional aspects of the stackelberg strategy in nonzero-sum games, Journal of Optimization Theory and Applications, vol, no 6, 973 [4] T Basar and G Olsder, Dynamic Noncooperative Game Theory Academic Press, New York, 98 [5] B Tolwinski, A stackelberg solution of dynamic games, IEEE Transactions of Automatic Control, vol 8, pp 85 93, 983 [6] H Von Stackelberg, The theory of the market economy Oxford University Press, 95 [7] V Boltyanski and A Poznyak, Robust maximum principle in minimax control, International Journal of Control, vol 7, no 4, pp , 999 [8], Linear multi-model time optimization Optimal Control Applications and Methods, vol 3, pp 4 6, 00 [9] A Poznyak, T Duncan, B Pasik-Duncan, and V Boltyanski, Robust maximum principle for multi-model LQ-problem, International Journal of Control, vol 75, no 5, pp , 00 [0] H Khalil, Multimodel design of a nash strategy, Journal of Optimization Theory and Applications, vol 3, no 4, pp , 980 [] H Khalil and P Kokotovic, Control strategies for decision makers using different models of the same system, IEEE Transactions on Automatic Control, vol AC-3, no, pp 89 98, 978 [] V Saksena, J J Cruz, W R Perkins, and T Basar, Information induced multimodel solutions in multiple decisionmaker problems, IEEE Transactions on Automatic Control, vol AC-8, no 6, pp 76 78, 983 [3] M Jimenez-Lizarraga and A Poznyak, Robust nash equilibrium in multi-model lq differential games: Analysis and extraproximal numerical procedure, Optimal Control Applications and Methods, vol 8, no, pp 7 4, 007 [4] H Abou-Kandil, G Freiling, V Ionescu, and G Jank, Matrix Riccati Equations in Control and System Theory Birkhauser,

Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems

Proceedings of the 27 American Control Conference Marriott Marquis Hotel at Times Square New York City, USA, July -3, 27 FrC.4 Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain