Multi-Pursuer Single-Evader Differential Games with Limited Observations*

Size: px

Start display at page:

Download "Multi-Pursuer Single-Evader Differential Games with Limited Observations*"

Job Jackson
6 years ago
Views:

1 American Control Conference (ACC) Washington DC USA June 7-9 Multi-Pursuer Single- Differential Games with Limited Observations* Wei Lin Student Member IEEE Zhihua Qu Fellow IEEE and Marwan A. Simaan Life Fellow IEEE Abstract In this paper closed-loop Nash equilibrium strategies for an N-pursuer single-evader differential game over a finite time horizon with limited observations is considered. The game setting is such that each pursuer has limited sensing range and can observe the state vector of another player only if that player is within the pursuer s sensing range. The evader on the other hand has unlimited sensing range which allows it to observe the state of all pursuers at all times and implement a standard closed-loop Nash strategy. To derive strategies for the pursuers a new concept of best achievable performance indices is proposed. These indices are derived in a way to be the closest to the original performance indices and such that the resulting pursuers collective strategy satisfies a Nash equilibrium against the evader s strategy. The strategies obtained by such an approach are independent of the initial state vector. An illustrative example is solved and simulation results corresponding to different sensing ranges and performance indices of the game are presented. I. INTRODUCTION The pursuit-evasion game has been considered for decades as a model for illustrating a variety of applications of differential game theory [ [. Basically it models the process one or more pursuers try to chase one or more evaders while the evaders try to escape. Solving a pursuitevasion game essentially involves developing strategies for the pursuers and evaders such that their prescribed performance indices are optimized. Historically pursuit-evasion games were first investigated by Isaacs [ in the 9s. Saddle point solutions for a type of zero-sum single-pursuer single-evader games were considered in [. Nonzero-sum pursuit-evasion games were introduced and investigated as an example of the Nash equilibrium strategies in [ and as an example of leader-follower Stackelberg strategies in [. Two pursuers and one evader games were studied in [. Recently there has been considerable interest in problems that involve many pursuers chasing one evader [6 [7 [8 [9 [ [. These types of games present some interesting challenges in designing coordinated strategies among all the pursuers to accomplish the goal of capturing the evader. From a practical point of view however it is important to consider games with state information constraints global information is not available to all players. In this paper we consider an N-pursuer single-evader game over a finite time horizon only the evader has global information with the ability *This work is supported in part by a grant (CCF-96) from National Science Foundation. W. Lin Z. Qu and M. A. Simaan are with the Department of EECS University of Central Florida Orlando Florida 86 US- A. s: weilin99@gmail.com qu@eecs.ucf.edu simaan@eecs.ucf.edu. to observe all the pursuers. The pursuers on the other hand have limited sensing capabilities. Each pursuer is only able to observe players that fall within its sensing range during any interval of time in the game. This might necessitate that each pursuer needs to implement a strategy based on the information available to it. Most of the results on pursuitevasion games so far are built upon the assumption that all players must have global or full information about all the players in the game in order to implement their closed-loop Nash strategies. Some of the recent papers on this subject have considered limited information structures either by the pursuers or by the evader [6 [9 [. In this paper we consider a game between a single evader with a high sensing capability outnumbered by several pursuers but with poor sensing capabilities. A practical example of such a situation occurs when a well-equipped drone with a very wide range of sensing capability must evade several weakly-equipped pursuing air vehicles. These types of problems need to be treated using an approach different from what currently exists in the literature. This paper builds on previous results obtained in [ and further explores how the inverse optimality can be used to secure a Nash equilibrium between the pursuers and evader. The remainder of the paper is organized as follows. Problem formulation is presented in Section II. The evader s Nash equilibrium strategy based on a global information structure is derived in Section III. The Nash strategies for the pursuers with limited observations are analyzed and derived in Section IV. An illustrative example and simulation results are shown in Section V. The paper is concluded in Section VI. II. PROBLEM FORMULATION The differential game considered in this paper is between N pursuers and a single evader in the unbounded Euclidean space. We assume that the state of each player is governed by the differential equation ẋ i = Fx i +G i u i () for i = e N vector x i R n is the state variable u i R mi is the control input and matrices F and G i are of proper dimensions. Subscript e stands for the evader and subscripts to N stand for pursuers to N. We assume that all players have the same dynamics (i.e. same F matrix) which allows for an explicit derivation of the dynamics of a vector z = [z T zt N T z i = x i x e whose entries represent the differences (or displacement) between the evader and each of the pursuers state vectors /$. AACC 7

2 Hence the overall dynamics becomes ż = Az +B e u e + N B i u i = Az +B e u e +B p u p () i= u p = [u T u T N T A = I N F I N is N N identity matrix is the Kronecker product B e = N G e N is a N vector with all the elements equal to B j = d j G j d j is an N vector with the jth element equal to and all other elements equal to B p = blkdiag{g G N } and blkdiag stands for block diagonal matrix. We assume that the collective objective of the pursuers is to minimize a quadratic function of the vector z while at the same time minimizing their energy costs. Hence denoting x i M = xt i Mx i we assume that all the pursuers try to minimize a common performance index: J p = z(t) M p + tf ( z Q p + u p R p )dt () M p = blkdiag{m p M pn } Q p = blkdiag{q p Q pn } R p = blkdiag{r p R pn }. Matrices M pj R n n and Q pj R n n are positivesemidefinite and matrix R j R mj mj is positive definite for j = N. Similarly we assume that the objective of the evader is to maximize a quadratic function of the vector z while at the same time keeping its energy costs at a minimum. Hence the evader will try to minimize the performance index: J e = z(t) M e + ( z Q e + u e R e )dt () M e = blkdiag{m e M en } Q e = blkdiag{q e Q en }. Matrices M ej R n n and Q ej R n n are negativesemidefinite for j = N and matrix R e R me me is positive definite. Given system dynamics () and performance indices ()-() a differential nonzero-sum game problem is formed. The Nash equilibrium for this game is defined as follows. Definition : For the N-pursuer single-evader differential game described in () with pursuers minimizing performance index () and evader minimizing performance index () the strategy set {u pu e} is a Nash equilibrium if the inequalities J i (u p u e ) J i(u p u e ) () J e (u pu e) J e (u pu e ) (6) hold for any u p U p and u e U e U p and U e are the admissible strategy sets for the pursuers and evader respectively. III. NASH STRATEGIES FOR THE EVADER From the perspective of the evader since it has a wide enough sensing range to observe all the pursuers it will naturally adopt its closed-loop Nash strategy derived from () () and () without knowing that the pursuers only have limited observations. Therefore according to well known Nash equilibrium result from [ on linear quadratic differential games the evader s strategy is u e = Re Be T P e z (7) P e is obtained by solving the coupled differential Riccati equations P e +Q e +P e A+A T P e P e B p Rp BT p P p P p B p Rp Bp T P e P e B e Re Be T P e =. (8a) P p +Q p +P p A+A T P p P p B p Rp BT p P p P p B e Re Be T P e P e B e Re Be T P p = (8b) with boundary condition P p (T) = M p and P e (T) = M e. IV. NASH STRATEGIES FOR THE PURSUERS Since the evader will implement its Nash strategy in (7) clearly the best strategy for the pursuers is to implement the corresponding Nash strategy u p = Rp BT P p z (9) P p is obtained from the solution of (8). However the pursuers are generally not able to implement (9) because the ith element of u p in (9) meaning the Nash strategy for pursuer i is generally a full state feedback of z and hence requires pursuerito have global information. Therefore with the constraint of limited observations a different strategy for the pursuers must be implemented. This strategy must form a new Nash equilibrium together with the strategy (7) of the evader. To accurately model the sensing capabilities of the pursuers we define a sensing radius of pursuer i as r i. This radius will determine whether the full state vector of another player in the game is observable to that pursuer or not. Consequently we define a time dependent Laplacian matrix [ for the N pursuers as follows: L (t) L N (t) L(t) =..... () L N (t) L NN (t) if Λz i (t) Λz j (t) r i for j i if Λz i (t) Λz j (t) > r i for j i L ij (t) = N L il if j = i l=l i Also define a vector h(t) = [h T (t) ht N (t)t whose ith entryh i (t) represents the ability of the ith pursuer to observe the evader at time t. That is { if Λzi (t) r h i (t) = i () if Λz i (t) > r i 7

3 In () and ()Λz i (t) represents the coordinates of pursuer i relative to the position of the evader in vector z i (t) at time t. Further we assume that if N N i= Λz i ǫ ǫ is a capture radius then the game terminates and the evader is assumed to be captured. We assume that the strategy for pursuer i is chosen in the following form N u i =K i h i z i +K i L ij (z i z j ). () j= The first term in the above expression represents a control component to chase the evader directly if pursuer i observes the evader. The second term in the expression represents a component as a feedback control of the difference between pursuer i and its neighboring pursuers state variables. In general pursueri s control could be chosen in a more general structure than the one in () however in this paper we choose u i in () because of its simplicity and applicability in realistic situations. Control u i can be rewritten as u i =K i [(h i d T i ) I n z +K i [(d T i L) I n z hi d =[K i K i ([ T i d T i L I n )z K i C i z () K i = [K i K i R mi n and [ hi d C i = T i d T i L I n R n nn. () Therefore denoting u p = K s z () K s = [(K C ) T (K N C N ) T T (6) the problem reduces to finding matrices{k K N } such that u p can still form a Nash equilibrium with the evader s strategy u e in (7). However according to Definition given performance indices () and () if the evader sticks to the Nash strategyu e in (7) and the pursuers fail to implement the corresponding Nash strategies in (9) the Nash equilibrium will no longer hold. Since it is generally not possible for the two expression of u p in (9) and () to be the same an intuitive approach is to modify the performance indices of both the evader and the pursuers keeping them as close as possible to the original indices given in () and () but in such a way that the evader strategy of (7) and pursuers strategy of () form a Nash equilibrium with respect to the modified performance indices. To achieve this goal we first find a family of performance indices with respect to which the pursuers strategy u p in () and evader s strategy u e in (7) form a Nash equilibrium. Then we use a new concept of best achievable performance indices to find a pair of performance indices within this family such that they are closest to the original performance indices J p and J e in () and (). Lemma : For the pursuit-evasion game described in ()- () given a set of matrices {K K N } then the strategies u e in (7) and u p in () form a Nash equilibrium with respect to performance indices J ps = z(t) M p + ( z Q ps u T p Sz zt S T u p + u p R p )dt (7) J es = z(t) M e + ( z Q es + u e R e )dt (8) S = B T p P p +R p K s (9) Q ps = Q p p Q es = Q e e p = P p B p Rp p P p Ks T R pk s () e = P e B p Rp S +S T Rp Bp T P e. () K s is as in (6) and P p and P e are the solutions of (8). Proof: Consider Lyapunov functions V p = zt P p z and V e = zt P e z () P p and P e are the solutions to the coupled Riccati equations (8b) (8a). Differentiating V p in () with respect to t and integrating it from to t f yield V p (t f ) V p () = [ z u Qps R p +u T psz +z T S T u p +z T P p B e (u e +Re BT e P ez)+ u p K s z R p dt. Hence J ps =V p ()+ u p K s z R p +z T P p B e (u e +Re BT e P ez)dt () J ps is defined in (7). Similarly we have J es =V e ()+ u e +Re BT e P ez R e +z T P e B i (u p K s z)dt () J es is defined in (8). Since R p and R e are positive definite it is obvious from () and () that u p = K s z and u e = Re Be T P e form a Nash e- quilibrium according to Definition. Clearly if K s = [(K C ) T (K N C N ) T T = Rp Bp T P p then J ps and J es in (7) and (8) become identical to () and (). We propose the concept of the best achievable performance indices as follows. Definition : Given system () and performance indices () and () performance indices (7) and (8) are called the best achievable performance indices if there exists a set of matrices {K K N } such that S p and e are minimized for all t S p and e are defined in (9)-() and is any user defined matrix norm. 7

4 Therefore by finding matrices {K K N } that correspond to the best achievable performance indices we actually find a pair of performance indices for the pursues and evader such that they are the closest to the original ones () and () while admitting strategies (7) and () as their Nash equilibrium. In order to determine {K K N } we need to solve a multi-objective optimization problem of minimizing S p and e simultaneously. One way to accomplish this is to minimize at every t a function that consists of a convex combination of these terms. That is H(t) =α S F +α p F +α e F =α Tr(S T S)+α Tr( p)+α Tr( e) () α i > and i= α i =. In () F denotes the Frobenius norm defined as M F = Tr(MT M). Note that the minimization of H(t) in () with respect to {K K N } is generally quite difficult to solve analytically. Since K s = N j= D jk j C j D i is an (m + +m N ) m i block matrix with the ith block being an m i m i identity matrix and other blocks zero matrices the gradient of H(t) with respect to K i can be obtained as Ki H =Di T R p(α S α K s p +α Rp Bp T P e e )Ci T i = N. (6) One possible approach to perform this minimization is to implement a gradient based algorithm such as steepest descent or conjugate gradient to search for the optimal matrices {K K N }. Note that these matrices will be independent of the initial state vector. By varying the coefficients α α a noninferior set can be generated and an appropriate choice of coefficients can be then made. In the game process we assume for simplicity that the pursuers sensing abilities are updated at discrete instants of time t t t β at which the pursuers perform sensing t = and t β = T. We assume that (t i t i ) is sufficiently small for all i = β such that the observations among the players can be regarded to be constant within such a small time interval (t i t i ). Hence once the information topology changes among the players at any of the time instants t t β the above algorithm will be performed to update the matrices K K N for the remainder of the game. V. EXAMPLE AND SIMULATION RESULTS Let us now consider a three-pursuer single-evader game taking place in a planar environment. Suppose that all players have the same dynamics given by () x i represents two position coordinates on the plane F = G i = I and u i represents the velocity control for i = e N. Then the system dynamics () can be written as A = B p = I I B e = I. (7) The performance indices are given by () and () with T = and M p = Q p = q(i I )R p = I I (8) M e = Q e = (I I )R e = I. q > is a scalar. The initial positions of pursuers and the evader are x = ( ) x = () x = () x e = (). We derive and simulate the results under various scenarios. In all of these scenarios we use α = / and α = α = / in (). We also assume a capture radius ǫ =.7. Scenario : If in addition to the evader all the pursuers also have wide enough sensing ranges (i.e. Λ = I and r i in () and ()) the game is under global information. Solving the coupled differential Riccati equations (8a) and (8b) with q = yields solutions P p and P e in the following form: P p = P p P p P p P p P p P p P p I P e = P e P e P e P e P e P e I P p P p P e P e P e Plots of P p P p P e and P e in this case are shown in Figure. And the pursuers and evader s strategies can be... P p P p P e P e time Fig.. Plots of P p P p P e and P e obtained from (9) and (7). In this case the motion trajectories of the players and the distances between the pursuers and evader over time are shown in Figure. With a capture radius ǫ =.7 (indicated as a black line in Figure. ) the evader is captured around t =.. Scenario : Assume that the pursuers sensing ranges are now reduced to r i = for i =. As shown in Figure at t = pursuer can only observe the evader pursuer can observe the evader and pursers and pursuer can only observe pursuer. For q = the motion trajectories and the distances between the pursuers and evader over time in Figure. Clearly in this scenario the pursuers are not able to capture the evader (with the same capture radiusǫ =.7) when the final time T = is reached. The change in the topology of the game is reflected in the changes of Laplacian matrixl(t) and vectorh(t) in () and (). In this scenario L(t) has changed three times as follows: [ [ [ t.7.7 < t.6.6 < t 7

5 ..8.6 Pursuer Pursuer Vertical axis. Pursuer Pursuer vertical axis Horizontal axis horizontal axis. Pursuer Pursuer Pursuer Pursuer Fig.. Motion trajectories and distances between the pursuers and evader under global information. distance to the evader is plotted in red dashed curve. Fig.. (-) () Vertical Axis () () Pursuer Pursuer Horizontal Axis Initial positions of three pursuers and single evader and the vectorhremained unchanged during the entire game i.e. h (t) = h (t) = and h (t) =. This essentially means while pursuers was never able to observe pursuer for the entire game it is able to observe pursuer for t [.6 and loses that observation after t =.7 Scenario : Using the same setting as scenario expect with q = (instead of q = ) motion trajectories and the distances between the pursuers and evader are shown in Figure. Clearly in this scenario (with ǫ =.7) the evader is captured around t =.. During the entire game the Laplacian matrix has also changed three times as follows [ [ [ t.. < t < t Fig.. Motion trajectories and distances between the pursuers and evader for scenario. and vector h has changed as follows: h (t) = h (t) = for t h (t) = for t. and h (t) = for. < t. In other words after t =. all the pursuers are able to observe the evader and after t = all the pursuers are also able to observe each other. Note that although scenarios and have the same performance indices the pursuers are not be able to catch the evader in scenario because of their limited observations. However as q increased from to between scenarios and the pursuers are able to catch the evader in scenario because of the higher penalty on the distances in the pursuers performance index. Remark: It would be interesting to determine a critical value q c of q which separates the escape and capture regions of the evader. That is if q < q c the evader escapes and if q q c the evader is captured at a time t [T. For this game the critical value of q has been determined to be q c =.6. Figure 6 shows plots of the distances between the pursuers and evader when q = q c =.6. Clearly the capture time in this case occurs at t =. Figure 7 shows a plot of the capture time versus q. Clearly as q increases from the critical value the capture time will decrease. The region < q < q c =.6 is the evader escapes i.e. the capture time will be. 7

6 vertical axis Pursuer Pursuer horizontal axis..... Pursuer Pursuer Fig.. Motion trajectories and distances between the pursuers and evader for scenario Pursuer Pursuer Fig. 6. s between the pursuers and evader when q = q c =.6 Capture 8 6 escapes q c =.6 is captured q VI. CONCLUSION In this paper closed-loop Nash strategies for a linear N-pursuer single-evader differential game with quadratic performance indices over a finite time horizon is considered. The evader is assumed to have unlimited observations while each pursuer has limited observations based on its own sensing radius. Since the evader is able to observe the states of all pursuers at all times it implements a standard closed-loop Nash strategy using the well-known couple Riccati differential equations approach. Under the limited observations setting however each pursuer is able to observe the states of only those players who are within its sensing radius. Hence the pursuers will seek to implement a collective strategy that can form a Nash equilibrium with the evader s strategy. To accomplish this the pursuers derive best achievable performance indices in such a way to be as close as possible to the original performance indices and such that the resulting pursuers Nash strategy based on the available observations and the original evader s strategy form a Nash equilibrium. An illustrative example involving three pursuers and one evader is solved and simulation results corresponding to different scenarios are presented. REFERENCES [ R. Isaacs Differential Games. John Wiley and Sons 96. [ Y. Ho A. B. Jr. and S. Baron Differential games and optimal pursuitevasion strategies IEEE Trans. Automatic Control vol. AC- no. pp [ A. Starr and Y. Ho Nonzero-sum differential games Journal of Optimization Theory and Applications vol. pp [ M. Simaan and J. Cruz On the Stackelberg solution in the nonzerosum games Journal of Optimization Theory and Applications vol. 8 pp. 97. [ M. Foley and W. Schmitendorf A class of differential games with two pursuers versus one evader IEEE Trans. Automatic Control vol. 9 pp [6 S. Bopardikar F.Bullo and J. Hespanha On discrete-time pursuitevasion games with sensing limitations IEEE Trans. Robotics vol. no. 6 pp [7 S. Bhattacharya and T. Baar Differential game-theoretic approach to a spatial jamming problem in Advances in Dynamic Games ser. Annals of the International Society of Dynamic Games P. Cardaliaguet and R. Cressman Eds. Birkhuser Boston vol. pp. 68. [8 D. Li and J. Cruz Defending an asset: A linear quadratic game approach IEEE Trans. Aerospace and Electronic Systems vol. 7 no. pp. 6 april. [9 H. Huang W. Zhang J. Ding D. Stipanovic and C. Tomlin Guaranteed decentralized pursuit-evasion in the plane with multiple pursuers in th IEEE CDC pp [ J. Guo G. Yan and Z. Lin Local control strategy for moving-targetenclosing under dynamically changing network topology Systems & Control Letters vol. 9 pp [ M. Wei G. Chen J. B. Cruz L. S. Haynes K. Pham and E. Blasch Multi-pursuer multi-evader pursuit-evasion games with jamming confrontation AIAA J. of Aerospace Computing Information and Comm vol. no. pp [ W. Lin Z. Qu and M. Simaan A design of distributed nonzero-sum nash strategies in 9th IEEE CDC Atlanta USA pp [ R. Olfati-Saber and R. Murray Consensus problems in networks of agents with switching topology and time-delays IEEE Trans. Automatic Control vol. 9 no. 9 pp.. Fig. 7. Capture versus q 76

Distributed Game Strategy Design with Application to Multi-Agent Formation Control

5rd IEEE Conference on Decision and Control December 5-7, 4. Los Angeles, California, USA Distributed Game Strategy Design with Application to Multi-Agent Formation Control Wei Lin, Member, IEEE, Zhihua