}w!"#$%&'()+,-./012345<ya

Size: px

Start display at page:

Download "}w!"#$%&'()+,-./012345<ya"

Anthony Phillips
5 years ago
Views:

1 MASARYK UNIVERSITY FACULTY OF INFORMATICS }w!"#$%&'()+,-./012345<ya Patrolling Games on Graphs DIPLOMA THESIS Michal Abaffy Brno, 2013

2 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Michal Abaffy Advisor: RNDr. Tomáš Brázdil, Ph.D. ii

3 Acknowledgement First, I would like to thank my parents for supporting me in my studies. Then to Jan Krčál and Vojtěch Řehák for introducing me to the area of patrolling games and consulting possible research topics in this area. Then I would like to thank my advisor Tomáš Brázdil for consulting research goals and possible methods of achieving them and for commenting the progress in my thesis. And also to Branislav Bošanský (Czech Technical University in Prague) for giving me an overview of the basic literature in the area of patrolling games. iii

4 Abstract Patrolling games are games played by two players, defender and attacker. The aim of the defender is to protect a given set of targets by making randomized routes among them. Usually, this defender s movement is done on a directed graph. Attacker tries to attack some of the targets. To the best of our knowledge, only patrolling games with finite-memory strategies of players have been solved. We deepen the knowledge about patrolling games by considering infinite memory strategies. We provide an algorithm which computes an ɛ-optimal defender s strategy in patrolling games where an optimal defender s strategy exist. This strategy is ɛ-optimal among all strategies and it uses only finite memory. We conjecture that this algorithm is correct for all patrolling games. iv

5 Keywords game theory, patrolling games, security games, probability, algorithms, stochastic games, simple stochastic games v

6 Contents 1 Introduction to the thesis Introduction to the patrolling games Game theory, security games and theirs applications Modeling patrolling games Major patrolling games literature Perimeter Arbitrary graph Euclidean environment Moving targets Alarms Main result Basic definitions and example Main result Discretization of transition probabilities Restricting attacker Values of an optimal defender s strategy in histories Discretization of probabilities of transitions Alternative ending of the main proof Other results Symmetries in optimal defender s strategies Abstract discussion on approaches to solve patrolling games Main result in more general models Optimal strategies for cliques Conjectures Conclusions and future work

7 1 Introduction to the thesis Security games are a special type of games that are concerned with security. They are played between two players. Defender somehow tries to protect the set of targets, attacker tries to attack some target. Foundation of security games is in the game theory. Security games are a topic of increasing importance in the last years with many new publications. There are many real-world applications of this type of games, which are mentioned at the end of Section 2.1. Patrolling games are a special type of security games. Defender tries to protect the set of targets by moving among them. First works on patrolling that are often cited are [1] and [5]. There are many different models of patrolling games, they are discussed in Sections 2.2 and 2.3. In the main part of the thesis we consider model of patrolling games called Patrolling Game with Infinite Memory (PGIM). Game is played on a directed graph. Some nodes of the graph are target nodes, some are not. The game evolves in the following manner. Defender starts his movements in one of the graph s nodes. He moves along edges of the graph and visits nodes. Moving along any edge takes 1 time step. Defender may do randomized movements. Attacker waits and in any time, he may choose to start an attack on any target node n A. Attack takes d time steps. Attacker wins if the defender does not visit n A during the attack. Defender wins otherwise. The goal of the defender is to maximize the probability that he wins the game. The main difference between PGIM and already published models is that PGIM works with infinite-memory strategies, while published models did consider only finite defender s strategies. In PGIM, defender can make his next move based on all the previous moves he made, while in published models, he could make his next move based only on a fixed finite number of possible situations. This makes PGIM much harder to work with as we cannot take advantage of the finite number of pure strategies as all other models do. We are not aware of any known result which would make finding optimal (or even ɛ-optimal) solution in PGIM possible in finite time. A simple example of a patrolling game in PGIM is the following situation. Policeman moves in a city trying to protect banks, while a group of criminals wants to rob one of the banks. The main result of this thesis is an algorithm, which computes an ɛ- optimal defender s strategy for any game of PGIM that has an optimal strategy. When obtaining this result, we have found some new concepts in the area of patrolling games that might be useful for other researchers. Our 2

8 other results include some interesting statements obtained in the process of proving the main result, discussion of symmetries in optimal defender s strategies, discussing optimal strategies on complete graphs and we question the problem of extending the main result into more general models. The outline of the thesis is the following. In Chapter 2, we give a quick introduction to the game theory, security games and patrolling games. The PGIM is informally described here, its justness is discussed here and its comparison with other models is mentioned here. The overview of the most important works on patrolling is done in this chapter. The main part of this thesis is in Chapter 3. The formal definition of PGIM is done here and the main result is formulated in this chapter. An sketch of its proof is done here and an easy part of the proof is done here. The hard part of the proof is done in Chapter 4. Other results are mentioned in Chapter 5. At the end of Chapter 5, also some conjectures that we believe to be true are stated. The last chapter summarizes the results obtained in this thesis and sets possible areas for future work which might extend results from this thesis. 3

9 2 Introduction to the patrolling games The aim of this chapter is to introduce patrolling games. We start by introducing basics of game theory, then we introduce a general class of security games. We show some real-world applications of security games. We continue by introducing models of patrolling games, which are a special type of security games. First, we informally describe PGIM, which will be used in the later parts of the thesis, and then continue by discussing other models. At the end of the chapter, we go through papers which we consider the most important in the area of patrolling games. This chapter is mostly based on [9]. 2.1 Game theory, security games and theirs applications There will be many concepts from the game theory used in this thesis, so let us start with simple introduction to the game theory, which we do based on [19] and [30]. Game theory models real-world situations as games, which are well-defined mathematical objects. These games are played between players P 1,..., P n 1. Each player can influence the outcome of the game by playing one strategy from a set of his strategies. Call these sets of strategies S 1,..., S n. A situation is a convex combination of elements from S 1 S n. Each player P i has also his utility function U i : S 1 S n R, which assigns every situation its utility. The higher the utility of a situation in U i, the better the situation is for player P i. Game, where strategies are given in the above form, is called game given in its normal form. However, sometimes it is hard to specify the sets of strategies. An extensive form of games is used to formalize games with a time sequencing of moves. Games in the extensive form can be seen as played on a (possibly infinite) game tree where each node of the tree represents a point of choice of some player. See [19] for an example. If the strategy sets are all finite, it is easy to find unique normal form from an extensive form. See [25] for an easy explanation how to do that. Patrolling games are usually described by their extensive form. Tree represents histories, points of choice for defender represent next defender s movement and points of choice for attacker represent whether to attack target node or do not attack. Important is the concept of pure and mixed strategies. Pure strategies 1. There may be also games with infinite number of players, but we do not consider them in this thesis. 4

10 in extensive-form games are such strategies which choose next node of the game tree with probability 1. In normal-form games, pure strategy for player P i is an element from S i. Mixed strategies in both forms of games are convex combinations of pure strategies. The solution of a game is typically defined to be a fixed-point situation, where no player can get higher expected utility by changing only his strategy. The best known solution concept in game theory is Nash Equilibrium (NE) [27], which is a situation where no player can get higher expected utility than he currently gets by changing just his own strategy. If the sets of pure strategies of all players are finite, then there exists at least one NE in such a game [27]. Such NE can be also computed. [17] shows that this problem is PPAD-complete. However, the optimal NE (for any reasonable definition of optimal) is NP-hard to find (or even to approximate), even in two-player games [24] and references therein [20], [14]. There are other solution concepts and the one of special interest in the area of security games is the Strong Stackelberg equilibrium (SSE also termed leader-follower equilibrium) for two-player games. One player is a leader and he commits himself to a strategy. The second player is a follower and he chooses his strategy when he knows the leader s strategy. SSE is an ordered pair of a leader s strategy σ which ensures leader the maximum utility with the worst case opponent and a follower s strategy π which gives the follower the maximum expected utility when the leader plays σ. A leader-follower strategy is a leader s strategy which leads to an SSE and this leader-follower strategy can be found for normal-form games in polynomial time [15]. For more results on founding SSE in security games, see [24]. Important type of games are zero-sum games, where the following condition holds. Let s = (s 1,, s n ) and s = (s 1,, s n) be any two situations. Then n n U i (s) = U i (s ). i=1 Games of PGIM are zero-sum games with 2 players. Zero-sum property in two-player games may be interpreted as: the better for one player the worse for the second player. Security games are models of game theory which model real-world scenarios concerned with security. Patrolling games are specific type of security games. Security games have 2 players. The defender allocates his limited resources to protect the targets and the attacker wants to attack some of the targets. i=1 5

11 Allocating resources is quite abstract formulation, so let us give an example of this. In patrolling games on graphs, allocating resources is time dependent, defender moves his patrolling unit among nodes of the graph. Another example is the following trivial security game, where there are k policemen and n, n > k banks. Policemen do not move and each policeman can be assigned to any bank. Attacker does not know which banks have assigned policeman and attack is successful if attacker chooses to rob a bank which does not have any policeman assigned. Some models of security games are games given by their normal form. For example, Resource Allocations Games [23] or Security Games on Graphs [22]. Models of patrolling games are given by their extensive form. Given utility functions for both players and assuming each player has a finite amount of pure strategies, good strategies of both players can be computed from NE, see the above paragraphs. However, it is assumed that attacker knows defender s strategy and chooses his strategy as the best response. Therefore, there is a possibility to use SSE as a solution concept for security games. In [34], it is shown that under a natural restriction on security games, any SSE is also NE and that committing to SSE strategies is a good idea for the defender. First deployment of security games into real life was the ARMOR system [28]. It was used in Los Angeles International Airport, where optimal checkpoints placement and canine schedules were computed. Other system that was used in security domain is IRIS [33]. IRIS was developed to reasonably assign Federal Air Marshals to international flights, but could be used to schedule patrols in other transportation networks as well. GUARDS [29] is more general system compared to ARMOR and IRIS. It reasons about different security activities and about diverse potential threads. It is used to protect over 400 airports in the USA by intelligently deploying limited security resources. PROTECT [32] is a system that helps United States Coast Guard in the port of Boston for scheduling their patrols. PROTECT, in contrary to systems previously mentioned, does not assume perfect adversary and it relies on quantal response [26] model of the adversary behavior. TRUSTS [35] is a security application that does not work with counterterrorism, but brings security games applications to a much broader setting. The situation is the following. Passengers are legally required to buy tickets before boarding, but there are no gates or turnstiles. The Los Angeles Sheriffs Department deploys uniformed patrols on board trains and at stations for fare-checking. Some mechanism for choosing times and locations for inspections is needed and TRUSTS is a possible solution. 6

12 2.2 Modeling patrolling games The aim of this section is to informally introduce PGIM, to discuss other possible models, to give the reader a motivation for studying the area of patrolling games and to give the reader some basic insight into the area of patrolling games. Let us remind the police-bank situation to give an example and motivation on studying patrolling games. Police department has limited number k of policemen with mobile vehicles and n, n > k banks that should be protected by police in a case of a robbery. We assume banks cannot contact world. The goal of the police is to schedule policemen movements in such a way that the possibility of an successful attack is minimized. In a case there is only one policeman (k = 1), this situation could be easily modeled by PGIM. There exist models of patrolling games, where also k 1 is possible. We will use this police-bank situation as an example when explaining patrolling games. A PGIM (or patrolling game) is a game of 2 players. These players are called defender (or patroller) and attacker (or intruder). Defender s goal is to maximize the probability of protecting a set Q of target nodes (e.g. banks) by moving his patrolling unit (e.g. policeman) on a directed graph G = (N, E), where Q N. Defender can move through edges of the graph, each edge takes 1 time step. He does this movement in a form of randomized routes. Defender moves his unit all the time. Attacker s goal is maximize the probability of an successful attack on some of the target nodes. By successful attack we mean that the attacked target node will not be visited by the defender s unit when the attacker attacks the node (e.g., in the time of a robbery). An successful attack takes d time steps. Attacker s strategy is to wait for the right moment to attack, then choose one target node and attack it. Once attacker starts the attack, he is either caught by the defender (defender s unit visits attacked node before the attack is successfully finished) or attacker successfully finishes the attack. Attacker cannot escape when he is in the middle of an attack and sees that he is going to be caught. Attacker can start his attack only once. Formal definition of PGIM is in Section 3.1. In the next paragraphs, we deepen the understanding of PGIM and patrolling games in general, explain why we choose this model and also discuss other possible models of patrolling games. In PGIM, it is considered that the attacker knows defender s strategy. This is a reasonable assumption due to the following reasons. 7

13 Defender moves all the time, while attacker can wait. Therefore, attacker can learn defender s strategy by observing his moves long enough. 2 Attacker can learn defender s strategy from a spy (e.g., a policeman who sends information to the criminals). Defender s goal may be to guarantee as low risk as possible, in which case he assumes the worst possible attacker s response. Since the attacker knows the defender s strategy, he can choose his strategy as the best response to the defender s strategy. Because of this, the attacker cannot benefit from randomization. Only pure attacker s strategies are considered. In the following paragraphs, let us reason why we have chosen PGIM to work with. Basically, we tried to choose the simplest reasonable model with an infinite memory of strategies. We are aware that many patrolling problems can be formulated as protecting targets in an environment with no graph structure given. For example, when patrolling on seas or in the air. However, graphs serve as a good discretization of the continuous space and since we cannot solve patrolling problems with continuous space, we start by solving patrolling on graphs. Choosing a good graph structure for given targets was discussed in [10], see Section for more on this issue. We choose one patrolling unit in PGIM to make it simple. Some models of patrolling games consider several patrolling units [1] and we know that some real-world situations that might be modeled as patrolling games require several patrolling units in the model. Some real-world scenarios that seem like patrolling games might consider more than one attack. However only one attack is a reasonable assumption for many real-world situations that might be modeled by patrolling games and most literature on patrolling work with the assumption that the attacker can start only one attack. Published models work mostly with discrete time, one exception known to us is [18], where the attacker s strategies may work with continuous time. We want to extend existing models by infinite memory of strategies, so we assume the discrete time as other researchers do. We assume all edges to be of the same length to make the model simple. Then we discuss the generalization, see Section 5.3, where edges may 2. This is not completely true in PGIM, but it is in stationary models, where defender decides where to go next based only on the node he is currently in. In other publications on patrolling, stationary models are almost always assumed. 8

14 have different lengths. The main result of our thesis holds also for this more complicated model. We assume that the defender can make his move based on all the moves he already did. This is in contrast with all the literature on patrolling we are aware of, where the defender can make his move based only on the node he is currently in or based on h N last nodes he was in (maybe even on some internal state), see Section for more. We will denote the strategies, where the defender can choose his next move based only on the node he is currently in, by stationary strategies. This makes PGIM much more difficult to work with than known models since the set of pure strategies is not finite as in the known models. We do not know any method for finding a leader-follower strategy in extensive-form infinite-horizon games, which PGIM is. Also authors of [5] wrote they had not known any such method. Our approach in (partially) solving this makes much use of properties that are special for patrolling games. We assume that the targets are fixed in time (they do not move). However, there are some published models, where this is not the case, see Section We assume that all target nodes take attacker d time steps to attack successfully. However, there are real-world scenarios, where different targets take different time to attack successfully. Some known models, for example [5], consider different lengths of attack for different target nodes. We can also imagine that some real-world scenario might be modeled with attack length to be a random variable rather than a constant. Other situation might need to model limited players knowledge of the attack length. However, we are not aware of published models that would consider these abstractions. Another possible extension is to consider that the defender may want to protect one target node more than other target node. The same may hold for the attacker. Elegant way how to describe how good a strategy is, is by the concept of utility from the game theory mentioned earlier in this thesis. In the next paragraph, we say something about utility function in general. In paragraph after that, we apply it to the area of patrolling games. Utility function is considered to be closed under convex combinations. The utility of an event, which is a probabilistic mixture of elementary events, is the convex combination of utilities of the elementary events. Formally, on an example with 2 elementary events A and B and the defender s utility function U d : U d (pa, (1 p)b) = pu d (A) + (1 p)u d (B) 9

15 The elementary events in patrolling games may be, for example, the following ones: successful attack of target node n i in time t; catching the attacker by the defender in time t; not attacking. In PGIM, we consider only 2 elementary events: A = successful attack; B = either no attack, or caught attacker. For more about possible elementary events in patrolling games, see Section The property of being closed under convex combinations has the following nice consequence. It is enough to define the utility of elementary events and the utility of all possible events is defined. PGIM assumes that the attacker can be caught by the defender only when they are both on the same node. In [5], it is assumed that the defender may catch the attacker who is attacking target node n i from node n j, where n i n j. In the simple Example we illustrate how a concrete patrolling game may look like and we also show on this example that stationary strategies may not be sufficient. 2.3 Major patrolling games literature This section is in the thesis to give the reader an overview on how the area of patrolling games have been developing over the time and to deepen the reader s understanding of patrolling games Perimeter One of the first works on patrolling was [1]. The authors introduced the topic of patrolling and explained why randomized strategies of the defender are needed. Patrolling was done on a circle graph structure, where distances between any two neighbor nodes were the same. All nodes of the graph are considered to be target nodes and utilities for both players of an successful attack or an successful protection are the same for all nodes. The attacker is considered to know the defender s strategy and the current position of patrolling units. Authors of [1] work with a perimeter (circle) and homogeneous patrolling units. They prove that there exists an optimal (among stationary defender s strategies) defender s strategy where patrolling units move in a synchronized and identical way, which reduces the game to a much simple game with single patrolling unit and smaller graph structure. Three types of defender s movements are considered in [1]. Bidirectional Movement Patrol (BMP): patrolling units (called robots 10

16 in the article) have no movement directionality. Patrolling units go with probability p to the right and with probability 1 p to the left. Directional Zero-Cost Patrol (DZCP): patrolling units go with probability p the same direction as in the previous move and they go the other direction with probability 1 p. Directional Costly-Turn Patrol (DCP): patrolling units go with probability p the same direction as in the previous move and they stay in the same nodes with probability 1 p. Authors found an algorithm which computes an optimal strategy among strategies with BMP. That algorithm can be used also to find an optimal strategy among strategies with DZCP or DCP. Other authors works with patrolling include [2], [3], [4] Arbitrary graph Patrolling on arbitrary graphs was first considered in [5]. PGIM is very similar to their model of patrolling games. Things in this paragraph have their model and PGIM in common. Patrolling is done on a directed graph with edges of length 1, where only some nodes of the graph are target nodes. Let N be the set of all nodes and Q be the set of target nodes. When modeling, it often holds that Q N. Authors considered strong attacker who knows the defender s strategy and his current position. Authors considered situation with 1 patrolling unit. Every target node n i takes time d i of steps to attack successfully. Both players know these constants d i for all target nodes. 3 Authors also considered that the defender may catch the attacker attacking node n j when defender is in n i, i j. 4 The games in [5] are modeled to be general-sum games with the following possible outcomes of the game: Intruder-capture: attacker tries to attack some node and is caught by the defender. Penetration-i: attacker successfully attacks node i. No-attack: attacker does not start an attack. 3. PGIM is simpler in this. d, i : d i = d. 4. PGIM is simpler in this. We consider that the attacker can be caught by the defender only when they are both in the same node. 11

17 When defining a game, utilities of both players for these outcomes must be said and are known to both players. 5 The defender can make his next move based only on the previous H moves, where H N 0 is known to both players before the game starts. 6 Authors formulate a bilinear mathematical program such that its solution is an optimal defender s strategy among defender s strategies working with H = 1. They suggest to use an optimalization software to find the solution. Situations where H > 1 could be obtained by extending the case with H = 1. Authors discussed selection of H. Let H 1 > H 2. They argue that 1) it takes longer time to compute an optimal strategy among the strategies that use H 1 than among strategies that use H 2 ; 2) there may be a strategy that uses H 1 which gives higher expected utility than any strategy that uses only H 2 ; 3) one should consider selection of H such that it gives good expected utility and computing an optimal strategy in this set of strategies is not too computationally expensive. They show that for complete graphs, there is no memory needed. Authors show an example, where using memory H = 1 is not enough to compute an optimal strategy among all strategies. We have found a simpler example of this, see Example They formulated a problem of finding X, such that there is strategy σ that uses only X last moves and gives such expected utility that there is no strategy σ which uses Y N last moves and gives higher expected utility then σ. They found a lower bound on X. See our Conjecture for this problem. Authors extended their work [5] by finding abstraction in graph structure [6], which makes the computation faster and enables solving larger game instances in reasonable time. Other works of the authors are [7] and [8], where authors extended the framework to multiple heterogeneous resources of the defender and discuss the question of possible limitations in attacker s knowledge. Authors of [10] introduced general concept of internal states of the patroller. They suggest to work with a set of states S in which the patroller can be during the game. States can represent observable characteristic (as 5. PGIM is simpler in this. Only two outcomes are considered: successful attack and the other outcome. Utilities in PGIM of successful attack are 0 for both players. Utility of the other outcome is 1 for the defender and 1 for the attacker. This makes PGIM zero-sum game. 6. PGIM is more complicated in this. We allow defender to make his next move based on all of his previous moves. 12

18 it was in [1] with directionality of the movement) or internal beliefs of the patroller. Authors allowed defender s transitions on ordered pairs (n, s), where n is node of the graph and s is a state. They suggested a model where defender s movements are functions of the current node and the current state. In such model of patrolling games, they find optimal defender s strategies as solutions of a mixed-integer non-linear program. In their experimental evaluation, the solution of that mixed-integer non-linear program was only locally optimal. Consider the situation on a circle with DZCP. Let A be a node on the left from B and let patrolling unit be in B. Internal states are L, R representing that patrolling unit is headed to the left or to the right. Let X {L, R}. Then there are possible movements (B, X) (A, L), but (B, X) (A, R) are not possible Euclidean environment The nature of patrolling problems is usually that there are targets that need to be protected. There may be no graph structure given from the nature of the problem, consider, for example, patrolling on seas or in the air. Patrolling games in Euclidean environment (often in some compact subset of R 2 with Euclidean metric) differ from the patrolling games on graphs only in the structure on which the patroller can move. However, discretization of space and time can be done in the patrolling games in Euclidean environment and this discretization leads to the patrolling games on graphs. Authors of [10] made some work on this problem. One possible discretization of the continuous problem in Euclidean environment is to choose N = Q and the graph to be a circle. There is still freedom in choosing neighbors of circle. Another possibility is to set N = Q and the graph to be a complete graph (with weighted edges). Or there may be even added some extra nodes so that N Q and defender s strategies may take advantage of the possibility of randomization in those extra nodes from N Q. On a particular environment with 4 targets, they compared 5 different models. One was a perimeter with DZCP (see Section 2.3.1). This model gave the worst value. Other 4 models were based on model from [5]. On this particular instance of a problem, adding more states (from 1 to 2) added more utility than adding 3 extra nodes. However, this result may not hold for other environments with targets. The problem of finding a good model (e.g., choosing a graph and a set of defender s strategies) for given targets in Euclidean environment is still open. 13

19 2.3.4 Moving targets Authors of [11] give the following examples as a reason to study patrolling games with moving targets (targets that can change position in time): protection of vessels transiting waters with high pirate activity in the maritime domain and unmanned aerial vehicle-based surveillance protecting moving ground targets. Their model of patrolling games is played on graphs and is defined similarly as in [5] (see Section 2.3.2). They extended the model from [5] by considering that position of targets on graph may change in time. They considered finite set of turns T. Game is repeatedly played in T turns. In each turn t T, every target may arbitrarily change its position in the graph. Formally, they work with arbitrary function f : Q T N. The movement schedule (the function f) is a fixed property of the environment and is known to both players. Defender chooses his next move based on the node he is currently in and on t T. Optimal defender s strategy is found as a solution of a non-linear mathematical program. Authors compared these time-dependent strategies with stationary strategies where the defender makes the next move based only on the node he is currently in. They did this comparison on concrete graphs (grid, grid with holes) with targets changing their positions. Results were that found optimal defender s strategy among time-dependent strategies gave much higher utility than strategy found among stationary strategies and founding an optimal strategy among time-dependent took approximately times longer than founding an optimal defender s strategy among stationary strategies. They suggested that it makes time-dependent strategies reasonable to consider. Authors of [18] also worked with the moving targets. They defined game called MRMT (multiple Mobile Resources protecting Moving Targets). Their major motivation was the problem of protecting ferries (targets) carrying passengers in waterside cities which could be attacked by small boats with explosives. Patrol boats (patrolling units) could provide protection to such ferries. They provided abstract problem definition. L N targets are moving on a line between two points A, B. Game is repeated every day, schedule of each target is known to defender as well as attacker. Defender has W, W < L patrolling boats, which could move faster than targets. Attacker chooses certain time and target to attack. They defined such utilities that the game is zero-sum. The set of attacker strategies is considered to be continuous. 14

20 The solution is based on an efficient linear program called CASS (Solver for Continuous Attacker Strategies). In the time paper was published, CASS was being considered for deployment by the US Coast Guard Alarms Models of patrolling games so far described in this thesis have considered that the attacker s presence can be detected only by the patrolling units. However, realistic security settings often allow alarms such as motion detectors. Authors of [13] make model of patrolling games with alarms and call games of this model AP-ALARMS. Their model is based on the model of [5], but there are alarms added. Formally, the set of alarms A is subset of target nodes. Alarms are not perfect. They may be activated also when no attack has started and may not be activated when attack has started. Each alarm is characterized by false negative (f n ) and false positive (f p ) rates {δ a f n, δ a f p } a A. Alarms can only be deactivated if the patroller enters the corresponding target. Authors model AP-ALARMS as partially observable stochastic games and formulate non-linear mathematical program to compute an optimal defender s strategy. 15

21 3 Main result 3.1 Basic definitions and example In this section we give basic definitions so that the main result of our thesis can be formulated. We also give a simple example of a game formulated as PGIM and show on this example that stationary strategies may not be good enough. A directed graph G is an ordered pair G = (N, E), where N is a finite set of nodes and E N N is a set of edges. Definition A PGIM (called also patrolling game) is a tuple G = (N, E, Q, n S, d) where (N, E) is a directed graph, Q N is the set of target nodes, n S Q is the initial node and d N is attack length. Note that in the previous definition, initial node is a target node, not just any node. See Example why not all nodes from N might be initial nodes in PGIM. Definition Let k N 0. Then a history is a sequence of nodes h = n 0 n 1 n k where i, 0 i < k : (n i, n i+1 ) E or h = ɛ is an empty history. The set of all histories is denoted H ɛ and H := H ɛ {ɛ}. A run is an infinite sequence of nodes r = n 0 n 1 where i N 0 : (n i, n i+1 ) E. The set of all runs is denoted by R. Note that neither histories, nor runs have to begin with n S. Definition Let h H, h = n 0 n 1 n k. Then h = k be the length of the history h. Definition A strategy of the defender (or defender s strategy) is a function σ that assigns to each history h H, where h = h n k, h H ɛ a probability distribution over the set of nodes n N, such that (n k, n) E. This probability of the transition from h to hn is denoted by P σ (hn h). The set of all defender s strategies is denoted by Σ. Remark We do not need to know how defender plays in histories h which have zero probability of happening (to understand what is the probability of happening of history h, see Definition and the following Notation). It may happen that we will talk about defender s strategies and not saying how defender plays in strategies that cannot happen. 16

22 Definition Let h H. We say that h H is a prefix of h if there exists h H such that h = h h. Let r R. We say that h H is a prefix of r if there exists r R such that r = hr. Note that h is not prefix of h. Let X be a set. F 2 X is σ-field if X F and F is closed under complement and countable union. A measurable space is a pair (X, F ), where X is a set called sample space and F is a σ-field over X. A probability measure over a measurable space (X, F ) is a function P : F R 0 such that for each countable collection {X i } i I of pairwise disjoint elements of F, P ( i I X i ) = i I P (X i), and moreover P (X) = 1. A probability space is a triple (X, F, P ) where (X, F ) is a measurable space and P is a probability measure over (X, F ). We will use that σ-field F over a set X is closed also under finite union. Proof. Since X F, F, because F is closed under complement. Then we do countable union A 1 A n instead of A 1 A n that we want to do. Definition Let σ Σ, h H. Then R h be defined as follows: R h = {r R r = hr, r R}. Let R H be the σ-field over R generated by {R h } h H. Let P σ be the unique probability measure over (R, R H ), which satisfies: n N {n S } : P σ (R n ) = 0 P σ (R ns ) = 1 h H, n N : P σ (R hn ) = P σ (R h )P σ (hn h) Note that from the above 3 conditions, only one of the first two is necessary as they imply each other. In the rest of this chapter, we work mainly with the probability spaces (R, R H, P σ ). Notation. σ Σ, h H : P σ (h) := P σ (R h ). Definition Let σ Σ. Then we denote the set of histories that can happen when defender plays σ as H(σ). Formally, H(σ) := {h H P σ (h) > 0}. 17

23 Definition A strategy of the attacker (or attacker s strategy) is a function π : H Q { } that assigns to each history h = h n k, h H ɛ the target q at which the attacker starts an attack in h or if the attacker does not start an attack in h. Moreover, h H, π(h) Q : π(h ) = for all h that are prefixes of h. Let Π denote the set of all attacker s strategies. Note that in PGIM, the attacker chooses his strategy when he knows not only defender s moves (history), but also defender s strategy, therefore we can intuitively see attacker s strategies also as functions π : H Σ Q { }. Definition Let σ Σ, h H, h H(σ). Then we denote the conditional probability that h happens if h happens as P σ (h h ). Formally, P σ (h h ) := P σ (R h R h ). Definition Let π Π. We define the sets of good (for the defender, bad for the attacker) runs with respect to π. R π GOOD 1 is the set of runs, on which the attacker does not start an attack and R π,σ GOOD 2 is the set of runs, on which the attacker starts the attack, but is caught by the defender. They are defined in the following way: R π GOOD 1 := {r R h that are prefixes of r : π(h) = }. R π GOOD 2 := {r R k N 0 : r = n 0...n k n k+1...n k+d r, π(n 0...n k ) Q, i {1,..., d} : n k+i = π(n 0...n k )}. R π GOOD := Rπ GOOD 1 R π GOOD 2 For the later purposes we need to know that probability of this set exists, which is guaranteed by the following lemma. Lemma Let π Π. Then R π GOOD R H. Proof. First, note that H is countably infinite set. Also note that it is enough to show that R π GOOD 1 R H and R π GOOD 2 R H, because σ-field is closed under countable (therefore also finite) union. Let us define R π BAD 1 := R R π GOOD 1. It is clear that R π BAD 1 = {r R h that is prefix of r : π(h) }. Let H 1 := {h H π(h) }. Since H is countably infinite, H 1 is also at most countable infinity. It is easy to see that R π BAD 1 = h H1 R h. 18

24 This proves that R π BAD 1 R H, because σ-field is closed under countable union. However, than R π GOOD 1 R H, because σ-field is closed under complement. For R π GOOD 2, let us just write it in equivalent way. The original definition is useful for realizing what we are doing, the following one is useful for showing, R π GOOD 2 R H. R π GOOD 2 = {r R h = h n that is prefix of r : h that is prefix of h, h h d, π(h ) = n}. It should be clear that the 2 definitions of R π GOOD 2 are the same. Now let us define H 2 = {h H, h = h n h that is prefix of h, h h d, π(h ) = n}. To complete the proof, just note that R π GOOD 2 = h H2 R h. Definition Let σ Σ. The value of σ is val(σ) := inf π Π P σ (R π GOOD). Definition The value of the game is val(g) := sup inf P σ (R π GOOD). π Π σ Σ Definition Let σ Σ. σ is called optimal defender s strategy if val(σ) = val(g). Let σ Σ, ɛ > 0. σ is called ɛ-optimal defender s strategy if val(σ) val(g) ɛ. Example Patrolling game is played on a graph (N, E), where N = {A, B, C} and E = {(A, B), (B, A), (B, C), (C, B)}, Q = N, n S = B, d = 4. It is simple to show that when the defender plays any stationary strategy, he cannot guarantee successful protection of all 3 nodes with probability 1. If the defender can choose his next move based on all his previous moves (in fact, based on 2 last visited nodes is enough), he can guarantee successful protection of all 3 nodes with probability 1. If the above claim is clear, we suggest stop reading this example. First, note that if defender is playing an stationary strategy, the only freedom he has in choosing his strategy, is to choose probability p with which he goes from B 19

25 to A. If p = 0 then attacker s strategy may be the following one: At the beginning of the game (in history h = B) attack target node A. Clearly, R π GOOD 1 =. Also R π GOOD 2 =, because defender goes from B to C and back and never goes to A. Defender s expected utility is 0 when attacker plays this strategy. If p = 1, the expected utility is again 0 from symmetry of the nodes A, C in the game. Now assume p (0, 1). Attacker s strategy may be again the same. Attack A at the beginning of the game. What is the probability that the defender will visit A in 4 time steps? It s R π GOOD 2 = 1 (1 p)(1 p) < 1. If the defender may play based on any history of his moves, he may play the following strategy: (BABC). In 4 time steps, he visits all nodes, so π Π : R π GOOD = R, so attacker has no chance to attack successfully any node. 3.2 Main result Theorem For patrolling games with an optimal defender s strategy, we have found an deterministic algorithm which ɛ > 0 computes ɛ-optimal defender s strategy in finite time. Proof. Idea of the proof: We will divide the proof into PART A and PART B. In PART A, we restrict the set of defender s strategies to Σ t Σ and show that under an assumption of an optimal defender s strategy in Σ, there exists an ɛ-optimal defender s strategy in Σ t. For the PART B, we denote the situation, where we have a patrolling game G and consider only the defender s strategies from Σ t, by discretized patrolling game (DPG) G t. In PART B, we find optimal defender s strategy for G t, which is, by PART A, guaranteed to be ɛ-optimal in G. PART A formally: Definition Let t N. Then Σ t Σ is the set of defender s strategies, where probabilities of transitions between two nodes can be only of the type i t where i N 0, i t. Theorem Let σ Σ be an optimal defender s strategy, ɛ > 0. Then there exists t N and strategy σ Σ t such that val(σ ) val(σ) ɛ. Moreover, given ɛ > 0, we can find such t by deterministic algorithm in finite time. PART B formally: Definition Let t N. A discretized patrolling game (DPG) G t is a patrolling game G with the set of defender s strategies restricted from Σ to Σ t. Note that Theorem could be equivalently stated as: 20

26 Theorem. Let G be a patrolling game with optimal defender s strategy, ɛ > 0. Then there exists t N such that val(g t ) val(g) ɛ. Moreover, given ɛ > 0, we can find such t by deterministic algorithm in finite time. Theorem Let G t be a DPG. Then we can compute an optimal defender s strategy in G t in a finite time. The hard part of the previous proof is Theorem Whole Chapter 4 is about its proof. We consider Theorem to be quite simple, we prove it in the rest of this section. Sketch of the proof of Theorem Given an DPG G t, we model it as a simple stochastic game (SSG) H 1. We do it in such a way that each defender s strategy in G t has its equivalent 2-tuple (strategy of player Min, starting vertex 2 ) in H. Defender s strategy and its equivalent 2-tuple have equal values. Then we use result from [21], see Theorem , and find optimal strategies in H in finite time. In order to construct optimal defender s strategy, we find appropriate starting vertex in H. We construct an optimal defender s strategy from the found optimal strategy in H and the found appropriate starting vertex in H. First, we need definition of SSG, of its value and optimal strategies. The following paragraphs are all based on [21] until we start modeling DPGs as SSGs. Definition An SSG is a tuple (V, V Max, V Min, V R, E, t, p), where (V, E) is a directed graph, (V Max, V Min, V R ) is partition of V, t V is the target vertex and v V R, w V : p(w v) is the transition probability from v to w, with the property w V p(w v) = 1. Definition A run is an infinite sequence v 0 v 1 V ω of vertices such that if v n (V Max V Min ) then (v n, v n+1 ) E and if v n V R then p(v n+1 v n ) > 0. A run is won by Max if it visits the target vertex; otherwise the run is won by Min. A history is a finite prefix of a run. Definition A strategy for player Max is a function π : V V Max V such that for each history h = v 0 v n such that v n V Max, we have (v n, π(h)) E. 1. In the rest of this section, we do not work with histories, so we borrowed symbol H. 2. To be consistent with [21] and to distinguish between nodes/vertices of DPG/SSG, we use notation vertex for the node of the graph in SSG. 21

27 A run v 0 v 1 is consistent with π if for every n N, if v n V Max then v n+1 = π(v 0 v n ). A strategy for player Min is defined similarly, and we denote it by σ. 3 Once the initial vertex v V and strategies π, σ for players Max and Min are fixed, we can measure the probability that a given set of runs occurs. This probability measure is denoted by Pv π,σ. n N, we denote by V n the random variable defined by V n (v 0 v 1 ) = v n. The set of runs is equipped with the σ-field generated by random variables (V n ) n N. There exists a probability measure Pv π,σ with the following properties: P π,σ v (V 0 = v) = 1, P π,σ v (V n+1 = π(v 0 V n ) V n V Max ) = 1, P π,σ v (V n+1 = σ(v 0 V n ) V n V Min ) = 1, P π,σ v (V n+1 V n V R ) = p(v n+1 V n ). The goal of player Max is to reach the target vertex t with the highest probability possible, whereas player Min has the opposite goal. Notation. Reach(t) is an event from σ-field defined as Reach(t) := { n N, V n = t}. Now let us give the definition of the value of the game and optimal strategies for both players of SSG. Definition Let v V be the starting vertex. Player Max can ensure to win the game from v with probability arbitrarily close to: val (v) := sup inf π σ P v π,σ (Reach(t)). Symmetrically, player Min can ensure that player Max cannot win with a probability much higher than: val (v) := inf σ sup π Pv π,σ (Reach(t)). If val (v) = val (v), then we define value of the game in v as val(v) := val (v). 3. We change notation π with σ from [21]. We do it because we will later model attacker as player Max and defender as player Min. 22

28 The following definition is our own. Definition Let v V be the starting vertex and σ be the strategy of player Min. Then value of strategy σ in starting vertex v is defined as: val(σ, v) := sup π Pv π,σ (Reach(t)). Definition Let v V. π is called optimal strategy for player Max if inf σ P π,σ v (Reach(t)) = sup inf π σ is called optimal strategy for player Min if sup π Pv π,σ (Reach(t)) = inf σ P π,σ σ v sup π (Reach(t)). Pv π,σ (Reach(t)). Definition of an stationary strategy for SSG is the same as for patrolling game, but let us remind it. Definition Strategy of a player in SSG is said to be stationary if the next move depends only the current vertex. The following 2 results mentioned in [21] are important for us. The first of them is Theorem known from [31, 16]. We put it here, because it says something nice about optimal strategies for SSGs and therefore also for DPGs, see Corollary The second one, Theorem is from [21] and is important for our proof of Theorem Theorem [31, 16]. For any SSG: v V : val(v) exists and optimal and stationary strategies for both players exist. Theorem [21]. Values and optimal strategies of an SSG G = (V, V Max, V Min, V R, E, t, p) are computable in time O( V R!( V E + p )), where p is the maximal bit-length of a transition probability in p. There is complexity mentioned in the above theorem. However, for us, it is enough that it is finite time as V R in our SSGs would be too big to consider our approach for use in practice. Now we need to find an SSG H for an DPG G t and a bijection f from defender s strategies to 2-tuples (strategy of player Min, starting vertex), so that σ Σ : val(σ) = val(f(σ)). 23

}w!"#$%&'()+,-./012345<ya

}w!#$%&'()+,-./012345<ya MASARYK UNIVERSITY FACULTY OF INFORMATICS }w!"#$%&'()+,-./012345