Randomized Sampling for Large Zero-Sum Games,

Size: px

Start display at page:

Download "Randomized Sampling for Large Zero-Sum Games,"

Marshall Walker
6 years ago
Views:

1 Randomized Sampling for Large Zero-Sum Games, Shaunak D. Bopardikar a,, Alessandro Borri b, João P. Hespanha c, Maria Prandini d, Maria D. Di Benedetto e a United Technologies Research Center Inc., Berkele, CA, USA b Istituto di Analisi dei Sistemi ed Informatica A. Ruberti, Consiglio Nazionale delle Ricerche IASI-CNR), Rome, Ital. c Center for Control, Dnamical Sstems and Computation, Universit of California at Santa Barbara, CA 93106, USA d Dipartimento di Elettronica e Informazione, Politecnico di Milano, Ital e Department of Electrical and Information Engineering, Universit of L Aquila, Ital Abstract This paper addresses the solution of large zero-sum matrix games using randomized methods. We formalize a procedure, termed as the sampled securit polic SSP) algorithm, b which a plaer can compute policies that, with a high confidence, are securit policies against an adversar using randomized methods to explore the possible outcomes of the game. The SSP algorithm essentiall consists of solving a stochasticall sampled subgame that is much smaller than the original game. We also propose a randomized algorithm, termed as the sampled securit value SSV) algorithm, which computes a high-confidence securit-level i.e., worst-case outcome) for a given polic, which ma or ma not have been obtained using the SSP algorithm. For both the SSP and the SSV algorithms we provide results to determine how man samples are needed to guarantee a desired level of confidence. We start b providing results when the two plaers sample policies with the same distribution and subsequentl extend these results to the case of mismatched distributions. We demonstrate the usefulness of these results in a hide-and-seek game that exhibits exponential complexit. Kewords: Game theor, Randomized algorithms, Zero-Sum Games, Optimization 1. Introduction This paper addresses zero-sum games in which one or both plaers are faced with a large number of choices, possibl infinitel man. For such games, the computation of securit levels i.e., worst-case outcomes) and the corresponding securit policies requires the exploration of ver large decision trees. Games where plaers are faced with deciding among a ver large number of options arise in combinatorial problems, where the number of possible options grows exponentiall with the size of the problem. This situation is common to man domains: In path planning, the number of possible decisions tpicall increases exponentiall with the number of points to visit [cf., e.g., Bellman 1962)]. This material is based upon work supported in part b ARO MURI Grant number W911NF , in part b the Center of Excellence for Research DEWS, Universit of L Aquila, Ital, and in part b the European Commission under the MoVeS project, FP7- ICT A preliminar version of this work entitled Randomized Sampling for Large Zero-Sum Games was presented at the 2010 IEEE Conference on Decision and Control, Atlanta, GA, USA. Corresponding author. This work was performed while this author was at the Universit of California Santa Barbara, CA, USA. addresses: bshaunak@gmail.com Shaunak D. Bopardikar), alessandro.borri@iasi.cnr.it Alessandro Borri), hespanha@ece.ucsb.edu João P. Hespanha), prandini@elet.polimi.it Maria Prandini), mariadomenica.dibenedetto@univaq.it Maria D. Di Benedetto) In network securit, sstem administrators need to consider multi-stage, multi-host attacks that ma consist of long sequences of actions b an attacker in their attempt to circumvent the sstem defenses [cf., e.g., Le and Wing 2005)]. In practice, this leads to polic spaces that grow exponentiall with the number of stages involved in an attack. More generall, in partial-information feedback games plaers must choose feedback policies that assign an action to each possible observation and therefore the number of feedback policies grows exponentiall with the size of the plaers observation spaces [cf., e.g., Hespanha and Prandini 2001); Bopardikar and Hespanha 2011)]. The exploration of large polic spaces is generall a hard task that can become computationall intractable when addressing partial information games [NP-complete in the size of the game tree, see Frank and Basin 1999)]. A method to face this issue is Monte Carlo sampling. The ke idea is to confine the search to a decision tree of reduced size b guessing or sampling the other plaer s moves, and then use conventional minimax search to determine the strateg to pla with. Techniques based on this idea has been successfull applied to several partial information games such as, e.g., Scrabble [cf. Frank 1989)], Bridge [cf. Ginsberg 1996)], and Kriegspiel chess [cf. Parker et al. 2005)]. The surve paper Browne et al. 2012) shows how Monte Carlo sampling has becoming increasing popular and has been extensivel adopted, not onl in the game context, but also in other domains such as e.g. path planning. Indeed, when the underling sstem is stochastic Preprint submitted to Elsevier November 16, 2012

2 but it is difficult to derive an analtic description of the probabilistic distribution characterizing its evolution, solutions based on simulation are tpicall adopted, which entails the use of Monte Carlo sampling. The recent successes in using randomized methods to explore large decision trees [e.g., in LaValle and Kuffner 2001); Hinton et al. 2006); Browne et al. 2012)] motivates the question that is behind the results in this paper: Suppose that m opponent is using a randomized algorithm to explore the game decision tree, can I produce a securit value and an associated securit polic that are correct with high probabilit? The answer to this question is affirmative and we show that such securit values/policies can be constructed using randomized algorithms. What is somewhat surprising about the results reported is that one can obtain high-confidence securit policies b restricting ones attention to a subset of policies that can be much smaller than the total set of policies available to the plaers. Moreover, this restricted set of policies ma be quite different from the set of policies that the opponent considered in her randomized exploration of the game decision tree. We call sampled securit polic SSP) the randomized algorithm proposed to obtain high-confidence securit values/policies. The SSP algorithm can be described as follows: Suppose that plaer P 2 selected a polic based on a random exploration of the policies available to both plaers. The precise algorithm used b P 2 to select her polic based on this extraction is tpicall unknown. Plaer P 1 should then proceed as follows: randoml select a subset of the total set of available policies to both plaers; construct the zero-sum matrix game corresponding to the selected subset of policies, ignoring all other policies available to the two plaers; and compute the securit value/policies associated with the matrix. Plaer P 1 can select either a mixed or a pure value/polic. In both cases, since a large number of policies have been ignored, the securit policies obtained b this process will generall not be securit policies for the whole game and therefore plaer P 1 ma obtain an outcome that is strictl worse than the value computed based on her submatrix. However, we show that this happens with low probabilit as long as the size of the submatrix is sufficientl large. Moreover, this result holds regardless of the algorithm used b P 2 to compute her polic based on the random tree exploration. In fact, P 2 could also be using the SSP algorithm to compute her own polic. Related Work Two-plaer zero-sum matrix games have been studied extensivel over the past decades [cf. the textbook b Basar and Olsder 1999)]. The classical Mini-Max theorem [cf. Von Neumann 1928)] guarantees the existence of an optimal pair of strategies for the two plaers, each of which is a securit polic for the corresponding plaer. However, when the matrix is of large size, the computation of the optimal strategies involves solving optimization problems with a large number of variables and constraints. 2 A probabilistic approach has proven to be computationall efficient in evaluating games with large sizes. Using probabilistic analsis, the existence of simple, nearoptimal strategies over a subset with logarithmicall smaller size of the original matrix game was established in Lipton and Young 1994). A popular method to solve win-lose tpe of multi-stage or dnamic games is to evaluate the root of a game tree, in which ever node is alternatel an AND and an OR operation, while the leaves have a value of either 0 or 1. Motwani and Raghavan 1995) present randomized algorithms to evaluate such game trees more efficientl than using deterministic algorithms. Randomized methods have been successful in providing efficient solutions to complex control design problems with probabilistic guarantees. Khargonekar and Tikku 1996) adopt a probabilistic approach to show the existence of randomized algorithms with polnomial complexit to solve complex robust stabilit analsis problems. Tempo et al. 1997) propose a randomized method for a probabilistic analsis of the worst-case controller performance, and determine sample size bounds. More recentl, Tempo et al. 2004) discuss the application of randomized methods to several control design problems in the presence of uncertaint. Randomized methods have also been used to provide a probabilistic approximation to the minimax value of a cost function in robust control design problems [cf. Fujisaki and Kozawa 2003)]. Their sample complexit requirement is, in general, much higher than for the notion of securit that we propose in this paper, since the are concerned with the sampled minimax value being close to the global minimax value with high confidence. A randomized approach is used in the linear programming reformulation of approximate dnamic programming in de Farias and Ro 2004). Vidasagar 1998); Vidasagar and Blondel 2001) demonstrate the use of randomized algorithms to solve control design problems and a number of well known complex problems in matrix theor through a statistical learning approach. Statistical learning theor [cf. Vapnik 1998)] provides a framework for probabilistic robust control snthesis. Using these tools, Alamo et al. 2009) consider semi-infinite optimization problems under uncertaint with a possibl non-convex objective function. In Calafiore and Campi 2006); Campi and Garatti 2008); Campi and Calafiore 2009), the authors introduce the so-called scenario approach to solve convex optimization problems with an infinite number of constraints. Possible applications of this approach to sstems and control are discussed in Calafiore and Campi 2006) and in Campi et al. 2009). Calafiore 2009) and Calafiore 2010) stud the sample complexit of randomized approaches to sstem analsis and design, and provide, in particular, an explicit expression of the sample-size for the scenario approach to convex optimization based on an approximation of the implicit expression given in Campi and Garatti 2008). These bounds were further refined in Alamo et al. 2010). The results in these papers are instrumental to

3 establish several of the results in the present paper. Contributions Throughout the paper, we explain the results from the perspective of the plaer P 1 the minimizer, who finds herself plaing against an opponent P 2 the maximizer, who computes her polic based on a random exploration of the game decision tree. The contributions of this paper are four-fold. First, we show that the SSP algorithm provides a securit polic for P 1 with probabilit 1 δ, provided that the size of the subgame solved b P 1 is sufficientl large. We provide two bounds on the size of the subgame, one that is valid when P 1 uses general mixed policies and the other when P 1 is restricted to consider onl pure policies. The latter ma require much smaller submatrix sizes when the entries of the matrix A take values in a finite set. The bounds are game independent and can be computable a-priori for an desired confidence level 1 δ, δ > 0. While the size of the subgame grows with the desired confidence level 1 δ, it is completel independent of the size of the original matrix game, which could, in fact, be even infinite and not even have a value. Moreover, this bound is also independent of the precise algorithm that P 2 uses to construct her polic based on the portion of the tree that she explored. The results outlined above assume that, while P 1 does not know the precise subtree that P 2 explored to compute her polic, P 1 does know the distribution that P 2 used to construct her random subtree. When this is not the case, there will be a mismatch between the distribution that P 1 uses in the SSP algorithm and the distribution that P 2 uses for her random exploration. The second contribution of the paper addresses this issue via two approaches. The first approach adopts sample complexit bounds obtained in Erdoǧan and Iengar 2006), which deals with the socalled ambiguous chance constrained problems. More precisel, we determine bounds on the sizes of the submatrices in the SSP algorithm when the mismatch between the distributions used b the two plaers remains below a specified distance ρ < 1, measured in the Prohorov metric. This approach requires no knowledge of the matrix game, but the bounds hold onl when the confidence parameter satisfies the condition ρ < δ, for a desired confidence level of 1 δ. The second approach is based on a novel characterization of the distance between the sampling distributions, which we call the mismatch factor, and is applicable to an confidence level 1 δ and an mismatch factor between the distributions. However, as one would expect, for a given confidence lever, a large mismatch factor requires a large number of samples. These results take advantage of the game structure and, in fact, when the mismatch is associated with polic domination, we show that the bounds with mismatch are exactl the same as the ones without mismatch. Essentiall, this result states that if P 1 knows that a particular subset S worse-for-p2 of P 2 s policies is dominated b another subset of P 2 s policies S better-for-p2 in 3 the sense that S worse-for-p2 is worse than S better-for-p2 from P 2 s perspective), then P 1 need not sample policies from S worse-for-p2. The usefulness of the second approach goes beond investigating the confidence of the SSP algorithm, as it extends the bounds of the scenario approach derived in Campi and Calafiore 2009) and Alamo et al. 2010) to mismatched distributions. Third, we propose a randomized algorithm, which we call sampled securit-value SSV) that P 1 can use to obtain a high-probabilit securit level for a given polic. The bound on the size of the subgame that P 1 extracts to determine her high-probabilit securit level holds for an polic available to P 1, regardless of whether or not this polic was obtained from the SSP algorithm. As for the SSP algorithm, the computation required b the SSV algorithm is independent of the size of the original matrix game and also of the precise algorithm that P 2 uses to construct her polic. When applied to a polic obtained using the SSP algorithm for a confidence level δ SSP, the SSV algorithm can be used to stud the securit of the polic for different perhaps tighter) confidence levels δ. Fourth and finall, we appl the SSP and SSV algorithms to solve a hide-and-seek game, in which one plaer hides a treasure in one of N points and the other plaer searches for the treasure b visiting each of the points. This is formalized as a zero-sum game in which the plaer that hides the treasure wants to maximize the distance that the other plaer needs to travel until the treasure is found. To determine the optimal strateg for this game, one would need to solve a matrix game whose size is N N!. Thus, exact solutions to this problem require computation that scales exponentiall with the number of points N. Our approach is independent of the size of the game and therefore the size of the matrix plas no role in the amount of computation required. As compared to the preliminar conference version [cf. Bopardikar et al. 2010)], this paper presents new results that include the version of the SSP algorithm for pure policies and its analsis, the mismatch in the distributions used b the plaers to construct the subgames, and improves upon the explicit sample size bounds using the results in Alamo et al. 2010). The more recent paper [cf. Bopardikar and Hespanha 2011)] formalizes the sampling procedure to dnamic or multi-stage, partial information games. Organization This paper is organized as follows. The problem formulation and the SSP algorithm are presented in Section 2. Bounds on the size of the subgame to provide highconfidence SSP solutions are established in Section 3 for the case when the two plaers use identical distributions to sample the matrix. These bounds are extended in Section 4 to allow mismatch between the distributions. Section 5 presents the SSV algorithm and the related bounds on the size of the subgame. Finall, we demonstrate the

4 procedure applied to the hide-and-seek problem in Section Problem Formulation Consider a zero-sum matrix game defined b an arbitrar M N real-valued matrix A, in which plaer P 1 is the minimizer and selects rows and plaer P 2 is the maximizer and selects columns. We are interested in problems where the number N of pure) policies available to P 2 is ver large, tpicall due to combinatorial explosion, forcing P 2 to explore a random subset of her own pure) polic space with onl n 2 N policies, and perhaps also onl a random subset of the possibles responses b P 1. Based on this, P 2 selects a polic z that she will use to pla against P 1. Denoting b B k l the set of k l left-stochastic 0, 1)- valued matrices i.e., matrices whose entries belong to the set 0, 1} with exactl one 1 per column), we can express the process b which P 2 samples her own polic space b selecting a random matrix Π 2 from the set B N n2. The matrix Π 2 B N n2 has one row for each of the possible policies of P 2 in the original game defined b A and one column for each polic that was actuall explored b P 2. A one in row i, column j of Π 2 signifies that the jth polic explored b P 2 corresponds to the column i of A. P 2 s random exploration results in a mixed polic z that can be written as z = Π 2 z 2 S N, z 2 S n2, 1) where, for a given integer k, S k denotes the probabilit simplex of size k. P 1 ma know the distribution used to extract Π 2, but will not know the matrix Π 2 that was actuall extracted nor which algorithm was used to determine z 2 and therefore will not know the polic z obtained b P 2. For P 1 to compute a high-confidence response against P 2 s polic z in 1), we introduce the sampled securit polic SSP) Algorithm 1. For the SSP algorithm to be useful, it needs to provide appropriate guarantees of correctness, which are formalized b the following definitions. We sa that the SSP algorithm is ɛ-secure for plaer P 1 with confidence 1 δ if P Γ1,Π 1,Π 2 Az V A 1 ) + ɛ ) 1 δ. 3) Algorithm 1 [SSP Algorithm] 1: P 1 randoml selects m 1 rows and n 1 columns of A, which she uses to construct an m 1 n 1 submatrix A 1 of A. This can expressed b the selection of two random matrices Γ 1 B M m1 and Π 1 B N n1 and then computing the product A 1 = Γ 1AΠ 1. 2: P 1 computes the mixed securit value V A 1 ) and the corresponding securit polic 1 for A 1 : V A 1 ) = max 1 A 1 z = min z S n1 S m1 max A 1 z 2) z S n1 We call V A 1 ) P 1 s sampled securit value. When multiple securit policies 1 exist, P 1 selects for 1 the one with the minimum Euclidean norm since the set of securit policies is convex, it contains a unique element with minimum norm). 3: P 1 computes her mixed polic for the original game: := Γ 1 1, resulting in the outcome Az = 1 Γ 1AΠ 2 z 2. We call P 1 s sampled securit polic. P 1 s polic space 1. While our results do not depend on it, P 2 could have obtained z also using the SSP algorithm. The previous definition guarantees that P 1 will be surprised with low) probabilit δ when plaing with policies obtained from a one-shot solution to the SSP algorithm. However, no specific guarantee is given regarding the inherent safet of the specific polic obtained using the SSP algorithm. So, e.g., suppose that plaer P 1 computes once using the SSP algorithm and then plas this polic multiple times against a sequence of policies z for P 2, each obtained b a distinct random explorations of her polic space. Then P 1 could conceivabl be surprised man more times than one would expect for a low value of δ. This would happen if she was unluck and got a low probabilit) that is particularl bad or a value V A 1 ) that is particularl optimistic. To avoid this scenario, we introduce an additional notion of securit that refers to the securit of a specific polic/value: we sa that a polic with value V A 1 ) is ɛ-secure for plaer P 1 with confidence 1 δ if P Π2 Az V A 1 ) + ɛ, V A1 ) ) 1 δ. 4) Here and in the sequel, we use a subscript in the probabilit measure P to remind the reader which random variables define the event that is being measured. In essence, condition 3) states that the probabilit that the outcome of the game will violate P 1 s sampled securit value V A 1 ) b more than ɛ is smaller than δ. As stated, this definition requires the bound to hold regardless of the algorithm used b P 2 to select her polic z. In fact, we even allow z to be obtained using an algorithm that randoml explores 4 Note that the subscript in the probabilit measure now onl includes the matrix corresponding to randomized exploration of the polic space b P 2 since the probabilit guarantees are given for a specific securit polic and value of P 1. So far, we have not specified the joint distribution of the row/column extraction matrices Γ 1 and Π 1 for P 1 in 1 In this case, the probabilit measure in 3) depends on additional random variables that we do not explicitl include in the subscript.

5 the SSP Algorithm 1, but these distributions, jointl with that of matrix Π 2 for P 2, clearl affect the outcome of the algorithm. In the context of noncooperative games, one should presume the extractions of the two plaers to be independent of each other. For simplicit, we further assume that plaers extract rows and columns independentl, as stated in the following assumption: Assumption 2.1 Independence) The random matrices Γ 1 and Π 1 in the SSP Algorithm 1 and the matrix Π 2 corresponding to plaer P 2 s randomized exploration are statisticall independent and each of them has independent and identicall distributed columns. Under Assumption 2.1, we shall determine in Section 3 bounds on the size of the random matrices extracted b P 1 in the SSP Algorithm 1 that guarantee high-confidence ɛ-securit results with ɛ = 0. These results are valid when P 1 knows precisel the distribution used b P 2 to explore her game decision tree, i.e., to extract columns of the game matrix A. If we allow for mismatched distributions, we can then prove ɛ-securit results with a value for ɛ > 0 that depends on the distance between the distributions used b P 1 and P 2 to extract columns of A see Section 4). Remark 2.1 General games) The results in this paper do not depend on the fact that the original game is a finite matrix game. The extend triviall to an costfunction Ju, d), u U, d D where U and D denote the sets of policies for the minimizer and maximizer, respectivel. In fact, it is not even necessar that the original game has saddle-point policies since all that the results use is the fact that, when we take finite samples of the sets of policies, we obtain finite matrix games. In fact, these results also appl to dnamic games, as is discussed in Bopardikar and Hespanha 2011). 3. Bounds for Probabilistic Guarantees In this section, we present theoretical bounds on the number of policies that plaer P 1 needs to consider for the SSP Algorithm to guarantee desired confidence levels. The results in this section refer to the case where the plaers sample policies for P 2 i.e., columns of A) using identical distributions. This assumption is subsequentl relaxed in Section Mixed Sampled Securit Policies The main result of this section provides a bound on the size of the submatrix A 1 in the SSP Algorithm that guarantees ɛ-securit with ɛ = 0 for the mixed polic. We recall that P 2 is assumed to use a polic z of the form 1), where Π 2 is a column-selection matrix and z2 some vector in S n2 that is obtained using a deterministic or stochastic algorithm. The case of P 2 using a sample of P 1 s policies to determine her polic z also gets included as none of the results in this paper require P 2 to use the same distribution as that used b P 1 for extracting rows of A. Theorem 3.1 SSP Algorithm) Suppose that Assumption 2.1 holds and that Π 1 B N n1 and Π 2 B N n2 have identicall distributed columns. The SSP Algorithm is ɛ = 0)-secure for P 1 with confidence 1 δ, δ 0, 1) as long as 2 m1 + 1 n 1 = δ 1 n 2, 5) with n 2 n 2. Additionall, suppose that we increase n 1 to satisf 1 n 1 = m 1 + 2m 1 ln 1 δ β + ln 1 ) n 2, 6) β for some β 0, 1). Then, with probabilit larger than 1 β, the SSP Algorithm generates a sampled securit polic with value V A 1 ) that is ɛ = 0)-secure for P 1 with confidence 1 δ. In words, this result states that it is alwas possible to guarantee ɛ = 0)-securit for P 1, if she constructs her submatrix A 1 utilizing a sufficientl large number of columns n 1. In particular, she alwas needs to choose a number of columns n 1 larger than the number of columns n 2 that P 2 is considering for her mixed policies [cf. 5) and 6)]. The additional number of columns that P 1 needs to consider is a function of the number m 1 of rows that P 1 wants to consider for her mixed polic and the desired confidence level. The probabilit 1 β associated with s securit probabilistic guarantee accounts for the possibilit that the confidence bound 4) fails altogether due to an unfortunate sample used b P 1 to compute. However, note that onl the logarithm of the confidence level β appears 6) and therefore a relativel small value for the number of columns n 1 suffices to make β extremel small and, hence, 1 β 1). Remark 3.2 P 1 s knowledge of n 2 ) According to Theorem 3.1, for plaer P 1 to enjo guaranteed ɛ = 0)-securit with confidence 1 δ, she must know an upper bound n 2 on the number of columns that P 2 used to construct the polic z in 1). However, if P 1 does not know n 2 precisel and, e.g., underestimates n 2 b a certain percentage, then 5) and 6) are still useful in the sense that the predict that the performance degradation in the actual confidence level δ grows proportionatel with n 2. This is because the bounds in 5) and 6) essentiall scale with n2 δ. 2 Given a scalar x R, we denote b x the smallest integer that is larger than or equal to x. 5

6 Proof of Theorem 3.1: B definition of the securit value V A 1 ), we have that V A 1 ) = min S m1 = min S m1 = min θ Θ max z S n1 Γ 1AΠ 1 z max j 1,...,n 1} Γ 1AΠ 1 e j n 1 ) v : Γ 1AΠ 1 e j n 1 ) v, j 1,..., n 1 } }, 7) where θ := 1, v), Θ := S m1 R, and we use e j n) to denote the jth element of the canonical basis of R n. Since n 1 is an integer multiple of n 2, i.e., n 1 = K n 2 with K = m1+1 δ 1, we can use the K n 2 columns of N K n2 Π 1 B to construct K independent and identicall distributed i.i.d.) matrices 1, 2,..., K B N n2. For an arbitrar realization of the matrix Γ 1 B M m1, which is independent of the i b Assumption 2.1, let us define the function f Γ1 : Θ B N n2 R b f Γ1 θ, ) = max j 1,..., n 1Γ 1A e j n 2 ) v. 8) 2} We can then rewrite 7) as V A 1 ) = min v : fγ1 θ, i ) 0, i 1,..., K} } 9) θ Θ and conclude from Campi and Calafiore, 2009, Proposition 3) that the conditional) probabilit that another matrix sampled independentl from the same distribution as the i satisfies the constraint f Γ1 θ, ) 0 can be lower-bounded as follows: P Π1, fγ1 θ ) K m 1, ) 0 Γ 1 1 δ, 10) K + 1 where θ denotes the value in Θ that achieves the minimum in 9) and the second inequalit is a consequence of 5). Since the minimum in 9) is achieved for the sampled securit polic/value θ = 1, V A 1 )), we can use the definition 8) of f Γ1 to re-write 10) as P Π1, 1 Γ 1A e j n 2 ) V A 1 ), Since n 2 n 2, we further conclude that P Π1, 1 Γ 1A e j n 2 ) V A 1 ), j 1,..., n 2 } Γ 1 ) 1 δ. j 1,..., n 2 } Γ 1 ) 1 δ. Under Assumption 2.1, when the columns of Π 1 and Π 2 are identicall distributed, the matrix consisting of the first n 2 columns of can be viewed as the matrix Π 2 and the inequalit above implies that P Π1,Π 2 1 Γ 1AΠ 2 e j n 2 ) V A 1 ), j 1,..., n 2 } Γ 1 ) 1 δ. Since 1 Γ 1AΠ 2 e j n 2 ) V A 1 ), j 1,..., n 2 } we conclude that 1 Γ 1AΠ 2 z V A 1 ), z S n2, P Π1,Π 2 1 Γ 1AΠ 2 z 2 V A 1 ) Γ 1 ) 1 δ. We have shown that this bound holds for an arbitrar realization of Γ 1, therefore it also holds for the unconditional probabilit, which shows that the SSP Algorithm is ɛ = 0)-secure for P 1 with confidence 1 δ according to 3). If instead of using Campi and Calafiore, 2009, Proposition 3) and 5) to obtain 10), we use Alamo et al., 2010, Theorem 4) and 6), we obtain instead that fγ1 θ, ) 0 Γ 1, θ ) 1 δ, 11) with probabilit larger than 1 β, where the confidence level 1 β refers to the extraction of Π 1 = [ 1,..., K ] that defines θ given Γ 1 ). The proof can now proceed exactl as before, but with 10) replaced b 11), which now involves a probabilit conditioned to θ =, V A 1 )). This shows that if n 1 satisfies 6), then with probabilit larger than 1 β, the polic with value V A 1 ) is ɛ = 0)- secure for P 1 with confidence 1 δ Pure Sampled Securit Polic Suppose that P 1 restricts herself to use pure policies in Step 2 of the SSP Algorithm 1. If we let e i m 1 ) denote the ith element of the canonical basis of R m1, then Step 2 becomes: 2: P 1 computes the pure securit value V pure A 1 ): V pure A 1 ) = max z S n1 e i m 1)A 1 z = min i 1,...,m 1} max z S n1 e im 1 )A 1 z, and the corresponding pure securit polic 1 for A 1 : 1 = e i m 1 ). We call Vpure A 1 ) P 1 s pure sampled securit value. When multiple pure securit policies 1 exist, P 1 can pick an of them. A bound similar to 6) can be established for the resulting pure SSP Algorithm, but the number of columns n 1 that P 1 needs to sample can often be much smaller. Note that the bound still holds for an polic z for plaer P 2 of the form 1), pure or mixed. Theorem 3.3 Pure SSP Algorithm) Suppose that Assumption 2.1 holds and that Π 1 B N n1 and Π 2 B N n2 6

7 have identicall distributed columns. Suppose that we select 1 n 1 = ln m 1 #Γ δ 1A) ) + ln 1 ) n 2, 12) β for β, δ 0, 1) and n 2 n 2, where #Γ 1A) denotes the total number of distinct values that the entries of Γ 1A can take. Then, with probabilit 1 β, the pure SSP Algorithm generates a pure sampled securit polic with value V pure A 1 ) that is ɛ = 0)-secure for P 1 with confidence 1 δ. In several matrix games the number of distinct values that entries of A can take is small and therefore #Γ 1A) is small. This occurs, e.g., in win-lose-tie games. For such games, 12) provides significant computational gains with respect to 6) because the bound in 12) grows with the logarithm of m 1, whereas the one in 6) grows linearl with m 1. Even if the matrix A can have man distinct values, significant computational savings are possible since #Γ 1A) m 1 N and hence, at worse, 12) still onl grows with the logarithm of m 2 1N. The price to pa for these computational savings is that the pure securit level V pure A 1 ) could be higher than the value V A 1 ) obtained with mixed policies. The proof of Theorem 3.3 is conceptuall similar to the second part of the proof of Theorem 3.1, with the main difference being that the polic selection involves optimizing over a finite set of cardinalit #Γ 1A), and, hence, we can use the bounds in Alamo et al., 2010, Theorem 3) instead of those in Alamo et al., 2010, Theorem 4). Proof of Theorem 3.3: Let e k n) denote the kth element of the canonical basis of R n. B the definition of the pure securit value V pure A 1 ), we have that V pure A 1 ) = min i 1,...,m 1} max e z S n1 im 1 )Γ 1AΠ 1 z = min max i 1,...,m 1} j 1,...,n e im 1 )Γ 1AΠ 1 e j n 1 ) 1} = min v : e i m 1 )Γ 1AΠ 1 e j n 1 ) v, j 1,..., n 1 } }, θ Θ 13) where θ := e i m 1 ), v), Θ := e 1 m 1 ), e 2 m 1 ),..., e m1 m 1 )} Γ 1A}, and Γ 1A} denotes the set of all distinct values that the entries of Γ 1A can take. The set Θ thus have m 1 #Γ 1A) distinct elements. Since n 1 is an integer multiple of n 2, i.e., n 1 = K n 2 1 with K = δ ln m1 #Γ 1 A) β, we can use the K n 2 columns of Π 1 B N K n2 to construct K i.i.d. matrices 1, 2,..., K B N n2. For an arbitrar realization of the matrix Γ 1 B M m1, which is independent of the i b Assumption 2.1, let us define the function g Γ1 : Θ B N n2 R b 0, max g Γ1 θ, ) = j 1,..., n e im 1 )Γ 1A e j n 2 ) v 0, 2} 1, otherwise, We can then rewrite 13) as V pure A 1 ) = min θ Θ K v : g Γ1 θ, i ) 0 }, 14) k=1 and conclude from Alamo et al., 2010, Theorem 3) that the conditional) probabilit that another matrix sampled independentl from the same distribution as the i satisfies the constraint g Γ1 θ, ) = 0, where θ denotes the value in Θ that achieves the minimum in 14), can be lower-bounded b gγ1 θ, ) = 0 Γ 1, θ ) 1 δ, 15) with probabilit larger than 1 β, where the confidence level 1 β refers to the extraction of Π 1 = [ 1,..., K ] that defines θ given Γ 1 ). Since the minimum in 14) is achieved for the sampled securit polic/value θ = 1, V pure A 1 )), we can use the definition of g Γ1 to re-write 15) as 1 Γ 1A e j n 2 ) V pure A 1 ), j 1,..., n 2 } Since n 2 n 2, we further conclude that Γ 1, 1, V pure A 1 ) ) 1 δ. 1 Γ 1A e j n 2 ) V pure A 1 ), j 1,..., n 2 } Γ 1, 1, V pure A 1 ) ) 1 δ. Under Assumption 2.1, when the columns of Π 1 and Π 2 are identicall distributed, the matrix consisting of the first n 2 columns of can be viewed as the matrix Π 2 and the inequalit above implies that Since P Π2 1 Γ 1AΠ 2 e j n 2 ) V pure A 1 ), j 1,..., n 2 } Γ 1, 1, V pure A 1 ) ) 1 δ. 1 Γ 1AΠ 2 e j n 2 ) V pure A 1 ), j 1,..., n 2 } we conclude that 1 Γ 1AΠ 2 z V pure A 1 ), z S n2, P Π2 1 Γ 1AΠ 2 z 2 V pure A 1 ) Γ 1, 1, V pure A 1 )) 1 δ. This shows that, with probabilit larger than 1 β, the sampled pure polic := Γ 1 1 with with value V pure A 1 ) is ɛ = 0)-secure for P 1 with confidence 1 δ. 7

8 Remark 3.4 The bound for the pure SSP Algorithm corresponding to 5) in Theorem 3.1 would be m1 #Γ n 1 = 1A) 1 n 2, δ which can be obtained b first deriving the single level of probabilit version of the bounds in Alamo et al., 2010, Theorem 3), following similar steps as in Campi and Calafiore, 2009, Proposition 3). Given that this bound is worse than 5), there is no computational advantage for plaer P 1 in considering pure rather than mixed policies. Consequentl, this result is not included in Theorem Mismatch in the Sampling Distributions We now investigate the effect of a mismatch between the distribution that P 1 uses to select the columns of the polic-selection matrix Π 1 used in the SSP Algorithm of Section 3 and the distribution that P 2 uses to select the columns of the polic-selection matrix Π 2 that she uses to determine the polic z in 1). We pursue two approaches: The first one is based on a characterization of the mismatch between distributions using the Prohorov metric and provides bounds that are independent of the game matrix A. The second approach provides a novel characterization of the mismatch between the two distributions that can take the matrix A into consideration. The proofs for the results in this section are deferred to the Appendix to facilitate the presentation Prohorov Metric-Based Approach For a given integer k, the distributions used to select the columns of Π 1 and Π 2 can be used to construct two measures m k and m k, respectivel, for the columnselection matrices taking values in B N k with i.i.d. columns. Using the discrete metric 1, x 1 x 2, dx 1, x 2 ) = x 1, x 2 B N k, 16) 0, x 1 = x 2, we can regard B N k as a metric space, for which the Prohorov metric between m k and m k is simpl given b the total variation metric πm k, m k ) = sup m k B) m k B), 17) B F where F denotes the Borel sigma-algebra on B N k. The following theorem is based on the results for the ambiguous chance constrained problems in Erdoǧan and Iengar 2006) and should be viewed as a generalization of the bound in 6) Theorem 3.1 for the case of mismatched distributions. Theorem 4.1 Matrix-independent mismatch) Suppose that Assumption 2.1 holds, that π m n2, m n2) ρ < 1 8 for some n 2 n 2, and that we select n 1 = 2m 1 + 1) + 2 δ ρ ln 1 β + 2m 1 + 1) 2 ln n 2 δ ρ δ ρ 18) for β 0, 1), δ ρ, 1). Then, with probabilit larger than 1 β, the SSP Algorithm generates a sampled securit polic with value V A 1 ) that is ɛ = 0)-secure for P 1 with confidence 1 δ. In the spirit of Theorem 3.1, the bound 18) is completel independent of the matrix game A. However, this result has the limitation that it is applicable onl for confidence levels δ > ρ and therefore does not permit confidence levels larger than 1 ρ Mismatch Factor-Based Approach Our second approach to characterize the impact of a mismatch between the sampling distributions relies on generalizations of the bounds of the scenario approach to convex optimization in Campi and Calafiore 2009) and Alamo et al. 2010) that were instrumental to the proof of Theorem 3.1. We start b presenting these generalizations, which are useful beond the context of the problem considered here Scenario Optimization Consider a sequence of K i.i.d. random variables 1, 2,..., K taking values in a set D. These random variables are used to specif a set of K constraints in the following convex optimization problem: 3 θ = arg min θ Θ c θ : fθ, i ) 0, i 1,..., K} }, 19) where c R n θ and the constraint-defining function f : Θ D R is convex with respect to the first argument, for each fixed value of D, and Θ is a convex subset of R n θ. The results that follow provide bounds on the probabilit that an additional independent random variable, also taking values in D but with a different distribution, satisfies the following somewhat relaxed) version of the constraint that appears in 19) for the optimal θ : fθ, ) ɛ, for some ɛ 0. Campi and Calafiore, 2009, Proposition 3) and Alamo et al., 2010, Theorem 4) provide such bounds when i and have the same distribution and ɛ = 0. Denoting b m and m the measures associated with the distributions of the i and, respectivel, we define the mismatch factor between m and m b µ f ɛ) := inf µ R µ : m fθ, ) > ɛ ) µ m fθ, ) > 0 ), } θ Θ 20) 3 In case of several possible multiple minima, the one with the smallest Euclidean norm should be selected.

9 When m = m, we have µ f ɛ) 1, with equalit when ɛ = 0. However, when the distributions do not match, µ f ɛ) can be arbitraril large. As the name indicates, the mismatch factor µ f ɛ) can be viewed as a measure of how much the distributions of and differ. Aside from not being a metric, it differs more fundamentall from the Prohorov metric 17) in that i) 20) onl regards the discrepanc between values of the measures for violation events of the tpe fθ, ) > ɛ; ii) it considers a kind of multiplicative uncertaint in the probabilities instead of differences); and iii) it allows for the relaxation parameter ɛ > 0 that can bring µ f ɛ) down if we are willing to allow fθ, ) to grow as large as ɛ > 0. As we shall see shortl, smaller values of µ f ɛ) lead to smaller probabilities of violation. The following two results generalize Campi and Calafiore, 2009, Proposition 3) and Alamo et al., 2010, Theorem 4), respectivel, for mismatched distributions and ɛ > 0. Lemma 4.2 For ever ɛ 0,, 1,..., K f θ, ) ) ɛ 1 µ f ɛ) n θ K + 1. Lemma 4.3 Given ɛ > 0, δ 0, 1), β 0, 1), and µf ɛ) K n θ 1) + 2n θ 1) ln 1 δ β + ln 1 ), β we have that 21) fθ, ) ɛ θ ) 1 δ, 22) with probabilit 4 larger than 1 β Probabilistic Guarantees Lemmas 4.2 and 4.3 allow us to generalize Theorem 3.1 for the case of mismatched distributions. This generalization involves a famil of functions f Γ : Θ B N n2 R, with Θ := S m1, R) and n 2 an integer larger than n 2, parameterized b the matrix Γ B M m1 and defined b f Γ θ, ) = max j 1,..., n 1Γ A e j n 2 ) v. 23) 2} We shall use these functions to compute the mismatch factor between the measures m k and m k for column-selection matrices taking values in B N k with i.i.d. columns, constructed using the distributions used to select the columns of Π 1 and Π 2, respectivel. Theorem 4.4 Matrix-dependent mismatch) Suppose that Assumption 2.1 holds, and that m n2 fθ, ) > ɛ ) µ m n2 fθ, ) > 0 ), θ Θ, Γ B M m1, 24) 4 The confidence level 1 β refers to the extraction of 1,... K that defines θ. 9 for some µ 0, ), n 2 n 2 and ɛ 0. The SSP Algorithm is ɛ-secure for P 1 with confidence 1 δ, δ 0, 1) as long as µ n 1 = δ m 1 + 1) 1 n 2. 25) Additionall, suppose that we increase n 1 to satisf µ n 1 = m 1 + 2m 1 ln 1 δ β + ln 1 ) n 2 26) β for some β 0, 1). Then, with probabilit larger than 1 β, the SSP Algorithm generates a sampled securit polic with value V A 1 ) that is ɛ-secure for P 1 with confidence 1 δ. Theorem 4.4 shows that, even when there is a mismatch in the distributions, it is still possible to achieve high-confidence securit policies. However, the number of samples required b the SSP algorithm essentiall needs to be multiplied b µ. Alternativel, if one uses the number of samples dictated b Theorem 3.1 and there is mismatch in the distributions, then one obtains securit with confidence 1 δ/µ instead of 1 δ) since one can go from the formulas in Theorem 3.1 to the ones in Theorem 4.4 b simpl replacing 1/δ b µ/δ. Remark 4.5 µ < 1) For values of ɛ > 0 and matched or closel matched distributions), the mismatch factors ma actuall be smaller than 1. Theorem 4.4 is still applicable and essentiall states that if one is willing to accept some ɛ > 0, one ma get 1 δ confidence with a smaller number of samples than what was required b Theorem 3.1. Remark 4.6 Matrix-independent results) If we choose µ to satisf m n2 B) µ m n2 B), B F, where F denotes the Borel sigma-algebra on B N n2, then 24) holds with ɛ = 0 for ever matrix game and we obtain a game independent result. The price is, of course, that such µ does not explore the structure of the particular game and ma therefore be much larger than what is needed. In fact, we shall see in the next section that the structure of the matrix A ma dictate that some mismatch should not lead to a degradation in the confidence levels. This is the case when A exhibits some form of polic domination Matrix games with dominated policies Consider a situation when P 1 knows of some particularl good policies that P 2 ma appl to pla the game. For example, suppose that the entries in some column c better-for-p2 of A are all element-wise larger than those in some other column c worse-for-p2. In this case, it turns out that P 1 can increase the probabilit of sampling the column c better-for-p2 at the expense of decreasing the probabilit of selecting c worse-for-p2 and this mismatch does not

10 require a larger bound on the number of columns to sample. This observation is formalized in the remaining of this section. We begin with the following notion of dominance. Definition 1 ɛ-dominance) Given an M N matrix A, the vector d B N 1 is said to be ɛ-dominated b the vector d B N 1 for some ɛ 0 if e i M) Ad e i M) Ad + ɛ, i 1,..., M}, where e i M) denotes the ith element of the canonical basis of R M. With ɛ = 0, the above definition becomes identical to that of domination between pure policies in matrix games [cf. Basar and Olsder 1999)]. Next, we introduce the notion of two sampling distributions being perturbed. Definition 2 Perturbed sampling) Given two distinct vectors d, d B N 1 and two probabilit measures m, m on B N 1, we sa that m is a perturbation of m with respect to the pair d, d ) if 1. m differs from m onl over d, d } B N 1, i.e., me j N)) = me j N)), for all j such that e j N) d, d}, where e j N) denotes the jth element of the canonical basis of R N ; 2. the probabilit of extracting d according to m is smaller than according to m, i.e., md ) md ). We now present the main result of this subsection. Theorem 4.7 Domination) Given the game matrix A, suppose that for some ɛ 0, there exist vectors d, d B N 1 such that d is ɛ-dominated b d. Suppose that Assumption 2.1 holds and that the columns of the matrices Π 1 and Π 2 are sampled according to distributions m and m, respectivel. If m is a perturbation of m with respect to d, d ), then Theorem 4.4 holds with µ = 1. This result shows that even when P 1 extracts with low probabilit possibl equal to zero) the column d, the bounds of Section 3 hold. 5. A-posteriori assessment of a given polic Suppose now that P 1 obtained a polic using either a randomized or a deterministic algorithm. In this section, we are interested in computing a high-confidence securit level value V ) for this polic, when is plaed against P 2 s polic z in 1). The sampled securit-value SSV) Algorithm 2 addresses this question. 10 Algorithm 2 [SSV Algorithm] 1: P 1 randoml selects k 1 columns of A, which corresponds to the selection of a random matrix Π 1 B N k1. 2: P 1 computes V ) = max j 1,...,k A Π 1 e j k 1 ), 27) 1} where e j k 1 ) denotes the jth element of the canonical basis of R k1. We call V ) P 1 s a-posteriori sampled securit value. We use the qualifier a-posteriori for the sampled securit value V ) to emphasize that this value is computed after a particular securit polic has been obtained. We sa that the SSV algorithm is ɛ-secure for plaer P 1 s polic with confidence 1 δ if P Π1,Π 2 Az V ) + ɛ ) 1 δ. 28) This condition states that the probabilit that the outcome of the game will violate P 1 s a-posteriori sampled securit value V ) b more than ɛ is smaller than δ. As stated, this definition requires the bound to hold regardless of the algorithms used to generate and z. In particular, we leave open the possibilit that both policies could have been computed using the SSP algorithm, perhaps with confidence levels different than δ. In fact, one could imagine P 1 computing using the SSP algorithm for a confidence level δ SSP and then studing the securit of such polic for tighter confidence levels δ using the SSV algorithm. Thus, the SSV algorithm combined with the SSP algorithm can be viewed as a heuristics for designing high-confidence securit policies. Also for the SSV, we can define a stronger notion of securit that guarantees the inherent securit of the a- posteriori sampled securit value V ), when P 1 plas repeatedl against a sequence of policies z for P 2, each obtained b a distinct random exploration of her polic space. We sa that the a-posteriori sampled securit value V ) is ɛ-secure for plaer P 1 s polic with confidence 1 δ if P Π2 Az V ) + ɛ, V ) ). 29) As for the SSP Algorithm 1, we assume that the distributions of the column extraction matrices Π 1 used b plaer P 1 in the SSV Algorithm 2 and Π 2 used b plaer P 2 in her randomized polic exploration are independent. Assumption 5.1 Independence) The random matrix Π 1 in the SSV Algorithm 2 and matrix Π 2 involved in plaer P 2 randomized exploration are statisticall independent and each of them has independent and identicall distributed columns. We now provide a bound on the number of samples k 1 used in the SSV Algorithm to guarantee ɛ-securit. Akin

11 to Section 3, we restrict our attention to the case when plaer P 1 uses the same distribution as plaer P 2 to extract columns of A. Results along the same lines as those shown in Section 4 for the SSP Algorithm could be obtain for the SSV Algorithm in the case of mismatched distributions, but we omit them because the are fundamentall similar. Theorem 5.1 SSV Algorithm) Suppose that Assumption 5.1 holds and that Π 1 B N k1 and Π 2 B N n2 have identicall distributed columns. The SSV algorithm is ɛ = 0)-secure for plaer P 1 s polic with confidence 1 δ, δ 0, 1) as long as 1 k 1 = δ 1 n 2, 30) with n 2 n 2. Additionall, suppose that we increase k 1 to satisf 1 k 1 = δ ln 1 n 2, 31) β for some β 0, 1). Then, with probabilit 1 β, the SSV Algorithm generates an a-posteriori sampled securit value V ) that is ɛ = 0)-secure for plaer P 1 s polic with confidence 1 δ. 1 Proof of Theorem 5.1: Defining K := δ 1 and the function f : S M B N n2 R b f, ) = we can re-write 27) as max j 1,..., n A e j n 2 ), 32) 2} Under Assumption 5.1, when the columns of Π 1 and Π 2 are identicall distributed, the matrix consisting of the first n 2 columns of can be viewed as the matrix Π 2 and the inequalit above implies that Since P Π1,Π 2 AΠ 2 e j n 2 ) V ), j 1,..., n 2 } AΠ 2 e j n 2 ) V ), j 1,..., n 2 } we conclude that ) 1 δ. AΠ 2 z V ), z S n2, P Π2, Π 1 AΠ 2 z 2 V ) ) 1 δ, which shows that SSV Algorithm is ɛ = 0)-secure with confidence 1 δ. If, instead of using Campi and Calafiore, 2009, Proposition 4) and 30) to obtain 34), we use Campi and Garatti, 2008, Theorem 1) and 31), we obtain f, ) V ), V ) ) 1 δ, 35) with probabilit larger than 1 β, where the confidence level 1 β refers to the extraction of Π 1 = [ 1,..., K ] that defines V ). The proof can now proceed exactl as before, but with 34) replaced b 35), which now involves a probabilit conditioned to, and V ). This shows that if k 1 satisfies 31), then with probabilit larger than 1 β, the securit value V ) is ɛ = 0)-secure with confidence 1 δ. V ) = max f, i ), 33) i 1,...,K} where the matrices 1, 2,..., K B N n2 are obtained b partitioning the K n 2 columns of Π N K n2 1 B into K i.i.d. matrices. For an given which is independent of the i ), we conclude from Campi and Calafiore, 2009, Proposition 4) that the conditional) probabilit that another matrix, sampled independentl from the same distribution as the i, satisfies the constraint f, ) V ) := can be lower-bounded as follows: max f, i ) i 1,...,K} P Π1, f, ) V ) ) K 1 δ, 34) K + 1 where the second inequalit is a consequence of 30). From the definition of f, we conclude from 34) that P Π1, A e j n 2 ) V ), j 1,..., n 2 } ) 1 δ, and, since n 2 n 2, we also have that P Π1, A e j n 2 ) V ), j 1,..., n 2 } ) 1 δ Example: Hide-and-seek matrix game In this section, we appl the SSP and SSV Algorithms to a classic search problem: Consider a zero-sum game where P 1 hides a non-moving object treasure) in one of N points p 1,..., p N } R 2 on the plane and P 2 wants to find the treasure with minimum cost, b traveling from point to point until she finds it. The game is plaed over the set of mixed policies: P 1 chooses a probabilit distribution z S N for the treasure over the N points, and P 2 chooses a probabilit distribution S M over the set R := r j : j = 1,..., M} of M := N! routes that start at P 1 s initial position p 0 R 2 and go through all possible permutations of the points. When P 1 chooses to hide the treasure at point p i and P 2 selects route r j, the outcome of the game is equal to the length of route r j from P 1 s initial position p 0 to the point p i where the treasure lies. Namel, k ij A ij = r j k) r j k 1), 36) k=1

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s