576M 2, we. Yi. Using Bernstein s inequality with the fact that E[Yi ] = Thus, P (S T 0.5T t) e 0.5t2

Size: px

Start display at page:

Download "576M 2, we. Yi. Using Bernstein s inequality with the fact that E[Yi ] = Thus, P (S T 0.5T t) e 0.5t2"

Colleen Mosley
5 years ago
Views:

1 APPENDIX he Appendix is structured as follows: Section A contains the missing proofs, Section B contains the result of the applicability of our techniques for Stackelberg games, Section C constains results about the sample complexity of standard SUQR, Section D contains the weaker sample complexity bound result for the generalized SUQR model derived using the approach of Haussler and Section E contains additional experiments. A. PROOFS Proof of heorem PROOF. First, Haussler uses the following pseudo metric ρ on A that is defined using the loss function l: ρ(a, b = max y Y l(y, a l(y, b. o start with, relying on Haussler s result, we show P r( h H. ˆr h ( z r h (p < α 3 4C ( α 48, H, ρ e α m 576M Choose α = α /4M and ν = M in heorem 9 of [4]. Using property (3 (Section., [4] of d v we obtain r s ɛ whenever d v(r, s α. Using this directly in heorem 9 of Haussler [4] we obtain the desired result above. Note the dependence of the above probability on m (the number of samples, and compare it to the first pre-condition in the PAC learning result. By equating δ/ to 4C(α/48, H, ρe α m 576M, we derive the sample complexity as m 576M 8C(α/48, H, ρ α δ We wish to compute a bound on C(ɛ, H, ρ in order to use the above result to obtain sample complexity. First, we prove that ρ d l for the loss function we use. his result is used to bound C(ɛ, H, ρ, since, it is readily verified from definition that C(ɛ, H, ρ C(ɛ/, H, d l. Such a bounding directly gives m 576M 8C(α/96, H, ρ α δ Below we prove that ρ d l. LEMMA 8. Given the loss function defined above, we have ρ(a, b max i a i b i i ai bi dl (a, b PROOF. By definition, ρ(a, b = max i ai + b i + + ea i max i a i b i + + ea i. here + eb i is j and k such that max r e a k e b k ea i e b i for all i. hus, + minrt + t = ea j e b j + ea i + eb i + eb i ea i e b i for all i and min r = + maxrt + t where t = eb i. he greatest positive value of the RHS is max r a j b j and least negative value possible for LHS is min r a k b k. hus, + ea i + eb i max i ai b i Hence, we obtain ρ(a, b = max i l(y i, a l(y i, b max i a i b i, and the last inequality is trivial. hus, using the above result we get Proof of Lemma m 576M α 8C(α/96, H, d l δ PROOF. First, note that x i = x i x lies between [, ] due to the constraints on x i, x. hen, for any two functions g, g G we have the following result: d L (P, d l (g, g = = d l (w(x i x, w (x i x dp (x X = (w w (x i x dp (x X (w w dp (x = (w w X Also, note that since the range of any g = w(x i x is [ M, M ] 4 4 and given x i x lies between [, ], we can claim that w lies between [ M, M ]. hus, given the distance between functions is 4 4 bounded by the difference in weights, it enough to divide the M/ range of the weights into intervals of size ɛ and consider functions at the boundaries. Hence the ɛ-cover has at most M/4ɛ functions. he proof for constant valued functions F i is similar, since its straightforward to see the distance between two functions in this space is the difference in the constant output. Also, the constants lie in [ M, M ], hen, the argument is same as the G case. 4 4 Proof of Lemma 3 PROOF. First, the space of functions Ĥ = {h/ ˆK h H i} is Lipschitz with Lipschitz constant and h i(x M/ ˆK. Clearly N (ɛ, H i, d l N (ɛ/ ˆK, Ĥ, d l. Using the following result from [3]: for any Lipschitz real valued function space H with constant, any positive integer s and any distance d N (ɛ, H, d l ( M(s + ˆKɛ + (s + N ( sɛ s+,x,d hen, we get the bound on N (ɛ/ ˆK, Ĥ, d l by choosing s = and d = d l, and hence obtain the desired bound on N (ɛ, H i, d l. Proof of Lemma 4 PROOF. For ease of notation, we do the proof with k standing for K +. Let Y i = U i 0.5, then Y / and S 0.5 = i Yi. Using Bernstein s inequality with the fact that E[Yi ] = / P ( i Y i = S 0.5 t e 0.5t /+t/6 hus, P (S 0.5 t e 0.5t /+t/6. ake k = 0.5 t, and hence t = 0.5 k = (0.5 k/. Hence, Proof of heorem 3 P (S k e 3 (0.5 k/ k/ PROOF. Given the results of Lemma 3, we get the sample complexity is of order ( α δ + ( N ( α, X, d l

2 Now, suing result of Lemma 4, we get the required order in the heorem. We wish to note that if K/ is a constant then the O(e in Lemma 4 gets swamped by the term. However, in practice for fixed, this term does provide lower actual complexity bound than what is indicated by the order. Proof of Lemma 5 PROOF. Observe that due to the definition of K any solution to MinLip will have Lipschitz constant K. hus, it suffices to show that the Lipschitz constant of h i is K, to prove that h i is a solution of MinLip. ake any two x, x. If the min in the expression for h i occurs for the same j for both x, x then h i(x h i(x is given by K x x j x x j. By application of triangle inequality x x x x j x x j x x hus, h i(x h i(x K x x. For the other case when the min for x occurs at some j and min for x at some j we have the following: h i(x = h ij + K x x j and h i(x = h ij + K x x j. Also, due to the min, h i(x h ij + K x x j = h i(x + K x x j K x x j. hus, we get h i(x h i(x K ( x x j x x j K x x Using the symmetric case inequality for x we get h i(x h i(x K ( x x j x x j K x x Combining both these we can claim that h i(x h i(x K x x. hus, we have proved that h i is K Lipschitz, and hence a solution of MinLip. Proof of Lemma 6 PROOF. Let p X be the marginal of p(x, y for space X. Define the expected entropy E[H(x] = px(x Iy=t i qp i (x qp i (x dx. Given the loss function, we know that r h (p = p(x, y Iy=t i qh i (x dx dy. his is same as p X(x Iy=t i qp i (x Iy=t i qh i (x dx dy. his reduces to p X(x Iy=t i qp i (x qh i (x dx dy. hus, we have E[H(x] + r h (p = p X(x I y=ti q p i (x qp i (x dx dy qi h (x Hence, we obtain E[H(x] + r h (p = E[KL(q p (x q h (x] Hence, r h (p r h (p is equal to E[KL(q p (x q h (x] E[KL(q p (x q (x] hus, from the assumptions, we get E[KL(q p (x q h (x] α + ɛ with probability δ. Next, using Markov inequality, with probability δ P r(kl(q p (x q h (x (α + ɛ /3 (α + ɛ /3 that is using the notation = (α+ɛ /3, with probability δ P r(kl(q p (x q h (x /3 /3 Using Pinkser s inequality we get (/ q p (x q h (x KL(q p (x q h (x. hat is, the event KL(q p (x q h (x /3 implies the event q p (x q h (x. hus, P r( q p (x q h (x P r(kl(q p (x q h (x /3. hus, we obtain: with probability δ, P r( q p (x q h (x. Proof of Lemma 7 PROOF. We know that q h i (x = 0. hus, eh i (x j eh j (x (assume h (x = j eh j (x qi h (x qi h (x = qi h (x e h i(x h i (x j eh j (x Let r denote e h l (x eh j (x e h l (x e h j (x j eh j (x j eh j (x. here is l and k such that max r = ehk(x eh j (x e h k (x for all j and min r = for all e h j (x j. hen, min r r max r First, note that due to our assumption that for each i h i(x h i(x ˆK x x, we have e ˆK x x min r r max r e ˆK x x Using the Lipschitzness we can also claim that e ˆK x x e h i(x h i (x e ˆK x x. hus, e ˆK x x e h i(x h i (x r e ˆK x x Since, e ˆK x x < and e ˆK x x > we have e h i(x h i (x r max( e ˆK x x, e ˆK x x Also, it is a fact that e y.5 y for y 3/4. hus, we obtain e h i(x h i (x r 3 ˆK x x for ˆK x x 3/4 hus, q h (x q h (x = i qh i (x q h i (x = j eh j (x i qh i (x e h i(x h i (x ( j eh j (x i qh i (x 3 ˆK x x for ˆK x x 3/8. Since i qh i (x =, we have q h (x q h (x 3 ˆK x x for x x 3/8 ˆK In other words q h is locally 3 ˆK-Lipschitz for every l norm ball of size 3/8 ˆK. he following allows us to prove global Lipschitzness. LEMMA 9. Any locally L-Lipschitz function f for every l p ball of size δ 0 on a compact convex set X R n is Lipschitz on the set X. he Lipschitz constant is also L. PROOF. ake any two points x, y X, the straight line joining x, y lies in X (as X is convex. Also, a finite number of balls of size δ 0 cover X (due to compactness. hus, there are finitely many points x = z,..., z µ = y on the line from x, y such that d lp (z i, z i+ δ 0. Further, since these points lie on a straight line we have d lp (x, y = µ d lp (z i, z i+ hen, let any metric d be used to measure distance in the range space of f, thus, we get d(f(x, f(y µ d(f(z i, f(z i+ µ Ld lp (z i, z i+ = Ld lp (x, y

3 Since in our case the defender mixed strategy space is compact and convex and q h (x satisfies the above lemma with L = 3 ˆK and δ 0 = 3/8 ˆK, q h (x is 3 ˆK-Lipschitz. Proof of heorem 4 PROOF. Coupled with the guarantee that with prob. δ, P r( q p (x q h (x, the assumptions guarantee that with prob. δ for the learned hypothesis h there must exist a x B(x, ɛ such that q p (x q h (x and there must exist x B( x, ɛ such that q p (x q h (x. First, for notational ease let γ denote. he following are immediate using triangle inequality, with the results q p (x q h (x γ and q p (x q h (x γ and the Lipschitzness assumptions q p (x q h (x Kɛ + γ (optx q p ( x q h (x 3 ˆKɛ + γ (opt x We call x Uq h ( x x Uq h (x as equation opth. hus, we bound the utility loss as following x Uq p (x x Uq p ( x = x Uq p (x x Uq h ( x + x Uq h ( x x Up(y/ x x Uq p (x x Uq h (x + x Uq h ( x x Up(y/ x using opth = (x x Uq p (x + x U(q p (x q h (x + x Uq h ( x x Uq p ( x ɛ + (Kɛ + γ + x Uq h ( x x Uq p ( x using x B(x, ɛ, optx = ((K + ɛ + γ + x U(q h ( x q h (x + x U(q h (x q p ( x (K + ɛ + γ + 6 ˆKɛ + γ using x B( x, ɛ with Lipschitz q h, opt x B. EXENSION O SACKELBERG GAMES Our technique extends to Stackelberg games by noting that the single resource case K = with targets gives xi. his directly maps to a probability distribution over actions. he x i s with x = xi is the probability of playing an action. With this set-up now the security game is a standard Stackelberg game, but where the leader has actions and follower has actions. hus, in order to capture the general Stakelberg game, for the adversary, we assume N actions for the adversary (instead of above. hen, similar to security games q,..., q N denotes the adversary s probability of playing an action. hus, the function h now outputs vectors of size N (instead of O(, i.e., A is a subset of N dimensional Euclidean space. he model of security game in the PAC framework extends as is to this Stackelberg setup, just with h(x and A being N dimensional. he rest of the analysis proceeds exactly as for security games for both parametric and non-parametric case, by replacing the corresponding to the adversary s action space by N. Since, the proof technique is exactly same, we just state the final results. hus, for a Stackelberg game with leader actions and N follower actions, the bound for heorem becomes 576M α 8C(α/96N, H, d l δ It can be seen from the proof for the parametric part that the sample complexity does not depend on the dimensionality of X, but only on the dimensionality of A. Hence, the sample complexity results from generalized SUQR parametric case is O ( α ( δ + N N α and for the non-parametric case, which depends on both dimensionality of X and, the sample complexity is O ( α ( δ + N + α C. ANALYSIS OF SANDARD SUQR FORM For SUQR the rewards and penalties are given and fixed. Let the rewards be given and fixed r = r,..., r (each r i [0, r max], r max > 0, and the penalty values are p = p,..., p (each p i [0, p min], p min < 0. hus, the output of h is h(x = w x + w r + w 3p,..., w x + w r + w 3p where r i = r i r and same for p i. Note that in the above formulation all the component functions h i(x have same weights. We can consider the function space H as the following direct-sum semi-free product G F E = { g + f + e,..., g + f + e g,..., g G, f,..., f F, e,..., e E}, where each of G, F, E is defined below. G = { g,..., g g,..., g ig i, all g i have same weight} where G i has functions of the form wx i. F = { f,..., f f,..., f if i, all f i have same weight} where F i has constant valued functions of the form wr i. E = { e,..., e e,..., e ie i, all e i have same weight} where E i has constant valued functions of the form wp i. Consider an ɛ/3-cover U e for E, an ɛ/3-cover U f for F and ɛ/3-cover U g for G. We claim that U e U f U g is an ɛ-cover for E F G. hus, the size of the ɛ-cover for E F G is bounded by U e U f U g. hus, N (ɛ, H, d l N (ɛ/3, G, d l N (ɛ/3, F, d l N (ɛ/3, E, d l aking sup over P we get C(ɛ, H, d l C(ɛ/3, G, d l C(ɛ/3, F, d l C(ɛ/3, E, d l Now, we show that U e U f U g is an ɛ-cover for H = E F G Fix any h H = E F G. hen, h = e + f + g for some e E, f F, g G. Let e U e be ɛ/3 close to e, f U f be ɛ/3 close to f and g U g be ɛ/3 close to g. hen, d L (P,d l (h, h k = d l (h i(x, h i(x dp (x X k k d l (g i(x, g i(x X k +d l (f i(x, f i(x + d l (e i(x, e i(x dp (x = d L (P,d l (g, g + d L (P,d l (f, f + d L (P,d l (e, e ɛ Similar to Lemma, it is possible to show that for any probability distribution P, for any function g, g d l (g, g w w

4 and f, f d l (f, f w w r max and e, e d l (e, e w w p min. Assume each of the functions have a range [ M/6, M/6] (this does not affect the order in terms of M. Given, these ranges w for g can take values in [ M/6, M/6], w for g can take values in [ M/6r max, M/6r max] and w for g can take values in [ M/6 p min, M/6 p min ]. o get a capacity of ɛ/3 it is enough to divide the respective w range into intervals of ɛ/3, and consider the boundaries. his yields an ɛ/3-capacity of M/ɛ, M/ɛr max and M/ɛ p min for G, F and E respectively. hus, C(ɛ, H, d l (M/ɛ 3 r max p min Plugging this in sample complexity from heorem we get the results that the sample complexity is O ( α ( δ + α D. ALERNAE PROOF FOR GENERAL- IZED SUQR SAMPLE COMPLEXIY As discussed in the main paper we use the function space H with each component function space H i given by w ix i + c i. hen, we can directly use Equation. We still need to bound C(ɛ, H i, d l. For this, we note the set of functions w ix i + c i has two free parameters w i and c i, thus, this function space is a subset of the vector space of functions of dimension two (two values needs to represent each function. Using the pseudo-dimension technique [4] we know that for psuedo-dimension d of function space H i we get more samples are added. o further show its potential, we modified the true adversary model of generating attacks from SUQR to the following: q i e w x i +c i, i.e., instead of x i, the adversary reasons based on x i. We considered the same true weight vector to simulate attacks. hen, we observe in Figs. (g (for payoff structure and (h (for payoff structure data, that α approaches a value closer to zero for 500 or more sample. Also, the NPL model performs better than the parametric model with 500 or more samples. his shows that the NPL approach is more accurate when the true adversary does not satisfy the simple parametric istic form, indicating that when we don t know the true function of the adversary s decision making process, adopting a non-parametric method to learn the adversary s behavior is more effective. C(ɛ, H i, d l ( em ɛ em ɛ d Also, we know [4] that pseudo-dimension is equal to the vector space dimension if the function class is a subset of a vector space. herefore, for our case d =. herefore, using Equation we get C(ɛ, H, d l ( em ɛ em ɛ Plugging this result in heorem we get the sample complexity of ( ( ( O ( α δ + ( α α E. EXPERIMENAL RESULS Here we provide additional experimental results on the Uganda, AM and simulated datasets. he AM dataset consisted of 3 unique mixed strategies, 6 of which were deployed for one payoff structure and the remaining 6 for another. In the main paper, we provided results on AM data for payoff structure. Here, in Figs. (a and (b, we show results on the AM data for both the parametric (SUQR and NPL learning settings on payoff structure. For running experiments on simulated data, we used the same mixed strategies and features as for the AM data, but simulated attacks, first using the actual SUQR model and then using a modified form of the SUQR model. Figs. (c and (d show results on simulated data on payoff structures and for the parametric cases, when the data is generated by an adversary with an SUQR model with true weight vector reported in Nguyen et. al [6] ((w, w, w 3 = ( 9.85, 0.37, 0.5 (c i = w R i + w 3P i. Similar results for the NPL model are shown in Figs. (e and (f respectively. We can see that the NPL approach performs poorly with only one or five samples as expectied but improves significantly as

5 (a AM Parametric Results Payoff (b AM Nonparametric Results Payoff (c Simulated Data Payoff - Parametric results (d Simulated Data Payoff - Parametric results (e Simulated Data Payoff - Nonparametric results (f Simulated Data Payoff - Nonparametric results (g Parametric vs Non-parametric results on Simulated (for various sample sizes data from payoff when the true adversary model is different from the parametric learned function (h Parametric vs Non-parametric results on Simulated (for various sample sizes data from payoff when the true adversary model is different from the parametric learned function Figure : Results on Uganda, AM and simulated datasets for the parametric and non-parametric cases respectively.

arxiv: v3 [cs.ai] 20 Nov 2015

arxiv: v3 [cs.ai] 20 Nov 2015 Learning Adversary Behavior in Security Games: A PAC Model Perspective Arunesh Sinha, Debarun Kar, Milind Tambe University of Southern California {aruneshs, dkar, tambe}@usc.edu arxiv:5.00043v3 [cs.ai]