Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games

Size: px
Start display at page:

Download "Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games"

Transcription

1 Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games Marc Lanctot Department of Computing Science University of Alberta Edmonton, Alberta, Canada 6G E8 Martin Zinkevich Yahoo! Research Santa Clara, CA, USA Kevin Waugh School of Computer Science Carnegie Mellon University Pittsburgh PA Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta, Canada 6G E8 1 Introduction he supplementary material presented here first presents a detailed description of the MCCFR algorithm. We then give proofs to heorems 3, 4, and 5 from the submission Monte Carlo Sampling for Regret Minimization in Extensive Games. We begin with some preliminaries, then prove a general result about all members of the MCCFR family of algorithms heorem 18 in Section 6). We then use that result to prove bounds for the MCCFR variants heorems 19 and 0 in Section 7). We finally prove the tightened bound for vanilla CFR heorem 1 in Section 8). MCCFR Algorithm he MCCFR algorithm is presented in detail in Algorithm 1. In Algorithm 1, the average strategy is updated optimistically by weighting the update to the average strategy equally for every iteration not seen since the last time the information set was visited. Note: this can be corrected by maintaining weights at each parent information set which get updated whenever they are visited, and pushing the values of the weights down as needed lazy updating). he average strategy can also be updated stochastically by weighting each update as the inverse of the probability of reaching the information set. he average strategy, σ is obtained by normalizing the values of the cumulative strategy tables s I for each action at each information set I. Although optimistic averaging is not technically a correct average it performs well empirically. We ve discussed two novel sampling schemes in this work: outcome-sampling and external sampling..1 Outcome Sampling When using outcome-sampling, we can do the updates for each player simultaneously on a single pass over the one sampled terminal history. When z[i]a is a prefix of z action a was taken at I in 1

2 Algorithm 1 Monte Carlo CFR with optimistic averaging Require: a sampling scheme S Initialize information set markers: I, c I 0 Initialize regret tables: I, r I [a] 0. Initialize cumulative strategy tables: I, s I [a] 0. Initialize initial profile: σi, a) 1/ AI) for t = {1,, 3, } do for i N do Sample a block of terminal histories Q Q using S for each prefix history z[i] of a terminal history z Q with P z[i]) = i do for a AI) do Let r = ri, a), the sampled counterfactual regret r I [a] r I [a] + r s I [a] s I [a] + t c I )π σ i σ ii, a) end for c I t σ i RegretMatchingr I ) end for end for end for our sampled history) then ri, a) = ṽ i σ t I a), I) ṽ iσ t, I) 1) = u iz)π i σ z[i])πσ z[i]a, z) π σ z) = u iz)π i σ z[i]) π σ z) u iz)π σ i z[i])πσ z[i], z) π σ z) ) π σ z[i]a, z) π σ z[i], z)) 3) = W π σ z[i]a, z) π σ z[i], z)) 4) where W = u iz)π σ i z[i]) π σ z) 5) When z[i]a is not a prefix of z, then ṽ i σi a) t, I) = 0, so ri, a) = 0 ṽ i σ t, I) 6) = W π σ z[i], z) 7). External Sampling When using external sampling, we update for each player separately one pass over the tree for each player). When updating I belonging to player i and z[i]a is a prefix of z, then π i σ z[i], z) = π i σ z[i]a, z) since a is taken by i, not the opponent. Also note that qz) = πσ i z). We have the regret:

3 ri, a) = ṽ i σi a) t, I) ṽ iσ t, I) 8) = u i z)πσ i z[i]) π σ z[i]a, z) π σ z[i], z)) 9) qz) z Q Z I = u i z)πσ i z[i])πσ i z[i], z) π σ z[i]a, z) π σ ) z[i], z) qz) π σ 10) z Q Z I i z[i], z) = u i z)πσ i z) πi σ z[i]a, z) πi σ z[i], z)) qz) z Q Z I 11) = u i z) πi σ z[i]a, z) πi σ z[i], z)) z Q Z I 1) In practice, ri, a) can be computed for all a AI) directly from Equation 1 by splitting the sum into two terms and recursively traversing the sub-tree under each action in I. 3 Preliminaries here are several basic properties of random variables and real numbers that are necessary to prove the main results. Lemma 1 For any random variable X: Pr[ X k E[X ]] 1 k. 13) Proof: Markov s Inequality states, if Y is always non-negative: Pr[Y je[y ]] 1 j. 14) By setting Y = X : Pr[X je[x ]] 1 j 15) Pr[ X je[x ]] 1 j. 16) Replacing k = j: Pr[ X k E[X ]] 1 j. 17) Lemma If a 1..., a n are non-negative real numbers in the interval [0, 1] where n i=1 a i = S, then n i=1 a i) S. Proof: Assume without loss of generality that n S.. Suppose that there are two elements a i,a j, where a i < 1 and a j < 1. If a i + a j 1, then: a i ) + a j ) a i ) + a i a j + a j ) 18) a i + a j ). 19) hus, it is better to have a i + a j, 0). If a i + a j > 1, then define A = a i + a j, and define fx) = A x) + x. Setting the derivative to zero: 0 = f x) 0) f x) = A x) + x 1) A = 4x ) A = x 3) 3

4 Upon further observation, f x) = 4, implying that A is a minimal point. herefore, since the critical points of fx) are A and the limits of the feasible region, namely A 1, and 1, then the limits of the feasible region must be the imal points. herefore, for any two a i and a j, either: 1. One or the other is zero, or:. One is equal to the other. herefore, there can be no more than one i such that a i 0, 1), all others must be equal to zero or one. Define i = S. Without loss of generality, assume for all i {1... i }, a i = 1, a i +1 = S S, and for all i {i +... n}, a i = 0. he result follows directly. Lemma 3 If a 1..., a n are non-negative real numbers where n i=1 a i = S, then n i=1 ai Sn. Proof: We prove this by induction on n. If n = 1, then the result is trivial. Otherwise, define x = n i=1 a i, so that a n + x = S, and therefore by induction n i=1 ai xn 1) + S x. Define fx) = xn 1) + S x. o imize fx), we observe that 0 and S are critical points, and we take the derivative and set it to zero: f x) = 0 4) f x) = 0.5n 1) 0.5 5) xn 1) S x 0.5 n 1 x = 0.5 S x 6) x n 1 = S x 7) x ) = S 8) n 1 ) n x = S 9) n 1 Sn 1) x = 30) n herefore, substituting the three critical points yields: f0) = S 31) fs) = Sn 1) 3) ) Sn 1) Sn 1)n 1) Sn 1) f = + S 33) n n n S S = n 1) n + 34) n he imum of these is Sn, establishing the inductive step. = Sn 35) Lemma 4 If b 1..., b n are non-negative real numbers where n i=1 b i = S, then n i=1 b i Sn. Proof: Let a i = b i and apply Lemma 3. Lemma 5 Given nonnegative reals a i,j in [0, 1], where m n i=1 j=1 a m,n = S, then: m n a m,n ) ms. 36) i=1 j=1 4

5 4 Blackwell s Approachability heorem Consider the following more sophisticated bound for the regret matching procedure using Blackwell s approachability. Lemma 6 For all real a, define a + = a, 0). For all a, b, it is the case that a + b) + ) a + ) + a + )b + b 37) Proof: We prove this by enumerating the possibilities: 1. a 0. hen a + = 0, so we have: a + b) + ) b + ) 38) b, 39) and: a + ) + a + )b + b = b. 40). a 0,b a. hen a = a + and a + b) + = a + b). So: a + b) + ) = a + b). 41) Also: a + ) + a + )b + b = a + ab + b 4) = a + b) 43) 3. a 0, b a. hen a = a +, and a + b) + = 0. So: a + b) + ) = 0. 44) Also: a + ) + a + )b + b = a + ab + b 45) = a + b) 46) 0 47) Define R + P, = R+ a). Regret matching is a strategy σ +1 where: R + a) if R P + σ +1 a) = R P +, > 0, 1 otherwise A 48) Lemma 7 If regret matching is used, then: R + a)r +1a) 0 49) 5

6 Proof: If R P +, 0, then for all a A, R + a) = 0, and the result is trivial. Otherwise: R + a)u +1a) u +1 σ t )) 50) R + a)r +1a) = ) = R + a)u +1a) u +1 σ t ) ) R + a) 51) ) ) = R + a)u +1a) σ +1 a )u +1 a ) R P +, 5) a A ) = R + a)u +1a) a A ) = R + a)u +1a) R + a )u +1 a ) a A ) R + a ) R P + u +1 a ) R P +, 53), ) 54) = 0 55) heorem 8 Define t to be a,a Au t a) u t a )). hen regret matching yields: R + a)) 1 A t ). 56) Proof: We prove this by recursion on. he base case for = 1) is obvious. Assuming this 1) holds for 1, we prove it holds for. Since R a) = R 1 a) + 1 r a), by Lemma 6: Summing yields: R + a)) R + a)) 1)R+ 1 a) 1 ) + 1 R + 1 a)r a) + r a) ) 57) ) ) R + 1 a)) + 1 R + 1 a)r a) + 1 r a)) By Lemma 7, R+ 1 a)r a) = 0, so: ) ) ) 1 R + a)) R a)) + r a)) By induction: Note that r a). So: 58) 59) R a)) 1) A t ). 60) R + a)) 1 1 ) A t ) + A ). 61) 5 Deterministic Strategies Before delving into the general proof, we need a few gory details involving deterministic strategies. 6

7 A deterministic strategy σ i : I i Ai) maps each information set I i I i to an action a AI i ). Define Σ i to be the set of deterministic strategies for i, and Σ = i N Σ i, and Σ i = j N \i Σ j. Define Ih) to be the information set I i I P h) containing h. Given a deterministic strategy profile σ, we can make it into a function from a history to the next action, defined as σh) = σ P h) Ih)). he terminal history hσ) is the unique h Z such that, for all t {0... h 1}, σht)) = h t+1. An information set I is reached with σ if for some h hσ), h I. In a game with perfect recall, define hσ, I) to be the unique h I where h hσ). If no deterministic strategy σ i of i allows I to be reached with σ i, σ i ), then I is unreachable with σ i. In a game with perfect recall, given σ i, each information set I i I i, given two deterministic strategies σ i and σ i, if σ i and σ i both reach I, then hσ i, σ i ), I) = hσ i, σ i ), I). herefore, if I is reachable with σ i we define hσ i, I) = hσ i, σ i ), I) for some σ i such that I is reached with σ i, σ i ). In general, for any set S N, I is reachable with σ S = {ˆσ i } i S if there exists a set σ N \S = {ˆσ i } i N \S such that I is reachable with σ S, σ N \S). Given a history h H, one can consider what would happen if σ was used to play h to termination. In particular, define hσ, h ) Z to be the unique history h Z such that h h and for all t { h... h 1}, σht)) = h t+1. hus, for all h H, we can define u i h, σ) = u i hσ, h )). Given a, σ i obliviously plays a if the strategy that plays the actions in a deterministically in sequence. In particular, for any information set I i I i, define ci i ) = X i h), the length of the sequence of information sets and actions reached by this player before this information set, for any h I i in a game with perfect recall, this is well-defined). herefore, σ i I i ) = a cii)+1, or is arbitrary if ci i ) + 1 is greater than the number of elements of a. Lemma 9 For any deterministic profile σ i, for any a, if I i I i a) is reachable with σ i, then it is reachable with σ i, σ i ), where σ i obliviously plays a. Proof: Since I i is reachable with σ i, then there exists some σ i such that I i is reachable with σ i, σ i ). By definition, the history hσ i, σ i ) has a prefix h I i. Define at) to be the first t elements of a, and define σi t to be the strategy that obliviously plays at), and arbitrary decisions are equal to σ i. We will prove by recursion on t that for all t a, hσ i, σi t) = hσ i, σ i ). First of all observe that σi 0 = σ i, so the basis of the recursion holds. For the inductive step, we assume that hσ i, σ t 1 i ) = hσ i, σ i ), and try to prove that hσ i, σi t) = hσ i, σ i ). Since h I i, then Xh ) = I 1, a 1 )... I k, a k )), and since I i I i a), then a = a 1,..., a k ). herefore, define h to be the prefix of h in I t. Note that σ t 1 i is in control for all I 1... I t 1, and then σ i selects a t in information set I t. However, since σ t would have also selected a t by definition, then hσ i, σi t) = hσ i, σ t 1 i ) = hσ i, σ i ). Note that changing later actions of σ i does not affect whether or not h is played, so that any arbitrary deterministic strategy which is a oblivious will work. Lemma 10 For any deterministic profile σ i, for any a, there is no more than one reachable I i I i a). Proof: Consider two information sets I i, I i I i a) where I i I i, and for the sake of contradiction, assume both are reachable with σ i. Given a σ i which is a-oblivious, then I i and I i are both reachable with σ i, σ i ). But there is only one history generated, hσ i, σ i ), and therefore there must exist h I i and h I i, both prefixes of hσ i, σ i ). But that implies that h h or vice-versa, meaning that in a perfect recall game, the sequence of prior information sets and actions of either I i or I i must include the other, an obvious contradiction to them both having action sequences of equal size. Lemma 11 For any strategy σ j Σ j, there exists a distribution ρ ˆΣ j ) such that for any h H, Pr [ I, a) X j h), ˆσ j I) = a] = π σj h). 6) ˆσ j ρ j 7

8 Proof: First, we define ρˆσ j ) to be: ρˆσ j ) = I I j σ j I)ˆσ j I)). 63) In other words, the probability of playing ˆσ j is the probability of playing like ˆσ j everywhere. Summing over all actions outside of X j h) gives the lemma. Lemma 1 Given a single player strategy σ i Σ i, and ρ generated as in Lemma 11, then: Pr [Reachˆσi i h)] = πσi i h). 64) ˆσ i ρ Lemma 13 Given a history h H, Reachˆσi i h) if and only if for all I, a) X ih), ˆσ i I) = a. Proof: First, if for all I, a) X i h), ˆσ i I) = a, then we can define ˆσ j such that for all I, a) X j h), ˆσ j I) = a, and for all I / X j h), set ˆσ j I) to be arbitrary. Note that for all h h, there exists an I, a) X P h )h) where h I and h h +1 = a. herefore, for all h h, ˆσ P h )I) = h h +1, implying that ˆσ reaches h. If Reachˆσi i h), then there exists a ˆσ i such that h hˆσ i, ˆσ i ). herefore, for all h h, sigmah ˆ ) = h h +1, and ˆσ P h )Ih )) = h h +1. For all I, a) X i h), I = Ih ) and a = h h +1 for some h h. Moreover, P h ) = i, implying that ˆσ i Ih )) = h h +1. Corollary 14 Given any h H, given i N, there exists a ˆσ i Σ i that reaches h. Proof: For any history h, it is easy to construct a strategy which satisfies Lemma 13. Lemma 15 Given a set of strategies ˆσ S = {ˆσ i } i S, if for all i S, Reachˆσi i h), then Reachˆσ S S h). Proof: By Corollary 14, for every i N \S, there exists a strategy ˆσ i that reaches h. By Lemma 13, for all i N, for all I, a) X i h), ˆσ i I) = a. Moreover, this implies that these strategies reconstruct h. Lemma 16 For any strategy profile σ i Σ i, there exists a distribution ρ Σ i ) such that for all I I i, π σ i i I) = ˆσ i b Σ i:reachˆσ i,i) ρˆσ i). Proof: For all j N \i, using Lemma 11, we generate a strategy ρ j Σ j ). Define ρ to be the distribution over Σ i ) obtained by independently sampling each ˆσ j by ρ j ; formally, ρˆσ i ) = ρ j ˆσ j ). 65) j N \i Consider a history h H. By Lemma 1, π σj j h) = Prˆσ [Reachˆσj j ρ j j h)]. Since the strategies are selected independently: Pr [ j N \i, Reachˆσj j h)] = Pr [Reachˆσj j h)] 66) ˆσ i ρ ˆσ j ρ j If we sum over all h I, we get the result. j N \i = j N \i πˆσj h) 67) = πˆσ i h) 68) Lemma 17 For any strategy profile σ i, for any a: π σ i i I) 1 69) I I i a) Proof: his follows directly from Lemma 16 and Lemma 10. 8

9 6 General MCCFR Bound We begin by proving a very general bound applicable to all algorithms in the MCCFR family. First, define B i = {I i a) : a A i }, so M = B B i B. heorem 18 For any p 0, 1], when using any algorithm in the MCCFR family such that for all Q Q and B B, πσ z[i], z)πσ i z[i]) 1 qz) δ 70) z Q Z I where δ 1, then with with probability at least 1 p, average overall regret is bounded by, Ri 1 + ) ) 1 u,i M i Ai. 71) p δ Proof: Define ri ti, a) to be the unsampled immediate counterfactual regret and rt i I, a) to be the sampled immediate counterfactual regret. Formally, ) rii, t a) = v i σi a) t, I) v iσ t, I) 7) ) r ii, t a) = ṽ i σi a) t, I) ṽ iσ t, I) 73) R i I) = 1 R i I) = 1 I) I) rii, t a) 74) r ii, t a) 75) Let Q t Q be the block sampled at time t. Note that we can bound the difference between two sampled counterfactual values for information set I at time t by, ) ṽ i σi a) t, I) ṽ iσ t, I) t u,ii) πσ z[i], z)πσ i z[i]) u,i 76) qz) z Q t Z I so by our assumption, So we can apply heorem 8, to get, Using Lemma 4, R i I) t u,ii) u,i δ 77) AI) t u,i I)) B AB) R i I) t u,i I)) B AB) t u,i I)) B AB) u,i /δ u,i B AB) δ 78) 79) 80) 81) 8) 9

10 he average overall sampled regret then can be bounded by, R i I) 83) R i B B i u,i B AB) B B i δ u,i Ai B B i B δ u,im i Ai δ We now need to prove that R and R are similar. his last portion is tricky. Since the algorithm is randomized, we cannot guarantee that every information set is reached, let alone that it has converged. herefore, instead of proving a bound on the absolute difference of R and R, we focus on proving a probabilistic connection. In particular, we will bound the expected squared difference between I I i Ri I) and I I R i i I) in order to prove that they are close, and then use Lemma 1 to bound the absolute value. We begin by focusing on the similarity of the counterfactual regret Ri I) and R i I)) in every node, by focusing on the similarity of the counterfactual regret of a particular action at a particular time ri ti, a) and rt i I, a)). By the Lemma from the main paper, we know that E[ri ti, a) rt i I, a)] = 0. From Lemma 4 we have, ) E Ri I) R i I)) I i [ E Ri I) R i I)) ] 87) I I i I Ii So, R i I) R i I)) = 1 R i I) R i I)) 1 R i I) R i I)) 1 I) I) I) rii, t a) 1 rii, t a) I) 84) 85) 86) r ii, a)) t 88) r ii, a))) t 89) r t i I, a) r t ii, a) )) 90) Note that if fx) is monotonically increasing on the non-negative numbers, then f a x a ) = a f x a ). R i I) R i I)) 1 R i I) R i I)) 1 E[R i I) R i I)) ] 1 I) I) rii, t a) rii, t a) I) r ii, a)) t 91) r ii, a)) t 9) E[ rii, t a) r ii, t a) ) ] 93) he final step is because if t t, then E[ri ti, a) rt i I, a))rt i I, a) rt i I, a))] = 0, because if t > t, then after time t, r t I, a) is an unbiased estimator of ri t I, a) and vice-versa). Substituting 10

11 back into Equation 87: ] [ I Ii R i I) R i I))) E I i I i I I i I) B B i B) [ r t E i I, a) r ii, t a) ) ] 94) [ r t E i I, a) r ii, t a) ) ] 95) By Equation 7, ri ti, a) u,iπ i σt I). From Equation 76, rt i I, a) t u,i I). hus, E [ r t ii, a) r t ii, a)) ] E [ r t ii, a)) + r t ii, a)) ] 96) Note that for all B B, by Lemma 17: u,iπ ii) σt u,iπ ii) σt u,i Along with Equation 77, and the fact that δ 1 this means, Returning to Equation 95, ) E Ri I) R i I)) I i I I i hus by Lemma 1, with probability at least 1 p, u,iπ σt ii) + t u,ii) 97) π σt ii) u,i 98) E [ r t ii, a) r t ii, a)) ] u,i + u,i δ 99) u,i δ 100) I i u,i δ B B i B) u,i δ 101) B I i AB) 10) Since M I i B i, R i R i Ii B i A i u,i 1 + δ p + u,im A i δ 103) ) 1 ) u,i M A i 104) p δ 7 Specific MCCFR Variants We can now apply heorem 18 to prove a regret bound for outcome-sampling and external-sampling. 7.1 Outcome-Sampling heorem 19 For any p 0, 1], when using outcome-sampling MCCFR where z Z either π i σ z) = 0 or qz) δ > 0 at every timestep, with probability 1 p, average overall regret is bounded by ) 1 ) Ri u,i M i Ai ) p δ 11

12 Proof: We simply need to show that, πσ z[i], z)πσ i z[i]) qz) z Q Z I 1 δ. 106) Note that for all Q Q, Q = 1. Also note that for any B B i there is at most one I B such that Q Z I. his is because all the information sets in Q Z I all have player i s action sequence of a different length, while all information sets in B have player i s action sequence being the same length. herefore, only a single term of the inner sum is ever non-zero. Now by our assumption, for all I and z Z I where π i σ z) > 0, π σ z[i], z)π σ i z[i]) qz) 1 δ 107) as all the terms of the numerator are less than 1. So the one non-zero term is bounded by 1/δ and so the overall sum of squares must be bounded by 1/δ. 7. External-Sampling heorem 0 For any p 0, 1], when using external-sampling MCCFR, with probability at least 1 p, average overall regret is bounded by ) u,i M i Ai ) p R i Proof: We will simply show that, πσ z[i], z)πσ i z[i]) qz) z Q Z I Since qz) = π i σ z), we need to show, πi σ z[i], z) z Q Z I 1 109) 1 110) Let ˆσ t be a deterministic strategy profile sampled from σ t where Q is the set of histories consistent with ˆσ t i. So Q Z I if and only if I is reachable with ˆσ t i. By Lemma 10, for all B B i there is only one I B that is reachable; name it I. Moreover, there is a unique history in I that is a prefix of all z Q Z I ; name it h. So for all z Q Z I, z[i ] = h. his is because ˆσ t t i uniquely specifies the actions for all but player i and B uniquely specifies the actions for player i prior to reaching I. Define ρ to be a strategy for all players including chance) where ρ j i = ˆσ j but ρ i = σ i. Consider a z Q Z I. z must be reachable by ˆσ i, so π ρ i z) = 1. So πi σ z[i ], z) = π ρ i h, z) 111) So, z Q Z I z Q Z I = z Q Z I z Z I πi σ z[i], z) z Q Z I π ρ h, z) 11) π ρ h, z) 1 113) 1 114) 1

13 8 Vanilla CFR: A ighter Bound In the final proof we use some of the same ideas of the previous proofs to tighten the original bound of vanilla CFR, so the bound depends on M i rather than I i as with the MCCFR variants. heorem 1 When using vanilla CFR for player i, R i u,i M i Ai /. Proof: Define t u,i I) = σt i I) u,ii). Using heorem 8, R,+ i I)) AI) R,+ i I) t u,ii)) 115) AI) u,i I) σ i t I)). 116) By summing over all information sets of I, we get: R,+ i 1 AI) u,i I) σ i t I)) 117) I I i Ai u,i σ i t I)) 118) I I i Ai u,i σ i t I)). 119) For each action sequence B B i : herefore, by Lemma 5: Summing over all B B i yields: R,+ i B B i σ ii) t 1 10) σ ii) t 11) σ t i I) B 1) Ai u,i B B i B. 13) In practice, this makes the bound on vanilla counterfactual regret as tight as the sampling bounds. he distinctive difference is the amount of computation required per iteration. 13

2 Background. Definition 1 [6, p. 200] a finite extensive game with imperfect information has the following components:

2 Background. Definition 1 [6, p. 200] a finite extensive game with imperfect information has the following components: Monte Carlo Sampling for Regret Minimization in Extensive Games Marc Lanctot variant that samples chance outcomes on each iteration [4]. They claim that the per-iteration cost reduction far exceeds the

More information

Generalized Sampling and Variance in Counterfactual Regret Minimization

Generalized Sampling and Variance in Counterfactual Regret Minimization Generalized Sampling and Variance in Counterfactual Regret Minimization Richard Gison and Marc Lanctot and Neil Burch and Duane Szafron and Michael Bowling Department of Computing Science, University of

More information

arxiv: v2 [cs.gt] 23 Jan 2019

arxiv: v2 [cs.gt] 23 Jan 2019 Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games Vojtěch Kovařík, Viliam Lisý vojta.kovarik@gmail.com, viliam.lisy@agents.fel.cvut.cz arxiv:1804.09045v2 [cs.gt]

More information

Dynamic Thresholding and Pruning for Regret Minimization

Dynamic Thresholding and Pruning for Regret Minimization Dynamic Thresholding and Pruning for Regret Minimization Noam Brown and Christian Kroer and Tuomas Sandholm Carnegie Mellon University, Computer Science Department, noamb@cmu.edu, ckroer@cs.cmu.edu, sandholm@cs.cmu.edu

More information

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6 Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall

More information

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning Noam Brown 1 uomas Sandholm 1 Abstract Iterative algorithms such as Counterfactual Regret Minimization CFR) are the most

More information

University of Alberta. Richard Gibson. Doctor of Philosophy. Department of Computing Science

University of Alberta. Richard Gibson. Doctor of Philosophy. Department of Computing Science University of Alberta REGRET MINIMIZATION IN GAMES AND THE DEVELOPMENT OF CHAMPION MULTIPLAYER COMPUTER POKER-PLAYING AGENTS by Richard Gibson A thesis submitted to the Faculty of Graduate Studies and

More information

arxiv: v1 [cs.gt] 1 Sep 2015

arxiv: v1 [cs.gt] 1 Sep 2015 HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com

More information

University of Alberta. Marc Lanctot. Doctor of Philosophy. Department of Computing Science

University of Alberta. Marc Lanctot. Doctor of Philosophy. Department of Computing Science Computers are incredibly fast, accurate and stupid. Human beings are incredibly slow, inaccurate and brilliant. Together they are powerful beyond imagination. Albert Einstein University of Alberta MONTE

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES Donald E. KNUTH Department of Computer Science. Standford University. Stanford2 CA 94305. USA Christos H. PAPADIMITRIOU Department of Computer Scrence and

More information

HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS

HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS Colin Cooper School of Mathematical Sciences, Polytechnic of North London, London, U.K. and Alan Frieze and Michael Molloy Department of Mathematics, Carnegie-Mellon

More information

LOWER BOUNDS FOR THE MAXIMUM NUMBER OF SOLUTIONS GENERATED BY THE SIMPLEX METHOD

LOWER BOUNDS FOR THE MAXIMUM NUMBER OF SOLUTIONS GENERATED BY THE SIMPLEX METHOD Journal of the Operations Research Society of Japan Vol 54, No 4, December 2011, pp 191 200 c The Operations Research Society of Japan LOWER BOUNDS FOR THE MAXIMUM NUMBER OF SOLUTIONS GENERATED BY THE

More information

Solving Heads-up Limit Texas Hold em

Solving Heads-up Limit Texas Hold em Solving Heads-up Limit Texas Hold em Oskari Tammelin, 1 Neil Burch, 2 Michael Johanson 2 and Michael Bowling 2 1 http://jeskola.net, ot@iki.fi 2 Department of Computing Science, University of Alberta {nburch,johanson,mbowling}@ualberta.ca

More information

Chapter 3 Deterministic planning

Chapter 3 Deterministic planning Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Lecture notes for the course Games on Graphs B. Srivathsan Chennai Mathematical Institute, India 1 Markov Chains We will define Markov chains in a manner that will be useful to

More information

Exact and Approximate Equilibria for Optimal Group Network Formation

Exact and Approximate Equilibria for Optimal Group Network Formation Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich and Bugra Caskurlu Computer Science Department, RPI, 110 8th Street, Troy, NY 12180 {eanshel,caskub}@cs.rpi.edu Abstract.

More information

Deterministic Decentralized Search in Random Graphs

Deterministic Decentralized Search in Random Graphs Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.

More information

Expectation of geometric distribution

Expectation of geometric distribution Expectation of geometric distribution What is the probability that X is finite? Can now compute E(X): Σ k=1f X (k) = Σ k=1(1 p) k 1 p = pσ j=0(1 p) j = p 1 1 (1 p) = 1 E(X) = Σ k=1k (1 p) k 1 p = p [ Σ

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples Expectation of geometric distribution Variance and Standard Deviation What is the probability that X is finite? Can now compute E(X): Σ k=f X (k) = Σ k=( p) k p = pσ j=0( p) j = p ( p) = E(X) = Σ k=k (

More information

Strongly Consistent Self-Confirming Equilibrium

Strongly Consistent Self-Confirming Equilibrium Strongly Consistent Self-Confirming Equilibrium YUICHIRO KAMADA 1 Department of Economics, Harvard University, Cambridge, MA 02138 Abstract Fudenberg and Levine (1993a) introduce the notion of self-confirming

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Exact and Approximate Equilibria for Optimal Group Network Formation

Exact and Approximate Equilibria for Optimal Group Network Formation Noname manuscript No. will be inserted by the editor) Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich Bugra Caskurlu Received: December 2009 / Accepted: Abstract

More information

Sprague-Grundy Values of the R-Wythoff Game

Sprague-Grundy Values of the R-Wythoff Game Sprague-Grundy Values of the R-Wythoff Game Albert Gu Department of Mathematics Carnegie Mellon University Pittsburgh, PA 15213, U.S.A agu@andrew.cmu.edu Submitted: Aug 6, 2014; Accepted: Apr 10, 2015;

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria Basic Game Theory University of Waterloo January 7, 2013 Outline 1 2 3 What is game theory? The study of games! Bluffing in poker What move to make in chess How to play Rock-Scissors-Paper Also study of

More information

CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2)

CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2) CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2) Tim Roughgarden February 2, 2016 1 Recap This is our third lecture on linear programming, and the second on linear programming

More information

Discrete Mathematics. On Nash equilibria and improvement cycles in pure positional strategies for Chess-like and Backgammon-like n-person games

Discrete Mathematics. On Nash equilibria and improvement cycles in pure positional strategies for Chess-like and Backgammon-like n-person games Discrete Mathematics 312 (2012) 772 788 Contents lists available at SciVerse ScienceDirect Discrete Mathematics journal homepage: www.elsevier.com/locate/disc On Nash equilibria and improvement cycles

More information

Numerical Sequences and Series

Numerical Sequences and Series Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is

More information

Notes on induction proofs and recursive definitions

Notes on induction proofs and recursive definitions Notes on induction proofs and recursive definitions James Aspnes December 13, 2010 1 Simple induction Most of the proof techniques we ve talked about so far are only really useful for proving a property

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

New Bounds for the Joint Replenishment Problem: Tighter, but not always better

New Bounds for the Joint Replenishment Problem: Tighter, but not always better New Bounds for the Joint Replenishment Problem: ighter, but not always better Eric Porras, Rommert Dekker Econometric Institute, inbergen Institute, Erasmus University Rotterdam, P.O. Box 738, 3000 DR

More information

The Complexity of Ergodic Mean-payoff Games,

The Complexity of Ergodic Mean-payoff Games, The Complexity of Ergodic Mean-payoff Games, Krishnendu Chatterjee Rasmus Ibsen-Jensen Abstract We study two-player (zero-sum) concurrent mean-payoff games played on a finite-state graph. We focus on the

More information

J. Flesch, J. Kuipers, G. Schoenmakers, K. Vrieze. Subgame-Perfection in Free Transition Games RM/11/047

J. Flesch, J. Kuipers, G. Schoenmakers, K. Vrieze. Subgame-Perfection in Free Transition Games RM/11/047 J. Flesch, J. Kuipers, G. Schoenmakers, K. Vrieze Subgame-Perfection in Free Transition Games RM/11/047 Subgame-Perfection in Free Transition Games J. Flesch, J. Kuipers, G. Schoenmakers, K. Vrieze October

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

Using Regret Estimation to Solve Games Compactly. Dustin Morrill

Using Regret Estimation to Solve Games Compactly. Dustin Morrill Using Regret Estimation to Solve Games Compactly by Dustin Morrill A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computing Science University

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

arxiv: v2 [cs.ds] 27 Nov 2014

arxiv: v2 [cs.ds] 27 Nov 2014 Single machine scheduling problems with uncertain parameters and the OWA criterion arxiv:1405.5371v2 [cs.ds] 27 Nov 2014 Adam Kasperski Institute of Industrial Engineering and Management, Wroc law University

More information

HW11. Due: December 6, 2018

HW11. Due: December 6, 2018 CSCI 1010 Theory of Computation HW11 Due: December 6, 2018 Attach a fully filled-in cover sheet to the front of your printed homework. Your name should not appear anywhere; the cover sheet and each individual

More information

Shortest paths with negative lengths

Shortest paths with negative lengths Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no

More information

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003 CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming

More information

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

On Acyclicity of Games with Cycles

On Acyclicity of Games with Cycles On Acyclicity of Games with Cycles Daniel Andersson, Vladimir Gurvich, and Thomas Dueholm Hansen Dept. of Computer Science, Aarhus University, {koda,tdh}@cs.au.dk RUTCOR, Rutgers University, gurvich@rutcor.rutgers.edu

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Partitioning Metric Spaces

Partitioning Metric Spaces Partitioning Metric Spaces Computational and Metric Geometry Instructor: Yury Makarychev 1 Multiway Cut Problem 1.1 Preliminaries Definition 1.1. We are given a graph G = (V, E) and a set of terminals

More information

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems Counting and Sampling Fall 2017 Lecture 17: D-Stable Polynomials and Lee-Yang Theorems Lecturer: Shayan Oveis Gharan November 29th Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Nash-solvable bidirected cyclic two-person game forms

Nash-solvable bidirected cyclic two-person game forms DIMACS Technical Report 2008-13 November 2008 Nash-solvable bidirected cyclic two-person game forms by Endre Boros 1 RUTCOR, Rutgers University 640 Bartholomew Road, Piscataway NJ 08854-8003 boros@rutcor.rutgers.edu

More information

The Las-Vegas Processor Identity Problem (How and When to Be Unique)

The Las-Vegas Processor Identity Problem (How and When to Be Unique) The Las-Vegas Processor Identity Problem (How and When to Be Unique) Shay Kutten Department of Industrial Engineering The Technion kutten@ie.technion.ac.il Rafail Ostrovsky Bellcore rafail@bellcore.com

More information

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9 CSE 55 Randomized Algorithms & Probabilistic Analysis Spring 01 Lecture : April 9 Lecturer: Anna Karlin Scribe: Tyler Rigsby & John MacKinnon.1 Kinds of randomization in algorithms So far in our discussion

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

The Lovász Local Lemma : A constructive proof

The Lovász Local Lemma : A constructive proof The Lovász Local Lemma : A constructive proof Andrew Li 19 May 2016 Abstract The Lovász Local Lemma is a tool used to non-constructively prove existence of combinatorial objects meeting a certain conditions.

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Endre Boros b Khaled Elbassioni d

Endre Boros b Khaled Elbassioni d R u t c o r Research R e p o r t On Nash equilibria and improvement cycles in pure positional strategies for Chess-like and Backgammon-like n-person games a Endre Boros b Khaled Elbassioni d Vladimir Gurvich

More information

CMPUT 675: Approximation Algorithms Fall 2014

CMPUT 675: Approximation Algorithms Fall 2014 CMPUT 675: Approximation Algorithms Fall 204 Lecture 25 (Nov 3 & 5): Group Steiner Tree Lecturer: Zachary Friggstad Scribe: Zachary Friggstad 25. Group Steiner Tree In this problem, we are given a graph

More information

Efficient Approximation for Restricted Biclique Cover Problems

Efficient Approximation for Restricted Biclique Cover Problems algorithms Article Efficient Approximation for Restricted Biclique Cover Problems Alessandro Epasto 1, *, and Eli Upfal 2 ID 1 Google Research, New York, NY 10011, USA 2 Department of Computer Science,

More information

MTAT Complexity Theory October 20th-21st, Lecture 7

MTAT Complexity Theory October 20th-21st, Lecture 7 MTAT.07.004 Complexity Theory October 20th-21st, 2011 Lecturer: Peeter Laud Lecture 7 Scribe(s): Riivo Talviste Polynomial hierarchy 1 Turing reducibility From the algorithmics course, we know the notion

More information

Lecture #5. Dependencies along the genome

Lecture #5. Dependencies along the genome Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker

A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker Duane Szafron University of Alberta Edmonton, Alberta dszafron@ualberta.ca Richard Gibson University of Alberta Edmonton, Alberta

More information

LOWER BOUNDS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM WITH USER PREFERENCES. 1 Introduction

LOWER BOUNDS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM WITH USER PREFERENCES. 1 Introduction LOWER BOUNDS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM WITH USER PREFERENCES PIERRE HANSEN, YURI KOCHETOV 2, NENAD MLADENOVIĆ,3 GERAD and Department of Quantitative Methods in Management, HEC Montréal,

More information

Supplementary lecture notes on linear programming. We will present an algorithm to solve linear programs of the form. maximize.

Supplementary lecture notes on linear programming. We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2016 Supplementary lecture notes on linear programming CS 6820: Algorithms 26 Sep 28 Sep 1 The Simplex Method We will present an algorithm to solve linear programs of the form

More information

Decision making, Markov decision processes

Decision making, Markov decision processes Decision making, Markov decision processes Solved tasks Collected by: Jiří Kléma, klema@fel.cvut.cz Spring 2017 The main goal: The text presents solved tasks to support labs in the A4B33ZUI course. 1 Simple

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

Experts in a Markov Decision Process

Experts in a Markov Decision Process University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2004 Experts in a Markov Decision Process Eyal Even-Dar Sham Kakade University of Pennsylvania Yishay Mansour Follow

More information

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010 Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 010 Peter Bro Miltersen January 16, 011 Version 1. 4 Finite Perfect Information Games Definition

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the

More information

Linear Programming Methods

Linear Programming Methods Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution

More information

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Theoretical and Practical Advances on Smoothing for Extensive-Form Games Theoretical and Practical Advances on Smoothing for Extensive-Form Games CHRISTIAN KROER, Carnegie Mellon University KEVIN WAUGH, University of Alberta FATMA KILINÇ-KARZAN, Carnegie Mellon University TUOMAS

More information

0.1 Uniform integrability

0.1 Uniform integrability Copyright c 2009 by Karl Sigman 0.1 Uniform integrability Given a sequence of rvs {X n } for which it is known apriori that X n X, n, wp1. for some r.v. X, it is of great importance in many applications

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

A RANDOM VARIANT OF THE GAME OF PLATES AND OLIVES

A RANDOM VARIANT OF THE GAME OF PLATES AND OLIVES A RANDOM VARIANT OF THE GAME OF PLATES AND OLIVES ANDRZEJ DUDEK, SEAN ENGLISH, AND ALAN FRIEZE Abstract The game of plates and olives was originally formulated by Nicolaescu and encodes the evolution of

More information

A note on monotone real circuits

A note on monotone real circuits A note on monotone real circuits Pavel Hrubeš and Pavel Pudlák March 14, 2017 Abstract We show that if a Boolean function f : {0, 1} n {0, 1} can be computed by a monotone real circuit of size s using

More information

Vertex colorings of graphs without short odd cycles

Vertex colorings of graphs without short odd cycles Vertex colorings of graphs without short odd cycles Andrzej Dudek and Reshma Ramadurai Department of Mathematical Sciences Carnegie Mellon University Pittsburgh, PA 1513, USA {adudek,rramadur}@andrew.cmu.edu

More information

R u t c o r Research R e p o r t. On Acyclicity of Games with Cycles. Daniel Andersson a Vladimir Gurvich b Thomas Dueholm Hansen c

R u t c o r Research R e p o r t. On Acyclicity of Games with Cycles. Daniel Andersson a Vladimir Gurvich b Thomas Dueholm Hansen c R u t c o r Research R e p o r t On Acyclicity of Games with Cycles Daniel Andersson a Vladimir Gurvich b Thomas Dueholm Hansen c RRR 8-8, November 8 RUTCOR Rutgers Center for Operations Research Rutgers

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

ACO Comprehensive Exam October 14 and 15, 2013

ACO Comprehensive Exam October 14 and 15, 2013 1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

Space and Nondeterminism

Space and Nondeterminism CS 221 Computational Complexity, Lecture 5 Feb 6, 2018 Space and Nondeterminism Instructor: Madhu Sudan 1 Scribe: Yong Wook Kwon Topic Overview Today we ll talk about space and non-determinism. For some

More information

6 The Principle of Optimality

6 The Principle of Optimality 6 The Principle of Optimality De nition A T shot deviation from a strategy s i is a strategy bs i such that there exists T such that bs i (h t ) = s i (h t ) for all h t 2 H with t T De nition 2 A one-shot

More information

For any y 2C, any sub-gradient, v, of h at prox(x), i.e., v and by optimality of prox(x) in (3), we have

For any y 2C, any sub-gradient, v, of h at prox(x), i.e., v and by optimality of prox(x) in (3), we have A Proofs We now give the details for the proof of our main results, i.e., heorems and 2. Below, we outline the steps for the proof of FAG s heorem. he proof of heorem 2 for FARE follows the same line of

More information

Upper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions

Upper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions Upper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions Matthew J. Streeter Computer Science Department and Center for the Neural Basis of Cognition Carnegie Mellon University

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Bipartite Perfect Matching

Bipartite Perfect Matching Bipartite Perfect Matching We are given a bipartite graph G = (U, V, E). U = {u 1, u 2,..., u n }. V = {v 1, v 2,..., v n }. E U V. We are asked if there is a perfect matching. A permutation π of {1, 2,...,

More information

Self-stabilizing uncoupled dynamics

Self-stabilizing uncoupled dynamics Self-stabilizing uncoupled dynamics Aaron D. Jaggard 1, Neil Lutz 2, Michael Schapira 3, and Rebecca N. Wright 4 1 U.S. Naval Research Laboratory, Washington, DC 20375, USA. aaron.jaggard@nrl.navy.mil

More information

Proving simple set properties...

Proving simple set properties... Proving simple set properties... Part 1: Some examples of proofs over sets Fall 2013 Proving simple set properties... Fall 2013 1 / 17 Introduction Overview: Learning outcomes In this session we will...

More information