Prediction and Playing Games

Size: px

Start display at page:

Download "Prediction and Playing Games"

Beryl Short
5 years ago
Views:

1 Prediction and Playing Games Vineel Pratap February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi

2 K-Person Normal Form Games Each player k (k =,..., K) has N k possible actions - i k,..., N K K-tuple of all the players actions i = (i,..., Ii K ) K k= {,..., N k} Loss suffered by player k is l (k), where l (k) : i [0, ]

3 Mixed strategy Mixed strategy for player k is a probability distribution p k = (p (k),..., p(k) N k ) Action played by player k, I (k) is a random variable taking values in the set {,..., N k } according to the distribution p (k) K-tuple of actions played by all players, I = (I (),..., I (K) ) Mixed strategy profile, π(i) = P[I = i] = p () i for all i = (i,...i k ) K k= {,..., N k} x... x p (K) i K

4 Nash Equilibrium The expected loss of player is πl (k) E l (k) (I) N =... i = N K i K = p () i x... x p (K) i K l (k) (i,..., i K ) A mixed strategy profile π = p () x... x p (K) is called a Nash Equilibrium if π l k π k lk for all k =,..., K and mixed strategies q (k), π k = p() x... x q (k) x... x p (K) Nash Theorem : Every finite game has a mixed strategy Nash equilibrium

5 Two-Person Zero-Sum Games For each pair of actions i = (i, i 2 ), where i {,..., N } and i 2 {,..., N 2 }, the losses of the two players satisfy l () (i) = l (2) (i) Simplifying notation - replace l (), N, N 2 by l, N, M Mixed strategy profile π = p x q, where p = (p,..., p N ) and q = (q,..., q M ), is a Nash equilibrium if and only if for all p = (p,..., p N ) and q = (,..., q M ), N M p i q j l(i, j) N i= j= M p i q j l(i, j) i= j= N i= j= M p i q j l(i, j)

6 Two-Person Zero-Sum Games Introduce notation l(p, q) = N max p max max M i= j= p i q j l(i, j) l(p, ) = l(p, q) = l(p, q) p l(p, ) max l(p, ) p l(p, ) max Also, for all p and, l(p, ) p l(p, ) max p p p l(p, ) () l(p, ) max l(p, ) for all p max l(p, ) max p l(p, ) (2)

7 von Neumann s imax theorem From () & (2), the existence of Nash equilibrium p x q implies that max l(p, ) = max l(p, ) p p The common value is called the value of the game, V Nash equilibrium, p x q l(p, q) = V As far as I can see, there could be no theory of games... without that theorem... I thought there was nothing worth publishing until the Minimax Theorem was proved - John von Neumann, 928

8 Correlated Equilibrium A probability distribution P over the set K k= {,..., N k} of all possible K-tuples of actions is called a correlated equilibrium if for all k =,..., K, E l (k) (I) E l (k) (I, Ĩ (k) ), where the r.v. I = (I (),..., I (K) ) is distributed according to P and (I, Ĩ (k) ) = (I (),..., I (k ), Ĩ (k), I (k+),..., I (K) ), where Ĩ (k) is an arbitrary {,..., N k }-valued r.v. that is a function of I (k)

9 Correlated Equilibrium Lemma: A probability distribution P over the set of all K-tuples i = (i,..., i K ) of actions is a correlated equilibrium if and only if, for every player k {,..., K} and actions j, j {,..., N k }, we have P(i) (l (k) (i) l (k) (i, j )) 0 i:i k =j where (i, j ) = (i,..., i k, j, i k+,..., i K ).

10 Repeated Two-Player Zero-Sum Games At each time instant t =, 2,... player k(k =,..., K) selects a mixed strategy p (k) t = (p (k),t,..., p(k) N k,t ) over the set,..., N k of his actions and draws an action I (k) t according to the distribution. Mixed strategy p (k) t may depend on the sequence of random variables I,..., I t

11 Repeated Two-Player Zero-Sum Games If all players play to keep their internal regret small, then the joint empirical frequencies of play converge to the set of correlated equilibria If every player uses a well-calibrated forecasting strategy to predict the K-tuple of actions I t and chooses an action that is the best reply to the forecasted distibution, the same convergence is also achieved.

12 Regret based strategies At each round t, based on the past plays of both players, the row player chooses an action I t {,..., N} according to the mixed strategy p t and the column player chooses an action J t {,..., M} according to the mixed strategy q t. If the row player knew the column player s actions J,..., J n in advance, he would choose I t = arg i=,...,n l(i, J t ) invoking a total loss n i=,...,nl(i, J t ). A meaningful objective is to imize the difference between the row player s cumulative loss and the cumulative loss of the best constant strategy, imize n l(i t, J t ) l(i, J t ) i=,...,n

13 Hannan consistent strategy A strategy is Hannan consistent if the regret is o(), regardless of how the column player behaves. Assug row player chooses his actions I t, regardless of what the column player does, ( lim sup n n l(i t, J t ) i=,...,n n ) l(i, J t ) 0 almost surely. For example, this may be achieved by the exponentially weighted average mixed strategy ( p i,t = exp η ) t s= l(i,js) N k= ( η exp ) i = {,.., N}, η > 0 t s= l(k,js)

14 Hannan consistent strategy Notation : l(p, j) = N M p i l(i, j) and l(i, q) = q j l(i, j) i= Theorem: Assume that in a two-person zero-sum game the row player plays according to a Hannan-consistent strategy. Then lim sup n n l(i t, J t ) V i= almost surely. Assug Hannan consistency, it suffices to show that i=,...,n n i=,...,n n l(i, J t ) V l(i, J t ) = p n l(p, J t )

15 Hannan consistent strategy Then, letting ˆq j,n = n n I J t=j be the emperical probability of the row player s action being j, p n l(p, J t ) = p M ˆq j,n l(p, j) j= = p l(p, ˆq n ) max q p l(p, q) = V. Corollary : Assume that in a two-person zero-sum game, both players play according to some Hannan consistent strategy. Then lim n n l(i t, J t ) = V almost surely.

Learning, Games, and Networks

Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to