Lecture 2 October ε-approximation of 2-player zero-sum games

Size: px

Start display at page:

Download "Lecture 2 October ε-approximation of 2-player zero-sum games"

Crystal Malone
5 years ago
Views:

1 Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion for -player zero-sum games 11 marix games A -player zero-sum game, or a marix game, is defined by a marix A R m n, called he payoff marix There are wo players wih opposing ineress: he row player (minimizer) and he column player (maximizer) A each sep of he play, he row player selecs a row i, and he column player selecs a column j, hen he row player pays o he column player he value a ij of he (i, j)h enry in he marix Suppose ha he play coninues forever, how o play such a game? Example 1 Consider he payoff marix A = Le us see wha happens if he row player chooses row 1 Then, he column player (being a maximizer) will choose he firs column Bu hen he row player (being a minimizer) will swich o row 3 Then, he column player will find i more profiable o swich o column 3, afer which he row player will swich o row 1, resuling in a cycle! This siuaion describes wha is no an equilibrium Examining he above marix, he row maxima are 1, 0, and 1, respecively So if he row player chooses row 1, he column player would guaranee 1, if he row player chooses row, hen he column player would guaranee 0, and so on In general, if he row player chooses row i, hen he column player would guaranee max j a ij, and hus he row player should choose he row ha minimizes his maximum Similarly, since he row minima are 1, 0, 1, respecively, he column player can guaranee max j min i a ij = 0 Since i happens in his example ha hese wo values are equal, here will be an equilibrium if he row player sicks o playing he nd row and he column player sicks o playing he nd column An equilibrium or a saddle poin is a pair of sraegies for he wo players such ha no player has incenive o swich, assuming ha he oher player does no swich Bu is here always a saddle-poin in pure sraegies as in Example 1 The answer is NO as he following well-known example shows Example Consider he payoff marix [ A = Then min i max j a ij = 1 1 = max j min i a ij

2 Op II Lecure Ocober 19 Winer 009/10 So wha o do? Play wih mixed sraegies Tha is, he row player chooses a probabiliy vecor x S m = {x R m : e T x = 1, x 0}, where e denoes he vecor of all ones of he appropriae dimension, and plays row i wih probabiliy x i Similarly, he column vecor plays according o a probabiliy vecor y S n = {y R n : e T y = 1, y 0} Le us denoe by A 1,, A m he rows of A, by A 1,, A m he columns of A, and by e i he i-uni vecor wih he appropriae dimension Then he expeced value ha he row player would pay if she decided o play row i is j a ijy j = A i y = e T i Ay, and hence her expeced payoff would be i x ia i y = x T Ay Similarly, he expeced payoff of he column player is x T Ay For insance, in Example above, if boh players choose ( 1, 1 ) as heir sraegy, hen he expeced payoff for boh is 0 On he oher hand, if he row player chooses x = ( 1, ), while he column player chooses y = (, 1), he hen payoff is 1 Is any of hese wo pairs of sraegies an equilibrium? And does such an equilibrium exiss in general? The answer is YES as given by he following Theorem Theorem 1 (Von Neumann (198)) For any marix A R m n, min max x T Ay = max min x T Ay (1) x S m y S n x S m Definiion (Saddle poin) A saddle poin in a marix game wih payoff marix A R m n, is a pair of sraegies x S m and y S n such ha Such a pair will be also called an opimal pair y S n min x Sm x T Ay = max y S n (x ) T Ay () Exercise 1 (i) Show ha min x Sm max y Sn x T Ay max y Sn min x Sm x T Ay (ii) Show ha a pair of sraegies x S m and y S n are opimal if and only if for all i, j: (x ) T A j A i y I is worh noing ha a marix game is equivalen o a pair of packing-covering linear programs (LP s) Exercise Le v be he common value in he ideniy (1) Show ha v = min{v x T A ve T, x S m } = max{v Ay ve, y S n } Le ε [0, 1 be a given consan We are ineresed in ε-opimal sraegies, defined as follows Definiion 3 (ε-opimal sraegies) A pair of sraegies x S m and y S n is an ε-opimal pair for a marix game wih payoff marix A R m n if max(x ) T Ay min x T Ay + ε (3) y S n x S m In his lecure, we consider he problem of finding approximae saddle poins of marix games ɛ-approximaion of zero-sum games Inpu: A marix A R m n and a desired accuracy ε Oupu: A pair of sraegies x S m and y S n Objecive: x, y are ε-equilibria -

3 Op II Lecure Ocober 19 Winer 009/10 1 ficiious play Ficiious play is a mehod suggesed by Brown in 1951 [Bro51 o obain a saddle poin for a given marix game Ieraively, he minimizer and he maximizer mainain in X() Z m + and Y () Z n + he frequencies by which rows and columns have been used, respecively, upo ime of he play Then each player updaes his/her sraegy by applying he bes response, given he curren opponen s sraegy The procedure is given below Algorihm 1 ficiious play(a) 1 X(0) := 0 and Y (0) := 0 for = 1,, do 3 i := argmin{a 1 Y ( 1),, A m Y ( 1)}; X() := X( 1) + e i 4 j := argmax{x( 1) T A 1,, X( 1) T A n }; Y () := Y ( 1)+e j Noe ha a each, he vecors X() such pair of sraegies, x X() = lim [Rob51 A bound of ( m+n ρ ε are feasible sraegies The convergence of, y Y () = lim, was esablished by Robinson ) m+n, where ρ = maxi,j a ij, on he ime needed for and Y () convergence o an ε-pair was obained by Shapiro in 1958 [Sha58 The endency in he lieraure is o believe ha his ime is bounded by O( poly(n,m) ) A smoohed version of ε his ficiious play, inroduced in he nex secion, archives such a bound 13 Randomized ficiious play Grigoriadis and Khachiyan (1995) [GK95 inroduced a randomized version of ficiious play, in which he argmin and argmax operaors in seps 3 and 4 above are replaced by a smoohed selecion which picks a row (respecively, a column) wih probabiliy ha decreases (respecively, increases) quickly as he curren response of he opponen o his row (respecively, column) increases More precisely, given he curren vecors of frequencies X() Z m + and Y () Z n +, he row and column players choose, respecively, a row i and a column j according o he (so-called Gibbs) disribuions: p i () p() where p i() = e εa i Y ( 1) and p() = q j () q() where q j() = e εx( 1) T A j and q() = p i () (4) n q j () (5) Here is he algorihm This will be he heme of mos of he algorihms described in he lecures on packing and covering LP s j=1-3

4 Op II Lecure Ocober 19 Winer 009/10 Algorihm randomized ficiious play(a) 1 X(0) := 0 and Y (0) := 0 for = 1,,, T def do = 6 ln(nm) ε 3 Pick i [m and j [n independenly, wih probabiliies p i() and q j(), respecively p() q() 4 X() := X( 1) + e i ; Y () := Y ( 1) + e j 5 reurn ( X(), Y () ) I is he smoohing sep in line 3 ha makes i possible o prove beer bounds on he number of ieraions han hose currenly known for deerminisic ficiious play The analysis, here and in all he algorihms considered in subsequen lecures, will follow more or less he same framework: we define a poenial funcion Φ() = p( + 1) q( + 1), (6) and show ha i does no increase by much from one ieraion o he nex Then his implies, by ieraing, ha he expeced poenial afer ime seps is bounded by he iniial poenial muliplied by some facor, which migh depend exponenially on Since he iniial poenial is he sum of some non-negaive exponenials, each erm in he sum is bounded by he final poenial Taking logs allows us o bound he error in approximaion a ime, as a funcion of, and our choice of he erminaing ime T, when plugged in his funcion, guaranees o make he error less han ε as desired The proof we give here uses ideas from Grigoriadis and Khachiyan (1995) [GK95 and Koufogiannakis and Young [KY07 For he purpose of obaining an approximaion wih an absolue error, we will assume ha all he enries of he marix A are in some fixed range, say [ 1, 1 Scaling he marix A by 1, where he widh parameer ρ is defined as ρ = max ρ i,j a ij, and replacing ε by ε in wha follows, we ge an algorihm ha works wihou his assumpion, bu ρ whose running ime is proporional o ρ We noe ha such dependence on he widh is unavoidable in all known algorihms ha obain ε-approximae soluions and whose running ime is proporional o poly( 1 ) An excepion is when A is non-negaive in which ε his dependence can be removed as we shall see in a laer lecure Exercise 3 Show ha any marix game (1) can be convered ino an equivalen one in which each enry in he marix A is in [a, b, where a, b R Does he same reducion work if we are aiming a an ɛ-approximae saddle poin? Theorem 4 Assuming A [ 1, 1 m n, algorihm randomized ficiious play (n+m) log(n+m) oupus ε-opimal sraegies The oal running ime is O( ) ε Proof: The bound on he running ime is obvious So i remains o show ha he pair oupu by he algorihm is ε-opimal As menioned above, we analyze he change in he poenial funcion (6) Noe ha, due o he random choices of he algorihm, he poenial funcion is a random variable We will prove he following bound Lemma 5 For = 1,,, E[Φ() E[Φ( 1)(1 + ε 6 ) -4

5 Op II Lecure Ocober 19 Winer 009/10 Then by ieraion we ge ha E[Φ() E[Φ(0)(1 + ε ) Φ(0)e ε 3, where he las 6 inequaliy follows by using he inequaliy 1 + x e x, valid for all real x This implies by Markov s inequaliy ha, wih probabiliy a leas 1, afer ieraions, Φ() e ε 3 Φ(0) (7) Noe ha Φ() = i,j e εx()t A j εa i Y () Since each erm in his sum is non-negaive and he sum is bounded by e ε 3 Φ(0), we conclude ha each erm is also bounded by he same bound Taking logs and using Φ(0) = nm, we ge ha or εx() T A j X() T εa iy () A j A i Y () ln(nm) + ε 3 + ln(nm) ε + ε 3 Using he value of = T = 6 ln(nm) a he end of he las ieraion, we ge ha X()T A j ε Y () A i +ε, implying (see Exercise 1(ii)) ha he pair of sraegies oupu by he algorihm is ε-opimal I remains o prove Lemma 5 Proof of Lemma 5 Fix an ieraion Denoe by X = X() and Y = Y () he changes in he vecors X and Y in ieraion, ha is, in sep 4, we use he updaes X( + 1) = X() + X and Y ( + 1) = Y () + Y In he following, we condiion on he values of X() and Y () (so for he momen, we will hink ha he only random evens are hose in sep 3, and hence p(), q(), and φ() are all consans) Le p() = (p 1 (),, p m ()) and q() = (q 1 (),, q n ()) Then E[ X = p() p() and E[ Y = q() q() To esimae he change in Φ( 1), we esimae he changes in p() and q() p( + 1) = = p i ( + 1) = e εa i Y () = e εa i (Y ( 1)+ Y ) p i ()e εa i Y (8) Exercise 4 Show ha, for all δ [ 1, 1, eδ 1 + δ + 3 δ Noe ha εa i Y [ 1, 1 since we have assumed ha a ij 1 Thus he fac in Exercise 4, ogeher wih (8), implies ha p( + 1) [ p i () 1 εa i Y p i () [1 + ε 6 εa i Y + ε 6 (A i Y ) -5 = p() [1 + ε 6 εp()t A Y, p()

6 Op II Lecure Ocober 19 Winer 009/10 where in he second inequaliy we used again he assumpion ha a ij 1 and hence (A i Y ) 1 In fac, his is he only place where his assumpion plays a role in he analysis Taking he expecaion wih respec o Y, we ge by lineariy of expecaion E[ p( + 1) p() [1 + ε 6 εp()t Aq() p() q() Similarly, we can derive E[ q( + 1) q() Thus, using independence of X and Y, we have [1 + ε 6 + εq()t A T p() q() p() [ ( ) E[Φ() = E[ p( + 1) E[ q( + 1) p() q() 1 + ε + 6 ε ) ( ) (1 + ε q() T A T p() p()t Aq() ε q() T A T p() p()t Aq() 6 q() p() p() q() 4 q() p() p() q() Since q()t A T p() q() p() = p()t Aq() p() q(), we ge ha E[Φ() Φ( 1) (1 + ε 6 ) Recalling ha his expecaion was condiional on he values of X() and Y (), he lemma follows by aking he expecaion of boh sides of his inequaliy wih respec o X() and Y () -6

7 Bibliography [Bro51 George W Brown Ieraive soluion of games by ficiious play In: TC Koopmans, Edior, Aciviy Analysis of Producion and Allocaion, pages , 1951 [GK95 Michael D Grigoriadis and Leonid G Khachiyan A sublinear-ime randomized approximaion algorihm for marix games Operaions Research Leers, 18():53 58, 1995 [KY07 Chrisos Koufogiannakis and Neal E Young Beaing simplex for fracional packing and covering linear programs In 48h Annual IEEE Symposium on Foundaions of Compuer Science (FOCS), pages , 007 [Rob51 Julia Robinson An ieraive mehod of solving a game The Annals of Mahemaics, 54():96 301, 1951 [Sha58 Harold N Shapiro Noe on a compuaion mehod in he heory of games Communicaions on Pure and Applied Mahemaics, 11(4): ,

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any