Lecture 2 October ε-approximation of 2-player zero-sum games

Similar documents
1 Review of Zero-Sum Games

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Approximation Algorithms for Unique Games via Orthogonal Separators

Online Convex Optimization Example And Follow-The-Leader

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Notes for Lecture 17-18

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Lecture 9: September 25

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

Vehicle Arrival Models : Headway

Final Spring 2007

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Examples of Dynamic Programming Problems

Linear Response Theory: The connection between QFT and experiments

Christos Papadimitriou & Luca Trevisan November 22, 2016

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Random Walk with Anti-Correlated Steps

Online Learning Applications

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Chapter 7: Solving Trig Equations

An Introduction to Malliavin calculus and its applications

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix for "Customer Recognition in. Experience versus Inspection Good Markets"

The Arcsine Distribution

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Lecture 4 Notes (Little s Theorem)

10. State Space Methods

Chapter 6. Systems of First Order Linear Differential Equations

Expert Advice for Amateurs

Solutions Problem Set 3 Macro II (14.452)

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

Economics 8105 Macroeconomic Theory Recitation 6

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Games Against Nature

OBJECTIVES OF TIME SERIES ANALYSIS

20. Applications of the Genetic-Drift Model

) were both constant and we brought them from under the integral.

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

The Brock-Mirman Stochastic Growth Model

Math 10B: Mock Mid II. April 13, 2016

Lecture 33: November 29

Some Ramsey results for the n-cube

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Chapter 3 Boundary Value Problem

Seminar 4: Hotelling 2

Math 334 Fall 2011 Homework 11 Solutions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Lecture Notes 5: Investment

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Policy regimes Theory

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

SOLUTIONS TO ECE 3084

LAPLACE TRANSFORM AND TRANSFER FUNCTION

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Notes on Kalman Filtering

Math 333 Problem Set #2 Solution 14 February 2003

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Optimality Conditions for Unconstrained Problems

Vanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Final Exam. Tuesday, December hours

EE650R: Reliability Physics of Nanoelectronic Devices Lecture 9:

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Online Learning with Partial Feedback. 1 Online Mirror Descent with Estimated Gradient

FINM 6900 Finance Theory

Variational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

- If one knows that a magnetic field has a symmetry, one may calculate the magnitude of B by use of Ampere s law: The integral of scalar product

Optima and Equilibria for Traffic Flow on a Network

Lecture 10 Estimating Nonlinear Regression Models

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

6.2 Transforms of Derivatives and Integrals.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Announcements: Warm-up Exercise:

EE 330 Lecture 23. Small Signal Analysis Small Signal Modelling

MATH 31B: MIDTERM 2 REVIEW. x 2 e x2 2x dx = 1. ue u du 2. x 2 e x2 e x2] + C 2. dx = x ln(x) 2 2. ln x dx = x ln x x + C. 2, or dx = 2u du.

An introduction to the theory of SDDP algorithm

Problem Set #3: AK models

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Comments on Window-Constrained Scheduling

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Two Coupled Oscillators / Normal Modes

GMM - Generalized Method of Moments

R.#W.#Erickson# Department#of#Electrical,#Computer,#and#Energy#Engineering# University#of#Colorado,#Boulder#

Stochastic models and their distributions

Predator - Prey Model Trajectories and the nonlinear conservation law

Transcription:

Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion for -player zero-sum games 11 marix games A -player zero-sum game, or a marix game, is defined by a marix A R m n, called he payoff marix There are wo players wih opposing ineress: he row player (minimizer) and he column player (maximizer) A each sep of he play, he row player selecs a row i, and he column player selecs a column j, hen he row player pays o he column player he value a ij of he (i, j)h enry in he marix Suppose ha he play coninues forever, how o play such a game? Example 1 Consider he payoff marix 1 0 1 A = 0 0 0 1 0 1 Le us see wha happens if he row player chooses row 1 Then, he column player (being a maximizer) will choose he firs column Bu hen he row player (being a minimizer) will swich o row 3 Then, he column player will find i more profiable o swich o column 3, afer which he row player will swich o row 1, resuling in a cycle! This siuaion describes wha is no an equilibrium Examining he above marix, he row maxima are 1, 0, and 1, respecively So if he row player chooses row 1, he column player would guaranee 1, if he row player chooses row, hen he column player would guaranee 0, and so on In general, if he row player chooses row i, hen he column player would guaranee max j a ij, and hus he row player should choose he row ha minimizes his maximum Similarly, since he row minima are 1, 0, 1, respecively, he column player can guaranee max j min i a ij = 0 Since i happens in his example ha hese wo values are equal, here will be an equilibrium if he row player sicks o playing he nd row and he column player sicks o playing he nd column An equilibrium or a saddle poin is a pair of sraegies for he wo players such ha no player has incenive o swich, assuming ha he oher player does no swich Bu is here always a saddle-poin in pure sraegies as in Example 1 The answer is NO as he following well-known example shows Example Consider he payoff marix [ A = Then min i max j a ij = 1 1 = max j min i a ij 1 1 1 1-1

Op II Lecure Ocober 19 Winer 009/10 So wha o do? Play wih mixed sraegies Tha is, he row player chooses a probabiliy vecor x S m = {x R m : e T x = 1, x 0}, where e denoes he vecor of all ones of he appropriae dimension, and plays row i wih probabiliy x i Similarly, he column vecor plays according o a probabiliy vecor y S n = {y R n : e T y = 1, y 0} Le us denoe by A 1,, A m he rows of A, by A 1,, A m he columns of A, and by e i he i-uni vecor wih he appropriae dimension Then he expeced value ha he row player would pay if she decided o play row i is j a ijy j = A i y = e T i Ay, and hence her expeced payoff would be i x ia i y = x T Ay Similarly, he expeced payoff of he column player is x T Ay For insance, in Example above, if boh players choose ( 1, 1 ) as heir sraegy, hen he expeced payoff for boh is 0 On he oher hand, if he row player chooses x = ( 1, ), while he column player chooses y = (, 1), he hen payoff is 1 Is 3 3 3 3 9 any of hese wo pairs of sraegies an equilibrium? And does such an equilibrium exiss in general? The answer is YES as given by he following Theorem Theorem 1 (Von Neumann (198)) For any marix A R m n, min max x T Ay = max min x T Ay (1) x S m y S n x S m Definiion (Saddle poin) A saddle poin in a marix game wih payoff marix A R m n, is a pair of sraegies x S m and y S n such ha Such a pair will be also called an opimal pair y S n min x Sm x T Ay = max y S n (x ) T Ay () Exercise 1 (i) Show ha min x Sm max y Sn x T Ay max y Sn min x Sm x T Ay (ii) Show ha a pair of sraegies x S m and y S n are opimal if and only if for all i, j: (x ) T A j A i y I is worh noing ha a marix game is equivalen o a pair of packing-covering linear programs (LP s) Exercise Le v be he common value in he ideniy (1) Show ha v = min{v x T A ve T, x S m } = max{v Ay ve, y S n } Le ε [0, 1 be a given consan We are ineresed in ε-opimal sraegies, defined as follows Definiion 3 (ε-opimal sraegies) A pair of sraegies x S m and y S n is an ε-opimal pair for a marix game wih payoff marix A R m n if max(x ) T Ay min x T Ay + ε (3) y S n x S m In his lecure, we consider he problem of finding approximae saddle poins of marix games ɛ-approximaion of zero-sum games Inpu: A marix A R m n and a desired accuracy ε Oupu: A pair of sraegies x S m and y S n Objecive: x, y are ε-equilibria -

Op II Lecure Ocober 19 Winer 009/10 1 ficiious play Ficiious play is a mehod suggesed by Brown in 1951 [Bro51 o obain a saddle poin for a given marix game Ieraively, he minimizer and he maximizer mainain in X() Z m + and Y () Z n + he frequencies by which rows and columns have been used, respecively, upo ime of he play Then each player updaes his/her sraegy by applying he bes response, given he curren opponen s sraegy The procedure is given below Algorihm 1 ficiious play(a) 1 X(0) := 0 and Y (0) := 0 for = 1,, do 3 i := argmin{a 1 Y ( 1),, A m Y ( 1)}; X() := X( 1) + e i 4 j := argmax{x( 1) T A 1,, X( 1) T A n }; Y () := Y ( 1)+e j Noe ha a each, he vecors X() such pair of sraegies, x X() = lim [Rob51 A bound of ( m+n ρ ε are feasible sraegies The convergence of, y Y () = lim, was esablished by Robinson ) m+n, where ρ = maxi,j a ij, on he ime needed for and Y () convergence o an ε-pair was obained by Shapiro in 1958 [Sha58 The endency in he lieraure is o believe ha his ime is bounded by O( poly(n,m) ) A smoohed version of ε his ficiious play, inroduced in he nex secion, archives such a bound 13 Randomized ficiious play Grigoriadis and Khachiyan (1995) [GK95 inroduced a randomized version of ficiious play, in which he argmin and argmax operaors in seps 3 and 4 above are replaced by a smoohed selecion which picks a row (respecively, a column) wih probabiliy ha decreases (respecively, increases) quickly as he curren response of he opponen o his row (respecively, column) increases More precisely, given he curren vecors of frequencies X() Z m + and Y () Z n +, he row and column players choose, respecively, a row i and a column j according o he (so-called Gibbs) disribuions: p i () p() where p i() = e εa i Y ( 1) and p() = q j () q() where q j() = e εx( 1) T A j and q() = p i () (4) n q j () (5) Here is he algorihm This will be he heme of mos of he algorihms described in he lecures on packing and covering LP s j=1-3

Op II Lecure Ocober 19 Winer 009/10 Algorihm randomized ficiious play(a) 1 X(0) := 0 and Y (0) := 0 for = 1,,, T def do = 6 ln(nm) ε 3 Pick i [m and j [n independenly, wih probabiliies p i() and q j(), respecively p() q() 4 X() := X( 1) + e i ; Y () := Y ( 1) + e j 5 reurn ( X(), Y () ) I is he smoohing sep in line 3 ha makes i possible o prove beer bounds on he number of ieraions han hose currenly known for deerminisic ficiious play The analysis, here and in all he algorihms considered in subsequen lecures, will follow more or less he same framework: we define a poenial funcion Φ() = p( + 1) q( + 1), (6) and show ha i does no increase by much from one ieraion o he nex Then his implies, by ieraing, ha he expeced poenial afer ime seps is bounded by he iniial poenial muliplied by some facor, which migh depend exponenially on Since he iniial poenial is he sum of some non-negaive exponenials, each erm in he sum is bounded by he final poenial Taking logs allows us o bound he error in approximaion a ime, as a funcion of, and our choice of he erminaing ime T, when plugged in his funcion, guaranees o make he error less han ε as desired The proof we give here uses ideas from Grigoriadis and Khachiyan (1995) [GK95 and Koufogiannakis and Young [KY07 For he purpose of obaining an approximaion wih an absolue error, we will assume ha all he enries of he marix A are in some fixed range, say [ 1, 1 Scaling he marix A by 1, where he widh parameer ρ is defined as ρ = max ρ i,j a ij, and replacing ε by ε in wha follows, we ge an algorihm ha works wihou his assumpion, bu ρ whose running ime is proporional o ρ We noe ha such dependence on he widh is unavoidable in all known algorihms ha obain ε-approximae soluions and whose running ime is proporional o poly( 1 ) An excepion is when A is non-negaive in which ε his dependence can be removed as we shall see in a laer lecure Exercise 3 Show ha any marix game (1) can be convered ino an equivalen one in which each enry in he marix A is in [a, b, where a, b R Does he same reducion work if we are aiming a an ɛ-approximae saddle poin? Theorem 4 Assuming A [ 1, 1 m n, algorihm randomized ficiious play (n+m) log(n+m) oupus ε-opimal sraegies The oal running ime is O( ) ε Proof: The bound on he running ime is obvious So i remains o show ha he pair oupu by he algorihm is ε-opimal As menioned above, we analyze he change in he poenial funcion (6) Noe ha, due o he random choices of he algorihm, he poenial funcion is a random variable We will prove he following bound Lemma 5 For = 1,,, E[Φ() E[Φ( 1)(1 + ε 6 ) -4

Op II Lecure Ocober 19 Winer 009/10 Then by ieraion we ge ha E[Φ() E[Φ(0)(1 + ε ) Φ(0)e ε 3, where he las 6 inequaliy follows by using he inequaliy 1 + x e x, valid for all real x This implies by Markov s inequaliy ha, wih probabiliy a leas 1, afer ieraions, Φ() e ε 3 Φ(0) (7) Noe ha Φ() = i,j e εx()t A j εa i Y () Since each erm in his sum is non-negaive and he sum is bounded by e ε 3 Φ(0), we conclude ha each erm is also bounded by he same bound Taking logs and using Φ(0) = nm, we ge ha or εx() T A j X() T εa iy () A j A i Y () ln(nm) + ε 3 + ln(nm) ε + ε 3 Using he value of = T = 6 ln(nm) a he end of he las ieraion, we ge ha X()T A j ε Y () A i +ε, implying (see Exercise 1(ii)) ha he pair of sraegies oupu by he algorihm is ε-opimal I remains o prove Lemma 5 Proof of Lemma 5 Fix an ieraion Denoe by X = X() and Y = Y () he changes in he vecors X and Y in ieraion, ha is, in sep 4, we use he updaes X( + 1) = X() + X and Y ( + 1) = Y () + Y In he following, we condiion on he values of X() and Y () (so for he momen, we will hink ha he only random evens are hose in sep 3, and hence p(), q(), and φ() are all consans) Le p() = (p 1 (),, p m ()) and q() = (q 1 (),, q n ()) Then E[ X = p() p() and E[ Y = q() q() To esimae he change in Φ( 1), we esimae he changes in p() and q() p( + 1) = = p i ( + 1) = e εa i Y () = e εa i (Y ( 1)+ Y ) p i ()e εa i Y (8) Exercise 4 Show ha, for all δ [ 1, 1, eδ 1 + δ + 3 δ Noe ha εa i Y [ 1, 1 since we have assumed ha a ij 1 Thus he fac in Exercise 4, ogeher wih (8), implies ha p( + 1) [ p i () 1 εa i Y p i () [1 + ε 6 εa i Y + ε 6 (A i Y ) -5 = p() [1 + ε 6 εp()t A Y, p()

Op II Lecure Ocober 19 Winer 009/10 where in he second inequaliy we used again he assumpion ha a ij 1 and hence (A i Y ) 1 In fac, his is he only place where his assumpion plays a role in he analysis Taking he expecaion wih respec o Y, we ge by lineariy of expecaion E[ p( + 1) p() [1 + ε 6 εp()t Aq() p() q() Similarly, we can derive E[ q( + 1) q() Thus, using independence of X and Y, we have [1 + ε 6 + εq()t A T p() q() p() [ ( ) E[Φ() = E[ p( + 1) E[ q( + 1) p() q() 1 + ε + 6 ε ) ( ) (1 + ε q() T A T p() p()t Aq() ε q() T A T p() p()t Aq() 6 q() p() p() q() 4 q() p() p() q() Since q()t A T p() q() p() = p()t Aq() p() q(), we ge ha E[Φ() Φ( 1) (1 + ε 6 ) Recalling ha his expecaion was condiional on he values of X() and Y (), he lemma follows by aking he expecaion of boh sides of his inequaliy wih respec o X() and Y () -6

Bibliography [Bro51 George W Brown Ieraive soluion of games by ficiious play In: TC Koopmans, Edior, Aciviy Analysis of Producion and Allocaion, pages 374 376, 1951 [GK95 Michael D Grigoriadis and Leonid G Khachiyan A sublinear-ime randomized approximaion algorihm for marix games Operaions Research Leers, 18():53 58, 1995 [KY07 Chrisos Koufogiannakis and Neal E Young Beaing simplex for fracional packing and covering linear programs In 48h Annual IEEE Symposium on Foundaions of Compuer Science (FOCS), pages 494 504, 007 [Rob51 Julia Robinson An ieraive mehod of solving a game The Annals of Mahemaics, 54():96 301, 1951 [Sha58 Harold N Shapiro Noe on a compuaion mehod in he heory of games Communicaions on Pure and Applied Mahemaics, 11(4):587 593, 1958 7