CS Homework 3. Walid Krichene ( )

Size: px

Start display at page:

Download "CS Homework 3. Walid Krichene ( )"

Jesse McCoy
5 years ago
Views:

1 CS 70 - Homework 3 Walid Krichene (36517) (1) Consider the randomized multiplicative weights algorithm on the alternatingly correct pair of experts A, B, with update (1 ). Without loss of generality, assume that A is correct on even days, and B on odd days. At iteration t, the loss vector l(t) = [l A (t), l B (t)] is given by { [1, 0] if t is odd l(t) = [0, 1] if t is even therefore the wight updates are given by w(t + 1) = { [w A (t)(1 ), w B (t)] if t is odd [w A (t), w B (t)(1 )] if t is even by induction, stating from the uniform weights w(1) = [1, 1], we have w(k + 1) = [(1 ) k, (1 ) k ] w(k) = [(1 ) k, (1 ) k 1 ] Let L(t) be the (random) loss incurred by the algorithm at iteration t. hen the expected loss at iteration t is E[L(t)] = w A(t)l A (t) + w B (t)l B (t) (1 ) k = 1 (1 ) = k if t = k + 1 w A (t) + w B (t) (1 ) k 1 if t = k Finally, the expected loss over iterations is = 1 (1 ) k +(1 ) k 1 E[L(t)] = / 1 + / 1 since 1/ 1/( ), an upper bound on the expected loss is simply given by E[L(t)] For the case = 1, the expectation becomes E[L(t)] = / 1 + / 3 1

2 () he original matrix of the game is A = In order to have payoffs in [0, 1], an equivalent game is given by the scaled and translated matrix Ã, given by Ãi,j = Ai,j+1 Ã = 1/ / / 1/ 0 0 he experts algorithm applied to the -player zero sum game proceeds as follows: the row player applies the expert s algorithm, and maintains the probability distribution p(t). the outcomes at iteration t are given by l(t) = A.y (t) where y(t) is the best response to p(t) (the column player is assumed to play optimally), given by y(t) = arg max p(t)ay y the column player simply picks arg max j (p(t)a) j, and picks the lowest such j in case of ties. he results are x = 1 x(t) δ/ 10 (0.491, , , 0.491) ( , , , ) ln 4 ( , , , ) where δ/ is computed using the average distributions, and is the maximum increase in player utility if she unilaterally changes her strategy. In other words where δ/ = max( xãȳ R(ȳ), C( x) xãȳ ) x = 1 ȳ = 1 x(t) y(t) and R(y) = min xãy x C(x) = max y xãy

3 1 package e x p e r t s import u t i l. l i n A l g e b r a. 3 4 c l a s s InvalidArgumentException ( message : S t r i n g ) extends Exception ( message ) 5 6 t r a i t Nature { 7 d e f update ( s t r a t e g y : Array [ Double ] ) 8 } 9 10 a b s t r a c t c l a s s Expert [N<: Nature ] ( v a l nature : N) { 11 d e f nextloss ( ) : Double 1 } c l a s s ZeroSumGame ( v a l A: Array [ Array [ Double ] ] ) extends Nature { 15 // A i s the p a y o f f matrix, assumed to be normalized ( e n t r i e s i n [ 0, 1 ] ) 16 // row minimizes, column maximizes 17 v a l nrows = A. l e n g t h 18 v a l ncols = A( 0 ). l e n g t h 19 var rounds = 0 0 var bestcolresponse = 0 1 v a l c u m u l a t i v e C o l S t r a t e g y = new Array [ Double ] ( ncols ) 3 v a l cumulativerowstrategy = new Array [ Double ] ( nrows ) 4 5 p r i v a t e d e f getnormalized ( c u m u l a t i v e S t r a t e g y : Array [ Double ] ) : Array [ Double ] = { 6 v a l norm = c u m u l a t i v e S t r a t e g y. sum 7 c u m u l a t i v e S t r a t e g y. map( /norm ) 8 } 9 30 d e f getavgcolstrategy ( ) = getnormalized ( c u m u l a t i v e C o l S t r a t e g y ) 31 d e f getavgrowstrategy ( ) = getnormalized ( cumulativerowstrategy ) 3 33 d e f computeoutcome ( x : Array [ Double ], y : Array [ Double ] ) : Double = { 34 x. times (A). times ( y. transp ( ) ). doublevalue 35 } d e f computebestcolresponse ( rowstrategy : Array [ Double ] ) : ( Int, Double ) = { 38 // compute the b e s t column r e s p o n s e 39 v a l ( (, argmax ), max) = rowstrategy. times (A). argmax ( ) 40 ( argmax, max) 41 } 4 43 d e f computebestrowresponse ( c o l S t r a t e g y : Array [ Double ] ) : ( Int, Double ) = { 44 v a l ( (, argmin ), min ) = A. times ( c o l S t r a t e g y. transp ( ) ). argmin ( ) 45 ( argmin, min ) 46 } d e f update ( rowstrategy : Array [ Double ] ) { 49 bestcolresponse = computebestcolresponse ( rowstrategy ) rounds += 1 51 // update the average s t r a t e g i e s 5 f o r ( i < 0 to nrows 1) 53 cumulativerowstrategy ( i ) += rowstrategy ( i ) c u m u l a t i v e C o l S t r a t e g y ( bestcolresponse ) += 1 56 } 57 } c l a s s ZeroSumGameExpert ( game : ZeroSumGame, row : I n t ) extends Expert [ ZeroSumGame ] ( game ){ 60 d e f nextloss ( ) : Double = { 61 v a l colresponse = game. bestcolresponse 6 r e t u r n game.a( row ) ( colresponse ) 63 } 64 } 3

4 65 66 // Experts a l g o r i t h m 67 // ype parameter N must extend Nature, and e x p e r t s are o f type Expert [N] 68 // ( the e x p e r t s s h a r e an i n s t a n c e o f Nature ) 69 // nature keeps e x t r a i n f o r m a t i o n s p e c i f i c to the p a r t i c u l a r game being played. 70 // f o r example, i f nature i s a ZeroSumGame, i t p r o v i d e s methods to compute b e s t 71 // column response, and i t keeps t r a c k o f the average row and column s t r a t e g i e s. 7 // In the r o u t i n g game, i t would p r o v i d e a b e s t r e s p o n s e method that computes the 73 // s h o r t e s t path, e t c. 74 c l a s s ExpertAlgorithm [N<: Nature ] ( e p s i l o n : Double, e x p e r t s : L i s t [ Expert [N ] ] ) { 75 i f ( e p s i l o n >= 1 e p s i l o n <= 0) 76 throw new InvalidArgumentException ( e p s i l o n should be i n ( 0, 1) ) v a l nature = e x p e r t s ( 0 ). nature 79 v a l support = e x p e r t s. l e n g t h 80 var s t r a t e g y = Array. f i l l ( support ) ( 1. / support ) 81 8 d e f n o r m a l i z e ( ) { 83 v a l norm = s t r a t e g y. sum 84 f o r ( i < 0 to support 1) 85 s t r a t e g y ( i )/=norm 86 } d e f next ( ) { 89 nature. update ( s t r a t e g y ) 90 v a l weights = e x p e r t s. map(. nextloss ). map( math. pow(1 e p s i l o n, ) ). toarray 91 s t r a t e g y = s t r a t e g y. dotmult ( weights ). arrayvalue ( ) 9 normalize ( ) 93 } 94 } 1 package e x p e r t s 3 o b j e c t main { 4 d e f main ( a r g s : Array [ S t r i n g ] ) : Unit = { 5 v a l e p s i l o n =. 1 6 // p a y o f f matrix 7 v a l A = Array [ Array [ Double ] ] ( 8 Array [ Double ] (. 5, 1, 0 ), 9 Array [ Double ] ( 0,. 5, 1 ), 10 Array [ Double ] ( 1, 0,. 5 ), 11 Array [ Double ] (. 5, 0, 0) 1 ) 13 v a l game = new ZeroSumGame (A) 14 v a l e x p e r t s = (0 to 3 ). map( new ZeroSumGameExpert ( game, ) ). t o L i s t 15 v a l a l g = new ExpertAlgorithm [ ZeroSumGame ] ( e p s i l o n, e x p e r t s ) f o r ( t < 1 to math. c e i l (100 math. l o g ( 4 ) ). intvalue ) 18 a l g. next ( ) 19 0 v a l y = game. getavgcolstrategy ( ) 1 v a l x = game. getavgrowstrategy ( ) v a l d e l t a = math. max( 3 game. computebestcolresponse ( x ). game. computeoutcome ( x, y ), 4 game. computeoutcome ( x, y ) game. computebestrowresponse ( y ). ) 5 6 p r i n t l n ( game. rounds ) 7 p r i n t l n ( x. t o L i s t ) 8 p r i n t l n ( d e l t a ) 9 } 30 } he code is in the files experts.scala and main.scala, and some helper classes are in linalgebra.scala. 4

5 (3) Consider the fractional routing problem on the graph G = (V, E), with pairs (s i, t i ), i [k]. Let E = m and V = n. Find a 1 + approximate solution to the fractional path routing problem in O(km log m ) time (with high probability). Lemma 1 If one chooses uniformly at random from k possibilities, times, with i [k], if i is the number of times i is chosen, we have P ( i k ) (1 ) > 1 1 k 8k ln k, then for each Lemma If G = G t, where the choice of G t is made independently of others, and G t have mean and covariance at most k, then 8k ln k, and P(G E[G](1 + )) > 1 1 k answer Consider the following randomized version of the experts algorithm, where the toll player maintains a probability distribution p e (t), e E. Algorithm 1 Experts algorithm Initialize i := 0 i for each iteration t {1,..., } do the routing player chooses, uniformly at random, i [k], and routes (s i, t i ) along the shortest path in the metric given by p e (t) let f i (t) F i be the corresponding flow solution (here F i is the set of feasible flows that connect (s i, t i )). let f j (t) = 0 for all j i increment i the toll player updates the distribution using the experts update rule p e (t + 1) p e (t)(1 + ) ge(t) where the gain of expert e on day t is the congestion of edge e under flow f(t) g e (t) = c(e, f(t)) Note: since the routing player only routes one pair (s i, t i ) on each day, the gains are at most one: g e (t) 1 for all e and t end for the candidate approximate flow solution is k 1 f = i=1 i f i (t) ( f is a random variable) Let 5

6 c max ( f) = max e E c(e, f) be the maximal congestion corresponding to f (random variable) c be the value of the toll congestion game c = min f F [k] max c(p, f) = max min c(p, f) p E f F [k] where p is in the set of probability distributions over the set of edges E, and f is in the set of feasible flows that connect all k pairs (F [k] = F F k = {f f k, f i F i ). Note: for each fixed p, the congestion function f c(p, f) is linear. his is crucial for the proof. G be the expected total gain G = p E c(p(t), f(t)) (random variable since the f(t) s are random). c(p(t), f(t)) is the cost of the shortest path picked on day t. G be the gain of the best expert (random variable) G = max c(e, f(t)) e 8k log(m) and assume we play the game for = days. he complexity of each iteration is O(m log(m)) and corresponds to computing one shortest path for the randomly chosen pair. hus the total time complexity is O( m log(m)) = O(km log(m) ). he claim is that with high probability, f is a (1 + 3)-approximate solution, i.e. 1 cmax ( f) c (1 + 3) (here c max ( f) is greater than the value of the game since the toll player plays second) or equivalently Proof of claim: we use a sequence of inequalities: c (1 3) c max ( f) 1. by the experts algorithm analysis, we have that for every realization, G (1 )G log(m)/, thus using = 8k log m/, we have kg. with high probability (probability 1 1 k ), t (1 )kg k log(m) = (1 ) kg 8 kg (1 )cmax ( f) () Indeed: by definition, G = max e t c(e, f(t)). Fixing e, we can write c(e, f(t)) = c(e, f i (t)) t t i = 1 i c(e, f i (t)) i i t (1 ) 1 c(e, f i (t)) using i (1 ) (Lemma 1) k i k i t = (1 ) k c(e, f) by linearity of c(e,.) taking the maximum over e, we have k max e t c(e, f(t)) (1 ) max e c(e, f), k i.e. G (1 )c max ( f) 6 (1)

7 3. with high probability (probability 1 1 k ), c (1 ) k G (3) Indeed, we have G = c(p(t), f(t)), where f(t) = f i(t) for some i, and the choice of i is made independently and uniformly at random. We have E[c(p(t), f(t))] = 1 c(p(t), k E[f(t)]) k by linearity. Here we observe that E[f(t)] = k i=1 1 k f i(t) (each i is chosen with probability 1/k), therefore k E[f(t)] = k i=1 f i(t) is the solution to the k-shortest paths, routing all pairs (since each f i (t) is the shortest path routing pair i). In other words, c(p(t), k E[f(t)]) = min f F[k] c(p(t), f), and thus is less than c (c = max p min f ). Now c c(p(t), E[f(t)]) = k E[c(p(t), f(t))] (where each c(p(t), f(t)) has expectation and variance less than one). Summing over t, c k E[G] then using Lemma, we have with high probability, E[G] G(1 ), therefore c k E[G] G(1 ) Combining these inequalities, we have with high probability c (1 ) kg by (3) (1 )((1 ) kg ) by (1) 8 (1 ) kg 8 (1 ) 3 c max ( f) 8 by () (1 3)c max ( f) 8 (there is a /8 additive term here, but I assume it is okay since we had a similar term in the original algorithm) 7

8 (4) Consider investing the following scenario. here are n possible instruments one can invest in in a period of time intervals, where each instrument either pays one of 0 dollars or 1 dollar at each time. You have inside information that at least one of the instruments yields P dollars over the time intervals. One can only invest in one instrument. he problem is to decide which one to invest in and when to do so. (a) Show that any deterministic algorithm may yield zero dollars (unless n = 1) proof Let us denote the gain of instrument i on time t by g i (t). Consider a deterministic algorithm A, and consider the sequence g perfect of gains such that g perfect i (t) = 1 i and t. Let i 0, t 0 be, respectively, the instrument invested in and the time invested in, when A is run on g perfect. Since A is deterministic (and causal!) when we run A on any sequence g such that g i (t) = 1 i and t < t 0, the algorithm will invest in i 0 on t 0. In particular if run on g haha defined by { gi haha 0 if i = i 0 and t t 0 (t) = 1 otherwise the gain is 0. (note that the sequence is feasible since all instruments except i 0 have maximal gain) 8

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call