DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION

Size: px

Start display at page:

Download "DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION"

June Atkinson
5 years ago
Views:

1 DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT UofA, October , 2006

2 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED FORECASTERS 3 WEIGHTED AVERAGE FORECASTER 4 FOLLOW THE PERTURBED LEADER 5 BIBLIOGRAPHY

3 BINARY PREDICTION PROBLEMS Biary predictio problem : D = Y = {0, 1}, l(p, y) = I {p y} Loss of forecaster: Loss of expert i: Loss of best expert: ˆL = l(ˆp t, y t ) L i = l(f it, y t ) Goal: Miimize regret, i.e., L = mi L i i R = ˆL L mi

4 BINARY PREDICTION PROBLEMS/2 Propositio: Cosider biary predictio problems. For ay determiistic forecaster y 1: s.t. ˆL (y 1: ) =, where ˆL (y 1: ) is the forecaster s loss o y 1: Proof: ˆp t is based o past iformatio. Hece t, y t ca be selected to let l(ˆp t, y t ) = 1. Qu.e.d. Corollary: There is o determiistic forecaster whose regret is subliear for ay biary predictio problem ad ay set of experts. Proof: Let N = 2, f 1t 0, f 2t 1. The y 1:, L (y 1: ) /2. Pick some y 1: that forces ˆL (y 1: ) =. Hece ˆL (y 1: ) L (y 1: ) /2 = /2. Idea: Radomize the forecaster as this falsifies the above propositio! (prevets worst-case)

5 RANDOMIZED FORECASTERS N def = {1, 2,..., N}. Covetio: l : N Y R, l(i, y) Note: sice l ad Y are ot further restricted, geerality is ot lost Radom choice: I t N is radom variable Forecaster computes I t based o Past iformatio (past decisios, past outcomes) U t U [0,1) Notatio: p it def = P (I t = i I 1:t 1, Y 1:t 1 ) Outcomes ca also be radomized. Outcomes do ot deped o the past actios I 1:t 1! Oblivious or o-reactive oppoet/eviromet, (stock, wheather, etc.)

6 WEIGHTED AVERAGE FORECASTER [LITTLESTONE AND WARMUTH, 1994] Previous result o EWA: THEOREM (LOSS BOUND FOR THE EWA FORECASTER) Assume that D is a covex subset of some vector-space. Let l : D Y [0, 1] be covex i its first argumet. The, for EWA (ˆp t = With η = P P i w i,t 1f it j w j,t 1, w i,t 1 = e ηl i,t 1) it holds: ˆL L l N η + η 8. 8 l N, ˆL L /2 l N. Let f it = e i (ith uit vector), ˆp it = w i,t 1 P N j=1 w j,t 1 l(p, y) def = N i=1 p il(i, y), l is covex i p def D = 1 = {p R N p i 0, j p i = 1 } R N is covex.

7 BOUND ON THE PSEUDO-EXPECTED REGRET EWA: ˆp t = P P i w i,t 1f it j w j,t 1, w i,t 1 = e ηl i,t 1 THEOREM (LOSS BOUND FOR THE EWA FORECASTER: RANDOMIZED PREDICTIONS) Let l : N Y [0, 1]. The, for EWA it holds: L L l N η + η 8. 8 l N With η =, L L /2 l N. Here Note: N L = l(ˆp t, Y t ) = ˆp it l(i, Y t ). i=1 l(ˆp t, Y t ) = E [l(i t, Y t ) Y 1:t, I 1:t 1 ] (= E t [l(i t, Y t )]).

8 BOUND ON THE ACTUAL REGRET? What about ˆL L?? ˆL = l(i t, Y t )?? l(ˆp t, Y t ) = L l(ˆp t, Y t ) is the (coditioal) expected value of l(i t, Y t ) Sums of i.i.d. radom variables are -close to their expectatios! Hoeffdig: if X 1,..., X is i.i.d., a t X t b t the for = (X t E [X t ]) 2ɛ P ( > ɛ) exp( P 2 (b ), t a t ) 2 2ɛ P ( < ɛ) exp( P 2 (b ). t a t ) 2 Whe b t a t 1, with prob. 1 δ, (X t E [X t ]) (/2) l(1/δ).

9 WEIGHTED AVERAGE FORECASTER: ACTUAL REGRET ˆL L = (l(i t, Y t ) l(ˆp t, Y t )); Boud? E [l(i t, Y t )] l(ˆp t, Y t ), but.. E [l(i t, Y t ) Y 1:t, I 1:t 1 ] = = = = N E [l(i t, Y t ) Y 1:t, I 1:t 1, I t = i] P (I t = i Y 1:t, I 1:t 1 ) i=1 N l(i, Y t ) P (I t = i Y 1:t 1, I 1:t 1 ) 1 i=1 N l(i, Y t ) p it = l(ˆp t, Y t ). i=1 0 l(ˆp t, Y t ) l(i t, Y t ) l(ˆp t, Y t ) 1 l(ˆp t, Y t ) (boudedess) 1 I t, Y t are idepedet give the past

10 HOEFFDING-AZUMA INEQUALITY DEFINITION (MARTINGALE DIFFERENCE SERIES) The sequece of radom variables, V 1, V 2,..., is a martigale differece series w.r.t. X 1, X 2,... if t N, V t is a fuctio of X 1,..., X t ad E [V t X 1:t 1 ] = 0 w.p.1. THEOREM (HOEFFDING-AZUMA) Assume that V 1, V 2,... is a martigale differece series w.r.t. X 1, X 2,... such that V t [A t, A t + c t ] where c t is a (o-radom) positive costat, A t is a fuctio of X 1:t 1. The for = V t, P ( > ɛ) exp( P 2ɛ2 c2 t ), P ( < ɛ) exp( P 2ɛ2 ). c2 t COROLLARY If c t 1 the w.p. 1 δ, (/2) l(1/δ).

11 BOUND ON THE RANDOM REGRET By applyig the Hoeffdig-Azuma (H-A) iequality to V t = l(i t, Y t ) l(ˆp t, Y t ), X 1 = (I 1, Y 1, Y 2 ), X t = (I t, Y t+1 ) (t > 1) we get: THEOREM (LOSS BOUND FOR THE EWA FORECASTER: RANDOM REGRET) Let l : N Y [0, 1]. The, for EWA it holds: ˆL L l N η + η l(1/δ) With η = 8 l N, ˆL L 2 l N + 2 l(1/δ).

12 BERNSTEIN S INEQUALITY THEOREM (BERNSTEIN S INEQUALITY FOR MARTINGALE DIFFERENCES) Assume that V 1, V 2,... is a martigale differece series w.r.t. X 1, X 2,... such that V t K. Let Σ 2 = = [ ] E Vt 2 X 1:t 1, V t. The for all Σ,δ > 0 ( ) 2 P 2Σ 2 log( 1 δ ) + 3 K log( 1 δ ) or Σ2 Σ 2 1 δ

13 SMALL LOSSES Previous small-loss boud: 2L l N + l N Radom fluctuatios: add /2 l(1/δ) too big! Berstei s iequality uses the predictable variace to boud the fluctuatios Boud o the predictable variace : E t [ (l(i t, Y t ) l(ˆp t, Y t )) 2] = E t [l(i t, Y t ) 2] l 2 (ˆp t, Y t ) E t [l(i t, Y t ) 2] E t [l(i t, Y t )] = l(ˆp t, Y t ) the effect of radom fluctuatios is comparable with the boud o the expected regret: ( l(it, Y t ) l(ˆp t, Y t ) ) 2L l(1/δ) l(1/δ).

14 FOLLOW THE LEADER Does it work? Take N = 2: Choices: l(1, y t ) : 1 2, 0, 1, 0, 1, 0,... l(2, y t ) : 1 2, 1, 0, 1, 0, 1,... 1 L l(1, y t ) : 11 =.5, 0 L 12 =.5, 1 L13=1.5, 0 L14=1.5, 1 L15=2.5, 0, L l(2, y t ) : 21 =.5 2, 1 L22=1.5, 0 L22=1.5, 1 L23=2.5, 0 L24=2.5, 1,... ˆL = , whilst L i /2, i = 1, 2, ˆL L /2 1.5

15 FOLLOW THE PERTURBED LEADER [Haa, 1957] Follow the perturbed leader (radomized fictitous play): ( ) I t = argmi i=1,...,n Li,t 1 + Z it, Z t f ( ), i.i.d. Goal: develop boud o L! Relate to BEH: ( ) Î t = argmi i N Li,t + Z i,t.

16 FPL: ANALYSIS, PLAN 1 ˆL ad ˆL BEH are close i expectatios: [ ] [ ] E l(i t, y t ) E l(ît, y t ) 2 ˆLBEH ad L are close: l(ît, y t ) 3 Estimate E [Boud ]. l(î, y t ) + Boud L + Boud

17 STEP 1: Goal: ˆL BEH ˆL BOUND [ ] [ ] E l(i t, y t ) E l(ît, y t ) E [l(i t, y t )] = E [ l(argmi i (L i,t 1 + Z it ), y t ) ] = E [F t (Z t )] = F t (z)f (z)dz, where F t (z) = l(argmi i (L i,t 1 + z i ), y t ) E [ l(argmi i (L i,t + Z it ), y t ) ] = E [ l(argmi i (L i,t 1 + l it + Z it ), y t ) ] = E [F t (Z t + l t )] = F t (z + l t )f (z)dz, where l it = l(i, y t ), l t = (l(1, y t ),..., l(n, y t )).

18 STEP 1: ˆL BEH ˆL BOUND/2 E [l(i t, y t )] = [ ] E l(ît, y t ) = F t (z + l t )f (z)dz = F t (z)f (z)dz, F t (z)f (z l t )dz. E [l(i t, y t )] = E [l(i t, y t )] F t (z)f (z)dz sup z,t ( ) sup z,t f (z) f (z l t ) - Choose e.g. f (z) = ( η 2 )N e η z 1, the f (z) f (z l t ) F t (z)f (z l t )dz [ ] E l(ît, y t ). ( ) f (z) f (z l t ) = e η( z 1 z l t 1 ) e η l t 1 e η.. provided that l t 1 1: TODO!

19 STEP 2: ˆL BEH L BOUND ˆL BEH = l(ît, y t ) Î t = argmi i (( t s=1 l(i, y s)) + Z it ). Pla: 1 BEH Boud ˆL = l(ît, y t ) by l(î, y t ) 2 Boud l(î, y t ) by L. I fact, for Step 2.2: { } l(î, y t ) + ZÎ, = mi(l i, + Z i ) i mi i (L i + max Here L has terms overgrows max j Z j! j Z j ) = L + max Z j j

20 STEP 2.1: ˆL BEH L BOUND We kow: For L t (p) = t s=1 l s(p), p t = argmi p L t (p), l t (p t ) l t (p ). Reuse? Î t = argmi i { ( t s=1 l(i, y s)) + Z it }. Rewrite as a miimum of a sum of losses: t ( ) Î t = argmi i l(i, ys ) + Z is Z i,s 1 s=1 t =: argmi i ˆl s (i), here Z i0 = 0 s=1 ˆl s (i) def = l(i, y s ) + Z is Z i,s 1 Reuse: ˆl t (Ît) ˆl t (Î)

21 STEP 2: ˆL BEH L BOUND/3 ˆl t (Ît) ˆl t (Î) ˆl t (Ît) = l(ît, y t ) + ZÎt,t Z Î t,t 1 = ˆL BEH + ZÎt,t Z Î t,t 1 ˆl t (Î) = l(î, y t ) + ZÎ, L + max Z j ( see above) j ˆL BEH + ˆL BEH ( ZÎt,t Z Î t,t 1 L + max Z i, + i ) L + max Z i, i max( Z i,t + Z i,t 1 ). i

22 STEP 3: TAKE EXPECTATIONS ˆL BEH = l(ît, y t ) L + max Z i, + i max(z i,t 1 Z i,t ) ( ) i Pla: Take expectatios Problem: Hard to cotrol E [ max i (Z i,t 1 Z i,t ) ] terms! Pla: Get rid of it! Idea: If Z t = Z t 1 (t 2), but Z 1 f ( ) the for [l(î ] [ ] Î t = argmi i (L it + Z it ), we have E t, y t) = E l(ît, y t ). (*) still applies to Z t ad Î t! [ ] [ ] E l(ît, y t ) = E l(î t, y t ) [ ] [ ] L + E max Z i + E max( Z i ). i i

23 SUMMARY Assumig that l t 1 1, Z t f (z) = ( η 2 )N e η z 1, ( ) f (z) E [l(i t, y t )] sup z,t f (z l t ) [ ] [ ] [ E l(ît, y t ) L + E max Z i + E i [ ] [ ] E l(ît, y t ) e η E l(ît, y t ) max( Z i ) i Hece [ ] [ ]) E [l(i t, y t )] e (L η + E max Z i + E max( Z i ). i i Outstadig issues: Show that we may assume that l t 1 1 Estimate E [max i Z i ] ad E [max i ( Z i )] Note: Z ad Z are idetically distributed, hece E [max i Z i ] = E [max i ( Z i )]. ].

24 ESTIMATE OF E [max i Z i ] E [max i Z i1 ] E [max i Z i1 ] = 0 P (max i Z i1 > u) du itegrate the tail P (max i Z i1 > u) NP ( Z 11 > u) N e ηu uio boud v 0 e ηu du = e ηv /η. P (max i Z i1 > u) du P (max i Z i1 > u) du v + N η e ηv v v Choose v = l(n)/η to get [ ] E max Z i1 (1 + l(n))/η. i..ad L e η ( L + ) 2(1+l N) η.

25 CAN WE ASSUME THAT l t 1 1? I geeral: NO Idea: l t 1 small sparse rewards (may zeroes) Sparsify rewards!

26 SPARSIFYING THE REWARDS Trasform rouds ito N rouds: #t l 1t l 2t... l Nt l 1t #N(t 1) l 2t... 0 #N(t 1) l Nt #Nt We have l ew s 1 1, sice 0 l it 1 p orig it, pis ew : actio probs; T t = N(t 1) + 1 Sychroicity of losses: L orig i,t 1 = Lew i,t t 1 porig it = pi,t ew t l 1t 0 first actio s prob. decreases from T t to T t + 1: p1,t ew t +1 pew 1,T t, others icrease. Repeat for T t + 2,... Hece, L orig t L ew T t ( L orig L ew N e η L,ew N + L ew T t L ew Nt ) ( 2(1+l N) η = e η L,orig + ) 2(1+l N) η

27 FPL BOUND THEOREM (FPL BOUND [KALAI AND VEMPALA, 2003]) Let l : N Y [0, 1] ad cosider FPL! Let Z t ( η 2 )N e η z 1. The ] ( ) E [ˆL e η E [L 2(1 + l N) ] + η Choose η = mi{1, 2(1 + l N)/((e 1)L )}. The E [L ] E [L ] 2 2(e 1)L (1 + l N) + 2(e + 1)(1 + l N). PROOF. Just combie the facts of the previous slides!

28 REFERENCES Haa, J. (1957). Approximatio to Bayes risk i repeated play. Cotributios to the theory of games, 3: Kalai, A. ad Vempala, S. (2003). Efficiet algorithms for the olie decisio problem. I Proceedigs of the 16th Aual Coferece o Learig Theory, pages Spriger. Littlestoe, N. ad Warmuth, M. (1994). The weighted majority algorithm. Iformatio ad Computatio, 108:

THE WEIGHTED MAJORITY ALGORITHM

THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!