Online Learning & Game Theory

Size: px

Start display at page:

Download "Online Learning & Game Theory"

Dorothy Stevens
5 years ago
Views:

1 Olie Learig & Gae Theory A quick overview with recet results Viaey Perchet Laboratoire Probabilités et Modèles Aléatoires Uiv. Paris-Diderot Jourées MAS Août 2014

2 Startig Exaples

3 Startig Exaples

4 Outlie 1 First, i a stochastic eviroet (i.i.d. processes) 2 The, i a adversarial eviroet (or idividual sequeces) 3 Fially, soe liks with gae theory

5 Stochastic Eviroet Regret Regret Miiizatio Extesios First Part Stochastic eviroet

6 Stochastic Eviroet Regret Regret Miiizatio Extesios Estiatio of Meas K = 2 discrete-tie proc.: X (1), X (2) i [0, 1] The payoff of the ad 1/2 o query Estiate the eas µ (1), µ (2) Hoeffdig iequality: expoetial decay X (k) µ k ( ) > ε with proba at ost 2 exp 2ε 2. Fiite uber of istakes: E N 1 { X (k) µ k > ε } 1 ε 2

7 Stochastic Eviroet Regret Regret Miiizatio Extesios Regret Miiizatio Choose oe ad to display k. Reward: X (k) Maxiize cuulative reward =1 X (k) or =1 µ(k) Miiize Regret [Haa 56] R = µ µ (k), with µ = ax{µ (1), µ (2) } =1 Equivalet forulatio with = µ µ k : R = 1{k } =1

8 Stochastic Eviroet Regret Regret Miiizatio Extesios Stochastic & Full Moitorig Full Moitorig: all values X (1), X (2) observed. Optial algorith: k = arg ax X (k) : ER 1 ad for sall, ER N Bouded regret, uiforly i! Give, worst is 1/ ad ER But i the exaples, oly X (k) is observed (badit oitorig)!

9 Stochastic Eviroet Regret Regret Miiizatio Extesios Stochastic & Badit Moitorig X (k) = 1 =1 X (k) with k = arg ax ot available, oly (k) X, ER = Θ(). (k) :k X = X (k) =k { : k = k} Balace exploitatio (play arg ax) ad exploratio (play arg i) to get iforatio Upper Cofidece Boud [Auer,Cesa-Biachi,Fischer 02] (k) 2 log() k = arg ax X + { : k = k} ER log()

10 Stochastic Eviroet Regret Regret Miiizatio Extesios New policy: Explore The Coit [P,Rigollet 13] Fiite horizo N N give. 1) Play alteratively ar 1 ad 2 as log as X (1) (2) X 2 2 log(4n/) 2) The play for ever the best ar. ER N log(n 2 ) Worst case 1 N Full Moit & ETC: N vs vs 1 with Full Ifo UCB: N log(n) Badit vs Full Moitorig Logarithic vs bouded regret; sae worst case

11 Stochastic Eviroet Regret Regret Miiizatio Extesios Bouded Regret? [Lai,Robbis 84],[Bubeck,P,Rigollet 13] Without additioal assuptio, No: lower boud i log()/ With ay give iterediate value µ (µ (1), µ (2) ), yes: If (1) (2) X or X above µ (k), the k = arg ax X Otherwise play alteratively both ars. X < µ o 1 (µ µ ) 2 stages (sae arguet for other ar). If µ ad kow: ER 1 as with Full Moit. If oly µ kow: ER log(1/ 2 )

12 Stochastic Eviroet Regret Regret Miiizatio Extesios More Geeral Fraeworks & Results Results i worst case ( distributio idepedet bouds ) Multi-ared badit. [Auer,Cesa-Biachi,Freud,Schapire 02],[Audibert,Bubeck 09] K > 2 ars, ER K Cotiuous badit. [Kleiberg 08],[Bubeck,Muos,Stoltz,Szepesvari 11] Ifiite set of ars, x [0, 1] d ad µ( ) Lipschitz. ER d+1 d+2 Liear badit[dai,hayes,kakade 08],[Zikevich 02],[Aberethy,Haza,Rakhli 08] x [0, 1] d ad µ( ) Liear. ER Badit with covariates (cf Google Exaple) [P,Rigollet 13],[Bull 14] Covariates ω [0, 1] d, E[X (k) ω] = µ (k) (ω) 1-Lip. ER d+1 d+2 Higher order bouds/sall losses/sparsity[haza,kale 10], [Gershiovitz 13], [Cappé,Garivier,Maillard,Muos,Stoltz 13], [Gaillard,Stoltz,va Erve 14] vs =1 (X (k) µ (k) ) 2, =1 K k=1 p(k) ( ) 2 X (k)

13 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Secod Part Adversarial eviroet What we have leared so far: I worst case aalysis Regret iiizatio i log(k ) with full oit Up to K, learig as fast with badit oit. tha with full oit. I distributio depedet (ot worst case) Bouded regret i 1 k Additioal assuptio required to lear as fast i badit oit

14 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Adversarial World I the exaples, data are ot i.i.d.. Spa seders ca eve adapt to spa filters, that is: The law of X (k) (1) +1 ca deped o X 1,..., X (1), X (K ) 1,..., X (K ) but eve o the previous choices k 1,..., k. The eviroet ca adapt ad choose rewards strategically. Sae def of regret (except argax chages with tie) R = ax k =1 X (k) =1 X (k) Goal: a policy with subliear regret o() agaist ANY possible strategy of the eviroet (i particular ay sequeces X (k) )

15 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret A Popular Algorith with Full Moitorig With k = argax X (k), ER = Θ(). With ay deteriistic policy, ER = Θ(). k with proba ( exp η =1 X (k) ) K j=1 (η exp =1 X (j) ); teperature η log(k ) Regret of expoetial weights [Auer,Cesa-Biachi,Freud,Schapire 02] ER log(k ), N Sae depedecy i as worst case i.i.d., optial i K.

16 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Optiality ad Badit Moitorig Optiality: ER log(k ) if X (k) = ±1 w.p. 1/2 E =1 X (k) Badit Moit.: = 0 but E ax k X (k) Expoetial weights w.r.t. = X (k) =1 1{k = k} P {k = k} X (k) = log(k ) X (k) : ER K log(k ) Reark: Optial bouds are K ubiased esti. of X (k)

17 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Discrete/Cotiuous Tie ( exp η ) =1 X (k) K j=1 (η exp =1 X (j) with V (k) ) = Φ(V ) := 1 η log ( K k=1 = =1 X (k) X (k) ) (k) exp(ηv ) Deteriistic cotiuous approx. of stochastic discrete proc. [Beaï,Hofbauer,Sori 06],[Beaï,Faure 13] ( ) E[V +1 ] V = X (k) +1 Φ(V ), X +1 k=1,...,k Stochastic Approx of V F (V ) := {U Φ(V ), U } 1; U R K Differetial iclusio with Lyapouov fuctio Φ(V ): Φ(V ) = V, Φ(V ) = U U, Φ(V ) 1, Φ(V ) = 0 li R li V = V (+ ) = V (0) = log(d)/η

18 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Refied Regret: Iteral-Swap- Regret: As well as the best costat strategy Iteral: O the stages where k = k, k was the best choice [Foster,Vohra 99] R it = ax k { ax j :k =k X (j) X (k) } Swap: As well as φ(k) istead of k, φ : [K ] [K ] [Blu,Masour 07] R swap = ax φ[k] [k] =1 X (φ(k)) X (k)

19 Adversarial Eviroet Distributio Idepedet A Algorith Iteral Regret Geeral regret Regret: As well as the best costat strategy Geeral: As well as ξ(k 1,..., k ) istead of k, ξ Ξ [Lehrer 02] R ge = ax ξ Ξ { ax j =1 X (ξ(k 1,...,k )) X (k) Geeralized versio of expoetial weights [P 14] ER ge log( Ξ ) Iteral regret log(k ), Swap regret K log(k ) }

20 Gae Theory Nash Equilibria Other equilibria Third Part Liks with Gae Theory What we have leared i the previous sectio: I worst case aalysis Learig is as fast i adversarial tha stochastic eviroet I the adversarial fraework Refied otios of regret ca be iiized

21 Gae Theory Nash Equilibria Other equilibria Agaist Oppoets - Gae Theory X (k) ot arbitrary, but iduced by choices of aother player TWO players, siultaeous actios i {1,.., K } ad {1,.., L} Payoffs are defied by two atrices A R K L ad B R K L. Player 1 picks row k {1,.., K } ad Player 2 colu l {1,.., L} Player 1 gets A k,l ad Player 2 gets B k,l Choices ca be rado p ([K ]) ad q ([L]) Player 1 gets k,l p kq l A k,l = p T Aq; P2 gets p T Bq Olie learig: X (k) = A k,l ad Y (l) = B k,l. Assue both players iiize regret idepedetly. Do they lear a solutio cocept fro gae theory?

22 Gae Theory Nash Equilibria Other equilibria Nash Equilibria A Nash equilibria is a situatio where o player has iterest to chage his actio [Nash 50], [Nash 51] A Nash equilibria is a pair (p, q ) ([K ]) ([L]) such that Player 1 has o iterest to chage give q : (p ) T Aq p T Aq, p ([K ]) Player 2 has o iterest to chage give p : (p ) T Aq (p ) T Aq, q ([L]) There always exist Nash equilibria; geerically a odd uber [Nash 50], [Nash 51], [Shapley 74]

23 Gae Theory Nash Equilibria Other equilibria Are Nash Equilibria Learable? Both players iiize their regret idepedetly. k p ([K ]), l q ([L]) Learig Nash equilibria could ea: (p, q ) ([K ]) ([L]) cv to a NE, or to set of NE. ( 1 =1 δ k, 1 =1 δ ) l ([K ]) ([L]) cv to a NE, or to set of NE =1 δ ) k,l ([K ] [L]) cv to a NE, or to set of NE ( 1 Nash equilibria are ot learable (idepedetly): [Hart,Mas-Colell 04] There always exists a gae s.t. oe of the covergece occur What is Learable? correlated eq, Miax-Value, Potetial eq [Coucheey, Gaujal, Mertikopolous]

24 Gae Theory Nash Equilibria Other equilibria Correlated Equilibria Players use a exteral device to correlate (as traffic lights); whe they are told to take a actio (as stop or go), it is optial A correlated equilibriu is a distributio π ([K ] [L]). (k, l ) π; P1 is told secretly to play k, P2 to play l l [L] if P1 plays k [K ], he gets l [L] π k,la k,l. If he plays j [K ] istead, he would get l [L] π k,la j,l π k,la k,l l [L] π k,la j,l, for all k, j [K ] Siilar to o iteral regret! If both players iiize iteral regret, epirical distributio of actios coverge to the set of correlated equilibria. [Foster,Vohra 99]

25 Gae Theory Nash Equilibria Other equilibria Miax Theory I zero-su gaes, players have optial strategies zero-su : B = A; P1 axiizes ad P2 iiizes p T Aq Value= ax p ([K ]) i q ([L]) pt Aq = i ax q ([L]) p ([K ]) p optial if (p ) T Aq Value for all q ([L]). p T Aq R 0 = 1 =1 X (k) Value ( 1 =1 δ k, 1 =1 δ ) l cv to optial strat, i.e. to NE NE are fast learable i zero-su gae, at O ( ) 1 [Harris 98]

26 coclusio I worst case aalysis Coclusio With full oitorig, learig is as fast i adversarial tha stochastic eviroet Up to K, learig is as fast with badit oit. tha with full oit. I distributio depedet (ot worst case) Additioal assuptio required to lear as fast i badit tha i full oitorig I gae theoretic fraework Nash equilibria are ot learable i geeral Correlated equilibria are learable (by iiizig iteral regret) I zero-su ad potetial gaes, equilibria are learable. Fudaetal textbook: [Cesa-Biachi,Lugosi 06]

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig