Linear Bandits. Peter Bartlett

Size: px
Start display at page:

Download "Linear Bandits. Peter Bartlett"

Transcription

1 Linear Bandits. Peter Bartlett Linear bandits. Exponential weights with unbiased loss estimates. Controlling loss estimates and their variance. Barycentric spanner. Uniform distribution. John s distribution. Lower bounds. Stochastic mirror descent. Full information. Bandit information. 1

2 Linear bandits At round t, Strategy chooses a t A R d. Adversary chooses linear lossl t : A [ 1,1]. Strategy sees lossl t (a t ). Loss is linear in action. Aim to minimize regret: R n = E l t (a t ) inf E n l t (a). a A 2

3 Example: Packet routing Consider the problem of packet-routing in a network (V,E). At round t, Strategy chooses a path a t A {0,1} E from origin node to destination node. Adversary chooses delays l t L = [0,1] E. See lossl t a t (total delay). Aim to minimize regret: R n = E Loss is linear in action. l t a t inf E n l t a. a A 3

4 Linear bandits vs k-armed bandits This problem is closely related to the classical k-armed bandit problem: At round t: Strategy chooses a t A = {1,...,k}. Adversary chooses l t L = [0,1] A. See lossl t (a t ). Aim to minimize regret: R n = E l t (a t ) inf E n l t (a). a A 4

5 Linear bandits vs k-armed bandits This is unchanged (up to a constant factor) if we instead define A = {e 1,...,e k } R k, L = {l : A [ 1,1] linear}. And allowing the strategy to choose a in the convex hull ofadoes not change the regret R n = E l t (a t ) inf E n l t (a). a A (But it might make the game easier for the strategy since it changes the information that the strategy sees.) 5

6 Finite covers For a compact A R d, we can construct an ǫ-cover of size O(1/ǫ d ), for example, in the uniform metric ρ(â,a) = â a := max i â i a i. Since we re aiming for O( n) regret, we can think of A as having cardinality A = O(n d/2 ), solog A = O(dlogn). 6

7 Exponential weights for linear bandits Given A, distribution µ on A, mixing coefficient γ > 0, learning rate η > 0, set q 1 uniform ona. for t = 1,2,...,n, 1. p t = (1 γ)q t +γµ 2. choose a t p t 3. observe l T t a t 4. update q t+1 (a) q t (a)exp( η l T t a)), where lt = Σ 1 t a t a T t l t, Σ t = E a pt aa T. 7

8 Unbiased loss estimates Assume span(a) = R d (otherwise, we can project to a lower dimension) and that µ has support on a d-dimensional set. So E a pt aa T has rank d. Strategy observes a T t l t and a t, so it can compute l t = Σ 1 t a t ( a T t l t ). l t is unbiased: ] E [ lt F t 1 = ( E a pt aa T) 1( Eat p t a t a T t ) lt = l t. 8

9 Regret bound Theorem: For ηsup a A lt t a 1, R n γn+ log A η +(e 2)η 2. EE a pt ( lt t a) So we need to control η times the magnitude of the loss estimates, η sup l T t a a A and the variance term, EE a pt ( lt t a) 2. 9

10 Proof The regret is [ n ( E l T t a t l T t a )]. We ve seen that, given historyf t 1, ] E [ lt F t 1 = E [ Σ 1 t a t a T ] t l t F t 1 = E[lt F t 1 ]. Lemma: Some unbiased estimates involving l t : E [ l T t a ] ] = E[ lt t a, E [ [ ] [ ] l T ] ] T t a t = E t (a)e [ lt F t 1 a = E p t (a) l a Ap T t a. a A 10

11 Proof So we can write the strategy s expected cumulative loss as E l T t a t = E p t (a) l T t a. a A We ll give up on the loss incurred in the exploration trials: a A p t (a) l T t a = a A = (1 γ) ((1 γ)q t (a)+γµ(a)) l T t a ( n a A q t (a) l T t a ) +γ a A µ(a) l T t a. } {{ } exploration 11

12 Proof For q t, we follow the standard analysis (see Adversarial Bandits), but instead of using non-negativity of the ls, we use a lower bound: logeexp( η(x EX)) E(exp( ηx) 1+ηX) (e 2)η 2 EX 2, where the last inequality uses exp( x) 1 x+(e 2)x 2 for x 1. So ifη l T t a 1 for all a A, the previous analysis shows that, for any a A, the first term above satisfies q t (a) l T t a a A l T t a + log A η +(e 2)η 2. q t (a)( lt t a) a A 12

13 Proof Combining, and using the fact that (1 γ)q t (a) p t (a), a A p t (a) l T t a l T t a +(exploration)+ log A η The unbiasedness lemma gives +(e 2)η 2. p t (a)( lt t a) a A R n γn+ log A η +(e 2)η 2. E a pt ( lt t a) 13

14 Controlling variance Lemma: ForL [ 1,1] A, the variance term is bounded: EE a pt ( lt t a) 2 d. 2 ) E( lt t a) = a T E ( lt lt t a = a T E ( (l T t a t ) 2Σ 1 t a t a T t Σ 1 t a T Σ 1 t E ( a t a T t ) Σ 1 t a ) a = a T Σ 1 t a. ) 2 ( E a pt E( lt t a Etr a T Σ 1 t a ) = tr ( Σ 1 t E ( aa T)) = tr(i) = d. 14

15 Controlling the magnitude of the estimator Lemma: ForL [ 1,1] A, l T t a sup a,b A a T Σ 1 t b. l T t a = a T t l t ( Σ 1 t a t ) T a a T t l t a T t Σ 1 t a sup a T Σ 1 t b. a,b A We ll see that typically sup a,b A a T Σ 1 t b c d /γ. 15

16 Regret bound Theorem: For L [ 1,1] A, if sup a T Σ 1 t b c d a,b A γ, log A settingη = n((e 2)d+c d ) γ = c d η givesr n 2 n(d+c d )log A. 16

17 Exploration distributions (Dani, Hayes, Kakade, 2008): For µ uniform over barycentric spanner, ( R n = O d ) nlog A = Õ ( d 3/2 ) n. (Cesa-Bianchi and Lugosi, 2009): For several combinatorial problems, A {0,1} d,µuniform over A gives sup a A a 2 2 λ min (E a µ [aa T ]) = O(d), so ( ) R n = O dnlog A = Õ( d n ). (Bubeck, Cesa-Bianchi and Kakade, 2009): John s Theorem: Õ(d n). 17

18 Barycentric spanner (Suppose that A R d spans R d.) A barycentric spanner ofais a set {b 1,...,b d } that spans R d and satisfies: for all a A there is ) an α [ 1,1] d such that a = Bα, where B = (b 1 b d. Every compact A has a barycentric spanner. If linear functions can be efficiently optimized over A, then there is an efficient algorithm for finding an approximate barycentric spanner (that is, α i 1+δ;O(d 2 logd/δ) linear optimizations). 18

19 Barycentric spanner Lemma: If {b 1,...,b d } A maximizes det(b), then it is a barycentric spanner. Proof. Fora = Bα, det(b) = det i ( a b 2 b d ) α i det = α 1 det(b). ( b i b 2 b d ) 19

20 Barycentric spanner Theorem: For A [ 1,1] d and µ uniform on a barycentric spanner ofa, (that is,c d d 2 ). Hence, sup a T Σ 1 t b d2 a,b A γ R n 2d 2nlog A. Σ t = γ d BBT +(1 γ) q t (a)aa T. a A }{{} M 20

21 Barycentric spanner: Proof sup a T Σ 1 t b a,b A sup α T B T Σ 1 t Bβ α,β [ 1,1] d sup α T B T Σ 1 α = β = t Bβ d ( = dλ max B T Σ 1 t B ) = dλ max ( B 1 Σ t B T) 1 d = ( λ ( min B 1 γ d BBT +M ) B T) d 2 γλ min (B 1 BB T B T ) = d2 γ, where λ max ( ) andλ min ( ) denote the largest and smallest eigenvalues. 21

22 Other exploration distributions Lemma: sup a T Σ 1 t b a,b A sup a A a 2 2 γλ min (E a µ [aa T ]). sup a T Σ 1 t b sup a,b A a A a 2 ( ) 2λ max Σ 1 t = sup a A a 2 2. λ min (Σ t ) λ min (Σ t ) = min p t (a)v T aa T v v =1 a A ( γ min µ(a)v T aa T v = γλ min Ea µ [aa T ] ). v =1 a A 22

23 John s distribution Theorem: [John s Theorem] For any convex set A R d, denote the ellipsoid of minimal volume containing it as E = { x R d : (x c) T M(x c) 1 }. Then there is a set{u 1,...,u m } E A ofm d(d+1)/2+1 contact points and a distributionpon this set such that anyx R d can be written x = c+d m p i x c,u i c (u i c), i=1 where, is the inner product for which the minimal ellipsoid is the unit ball about its center c: x,y = x T My. 23

24 John s distribution This shows that x c = d i p i (u i c)(u i c) T M(x c) x = d i 1 d I = i p i ũ i ũ T i x p i ũ i ũ T i, where ũ i = M 1/2 (u i c), and similarly for x. Setting the exploration distributionµto be the distributionpover the set of transformed contact pointsũ i, we see that, for a,b A, ã T E u µ uu T b = 1 dãt b. 24

25 John s distribution So if we shift the origin of the set A and of theu i (and the corresponding introduction of a constant component in the losses), we have that is,c d d. Hence, sup a T Σ 1 t b d a,b A γ, R n 2 2ndlog A. 25

26 Exploration distributions (Dani, Hayes, Kakade, 2008): For µ uniform over barycentric spanner, ( R n = O d ) nlog A = Õ ( d 3/2 ) n. (Cesa-Bianchi and Lugosi, 2009): For several combinatorial problems, A {0,1} d,µuniform over A gives sup a A a 2 2 λ min (E a µ [aa T ]) = O(d), so ( ) R n = O dnlog A = Õ( d n ). (Bubeck, Cesa-Bianchi and Kakade, 2009): John s Theorem: Õ(d n). 26

27 Outline Linear bandits. Exponential weights with unbiased loss estimates. Controlling loss estimates and their variance. Barycentric spanner. Uniform distribution. John s distribution. Lower bounds. Stochastic mirror descent. Full information. Bandit information. 27

28 Lower bounds Lower bounds from the stochastic setting suffice. Theorem: Consider A = {±1} d, L {±e i : 1 i d}. There is a constant c such that, for any strategy and any n, there is an i.i.d. adversary for which R n cd n. (Here, ndlog A = O(d n).) 28

29 Lower bounds: proof Probabilistic method: Fixǫ (0,1/2) and, for each b {±1} d, definep b onlas P b (e i ) = 1 b iǫ 2d P b ( e i ) = 1+b iǫ 2d (so that the optimal a = b). We ll choose b uniformly, and show that the expected regret under this choice is large.,. 29

30 Lower bounds: proof R n (P b ) = = = = i=1 d E[l t,i (a t,i b i )] i=1 d ( 1 2bi ǫ (a t,i b i ) 2d i=1 d i=1 (b i a t,i ) b iǫ d d 2ǫ 1[a t,i b i ]. d }{{} R i n (b i) 1+2b iǫ 2d ) 30

31 Lower bounds: proof The regret of sub-game i,r i n(b i ), is at least the regret that would be incurred if the strategy knew that the adversary was using one of thep b distributions, and also knew {b j : j i}. In that case, it would know θ := E j i l t,j a t,j, and so at each round, it would see a (±1) Bernoulli random variable l T t a t, with mean θ b i a t,i ǫ d. Notice that the 1/d here is crucial: because information about theith component only arrives once every d rounds on average, the range of values of the unknown Bernoulli mean has shrunk. If the strategy saw the components of l i (even in the semi-bandit setting, witha = {0,1} d and feedback (l t,1 a t,1,...,l t,d a t,d )), it would not suffer this disadvantage. 31

32 Lower bounds: proof Using the same argument as we saw for the stochastic multi-armed bandit case (with a little extra work to show that θ is unlikely to be too close to0 or1, so that the variance of the Bernoulli is not too small), we see that ER i n(b i ) 2ǫn ( ) 1 n d 2 cǫ. d Choosingǫ = d/(4c n) giveser i n(b i ) = Ω( n), and so ER n (P b ) = Ω(d n). [NB: A = [ 1,1] d L = {±e i } has lower regret, because the strategy can use a t to identify which direction ±e i was played.] [Open problem: when isθ(d n) possible with an efficient strategy?] 32

33 Outline Linear bandits. Exponential weights with unbiased loss estimates. Controlling loss estimates and their variance. Barycentric spanner. Uniform distribution. John s distribution. Lower bounds. Stochastic mirror descent. Full information. Bandit information. 33

34 Full information online prediction games Repeated game: Strategy plays a t A Adversary reveals l t L Aim to minimize regret: R n = l t (a t ) min a A l t (a). 34

35 Online Convex Optimization Choosinga t to minimize past losses can fail. The strategy must avoid overfitting. First approach: gradient steps. Stay close to previous decisions, but move in a direction of improvement. 35

36 Online Convex Optimization 1. Gradient algorithm. 2. Regularized minimization Bregman divergence Regularized minimization minimizing latest loss and divergence from previous decision Constrained minimization equivalent to unconstrained plus Bregman projection Linearization Mirror descent 3. Regret bound 36

37 Online Convex Optimization: Gradient Method a 1 A, a t+1 = Π A (a t η l t (a t )), where Π A is the Euclidean projection on A, Π A (x) = argmin a A x a. Theorem: For G = max t l t (a t ) and D = diam(a), the gradient strategy withη = D/(G n) has regret satisfying R n GD n. 37

38 Online Convex Optimization: Gradient Method Example: (2-ball, 2-ball) A = {a R d : a 1},L = {a v a : v 1}. D = 2,G 1. Regret is no more than2 n. (And O( n) is optimal.) Example: (1-ball, -ball) A = (k), L = {a v a : v 1}. D = 2,G k. Regret is no more than2 kn. Since competing with the whole simplex is equivalent to competing with the vertices (experts) for linear losses, this is worse than exponential weights ( k versus logk). 38

39 Gradient Method: Proof Define ã t+1 = a t η l t (a t ), a t+1 = Π A (ã t+1 ). Fixa A and consider the measure of progress a t a. a t+1 a 2 ã t+1 a 2 = a t a 2 +η 2 l t (a t ) 2 2η t (a t ) (a t a). By convexity, (l t (a t ) l t (a)) l t (a t ) (a t a) a 1 a 2 a n+1 a 2 2η + η 2 l t (a t ) 2 39

40 Online Convex Optimization 1. Gradient algorithm. 2. Regularized minimization Bregman divergence Regularized minimization minimizing latest loss and divergence from previous decision Constrained minimization equivalent to unconstrained plus Bregman projection Linearization Mirror descent 3. Regret bound 40

41 Online Convex Optimization: A Regularization Viewpoint Suppose l t is linear: l t (a) = g t a. Suppose A = R d. Then minimizing the regularized criterion ( ) t a t+1 = argmin η l s (a)+ 1 a A 2 a 2 corresponds to the gradient step s=1 a t+1 = a t η l t (a t ). 41

42 Online Convex Optimization: Regularization Regularized minimization Consider the family of strategies of the form: ( ) t a t+1 = argmin η l s (a)+r(a). a A s=1 The regularizer R : R d R is strictly convex and differentiable. R keeps the sequence ofa t s stable: it diminishesl t s influence. We can view the choice ofa t+1 as trading off two competing forces: making l t (a t+1 ) small, and keeping a t+1 close toa t. This is a perspective that motivated many algorithms in the literature. 42

43 Properties of Regularization Methods In the unconstrained case (A = R d ), regularized minimization is equivalent to minimizing the latest loss and the distance to the previous decision. The appropriate notion of distance is the Bregman divergence D Φt 1 : Define Φ 0 = R, Φ t = Φ t 1 +ηl t, so that a t+1 = argmin a A ( η ) t l s (a)+r(a) s=1 = argmin a A Φ t(a). 43

44 Bregman Divergence Definition: For a strictly convex, differentiable Φ : R d R, the Bregman divergence wrt Φ is defined, for a,b R d, as D Φ (a,b) = Φ(a) (Φ(b)+ Φ(b) (a b)). D Φ (a,b) is the difference between Φ(a) and the value at a of the linear approximation of Φ about b. (PICTURE) 44

45 Bregman Divergence Example: For a R d, the squared euclidean norm, Φ(a) = 1 2 a 2, has D Φ (a,b) = 1 ( ) 1 2 a 2 2 b 2 +b (a b) the squared euclidean norm. = 1 2 a b 2, 45

46 Bregman Divergence Example: Fora [0, ) d, the unnormalized negative entropy,φ(a) = d i=1 a i(lna i 1), has D Φ (a,b) = i = i (a i (lna i 1) b i (lnb i 1) lnb i (a i b i )) ( a i ln a ) i +b i a i, b i the unnormalized KL divergence. Thus, fora d,φ(a) = i a ilna i has D Φ (a,b) = i a i ln a i b i. 46

47 Bregman Divergence When the domain of Φ iss R d, in addition to differentiability and strict convexity, we make some more assumptions: S is closed, and its interior is convex. For a sequence approaching the boundary of S, Φ(a n ). We say that such aφis a Legendre function. 47

48 Bregman Divergence Properties 1. D Φ 0,D Φ (a,a) = D A+B = D A +D B. 3. For l linear, D Φ+l = D Φ. 4. Bregman projection, Π Φ A (b) = argmin a AD Φ (a,b) is uniquely defined for closed, convex A S (that intersects the interior ofs). 5. Generalized Pythagorus: for closed, convex A,a = Π Φ A (b),a A, D Φ (a,b) D Φ (a,a )+D Φ (a,b). 6. a D Φ (a,b) = Φ(a) Φ(b). 7. For Φ the Legendre dual ofφ, Φ = ( Φ) 1, D Φ (a,b) = D Φ ( Φ(b), Φ(a)). 48

49 Legendre Dual Here, for a Legendre function Φ : S R, we define the Legendre dual as Φ (u) = sup(u v Φ(v)). v S ( 49

50 Legendre Dual Properties: Φ is Legendre. dom(φ ) = Φ(int domφ). Φ = ( Φ) 1. D Φ (a,b) = D Φ ( Φ(b), Φ(a)). Φ = Φ. Examples: Φ = p: Φ = q, where 1/p+1/q = 1. Φ(a) = d i=1 ea i : Φ (u) = i u i(lnu i 1). 50

51 Online Convex Optimization 1. Problem formulation 2. Empirical minimization fails. 3. Gradient algorithm. 4. Regularized minimization Bregman divergence Regularized minimization minimizing latest loss plus divergence from previous decision Constrained minimization equivalent to unconstrained plus Bregman projection Linearization Mirror descent 5. Regret bounds 51

52 Properties of Regularization Methods In the unconstrained case (A = R d ), regularized minimization is equivalent to minimizing the latest loss and the distance (Bregman divergence) to the previous decision. Theorem: Define ã 1 via R(ã 1 ) = 0, and set ã t+1 = arg min a R d ( ηlt (a)+d Φt 1 (a,ã t ) ). Then ã t+1 = arg min a R d ( η ) t l s (a)+r(a). s=1 52

53 Properties of Regularization Methods Proof. By the definition of Φ t, ηl t (a)+d Φt 1 (a,ã t ) = Φ t (a) Φ t 1 (a)+d Φt 1 (a,ã t ). The derivative wrt a is Φ t (a) Φ t 1 (a)+ a D Φt 1 (a,ã t ) = Φ t (a) Φ t 1 (a)+ Φ t 1 (a) Φ t 1 (ã t ) Setting to zero shows that Φ t (ã t+1 ) = Φ t 1 (ã t ) = = Φ 0 (ã 1 ) = R(ã 1 ) = 0, Soã t+1 minimizesφ t. 53

54 Properties of Regularization Methods Constrained minimization is equivalent to unconstrained minimization, followed by Bregman projection: Theorem: For we have a t+1 = argmin a A Φ t(a), ã t+1 = arg min a R d Φ t (a), a t+1 = Π Φ t A (ã t+1). 54

55 Properties of Regularization Methods Proof. Let a t+1 denote Π Φ t A (ã t+1). First, by definition of a t+1, Φ t (a t+1 ) Φ t (a t+1). Conversely, D Φt (a t+1,ã t+1 ) D Φt (a t+1,ã t+1 ). But Φ t (ã t+1 ) = 0, so Thus,Φ t (a t+1) Φ t (a t+1 ). D Φt (a,ã t+1 ) = Φ t (a) Φ t (ã t+1 ). 55

56 Properties of Regularization Methods Example: For linear l t, regularized minimization is equivalent to minimizing the last loss plus the Bregman divergence wrt R to the previous decision: ( ) t arg min η l s (a)+r(a) a A s=1 ( ) = Π R A arg min (ηl t (a)+d R (a,ã t )) a R d because adding a linear function toφdoes not change D Φ., 56

57 Linear Loss We can replace l t by l t (a t ), and this leads to an upper bound on regret. Thus, for convex losses, we can work with linear l t. 57

58 Regularization Methods: Mirror Descent Regularized minimization for linear losses can be viewed as mirror descent taking a gradient step in a dual space: Theorem: The decisions ã t+1 = arg min a R d ( η ) t g s a+r(a) s=1 can be written ã t+1 = ( R) 1 ( R(ã t ) ηg t ). This corresponds to first mapping from ã t through R, then taking a step in the direction g t, then mapping back through ( R) 1 = R to ã t+1. 58

59 Regularization Methods: Mirror Descent Proof. For the unconstrained minimization, we have R(ã t+1 ) = η R(ã t ) = η t g s, s=1 t 1 s=1 g s, so R(ã t+1 ) = R(ã t ) ηg t, which can be written ã t+1 = R 1 ( R(ã t ) ηg t ). 59

60 Mirror Descent Given: compact, convex A R d, closed, convex S A, η > 0, S A, Legendre R : S R. Set a 1 argmin a A R(a). For round t: 1. Play a t ; observe l t R d. 2. w t+1 = R ( R(a t ) η l t (a t )). 3. a t+1 = argmin a A D R (a,w t+1 ). [Always convex optimization.] 60

61 Exponential weights as mirror descent For A = (k) and R(a) = exponential weights: k (a i loga i a i ), this reduces to i=1 R(u) i = loga i, R (u) = i e u i, R (u) i = exp(u i ), R(w t+1 ) i = log(w t+1,i ) = loga t,i η l t (a t ) i, w t+1,i = a t,i exp( η l t (a t ) i ), D R (a,b) = ( a i log a ) i +b i a i, b i i a t+1,i w t+1,i. 61

62 Mirror descent regret Theorem: Suppose that, for all a A int(s), l L, R(a) η l(a) R(int(S)). For any a A, (l t (a t ) l t (a)) ( 1 R(a) R(a 1 )+ η D R ( R(a t ) η l t (a t ), R(a t )) ). Proof: Fixa A. Since thel t are convex, (l t (a t ) l t (a)) l t (a t ) T (a t a). 62

63 Mirror descent regret: proof The choice of w t+1 and the fact that R 1 = R show that Hence, R(w t+1 ) = R(a t ) η l t (a t ). η l t (a t ) T (a t a) = (a a t ) T ( R(w t+1 ) R(a t )) = D R (a,a t )+D R (a t,w t+1 ) D R (a,w t+1 ). Generalized Pythagorus inequality shows that the projection a t+1 satisfies D R (a,w t+1 ) D R (a,a t+1 )+D R (a t+1,w t+1 ). 63

64 Mirror descent regret: proof η l t (a t ) T (a t a) ( D R (a,a t )+D R (a t,w t+1 ) D R (a,w t+1 ) ) D R (a,a t+1 ) D R (a t+1,w t+1 ) = D R (a,a 1 ) D R (a,a n+1 )+ D R (a,a 1 )+ (D R (a t,w t+1 ) D R (a t+1,w t+1 )) D R (a t,w t+1 ). 64

65 Mirror descent regret: proof = D R (a,a 1 )+ D R ( R(w t+1 ), R(a t )) = D R (a,a 1 )+ D R ( R(a t ) η l t (a t ), R(a t )) = R(a) R(a 1 )+ D R ( R(a t ) η l t (a t ), R(a t )). 65

66 Linear bandit setting See onlyl t (a t ); l t (a t ) is unseen. Instead of a t, strategy plays a noisy version, x t. Strategy uses l t (x t ) to give an unbiased estimate of l t (a t ). 66

67 Stochastic mirror descent Given: compact, convex A R d, η > 0, S A, Legendre R : S R. Set a 1 argmin a A R(a). For round t: 1. Play noisy version x t of a t ; observe l t (x t ). 2. Compute estimate g t of l t (a t ). 3. w t+1 = R ( R(a t ) η g t ). 4. a t+1 = argmin a A D R (a,w t+1 ). 67

68 Regret of stochastic mirror descent Theorem: Suppose that, for all a A int(s) and linear l L, E[ g t a t ] = l t (a t ) and R(a) η g t (a) R(int(S)). For any a A, (l t (a t ) l t (a)) 1 η ( R(a) R(a 1 )+ + ED R E[ a t E[x t a t ] g t ]. ( ) ) R(a t ) η g t, R(a t ) 68

69 Regret: proof E (l t (x t ) l t (a)) = E = E E = E (l t (x t ) l t (a t )+l t (a t ) l t (a)) ( [ E l T t (x t a t ) ] a t +lt (a t ) l t (a) ) a t E[x t a t ] g t +E a t E[x t a t ] g t +E l t (a t ) T (a t a) g t T (a t a). 69

70 Regret: proof Applying the regret bound for the (random) linear lossesa g T t a gives E a t E[x t a t ] g t ( + 1 R(a) R(a 1 )+ η ) ED R ( R(a t ) η g t, R(a t )). 70

71 Regret: Euclidean ball Consider B = {a R d : a 1} (with the Euclidean norm). Ingredients: 1. Distribution ofx t, given a t : x t = ξ t a t a t +(1 ξ t)ǫ t e It, where ξ t is Bernoulli( a t ),ǫ t is uniform ±1, and I t is uniform on {1,...,d}, soe[x t a t ] = a t. 2. Estimate l t of lossl t : soe[ l t a t ] = l t. l t = d 1 ξ t 1 a t xt t l t x t, 71

72 Regret: Euclidean ball Theorem: Consider stochastic mirror descent on A = (1 γ)b, with these choices and R(a) = log(1 a ) a. Then for ηd 1/2, R n γn+ log(1/γ) η +η E For γ = 1/ n and η = logn/(2nd), R n 3 dnlogn. [ (1 a t ) l t 2]. Proof: R(a) = a/(1 a ). 72

73 Linear bandits Open question: What geometric properties of A and L determine the regret? A L R n convex l : A [ 1,1] Õ(d n) Õ( dn) d 1 1 Õ( dn) 1 {±e i : 1 i d} Ω(d n) 73

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Online Prediction Peter Bartlett

Online Prediction Peter Bartlett Online Prediction Peter Bartlett Statistics and EECS UC Berkeley and Mathematical Sciences Queensland University of Technology Online Prediction Repeated game: Cumulative loss: ˆL n = Decision method plays

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Combinatorial Bandits

Combinatorial Bandits Combinatorial Bandits Nicolò Cesa-Bianchi Università degli Studi di Milano, Italy Gábor Lugosi 1 ICREA and Pompeu Fabra University, Spain Abstract We study sequential prediction problems in which, at each

More information

Online learning with feedback graphs and switching costs

Online learning with feedback graphs and switching costs Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Regret in Online Combinatorial Optimization

Regret in Online Combinatorial Optimization Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Relative Loss Bounds for Multidimensional Regression Problems

Relative Loss Bounds for Multidimensional Regression Problems Relative Loss Bounds for Multidimensional Regression Problems Jyrki Kivinen and Manfred Warmuth Presented by: Arindam Banerjee A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

Online learning CMPUT 654. October 20, 2011

Online learning CMPUT 654. October 20, 2011 Online learning CMPUT 654 Gábor Bartók Dávid Pál Csaba Szepesvári István Szita October 20, 2011 Contents 1 Shooting Game 4 1.1 Exercises...................................... 6 2 Weighted Majority Algorithm

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

arxiv: v1 [cs.lg] 23 Jan 2019

arxiv: v1 [cs.lg] 23 Jan 2019 Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Adaptive Shortest-Path Routing under Unknown and Stochastically Varying Link States

Adaptive Shortest-Path Routing under Unknown and Stochastically Varying Link States Adaptive Shortest-Path Routing under Unknown and Stochastically Varying Link States Keqin Liu, Qing Zhao To cite this version: Keqin Liu, Qing Zhao. Adaptive Shortest-Path Routing under Unknown and Stochastically

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

LEARNING IN CONCAVE GAMES

LEARNING IN CONCAVE GAMES LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Fast Rates for Exp-concave Empirial Risk Minimization

Fast Rates for Exp-concave Empirial Risk Minimization Fast Rates for Exp-concave Empirial Risk Minimization Tomer Koren, Kfir Y. Levy Presenter: Zhe Li September 16, 2016 High Level Idea Why could we use Regularized Empirical Risk Minimization? Learning theory

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory. Lecture 14. CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Stat 260/CS Learning in Sequential Decision Problems.

Stat 260/CS Learning in Sequential Decision Problems. Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information