Efficient Online Bandit Multiclass Learning with Õ( T ) Regret

Size: px
Start display at page:

Download "Efficient Online Bandit Multiclass Learning with Õ( T ) Regret"

Transcription

1 with Õ ) Regret Alina Beygelzimer 1 Francesco Orabona Chicheng Zhang 3 Abstract We present an efficient second-order algorithm with Õ 1 ) 1 regret for the bandit online multiclass problem. he regret bound holds simultaneously with respect to a family of loss functions parameterized by, for a range of restricted by the norm of the competitor. he family of loss functions ranges from hinge loss = 0) to squared hinge loss = 1). his provides a solution to the open problem of Abernethy, J. and Rahlin, A. An efficient bandit algorithm for -regret in online multiclass prediction? In COL, 009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms. 1. Introduction In the online multiclass classification problem, the learner must repeatedly classify examples into one of classes. At each step t, the learner observes an example x t R d and predicts its label ỹ t [. In the full-information case, the learner observes the true label y t [ and incurs loss 1[ỹ t y t. In the bandit version of this problem, first considered in Kaade et al., 008), the learner only observes its incurred loss 1[ỹ t y t, i.e., whether or not its prediction was correct. Bandit multiclass learning is a special case of the general contextual bandit learning Langford & Zhang, 008) where exactly one of the losses is 0 and all other losses are 1 in every round. he goal of the learner is to minimize its regret with respect to the best predictor in some reference class of predictors, that is the difference between the total number of mistaes the learner maes and the total number of mis- 1 Yahoo Research, New Yor, NY Stony Broo University, Stony Broo, NY 3 University of California, San Diego, La Jolla, CA. Correspondence to: Alina Beygelzimer <beygel@yahooinc.com>, Francesco Orabona <francesco@orabona.com>, Chicheng Zhang <chichengzhang@ucsd.edu>. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 017. Copyright 017 by the authors). 1 Õ ) hides logarithmic factors. taes of the best predictor in the class. Kaade et al. 008) proposed a bandit modification of the Multiclass algorithm Duda & Hart, 1973), called the Banditron, that uses a reference class of linear predictors. Note that even in the full-information setting, it is difficult to provide a true regret bound. Instead, performance bounds are typically expressed in terms of the total multiclass hinge loss of the best linear predictor, a tight upper bound on 0-1 loss. he Banditron, while computationally efficient, achieves only O /3 ) expected regret with respect to this loss, where is the number of rounds. his is suboptimal as the Exp4 algorithm of Auer et al. 003) can achieve Õ ) regret for the 0-1 loss, albeit very inefficiently. Abernethy & Rahlin 009) posed an open problem: Is there an efficient bandit multiclass learning algorithm that achieves expected regret of Õ ) with respect to any reasonable loss function? he first attempt to solve this open problem was by Crammer & Gentile 013). Using a stochastic assumption about the mechanism generating the labels, they were able to show a Õ ) regret, with a second-order algorithm. Later, Hazan & Kale 011), following a suggestion by Abernethy & Rahlin 009), proposed the use of the logloss coupled with a softmax prediction. he softmax depends on a parameter that controls the smoothing factor. he value of this parameter determines the exp-concavity of the loss, allowing Hazan & Kale 011) to prove worst-case regret bounds that range between Olog ) and O 3 ), again with a second-order algorithm. However, the choice of the smoothing factor in the loss becomes critical in obtaining strong bounds. he original Banditron algorithm has been also extended in many ways. Wang et al. 010) have proposed a variant based on the exponentiated gradient algorithm Kivinen & Warmuth, 1997). Valizadegan et al. 011) proposed different strategies to adapt the exploration rate to the data in the Banditron algorithm. However, these algorithms suffer from the same theoretical shortcomings as the Banditron. here has been significant recent focus on developing efficient algorithms for the general contextual bandit problem Dudí et al., 011; Agarwal et al., 014; Rahlin & Sridharan, 016; Syrganis et al., 016a;b). While solving

2 a more general problem that does not mae assumptions on the structure of the reward vector or the policy class, these results assume that contexts or context/reward pairs are generated i.i.d., or the contexts to arrive are nown beforehand, which we do not assume here. In this paper, we follow a different route. Instead of designing an ad-hoc loss function that allows us to prove strong guarantees, we propose an algorithm that simultaneously satisfies a regret bound with respect to all the loss functions in a family of functions that are tight upper bounds to the 0-1 loss. he algorithm, named Second Order Banditron Algorithm SOBA), is efficient and based on the second-order algorithm Cesa-Bianchi et al., 005). he regret bound is of the order of Õ ), providing a solution to the open problem of Abernethy & Rahlin 009).. Definitions and Settings We first introduce our notation. Denote the rows of a matrix V R d by v 1, v,..., v. he vectorization of V is defined as vecv ) = [v 1, v,..., v, which is a vector in R d. We define the reverse operation of reshaping a d 1 vector into a d matrix by matv ), using a row-major order. o simplify notation, we will use V and vecv ) interchangeably throughout the paper. For matrices A and B, denote by A B their Kronecer product. For matrices X and Y of the same dimension, denote by X, Y = i,j X i,jy i,j their inner product. We use to denote the l norm of a vector, and F to denote the Frobenius norm of a matrix. For a positive definite matrix A, we use x A = x, Ax to denote the Mahalanobis norm of x with respect to A. We use 1 to denote the vector in R whose entries are all 1s. We use E t 1 [ to denote the conditional expectation given the observations up to time t 1 and x t, y t, that is, x 1, y 1, ỹ 1,..., x t 1, y t 1, ỹ t 1, x t, y t. Let [ denote {1,..., }, the set of possible labels. In our setting, learning proceeds in rounds: For t = 1,,..., : 1. he adversary presents an example x t R d to the learner, and commits to a hidden label y t [.. he learner predicts a label ỹ t p t, where p t Δ 1 is a probability distribution over [. 3. he learner receives the bandit feedbac 1[ỹ t y t. he goal of the learner is to minimize the total number of mistaes, M = 1[ỹ t y t. We will use linear predictors specified by a matrix W R d. he prediction is given by W x) = arg max i [ W x) i, where W x) i is the ith element of the vector W x, corresponding to class i. A useful notion to measure the performance of a competitor U R d is the multiclass hinge loss lu, x, y)) := max i y [1 Ux) y + Ux) i +, 1) where [ + = max, 0). 3. A History of Loss Functions As outlined in the introduction, a critical choice in obtaining strong theoretical guarantees is the choice of the loss function. In this section we introduce and motivate a family of multiclass loss functions. In the full information setting, strong binary and multiclass mistae bounds are obtained through the use of the algorithm Rosenblatt, 1958). A common misunderstanding of the algorithm is that it corresponds to a gradient descent procedure with respect to the binary or multiclass) hinge loss. However, it is well nown that the simultaneously satisfies mistae bounds that depend on the cumulative hinge loss and also on the cumulative squared hinge loss, see for example Mohri & Rostamizadeh 013). Note also that the squared hinge loss is not dominated by the hinge loss, so, depending on the data, one loss can be better than the other. We show that the algorithm satisfies an even stronger mistae bound with respect to the cumulative loss of any power of the multiclass hinge loss between 1 and. heorem 1. On any sequence x 1, y 1 ),..., x, y ) with x t X for all t [, and any linear predictor U R d, the total number of mistaes M of the multiclass satisfies, for any q [1,, M M 1 1 q L 1 q MH,q U) + U F X M, where L MH,q U) = lw, x t, y t )) q. In particular, it simultaneously satisfies the following: M L MH,1 U)+X U F +X U F L MH,1 U) M L MH, U)+X U F +X U F L MH, U). For the proof, see Appendix B. A similar observation was done by Orabona et al. 01) who proved a logarithmic mistae bound with respect to all loss functions in a similar family of functions smoothly interpolating between the hinge loss to the squared hinge loss. In particular, Orabona et al. 01) introduced the following family of binary loss functions l x) := { 1 x + x, x 1 0, x > 1. )

3 Figure 1. Plot of the loss functions in l for different values of. where 0 1. Note that = 0 recovers the binary hinge loss, and = 1 recovers the squared hinge loss. Meanwhile, for any 0 1, l x) max{l 0 x), l 1 x)}, and l is an upper bound on 0-1 loss: 1[x < 0 l x). See Figure 1 for a plot of the different functions in the family. Here, we define a multiclass version of the loss in ) as ) l U, x, y)) := l Ux) y max Ux) i. 3) i y Hence, l 0 U, x, y)) = lu, x, y)) is the classical multiclass hinge loss and l 1 U, x, y)) = l U, x, y)) is the squared multiclass hinge loss. Our algorithm has a Õ 1 ) regret bound that holds simultaneously for all loss functions in this family, with in a range that ensure that Ux) i Ux) j, i, j [. We also show that there exists a setting of the parameters of the algorithm that gives a mistae upper bound of ÕL ) 1/3 + ), where L is the cumulative hinge loss of the competitor, which is never worse that the best bounds in Kaade et al. 008). 4. Second Order Banditron Algorithm his section introduces our algorithm for bandit multiclass online learning, called Second Order Banditron Algorithm SOBA), described in Algorithm 1. SOBA maes a prediction using the γ-greedy strategy: At each iteration t, with probability 1 γ, it predicts ŷ t = arg max i [ W t x t ) i ; with the remaining probability γ, it selects a random action in [. As discussed in Kaade et al. 008), randomization is essential for designing bandit multiclass learning algorithms. If we deterministically output a label and mae a mistae, then it is hard to mae an update since we do not now the identity of y t. However, Algorithm 1 Second Order Banditron Algorithm SOBA) Input: Regularization parameter a > 0, exploration parameter γ [0, 1. 1: Initialization: W 1 = 0, A 0 = ai, θ 0 = 0 : for t = 1,,..., do 3: Receive instance x t R d 4: ŷ t = arg max i [ W t x t ) i 5: Define p t = 1 γ)eŷt + γ 1 6: Randomly sample ỹ t according to p t 7: Receive bandit feedbac 1[ỹ t y t 8: Initialize update indicator n t = 0 9: if ỹ t = y t then 10: ȳ t = arg max i [\{yt}w t x t ) i 11: g t = 1 eȳt e yt ) x t 1: z t = g t 13: m t = Wt, zt + W t, g t 1+zt A 1 t 1 zt 14: if m t + t 1 s=1 n sm s 0 then 15: urn on update indicator n t = 1 16: end if 17: end if 18: Update A t = A t 1 + n t z t zt 19: Update θ t = θ t 1 n t g t 0: Set W t+1 = mata 1 t θ t ) 1: end for Remar: matrix A t is of dimension d d, and vector θ t is of dimension d; in line 0, the matrix multiplication results in a d dimensional vector, which is reshaped to matrix W t+1 of dimension d. if randomization is used, we can estimate y t and perform online stochastic mirror descent type updates Bubec & Cesa-Bianchi, 01). SOBA eeps trac of two model parameters: cumulative -style updates θ t = t s=1 n sg s R d and corrected covariance matrix A t = ai + t s=1 n sz s zs R d d. he classifier W t is computed by matricizing over the matrix-vector product A 1 t 1 θ t 1 R d. he weight vector θ t is standard in designing online mirror descent type algorithms Shalev-Shwartz, 011; Bubec & Cesa- Bianchi, 01). he matrix A t is standard in designing online learning algorithms with adaptive regularization Cesa- Bianchi et al., 005; Crammer et al., 009; McMahan & Streeter, 010; Duchi et al., 011; Orabona et al., 015). he algorithm updates its model n t = 1) only when the following conditions hold simultaneously: 1) the predicted label is correct ỹ t = y t ), and ) the cumulative regularized negative margin t 1 s=1 n sm s +m t ) is positive if this update were performed. Note that when the predicted label is correct we now the identity of the true label. As we shall see, the set of iterations where n t = 1 includes all iterations where ỹ t = y t ŷ t. his fact is cru-

4 cial to the mistae bound analysis. Furthermore, there are some iterations where ỹ t = y t = ŷ t but we still mae an update. his idea is related to online passive-aggressive algorithms Crammer et al., 006; 009) in the full information setting, where the algorithm maes an update even when it predicts correctly but the margin is too small. Let s now describe our algorithm more in details. hroughout, suppose all the examples are l -bounded: x t X. As outlined above, we associate a time-varying regularizer R t W ) = 1 W A t, where A t = ai + t s=1 n sz s zs is a d d matrix and z t = g t = 1 pt,yt eȳt e yt ) x t. Note that this time-varying regularizer is constructed by scaled versions of the updates g t. his is critical, because in expectation this becomes the correct regularizer. Indeed, it is easy to verify that, for any U R d, E t 1 [1[y t = ỹ t g t = eȳt e yt ) x t, E t 1 [1[y t = ỹ t U, z t = U, eȳt e yt ) x t. In words, this means that in expectation the regularizer contains the outer products of the updates, that in turn promote the correct class and demotes the wrong one. We stress that it is impossible to get the same result with the estimator proposed in Kaade et al. 008). Also, the analysis is substantially different from the Online Newton Step approach Hazan et al., 007) used in Hazan & Kale 011). In reality, we do not mae an update in all iterations in which ỹ t = y t, since the algorithm need to maintain the invariant that t s=1 m sn s 0, which is crucial to the proof of Lemma. Instead, we prove a technical lemma that gives an explicit form on the expected update n t g t and expected regularization n t z t zt. Define [ t 1 q t := 1 n s m s + m t 0, s=1 h t := 1[ŷ t y t + q t 1[ŷ t = y t. Lemma 1. For any U R d, E t 1 [n t U, g t = h t U, e yt eȳt ) x t, E t 1 [n t U, z t = h t U, e yt eȳt ) x t. he proof of Lemma 1 is deferred to the end of Subsection 4.1. Our last contribution is to show how our second order algorithm satisfies a mistae bound for an entire family of loss functions. Finally, we relate the performance of the algorithm that predicts ŷ t to the γ-greedy algorithm. Putting all together, we have our expected mistae bound for SOBA. heorem. SOBA has the following expected upper bound on the number of mistaes, M, for any U R d and any 0 < min1, max i u i X+1 ), E [M L U) + a U F + γ ) E [ zt A 1 t z t + γ, where L U) := l U, x t, y t )) is the cumulative -loss of the linear predictor U, and {u i } i=1 are rows of U. In particular, setting γ = O d ln ) and a = X, we have E [M L U) + O X U F + ) d ln. Note that, differently from previous analyses Kaade et al., 008; Crammer & Gentile, 013; Hazan & Kale, 011), we do not need to assume a bound on the norm of the competitor, as in the full information and Second Order algorithms. In Appendix A, we also present an adaptive variant of SOBA that sets exploration rate γ t dynamically, which achieves a regret bound within a constant factor of that using optimal tuning of γ. We prove heorem in the next Subsection, while in Subsection 4. we prove a mistae bound with respect to the hinge loss, that is not fully covered by heorem Proof of heorem hroughout the proofs, U, W t, g t, and z t s should be thought of as d 1 vectors. We first show the following lemma. Note that this is a statement over any sequence and no expectation is taen. Lemma. For any U R d, with the notation of Algorithm 1, we have: n t U, g t U, z t ) a U F + n t g t A 1 t g t. hroughout the paper, expectations are taen with respect to the randomization of the algorithm.

5 Proof. First, from line 14 of Algorithm 1, it can be seen by induction) that SOBA maintains the invariant that t n s m s 0. 4) s=1 We next reduce the proof to the regret analysis of online least squares problem. For iterations where n t = 1, define α t = 1 pt,yt so that g t = α t z t. From the algorithm, A t = ai + t s=1 n sz s zs, and W t is the ridge regression solution based on data collected in time 1 to t 1, i.e. W t = A 1 t 1 t 1 s=1 n sg s ) = A 1 t 1 t 1 s=1 n sα s z s ). By per-step analysis in online least squares, see, e.g., Orabona et al., 01)See Lemma 6 for a proof), we have that if an update is made at iteration t, i.e. n t = 1, then 1 W t, z t + α t ) 1 z t A 1 t z t ) 1 U, z t + α t ) 1 U W t A t 1 1 U W t+1 A t. Otherwise n t = 0, in which case we have W t+1 = W t and A t+1 = A t. Denoting by t = 1 zt A 1 t z t, by Sherman-Morrison 1 formula, t =. Summing over all rounds t 1+zt A 1 t 1 zt [ such that n t = 1, 1 [ n t Wt, z t + α t ) t U, z t + α t ) 1 U A 0 1 U W +1 A a U F. We also have by definition of m t, W t, z t + α t ) t U, z t + α t ) = m t U, g t U, z t α t z t A 1 t z t. Putting all together and using the fact that n tm t 0, we have the stated bound. We can now prove the following mistae bound for the prediction ŷ t, defined as ˆM := 1[ŷ t y t. heorem 3. For any U R d, and any 0 < min1, max i u i X+1 ), the expected number of mistaes committed by ŷ t can be bounded as [ E ˆM L U) + a U F L U) + a U F + E[n tzt A 1 t z t γ ) ) d ln γ ) X a d where L U) := l U, x t, y t )) is the -loss of the linear predictor U., Proof. Using Lemma with U, we get that n t U, g t U, z t ) a U F + n t g t A 1 t g t. aing expectations, using Lemma 1 and the fact that γ and that A t is positive definite, we have 1 [ 0 E h t U, e yt eȳt ) x t [ + E h t U, e yt eȳt ) x t ) + a U F + γ E[n t z t A 1 t z t. [ Add the terms )E h t to both sides and divide both sides by ), to have [ [ E h t E h t f U, e yt eȳt ) x t ) + a U F + γ ) E[n t z t A 1 t z t, where fz) := 1 z + z. aing a close loo at the function f, we observe that the two roots of the quadratic function are 1 and, respectively. Setting 1, the function is negative in 1, and positive in, 1. Additionally, if 0 < max i u, then i X+1 for all i, j [, U, e i e j ) x t. herefore, we have that f U, e yt eȳt ) x t ) = fux t ) yt Ux t )ȳt ) l Ux t ) yt Ux t )ȳt ) ) l Ux t ) yt maxux t ) r r y t 5) = l U, x t, y t )). where the first equality is from algebra, the first inequality is from that f ) l ) in,, the second inequality is from that l ) is monotonically decreasing. Putting together the two constraints on, and noting that ˆM h t, we have the first bound. he second statement follows from Lemma 3 below.

6 Lemma 3. If d 1,,, then ) E[n t zt A 1 t z t d ln 1 + X. a d Specifically, if a = X, the right hand side is d ln. Proof. Observe that n t zt A 1 t z t ln A A 0 d ln 1 + X a d 1[ỹ t=y t, where the first inequality is a well-nown fact from linear algebra e.g. Hazan et al., 007, Lemma 11). Given that the A is d d, the second inequality comes from the fact that A is maximized when all its eigenvalues are equal nt zt to tra ) d = a + d a + X Finally, using Jensen s inequality, we have that, ) E[n t zt A 1 t z t d ln 1 + X. a d 1[ỹ t =y t d. If a = X, then the right hand side is d ln1+ d ), which is at most d ln under the conditions on d,,. Proof of heorem. Observe that by triangle inequality, 1[ỹ t y t 1[ỹ t ŷ t + 1[y t ŷ t. Summing over t, taing expectation on both sides, we conclude that E[M E[ ˆM + γ. 6) he first statement follows from combining the above inequality with heorem 3. For the second statement, first note that from heorem 3, and Equation 6), we have d ln 1 + E[M L U) + a U F + γ ) L U) + X U F + d ln γ ) X a d + γ, + γ where the second inequality is from that 1, and Lemma 3 with a = X. he statement is concluded by the setting of γ = O d ln ). Proof of Lemma 1. We show the lemma in two steps. Let G t := q t 1[y t = ỹ t = ŷ t, and H t := 1[y t = ỹ t ŷ t. First, we show that n t = G t + H t. Recall that SOBA maintains the invariant 4), hence t 1 s=1 n sm s 0. From line 14 of SOBA, we see that n t = 1 only if ỹ t = y t. Now consider two cases: y t = ỹ t ŷ t. In this case, ȳ t = ŷ t, therefore W t, g t 0, maing m t 0. his implies that t 1 s=1 n sm s + m t 0, guaranteeing n t = 1. y t = ỹ t = ŷ t. In this case, n t is set to 1 if and only if q t = 1, i.e. t 1 s=1 n sm s + m t 0. his gives that n t = G t + H t. Second, we have the following two equalities: E t 1 [H t U, g t [ 1[ỹt = y t = E t 1 1[ŷ t y t U, e yt eȳt ) x t = 1[ŷ t y t U, e yt eȳt ) x t, E t 1 [G t U, g t [ 1[ỹt = y t = E t 1 1[ŷ t = y t q t U, e yt eȳt ) x t = 1[ŷ t y t q t U, e yt eȳt ) x t. he first statement follows from adding up the two equalities above. he proof for the second statement is identical, except replacing U, e yt eȳt ) x t with U, e yt eȳt ) x t. 4.. Fall-Bac Analysis he loss function l is an interpolation between the hinge and the squared hinge losses. Yet, the bound becomes vacuous for = 0. Hence, in this section we show that SOBA also guarantees a ÕL 0U) ) 1/3 + ) mistae bound w.r.t. L 0 U), the multiclass hinge loss of the competitor, assuming L 0 U) is nown. hus the algorithm achieves a mistae guarantee no worse than the sharpest bound implicit in Kaade et al. 008). heorem 4. Set a = X and denote by M the number of mistaes done by SOBA. hen SOBA has the following guarantees: 3 1. If L 0 U) U F + 1) d X ln, then with parameter setting γ = min1, d X L 0U) ln ) 1/3 ), one has the following expected mistae bound: E[M L 0 U) + O U F d X L 0 U) ln ) 1/3). 3 Assuming the nowledge of U F it would be possible to reduce the dependency on U F in both bounds. However such assumption is extremely unrealistic and we prefer not to pursue it.

7 . If L 0 U) < U F + 1) d X ln, then with parameter setting γ = min1, d X ln ) 1/ ), one has the following expected mistae bound: E[M L 0 U) + O U F + 1)X ) d ln. where L 0 U) := l 0U, x t, y t )) is the hinge loss of the linear classifier U. Proof. Recall that ˆM the mistaes made by ŷ t, that is 1[ŷ t y t. Adding to both sides of 5) the term E[ h t and dividing both sides by, and plugging a = X, we get that for all > 0, [ [ E h t E h t 1 U, e yt eȳt ) x t ) + h t U, e ȳ t e yt ) x t + X U F + d γ ln [ ) E l 0 U, x t, y t )) + h t + 1 U F X + d γ ln. where the first inequality uses Lemma 3, the second inequality is from Cauchy-Schwarz that U, e yt eȳt ) x t U F e yt eȳt ) x t U F X and that 1 U, eyt eȳt ) x t ) lu, x t, y t )). d γ ln aing =, we have U F E[ t ht+ 1 )X [ E h t L 0 U) [ ) + U F E h t + 1 d X ln γ [ ) L 0 U) + U F E h t + 1 d X ln, γ where the last inequality is due to the elementary inequality c + d c + d, and the setting of a = X. Solving the inequality and using the fact that E[M E[ ˆM + γ E[ h t + γ, we have E[M L 0 U) + γ + O d U F X ln + L 0 U) d U F X ln γ γ. he theorem follows from Lemma 5 in Appendix B, taing U = U F, H = d X ln, L = L 0 U). 5. Empirical Results We tested SOBA to empirically validate the theoretical findings. We used three different datasets from Kaade et al. 008): SynSep, SynNonSep, Reuters4. he first two are synthetic, with 10 6 samples in R 400 and 9 classes. SynSep is constructed to be linearly separable, while SynNonSep is the same dataset with 5% random label noise. Reuters4 is generated from the RCV1 dataset Lewis et al., 004), extracting the 665,65 examples that have exactly one label from the set {CCA, ECA, GCA, MCA}. It contains 47,36 features. We also report the performance on Covtype from LibSVM repository. 4 We report averages over 10 different runs. SOBA, as the Newtron algorithm, has a quadratic complexity in the dimension of the data, while the Banditron and the algorithm are linear. Following the long tradition of similar algorithms Crammer et al., 009; Duchi et al., 011; Hazan & Kale, 011; Crammer & Gentile, 013), to be able to run the algorithm on large datasets, we have implemented an approximated diagonal version of SOBA, named SOBAdiag. It eeps in memory just the diagonal of the matrix A t. Following Hazan & Kale 011), we have tested only algorithms designed to wor in the fully adversarial setting. Hence, we tested the Banditron and the PNewtron, the diagonal version of the Newtron algorithm in Hazan & Kale 011). he multiclass algorithm was used as a full-information baseline. In the experiments, we only changed the exploration rate γ, leaving fixed all the other parameters the algorithms might have. In particular, for the PNewtron we set α = 10, β = 0.01, and D = 1, as in Hazan & Kale 011). In SOBA, a is fixed to 1 in all the experiments. We explore the effect of the exploration rate γ in the first row of Figure 5. We see that the PNewtron algorithm, 5 thans to the exploration based on the softmax prediction, can achieve very good performance for a wide range of γ. It is important to note that SOBAdiag has good performance on all four datasets for a value of γ close to 1%. For bigger values, the performance degrades because the best possible is lower bounded by 1 γ due to explo- 4 cjlin/ libsvmtools/datasets/ 5 We were unable to mae the PNewtron wor on Reuters4. For any setting of γ the is never better than 57%. he reason might be that the dataset RCV1 has 47,36 features, while the one reported in Kaade et al. 008); Hazan & Kale 011) has 346,810, hence the optimal setting of the 3 other parameters of PNewtron might be different. For this reason we prefer not to report the performance of PNewtron on Reuters4.

8 SynSep Banditron PNewtron SOBAdiag SynNonSep Banditron PNewtron SOBAdiag Covtype Banditron PNewtron SOBAdiag Reuters4 Banditron SOBAdiag SynSep 10 0 SynNonSep Banditron, = -6 PNewtron, = -13 SOBAdiag, = Covtype Banditron, = -5 PNewtron, = -9 SOBAdiag, = Reuters4 Banditron, = -5 SOBAdiag, = Banditron, = -13 PNewtron, = -13 SOBAdiag, = number of examples number of examples number of examples number of examples Figure. Error rates vs. the value of the exploration rate γ top row) and vs. the number examples bottom row). he x-axis is logarithmic in all the plots, while the y-axis is logarithmic in the plots in the second row. Figure best viewed in colors. ration. For smaller values of exploration, the performance degrades because the algorithm does not update enough. In fact, SOBA updates only when ỹ t = y t, so when γ is too small the algorithms does not explore enough and remains stuc around the initial solution. Also, SOBA requires an initial number of updates to accumulate enough negative terms in the t n tm t in order to start updating also when ŷ t is correct but the margin is too small. he optimal setting of γ for each algorithm was then used to generate the plots in the second row of Figure 5, where we report the over time. With the respective optimal setting of γ, we note that the performance of PNewtron does not seem better than the one of the Multiclass algorithm, and on par or worse to the Banditron s one. On the other hand, SOBAdiag has the best performance among the bandits algorithms on 3 datasets out of 4. he first dataset, SynSep, is separable and with their optimal setting of γ, all the algorithms converge with a rate of roughly O 1 ), as can be seen from the log-log plot, but the bandit algorithms will not converge to zero, but to 1 γ. However, SOBA has an initial phase in which the is high, due to the effect mentioned above. On the second dataset, SynNonSep, SOBAdiag outperforms all the other algorithms including the fullinformation ), achieving an close to the noise level of 5%. his is due to SOBA being a secondorder algorithm, while the is a first-order algorithm. A similar situation is observed on Covtype. On the last dataset, Reuters4, SOBAdiag achieves performance better than the Banditron. 6. Discussion and Future Wor In this paper, we study the problem of online multiclass learning with bandit feedbac. We propose SOBA, an algorithm that achieves a regret of Õ 1 ) with respect to - loss of the competitor. his answers a COL open problem posed by Abernethy & Rahlin, 009). Its ey ideas are to apply a novel adaptive regularizer in a second order online learning algorithm, coupled with updates only when the predictions are correct. SOBA is shown to have competitive performance compared to its precedents in synthetic and real datasets, in some cases even better than the fullinformation algorithm. here are several open questions we wish to explore: 1. Is it possible to design efficient algorithms with mistae bounds that depend on the loss of the competitor, i.e. E[M L U) + Õ dl U) + d)? his type of bound occurs naturally in the full information multiclass online learning setting, see e.g. heorem 1), or in multiarmed bandit setting, e.g. Neu, 015).. Are there efficient algorithms that have a finite mistae bound in the separable case? Kaade et al., 008) provides an algorithm that performs enumeration and plurality vote to achieve a finite mistae bound in the finite dimensional setting, but unfortunately the algorithm is impractical. Notice that it is easy to show that in SOBA ŷ t maes a logarithmic number of mistaes in the separable case, with a constant rate of exploration, yet it is not clear how to decrease the exploration over time in order to get a logarithmic number of mistaes for ỹ t. Acnowledgments. We than Claudio Gentile for suggesting the original plan of attac for this problem, and than the anonymous reviewers for thoughtful comments.

9 References Abernethy, J. and Rahlin, A. An efficient bandit algorithm for -regret in online multiclass prediction? In COL, 009. Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., and Schapire, R. E. aming the monster: a fast and simple algorithm for contextual bandits. ICML, 014. Auer, P., Cesa-Bianchi, N., and Gentile, C. Adaptive and self-confident on-line learning algorithms. J. Comput. Syst. Sci., 641):48 75, 00. Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. he nonstochastic multiarmed bandit problem. SIAM J. Comput., 31):48 77, January 003. Bubec, S. and Cesa-Bianchi, N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. FnML, 51):1 1, 01. Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge University Press, 006. Cesa-Bianchi, N., Conconi, A., and Gentile, C. A secondorder algorithm. SIAM Journal on Computing, 343): , 005. Crammer, K. and Gentile, C. Multiclass classification with bandit feedbac using adaptive regularization. Machine learning, 903): , 013. Crammer, K., Deel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. Online passive-aggressive algorithms. JMLR, 7Mar): , 006. Crammer, K., Kulesza, A., and Dredze, M. Adaptive regularization of weight vectors. In NIPS, pp , 009. Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 1Jul):11 159, 011. Duda, R. O. and Hart, P. E. Pattern classification and scene analysis. John Wiley, Dudí, M., Hsu, D. J., Kale, S., Karampatziais, N., Langford, J., Reyzin, L., and Zhang,. Efficient optimal learning for contextual bandits. In UAI 011, pp , 011. Hazan, E. and Kale, S. Newtron: an efficient bandit algorithm for online multiclass prediction. In NIPS, pp , 011. Hazan, E., Agarwal, A., and Kale, S. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69-3):169 19, 007. Kaade, S. M., Shalev-Shwartz, S., and ewari, A. Efficient bandit algorithms for online multiclass prediction. In ICML, pp ACM, 008. Kivinen, J. and Warmuth, M. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 131):1 63, January Langford, J. and Zhang,. he epoch-greedy algorithm for multi-armed bandits with side information. In NIPS 0, pp Lewis, D. D., Yang, Y., Rose,. G., and Li, F. RCV1: A new benchmar collection for text categorization research. JMLR, 5Apr): , 004. McMahan, H Brendan and Streeter, Matthew. Adaptive bound optimization for online convex optimization. COL, 010. Mohri, M. and Rostamizadeh, A. mistae bounds. arxiv preprint arxiv: , 013. Neu, G. First-order regret bounds for combinatorial semibandits. In COL, pp , 015. Orabona, F., Cesa-Bianchi, N., and Gentile, C. Beyond logarithmic bounds in online learning. In AISAS, pp , 01. Orabona, F., Crammer, K., and Cesa-Bianchi, N. A generalized online mirror descent with applications to classification and regression. Machine Learning, 993): , 015. Rahlin, A. and Sridharan, K. BISRO: An efficient relaxation-based method for contextual bandits. In ICML, 016. Rosenblatt, F. he : A probabilistic model for information storage and organization in the brain. Psychological review, 656): , Shalev-Shwartz, S. Online learning and online convex optimization. FnML, 4): , 011. Syrganis, Vasilis, Krishnamurthy, Ashay, and Schapire, Robert. Efficient algorithms for adversarial contextual learning. In ICML, pp , 016a. Syrganis, Vasilis, Luo, Haipeng, Krishnamurthy, Ashay, and Schapire, Robert E. Improved regret bounds for oracle-based adversarial contextual bandits. In NIPS, pp , 016b. Valizadegan, H., Jin, R., and Wang, S. Learning to trade off between exploration and exploitation in multiclass bandit prediction. In KDD, pp ACM, 011.

10 Wang, S., Jin, R., and Valizadegan, H. A potential-based framewor for online multi-class learning with partial feedbac. In AISAS, pp , 010. Efficient Online Bandit Multiclass Learning

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Sham M. Kakade Shai Shalev-Shwartz Ambuj Tewari Toyota Technological Institute, 427 East 60th Street, Chicago, Illinois 60637, USA sham@tti-c.org shai@tti-c.org tewari@tti-c.org Keywords: Online learning,

More information

Better Algorithms for Selective Sampling

Better Algorithms for Selective Sampling Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are

More information

Online multiclass learning with bandit feedback under a Passive-Aggressive approach

Online multiclass learning with bandit feedback under a Passive-Aggressive approach ESANN 205 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 22-24 April 205, i6doc.com publ., ISBN 978-28758704-8. Online

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Optimal and Adaptive Algorithms for Online Boosting

Optimal and Adaptive Algorithms for Online Boosting Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Contextual semibandits via supervised learning oracles

Contextual semibandits via supervised learning oracles Contextual semibandits via supervised learning oracles Akshay Krishnamurthy Alekh Agarwal Miroslav Dudík akshay@cs.umass.edu alekha@microsoft.com mdudik@microsoft.com College of Information and Computer

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Multiclass Classification with Bandit Feedback using Adaptive Regularization

Multiclass Classification with Bandit Feedback using Adaptive Regularization Multiclass Classification with Bandit Feedback using Adaptive Regularization Koby Crammer The Technion, Department of Electrical Engineering, Haifa, 3000, Israel Claudio Gentile Universita dell Insubria,

More information

Online and Stochastic Learning with a Human Cognitive Bias

Online and Stochastic Learning with a Human Cognitive Bias Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Online and Stochastic Learning with a Human Cognitive Bias Hidekazu Oiwa The University of Tokyo and JSPS Research Fellow 7-3-1

More information

A Time and Space Efficient Algorithm for Contextual Linear Bandits

A Time and Space Efficient Algorithm for Contextual Linear Bandits A Time and Space Efficient Algorithm for Contextual Linear Bandits José Bento, Stratis Ioannidis, S. Muthukrishnan, and Jinyun Yan Stanford University, Technicolor, Rutgers University, Rutgers University

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Generalized Boosting Algorithms for Convex Optimization

Generalized Boosting Algorithms for Convex Optimization Alexander Grubb agrubb@cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 523 USA Abstract Boosting is a popular way to derive powerful

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Learning to Trade Off Between Exploration and Exploitation in Multiclass Bandit Prediction

Learning to Trade Off Between Exploration and Exploitation in Multiclass Bandit Prediction Learning to Trade Off Between Exploration and Exploitation in Multiclass Bandit Prediction Hamed Valizadegan Department of Computer Science University of Pittsburgh Pittsburgh, USA hamed@cs.pitt.edu Rong

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Learning for Contextual Bandits

Learning for Contextual Bandits Learning for Contextual Bandits Alina Beygelzimer 1 John Langford 2 IBM Research 1 Yahoo! Research 2 NYC ML Meetup, Sept 21, 2010 Example of Learning through Exploration Repeatedly: 1. A user comes to

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Learning with Exploration

Learning with Exploration Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Subgradient Methods for Maximum Margin Structured Learning

Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing

More information

A simpler unified analysis of Budget Perceptrons

A simpler unified analysis of Budget Perceptrons Ilya Sutskever University of Toronto, 6 King s College Rd., Toronto, Ontario, M5S 3G4, Canada ILYA@CS.UTORONTO.CA Abstract The kernel Perceptron is an appealing online learning algorithm that has a drawback:

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University, 353 Serra Street, Stanford, CA 94305 USA JSTEINHARDT@CS.STANFORD.EDU PLIANG@CS.STANFORD.EDU Abstract We present an adaptive variant of the exponentiated

More information

Coactive Learning for Locally Optimal Problem Solving

Coactive Learning for Locally Optimal Problem Solving Coactive Learning for Locally Optimal Problem Solving Robby Goetschalckx, Alan Fern, Prasad adepalli School of EECS, Oregon State University, Corvallis, Oregon, USA Abstract Coactive learning is an online

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Gambling in a rigged casino: The adversarial multi-armed bandit problem Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at

More information

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Linear Multi-Resource Allocation with Semi-Bandit Feedback Linear Multi-Resource Allocation with Semi-Bandit Feedbac Tor Lattimore Department of Computing Science University of Alberta, Canada tor.lattimore@gmail.com Koby Crammer Department of Electrical Engineering

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904,

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Slides at:

Slides at: Learning to Interact John Langford @ Microsoft Research (with help from many) Slides at: http://hunch.net/~jl/interact.pdf For demo: Raw RCV1 CCAT-or-not: http://hunch.net/~jl/vw_raw.tar.gz Simple converter:

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

Imitation Learning by Coaching. Abstract

Imitation Learning by Coaching. Abstract Imitation Learning by Coaching He He Hal Daumé III Department of Computer Science University of Maryland College Park, MD 20740 {hhe,hal}@cs.umd.edu Abstract Jason Eisner Department of Computer Science

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

Discover Relevant Sources : A Multi-Armed Bandit Approach

Discover Relevant Sources : A Multi-Armed Bandit Approach Discover Relevant Sources : A Multi-Armed Bandit Approach 1 Onur Atan, Mihaela van der Schaar Department of Electrical Engineering, University of California Los Angeles Email: oatan@ucla.edu, mihaela@ee.ucla.edu

More information

Minimax Fixed-Design Linear Regression

Minimax Fixed-Design Linear Regression JMLR: Workshop and Conference Proceedings vol 40:1 14, 2015 Mini Fixed-Design Linear Regression Peter L. Bartlett University of California at Berkeley and Queensland University of Technology Wouter M.

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Lecture 10 : Contextual Bandits

Lecture 10 : Contextual Bandits CMSC 858G: Bandits, Experts and Games 11/07/16 Lecture 10 : Contextual Bandits Instructor: Alex Slivkins Scribed by: Guowei Sun and Cheng Jie 1 Problem statement and examples In this lecture, we will be

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Online Passive-Aggressive Active Learning

Online Passive-Aggressive Active Learning Noname manuscript No. (will be inserted by the editor) Online Passive-Aggressive Active Learning Jing Lu Peilin Zhao Steven C.H. Hoi Received: date / Accepted: date Abstract We investigate online active

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

arxiv: v1 [cs.lg] 1 Jun 2016

arxiv: v1 [cs.lg] 1 Jun 2016 Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits arxiv:1606.00313v1 cs.g 1 Jun 2016 Vasilis Syrgkanis Microsoft Research Haipeng uo Princeton University Robert E. Schapire Microsoft

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Bandit Online Learning on Graphs via Adaptive Optimization

Bandit Online Learning on Graphs via Adaptive Optimization Bandit Online Learning on Graphs via Adaptive Optimization Peng Yang, Peilin Zhao 2,3 and Xin Gao King Abdullah University of Science and Technology, Saudi Arabia 2 South China University of Technology,

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information