Efficient Minimax Strategies for Square Loss Games

Size: px
Start display at page:

Download "Efficient Minimax Strategies for Square Loss Games"

Transcription

1 Efficient Minimax Strategies for Square Loss Games Wouter M. Koolen Queensland University of Technology and UC Berkeley Alan Malek University of California, Berkeley Peter L. Bartlett University of California, Berkeley and Queensland University of Technology Abstract We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. We derive the minimax solutions for the case where the prediction and action spaces are the simplex this setup is sometimes called the Brier game and the l ball this setup is related to Gaussian density estimation. We show that in both cases the value of each sub-game is a quadratic function of a simple statistic of the state, with coefficients that can be efficiently computed using an explicit recurrence relation. The resulting deterministic minimax strategy and randomized maximin strategy are linear functions of the statistic. Introduction We are interested in general strategies for sequential prediction and decision making a.k.a. online learning that improve their performance with experience. Since the early days of online learning, people have formalized such learning tasks as regret games. The learner interacts with an adversarial environment with the goal of performing almost as well as the best strategy from some fixed reference set. In many cases, we have efficient algorithms with an upper bound on the regret that meets the game-theoretic lower bound up to a small constant factor. In a few special cases, we have the exact minimax strategy, meaning that we understand the learning problem at all levels of detail. In even fewer cases we can also efficiently execute the minimax strategy. These cases serve as exemplars to guide our thinking about learning algorithms. In this paper we add two interesting examples to the canon of efficiently computable minimax strategies. Our setup, as described in Figure, is as follows. The Learner and the Adversary play vectors a A and x X, upon which the Learner is penalized using the squared Euclidean distance a x or its generalization, the squared Mahalanobis distance, a x W = a x W a x, parametrized by a symmetric matrix W 0. After a sequence of T such interactions, we compare the loss of the Learner to the loss of the best fixed prediction a A. In all our examples, this best fixed action in hindsight is the mean outcome a = T T t= x t, regardless of W. We use regret, the difference between the loss of the learner and the loss of a, to evaluate performance. The minimax regret for the T -round game, also known as the value of the game, is given by V := a x a T x T t= a t x t W a t= a x t W

2 where the a t range over actions A and the x t range over outcomes X. The minimax strategy chooses the a t, given all past outcomes x,..., x t, to achieve this regret. Intuitively, the minimax regret is the regret if both players play optimally while assuming the other player is doing the same. Our first example is the Brier game, where the action and outcome spaces are the probability simplex with K outcomes. The Brier game is traditionally popular in meteorology [Bri50]. Our second example is the ball game, where the action and outcome spaces are the Euclidean norm ball, i.e. A = X = {x R K x = }. Even though we measure loss by the squared Mahalanobis distance, we play on the standard Euclidean norm ball. The ball game is related to Gaussian density estimation [TW00]. In each case we exhibit a strategy that can play a T -round game in OT K time. The algorithm spends OT K + K 3 time pre-processing the game, and then plays in OK time per round. Outline Given: T, W, A, X. For t =,,..., T Learner chooses prediction a t A Adversary chooses outcome x t X Learner incurs loss a t x t W. Figure : Protocol We define our loss using the squared Mahalanobis distance, parametrized by a symmetric matrix W 0. We recover the squared Euclidean distance by choosing W = I. Our games will always last T rounds. For some observed data x,..., x n, the value-to-go for the remaining T n rounds is given by V x,..., x n := a n+ x n+ a T x T t=n+ a t x t W a t= a x t W. By definition, the minimax regret is V = V ɛ where ɛ is the empty sequence, and the value-togo satisfies the recurrence { T V x,..., x n = a t= a x t W if n = T, an+ xn+ a n+ x n+ W + V x,..., x n+ if n < T. Our analysis for the two games proceeds in a similar manner. For some past history of plays x,..., x n of length n, we summarize the state by s = n t= x t and σ = n t= x t W x t. As we will see, the value-to-go after n of T rounds can be written as V s, σ, n; i.e. it only depends on the past plays through s and σ. More surprisingly, for each n, the value-to-go V s, σ, n is a quadratic function of s and a linear function of σ under certain conditions on W. While it is straightforward to see that the terminal value V s, σ, T is quadratic in the state this is easily checked by computing the loss of the best expert and using the first case of Equation, it is not at all obvious that propagating from V s + x, σ + x W x, n + to V s, σ, n, using the second case of, preserves this structure. This compact representation of the value-function is an essential ingredient for a computationally feasible algorithm. Many minimax approaches, such as normalized maximum likelihood [Sht87], have computational complexities that scale exponentially with the time horizon. We derive a strategy that can play in constant amortized time. Why is this interesting? We go beyond previous work in a few directions. First, we exhibit two new games that belong to the tiny class admitting computationally feasible minimax algorithms. Second, we consider the setting with squared Mahalanobis loss which allows the user intricate control over the penalization of different prediction errors. Our results clearly show how the learner should exploit this prioritization.. Related work Repeated games with minimax strategies are frequently studied [CBL06] and, in online learning, minimax analysis has been applied to a variety of losses and repeated games; however, computa-

3 tionally feasible algorithms are the exception, not the rule. For example, consider log loss, first discussed in [Sht87]. Whiile the minimax algorithm, Normalized Maximum Likelihood, is well known [CBL06], it generally requires computation that is exponential in the time horizon as one needs to aggregate over all data sequences. To our knowledge, there are two exceptions where efficient NML forecasters are possible: the multinomial case where fast Fourier transforms may be exploited [KM05], and very particular exponential families that cause NML to be a Bayesian strategy [HB], [BGH + 3]. The minimax optimal strategy is known also for: i the ball game with W = I [TW00] our generalization to Mahalanobis W I results in fundamentally different strategies, ii the ball game with W = I and a constraint on the player s deviation from the current empirical minimizer [ABRT08] for which the optimal strategy is Follow-the-Leader, iii Lipschitz-bounded convex loss functions [ABRT08], iv experts with an L bound [AWY08], and v static experts with absolute loss [CBS]. While not guaranteed to be an exhaustive list, the previous paragraph demonstrates the rarity of tractable minimax algorithms. 3 The Offline Problem The regret is defined as the difference between the loss of the algorithm and the loss of the best action in hindsight. Here we calculate that action and its loss. Lemma 3.. Suppose A convx this will always hold in the settings we study. For data x,..., x T X, the loss of the best action in hindsight equals a A t= a x t W = T x t W x t T T x t W x t, 3 T t= and the minimizer is the mean outcome a = T T t= x t. Proof. The unconstrained minimizer and value are obtained by equating the derivative to zero and plugging in the solution. The assumption A convx ensures that the constraint a A is inactive. The best action in hindsight is curiously independent of W, A and X. This also shows that the follow the leader strategy that plays a t = t t s= x s is independent of W and A as well. As we shall see, the minimax strategy does not have this property. 4 Simplex Brier Game In this section we analyze the Brier game. The action and outcome spaces are the probability simplex on K outcomes; A = X = := {x R K + x = }. The loss is given by half the squared Mahalanobis distance, a x W. We present a full minimax analysis of the T -round game: we calculate the game value, derive the maximin and minimax strategies, and discuss their efficient implementation. The structure of this section is as follows. In Lemmas 4. and 4., the conclusions value and optimizers are obtained under the proviso that the given optimizer lies in the simplex. In our main result, Theorem 4.3, we apply these auxiliary results to our minimax analysis and argue that the maximizer indeed lies in the simplex. We immediately work from a general symmetric W 0 with the following lemma. Lemma 4.. Fix a symmetric matrix C 0 and vector d. The optimization problem has value d Cd Cd p = C C provided that p is in the simplex. max p p C p + d p d C C C C = d Cd C = t= t= d + Cd C attained at optimizer C C C C d + C C 3

4 Proof. We solve for the optimal p. Introducing Lagrange multiplier λ for the constraint k p k =, we need to have p = C d λ which results in λ = Cd C. Thus, the maximizer equals p = C d Cd C which produces objective value d + Cd C C d Cd C. The statement follows from simplification. This lemma allows us to compute the value and saddle point whenever the future payoff is quadratic. Lemma 4.. Fix symmetric matrices W 0 and A such that W + A 0, and a vector b. The optimization problem min max a x a x W + x Ax + b x achieves its value c W c W c W where c = diag W + A + b at saddle point the maximin strategy randomizes, playing x = e i with probability p i a = p = W W W c + W W W provided p 0. Proof. The objective is convex in x for each a as W + A 0, so it is maximized at a corner x = e k. We apply min-max swap see e.g. [Sio58], properness of the loss which implies that a = p and expand: min max a x a x W + x Ax + b x = min max a k = max min p a = max E p k p a e k W + e k Ae k + b e k [ E k p a e k W + ] e k Ae k + b e k [ p e k W + ] e k Ae k + b e k = max p p W p + diag W + A p + b p The proof is completed by applying Lemma Minimax Analysis of the Brier Game Next, we turn to computing V s, σ, n as a recursion and specifying the minimax and maximin strategies. However, for the value-to-go function to retain its quadratic form, we need an alignment condition on W. We say that W is aligned with the simplex if W W W W diagw W W, 4 where denotes an entry-wise inequality between vectors. Note that many matrices besides I satisfy this condition: for example, all symmetric matrices. We can now fully specify the value and strategies for the Brier game. Theorem 4.3. Consider the T -round Brier game with Mahalanobis loss a x W with W satisfying the alignment condition 4. After n outcomes x,..., x n with statistics s = n t= x t and σ = n t= x t W x t the value-to-go is V s, σ, n = α ns W s σ + nα n diagw s + γ n, 4

5 and the minimax and maximin strategies are given by a s, σ, n = p s, σ, n = W W + α n+ where the coefficients are defined recursively by s nw W + nα n+ W W W W diagw α T = T γ T = 0 α n = α n+ + α n+ γ n = nα n+ + γ n+. 4 diagw W diagw W diagw W Proof. We prove this by induction, beginning at the end of the game and working backwards in time. Assume that V s, σ, T has the given form. Recall that the value at the end of the game is V s, σ T, T = a t= a x t W and is given by Lemma 3.. Matching coefficients, we find V s, σ, T corresponds to α T = T and γ T = 0. Now assume that V has the assumed form after n rounds. Using s and σ to denote the state after n rounds, we can write V s, σ, n = min max a x a x W + α ns + x W s + x σ + x W x + nα n diagw s + x + γ n. Using Lemma 4. to evaluate the right hand side produces a quadratic function in the state, and we can then match terms to find α n and γ n and the minimax and maximin strategy. The final step is checking the p 0 condition necessary to apply Lemma 4., which is equivalent to W being aligned with the simplex. See the appendix for a complete proof. This full characterization of the game allows us to derive the following minimax regret bound. Theorem 4.4. Let W satisfy the alignment condition 4. The minimax regret of the T -round simplex game satisfies V + lnt 4 diagw W diagw W diagw. W Proof. The regret is equal to the value of the game, V = V 0, 0, 0 = γ 0. First observe that nα n+ = nα n+ + n α n+ = nα n+ + n α n α n+ = α n+ + n + α n+ + n α n. After summing over n the last two terms telescope, and we find γ 0 nα n+ = T α T + + α n+ = n=0 n=0 α n. Each α n can be bounded by /n, as observed in [TW00, proof of Lemma ]. In the base case n = T this holds with equality, and for n < T we have α n = α n+ + α n+ n= n + + n + = nn + n n + n. It follows that γ 0 T n= α n T n= n + lnt as desired. 5

6 5 Norm Ball Game This section parallels the previous. Here, we consider the online game with Mahalanobis loss and A = X = := {x R K x }, the -norm Euclidian ball not the Mahalanobis ball. We show that the value-to-go function is always quadratic in s and linear in σ and derive the minimax and maximin strategies. Lemma 5.. Fix a symmetric matrix A and vector b and assume A + W 0. Let λ max be the largest eigenvalue of W +A and v max the corresponding eigenvector. If b λ max I A b, then the optimization problem a x W + x Ax + x b a x has value b λ max I A b + λ max, minimax strategy a = λ max I A b and a randomized maximin strategy that plays two unit length vectors, with Pr x = a ± a a v max = ± a v max a a, where a and a are the components of a perpendicular and parallel to v max. Proof. As the objective is convex, the inner optimum must be on the boundary and hence will be at a unit vector x. Introduce a Lagrange multiplier λ for x x to get the Lagrangian a λ 0 x a x W + x Ax + x b + λ x x. This is concave in x if W + A λi 0, that is, λ max λ. Differentiating yields the optimizer x = W + A λi W a b, which leaves us with an optimization in only a and λ: a λ λ max a W a W a b W + A λi W a b + λ. Since the imums are over closed sets, we can exchange their order. Unconstrained optimization of a results in a = λi A b. Evaluating the objective at a and using W a b = W λi A b b = W + A λi λi A b results in λ λ max b λi A b + λ = λ λ max i u i b λ λ i + λ using the spectral decomposition A = i λ iu i u i. For λ λ max, we have λ λ i. Taking derivatives, provided b λ max I A b, this function is increasing in λ λ max, and so obtains its imum at λ max. Thus, when the assumed inequality is satisfied, the a is minimax for the given x. To obtain the maximin strategy, we can take the usual convexification where the Adversary plays distributions P over the unit sphere. This allows us to swap the imum and remum see e.g. Sion s minimax theorem[sio58] and obtain an equivalent optimization problem. We then see that the objective only depends on the mean µ = E x and second moment D = E xx of the distribution P. The characterization in [KNW3, Theorem.] tells us that µ, D are the first two moments of a distribution on units iff trd = and D µµ. Then, our usual min-max swap yields V = P = E a x P µ,d a [ a W a a W x + x W x + x Ax + b x a W a a W µ + tr W + AD + b µ = µ,d µ W µ + tr W + AD + b µ = a W a + b a + D a a trd= tr W + AD, ] 6

7 µ v max Figure : Illustration of the maximin distribution from Lemma 5.. The mixture of red unit vectors with mean µ has second moment D = µµ + µ µv max v max. where the second equality uses a = µ and the third used the saddle point condition µ = a. The matrix D with constraint trd = now seeks to align with the largest eigenvector of W + A but it also has to respect the constraint D a a. We now re-parameterize by C = D a a. We then need to find C 0 tr W + AC. trc= a a By linearity of the objective the maximizer is of rank, and hence this is a scaled maximum eigenvalue problem, with solution given by C = a a v max v max, so that D = a a + a a v max v max. This essentially reduces finding P to a -dimensional problem, which can be solved in closed form [KNW3, Lemma 4.]. It is easy to verify that the mixture in the theorem has the desired mean a and second moment D. See Figure for the geometrical intuition. Notice that both the minimax and maximin strategies only depend on W through λ max and v max. 5. Minimax Analysis of the Ball Game With the above lemma, we can compute the value and strategies for the ball game in an analogous way to Theorem 4.3. Again, we find that the value function at the end of the game is quadratic in the state, and, surprisingly, remains quadratic under the backwards induction. Theorem 5.. Consider the T -round ball game with loss a x W. After n rounds, the valueto-go for a state with statistics s = n t= x t and σ = n t= x t W x t is The minimax strategy plays V s, σ, n = s A n s σ + γ n. a s, σ, n = λ max I A n+ W An+ s and the maximin strategy plays two unit length vectors with Pr x = a ± a a v max = ± a v max a a, where λ max and v max correspond to the largest eigenvalue of A n+ and a and a are the components of a perpendicular and parallel to v max. The coefficients A n and γ n are determined recursively by base case A T = T W and γ T = 0 and recursion A n = A n+ W + λ max I A n+ An+ + A n+ γ n = λ max + γ n+. 7

8 Proof outline. The proof is by induction on the number n of rounds played. In the base case n = T we find see 3 A T = T W and γ T = 0. For the the induction step, we need to calculate V s, σ, n = a x a x W + V s + x, σ + x W x, n +. Using the induction hypothesis, we expand the right-hand-side to a x a x W + s + x A n+ s + x σ + x W x + γ n+. which we can evaluate by applying Lemma 5. with A = A n+ W and b = A n+ s. Collecting terms and matching with V s, σ, n = s A n s σ + γ n yields the recursion for A n and γ n as well as the given minimax and maximin strategies. As before, much of the algebra has been moved to the appendix. Understanding the eigenvalues of A n As we have seen from the A n recursion, the eigensystem is always the same as that of W. Thus, we can characterize the minimax strategy completely by its effect on the eigenvalues of W. Denote the eigenvalues of A n and W to be λ i n and ν i, respectively, with λ n corresponding to the largest eigenvalue. The eigenvalues follow: λ i n = λ i n+ ν i + λ n+ λi n+ + λ i n+ = λi n+ν i + λ n+ ν i + λ n+, λi n+ which leaves the order of the λ i n unchanged. The largest eigenvalue λ n satisfies the recurrence λ T /ν = /T and λ n/ν = λ n+/ν + λ n+ /ν, which, remarkably, is the same recurrence for the α n parameter in the Brier game, i.e. λ max n = α n ν max. This observation is the key to analyzing the minimax regret. Theorem 5.3. The minimax regret of the T -round ball game satisfies γ 0 = +lnt λ max W Proof. We have V = V 0, 0, 0 = γ 0 = / T n= λmax n proof of Theorem 4.4 gives the bound on T n= α n. = /λ max W T n= α n. The Taking stock, we find that the minimax regrets of the Brier game Theorems 4.3 and ball game Theorems 5. have identical dependence on the horizon T but differ in a complexity factor arising from the interaction of the action space and the loss matrix W. 6 Conclusion In this paper, we have presented two games that, unexpectedly, have computationally efficient minimax strategies. While the structure of the squared Mahalanobis distance is important, it is the interplay between the loss and the constraint set that allows efficient calculation of the backwards induction, value-to-go, and achieving strategies. For example, the squared Mahalanobis game with l ball action spaces does not admit a quadratic value-to-go unless W = I. We emphasize the low computational cost of this method despite the exponential blow-up in state space size. In the Brier game, the α n coefficients need to be precomputed, which can be done in OT time. Similarly, computation of the eigenvalues of the A n coefficients for the ball game can be done in OT K + K 3 time. Then, at each iteration of the algorithm, only matrix-vector multiplications between the current state and the precomputed parameters are required. Hence, playing either T round game requires OT K time. Unfortunately, as is the case with most minimax algorithms, the time horizon must be known in advance. There are many different future directions. We are currently pursuing a characterization of action spaces that permit quadratic value functions under squared Mahalanobis loss, and investigating connections between losses and families of value functions closed under backwards induction. There is some notion of conjugacy between losses, value-to-go functions, and action spaces, but a generalization seems difficult: the Brier game and ball game worked out for seemingly very different reasons. 8

9 References [ABRT08] Jacob Abernethy, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal strategies and minimax lower bounds for online convex games. In Servedio and Zhang [SZ08], pages [AWY08] Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In Servedio and Zhang [SZ08], pages [BGH + 3] Peter L. Bartlett, Peter Grunwald, Peter Harremoës, Fares Hedayati, and Wojciech Kotłowski. Horizon-independent optimal prediction with log-loss in exponential families. CoRR, abs/ , 03. [Bri50] [CBL06] [CBS] [HB] [KM05] [KNW3] [Sht87] Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78: 3, 950. Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 006. Nicolò Cesa-Bianchi and Ohad Shamir. Efficient online learning via randomized rounding. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 4, pages , 0. Fares Hedayati and Peter L. Bartlett. Exchangeability characterizes optimality of sequential normalized maximum likelihood and bayesian prediction with jeffreys prior. In International Conference on Artificial Intelligence and Statistics, pages , 0. Petri Kontkanen and Petri Myllymäki. A fast normalized maximum likelihood algorithm for multinomial data. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05, pages 63 66, 005. Wouter M. Koolen, Jiazhong Nie, and Manfred K. Warmuth. Learning a set of directions. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 6th Annual Conference on Learning Theory COLT, June 03. Yurii Mikhailovich Shtar kov. Universal sequential coding of single messages. Problemy Peredachi Informatsii, 33:3 7, 987. [Sio58] Maurice Sion. On general minimax theorems. Pacific J. Math., 8:7 76, 958. [SZ08] Rocco A. Servedio and Tong Zhang, editors. st Annual Conference on Learning Theory - COLT 008, Helsinki, Finland, July 9-, 008. Omnipress, 008. [TW00] Eiji Takimoto and Manfred K. Warmuth. The minimax strategy for Gaussian density estimation. In 3th COLT, pages 00 06,

10 A Proofs Proof of Theorem 4.3. We prove by induction. Recall that V s, σ, T, the value at the end of the game, already has V s, σ, T = T s W s σ, corresponding to α T = T and γ T = 0. Adding a constant to the objective in Lemma 4. only adds the same offset to the value without changing the saddle point. Now, assume the induction hypothesis for n T rounds. Using the abbreviation d = diagw, we then need to evaluate V s, σ, n = min max a x a x W + α ns + x W s + x σ + x W x + nα n d s + x + γ n. = min max a x a x W + α nx W x x W x + nα n d x + α n s W x + α ns W s + nα n d s σ + γ n. Applying Lemma 4. with A = α n W and b = nα n d + α n W s produces the value where all time indices are n V s, σ, n = α + αs W n α s + α + nα d s + n α d W d n α W d + α s W and the strategy p = σ + γ W W W W n α nd + α n W s + W W. Using the fact that s = n, the value can be further simplified to V s, σ, n = α + αs W s + n α + α d s + n α 8 d W d W d W σ + γ. This establishes the induction step upon setting α n = α n+ + α n+, γ n = γ n+ + nα n+ 8 d W d W d. W Finally, we need to verify that the strategies above respect the constraints of the game, i.e. that a and p are in the simplex. Otherwise, the above calculations do not correspond to the minimax strategy. We need to show for all s 0 with sum n that α n s + n α n W d α n n + W n α n W d W. 0

11 As this needs to hold for all s 0, we need in fact that component-wise that is n α n W d α n n + W n α n W d W, W W W diagw n α n W W n α n W = W W. Proof of Theorem 5.. Recall that we needed to calculate V s, σ, n = a x a x W + s + x A n+ s + x σ + x W x + γ n+ which may be reorganized to s A n+ s σ + γ n+ + a x a x W + x A n+ W x + s A n+ x. We now apply Lemma 5. with A = A n+ W and b = A n+ s. Let λ max n+ be the largest eigenvalue of A n+ W + W = A n+. If we have then the value equals s A n+ W + λ max n+i A n+ An+ s, 5 s A n+ s σ + γ n+ + λmax n+ + s A n+ W + λ max n+i A n+ An+ s. So if we assert that V s, σ, n = s A n s σ + γ n, we find that we must have the correspondence: A n γ n = A n+ W + λ max n+i A n+ An+ + A n+, = λmax n+ + γ n+. The final piece is to check Equation 5. To see this, let ν denote the eigenvalues of W. Observe that the largest eigenvalue of A n+ W + λ max n+i A n+ An+ is, λ max n+/ν max because the mapping λ λ ν + λ max n+ λ is monotonic in λ λ max. The proof now follows by combining the recurrence for the largest eigenvalue from Section 5. with the bound on α n from the proof of Theorem 4.4: s A n+ W + λ max λ max n+i A n+ An+ s s αn+n n+ ν max n n + <.

Minimax Fixed-Design Linear Regression

Minimax Fixed-Design Linear Regression JMLR: Workshop and Conference Proceedings vol 40:1 14, 2015 Mini Fixed-Design Linear Regression Peter L. Bartlett University of California at Berkeley and Queensland University of Technology Wouter M.

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Efficient Sequential Decision Making. Alan Malek. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Efficient Sequential Decision Making. Alan Malek. A dissertation submitted in partial satisfaction of the. requirements for the degree of Efficient Sequential Decision Making by Alan Malek A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering Electrical Engineering and

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Online Forest Density Estimation

Online Forest Density Estimation Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies

Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction

Achievability of Asymptotic Minimax Regret in Online and Batch Prediction JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute

More information

Move from Perturbed scheme to exponential weighting average

Move from Perturbed scheme to exponential weighting average Move from Perturbed scheme to exponential weighting average Chunyang Xiao Abstract In an online decision problem, one makes decisions often with a pool of decisions sequence called experts but without

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal

More information

EE 227A: Convex Optimization and Applications October 14, 2008

EE 227A: Convex Optimization and Applications October 14, 2008 EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

3. THE SIMPLEX ALGORITHM

3. THE SIMPLEX ALGORITHM Optimization. THE SIMPLEX ALGORITHM DPK Easter Term. Introduction We know that, if a linear programming problem has a finite optimal solution, it has an optimal solution at a basic feasible solution (b.f.s.).

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

280 Eiji Takimoto and Manfred K. Warmuth

280 Eiji Takimoto and Manfred K. Warmuth The Last-Step Minimax Algorithm Eiji Takimoto 1? and Manfred K. Warmuth?? 1 Graduate School of Information Sciences, Tohoku University Sendai, 980-8579, Japan. t@ecei.tohoku.ac.jp Computer Science Department,

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 1.

CS281B/Stat241B. Statistical Learning Theory. Lecture 1. CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends

More information

No-Regret Learning in Convex Games

No-Regret Learning in Convex Games Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 Amy Greenwald Casey Marks Department of Computer Science, Brown University, Providence, RI 02912 Abstract

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Competing With Strategies

Competing With Strategies JMLR: Workshop and Conference Proceedings vol (203) 27 Competing With Strategies Wei Han Alexander Rakhlin Karthik Sridharan wehan@wharton.upenn.edu rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Competing With Strategies

Competing With Strategies Competing With Strategies Wei Han Univ. of Pennsylvania Alexander Rakhlin Univ. of Pennsylvania February 3, 203 Karthik Sridharan Univ. of Pennsylvania Abstract We study the problem of online learning

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality

More information

On the Worst-case Analysis of Temporal-difference Learning Algorithms

On the Worst-case Analysis of Temporal-difference Learning Algorithms Machine Learning, 22(1/2/3:95-121, 1996. On the Worst-case Analysis of Temporal-difference Learning Algorithms schapire@research.att.com ATT Bell Laboratories, 600 Mountain Avenue, Room 2A-424, Murray

More information

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

The Interplay Between Stability and Regret in Online Learning

The Interplay Between Stability and Regret in Online Learning The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

1 Seidel s LP algorithm

1 Seidel s LP algorithm 15-451/651: Design & Analysis of Algorithms October 21, 2015 Lecture #14 last changed: November 7, 2015 In this lecture we describe a very nice algorithm due to Seidel for Linear Programming in lowdimensional

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Minimizing Adaptive Regret with One Gradient per Iteration

Minimizing Adaptive Regret with One Gradient per Iteration Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information