Efficient Minimax Strategies for Square Loss Games
|
|
- Irene Wilcox
- 5 years ago
- Views:
Transcription
1 Efficient Minimax Strategies for Square Loss Games Wouter M. Koolen Queensland University of Technology and UC Berkeley Alan Malek University of California, Berkeley Peter L. Bartlett University of California, Berkeley and Queensland University of Technology Abstract We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. We derive the minimax solutions for the case where the prediction and action spaces are the simplex this setup is sometimes called the Brier game and the l ball this setup is related to Gaussian density estimation. We show that in both cases the value of each sub-game is a quadratic function of a simple statistic of the state, with coefficients that can be efficiently computed using an explicit recurrence relation. The resulting deterministic minimax strategy and randomized maximin strategy are linear functions of the statistic. Introduction We are interested in general strategies for sequential prediction and decision making a.k.a. online learning that improve their performance with experience. Since the early days of online learning, people have formalized such learning tasks as regret games. The learner interacts with an adversarial environment with the goal of performing almost as well as the best strategy from some fixed reference set. In many cases, we have efficient algorithms with an upper bound on the regret that meets the game-theoretic lower bound up to a small constant factor. In a few special cases, we have the exact minimax strategy, meaning that we understand the learning problem at all levels of detail. In even fewer cases we can also efficiently execute the minimax strategy. These cases serve as exemplars to guide our thinking about learning algorithms. In this paper we add two interesting examples to the canon of efficiently computable minimax strategies. Our setup, as described in Figure, is as follows. The Learner and the Adversary play vectors a A and x X, upon which the Learner is penalized using the squared Euclidean distance a x or its generalization, the squared Mahalanobis distance, a x W = a x W a x, parametrized by a symmetric matrix W 0. After a sequence of T such interactions, we compare the loss of the Learner to the loss of the best fixed prediction a A. In all our examples, this best fixed action in hindsight is the mean outcome a = T T t= x t, regardless of W. We use regret, the difference between the loss of the learner and the loss of a, to evaluate performance. The minimax regret for the T -round game, also known as the value of the game, is given by V := a x a T x T t= a t x t W a t= a x t W
2 where the a t range over actions A and the x t range over outcomes X. The minimax strategy chooses the a t, given all past outcomes x,..., x t, to achieve this regret. Intuitively, the minimax regret is the regret if both players play optimally while assuming the other player is doing the same. Our first example is the Brier game, where the action and outcome spaces are the probability simplex with K outcomes. The Brier game is traditionally popular in meteorology [Bri50]. Our second example is the ball game, where the action and outcome spaces are the Euclidean norm ball, i.e. A = X = {x R K x = }. Even though we measure loss by the squared Mahalanobis distance, we play on the standard Euclidean norm ball. The ball game is related to Gaussian density estimation [TW00]. In each case we exhibit a strategy that can play a T -round game in OT K time. The algorithm spends OT K + K 3 time pre-processing the game, and then plays in OK time per round. Outline Given: T, W, A, X. For t =,,..., T Learner chooses prediction a t A Adversary chooses outcome x t X Learner incurs loss a t x t W. Figure : Protocol We define our loss using the squared Mahalanobis distance, parametrized by a symmetric matrix W 0. We recover the squared Euclidean distance by choosing W = I. Our games will always last T rounds. For some observed data x,..., x n, the value-to-go for the remaining T n rounds is given by V x,..., x n := a n+ x n+ a T x T t=n+ a t x t W a t= a x t W. By definition, the minimax regret is V = V ɛ where ɛ is the empty sequence, and the value-togo satisfies the recurrence { T V x,..., x n = a t= a x t W if n = T, an+ xn+ a n+ x n+ W + V x,..., x n+ if n < T. Our analysis for the two games proceeds in a similar manner. For some past history of plays x,..., x n of length n, we summarize the state by s = n t= x t and σ = n t= x t W x t. As we will see, the value-to-go after n of T rounds can be written as V s, σ, n; i.e. it only depends on the past plays through s and σ. More surprisingly, for each n, the value-to-go V s, σ, n is a quadratic function of s and a linear function of σ under certain conditions on W. While it is straightforward to see that the terminal value V s, σ, T is quadratic in the state this is easily checked by computing the loss of the best expert and using the first case of Equation, it is not at all obvious that propagating from V s + x, σ + x W x, n + to V s, σ, n, using the second case of, preserves this structure. This compact representation of the value-function is an essential ingredient for a computationally feasible algorithm. Many minimax approaches, such as normalized maximum likelihood [Sht87], have computational complexities that scale exponentially with the time horizon. We derive a strategy that can play in constant amortized time. Why is this interesting? We go beyond previous work in a few directions. First, we exhibit two new games that belong to the tiny class admitting computationally feasible minimax algorithms. Second, we consider the setting with squared Mahalanobis loss which allows the user intricate control over the penalization of different prediction errors. Our results clearly show how the learner should exploit this prioritization.. Related work Repeated games with minimax strategies are frequently studied [CBL06] and, in online learning, minimax analysis has been applied to a variety of losses and repeated games; however, computa-
3 tionally feasible algorithms are the exception, not the rule. For example, consider log loss, first discussed in [Sht87]. Whiile the minimax algorithm, Normalized Maximum Likelihood, is well known [CBL06], it generally requires computation that is exponential in the time horizon as one needs to aggregate over all data sequences. To our knowledge, there are two exceptions where efficient NML forecasters are possible: the multinomial case where fast Fourier transforms may be exploited [KM05], and very particular exponential families that cause NML to be a Bayesian strategy [HB], [BGH + 3]. The minimax optimal strategy is known also for: i the ball game with W = I [TW00] our generalization to Mahalanobis W I results in fundamentally different strategies, ii the ball game with W = I and a constraint on the player s deviation from the current empirical minimizer [ABRT08] for which the optimal strategy is Follow-the-Leader, iii Lipschitz-bounded convex loss functions [ABRT08], iv experts with an L bound [AWY08], and v static experts with absolute loss [CBS]. While not guaranteed to be an exhaustive list, the previous paragraph demonstrates the rarity of tractable minimax algorithms. 3 The Offline Problem The regret is defined as the difference between the loss of the algorithm and the loss of the best action in hindsight. Here we calculate that action and its loss. Lemma 3.. Suppose A convx this will always hold in the settings we study. For data x,..., x T X, the loss of the best action in hindsight equals a A t= a x t W = T x t W x t T T x t W x t, 3 T t= and the minimizer is the mean outcome a = T T t= x t. Proof. The unconstrained minimizer and value are obtained by equating the derivative to zero and plugging in the solution. The assumption A convx ensures that the constraint a A is inactive. The best action in hindsight is curiously independent of W, A and X. This also shows that the follow the leader strategy that plays a t = t t s= x s is independent of W and A as well. As we shall see, the minimax strategy does not have this property. 4 Simplex Brier Game In this section we analyze the Brier game. The action and outcome spaces are the probability simplex on K outcomes; A = X = := {x R K + x = }. The loss is given by half the squared Mahalanobis distance, a x W. We present a full minimax analysis of the T -round game: we calculate the game value, derive the maximin and minimax strategies, and discuss their efficient implementation. The structure of this section is as follows. In Lemmas 4. and 4., the conclusions value and optimizers are obtained under the proviso that the given optimizer lies in the simplex. In our main result, Theorem 4.3, we apply these auxiliary results to our minimax analysis and argue that the maximizer indeed lies in the simplex. We immediately work from a general symmetric W 0 with the following lemma. Lemma 4.. Fix a symmetric matrix C 0 and vector d. The optimization problem has value d Cd Cd p = C C provided that p is in the simplex. max p p C p + d p d C C C C = d Cd C = t= t= d + Cd C attained at optimizer C C C C d + C C 3
4 Proof. We solve for the optimal p. Introducing Lagrange multiplier λ for the constraint k p k =, we need to have p = C d λ which results in λ = Cd C. Thus, the maximizer equals p = C d Cd C which produces objective value d + Cd C C d Cd C. The statement follows from simplification. This lemma allows us to compute the value and saddle point whenever the future payoff is quadratic. Lemma 4.. Fix symmetric matrices W 0 and A such that W + A 0, and a vector b. The optimization problem min max a x a x W + x Ax + b x achieves its value c W c W c W where c = diag W + A + b at saddle point the maximin strategy randomizes, playing x = e i with probability p i a = p = W W W c + W W W provided p 0. Proof. The objective is convex in x for each a as W + A 0, so it is maximized at a corner x = e k. We apply min-max swap see e.g. [Sio58], properness of the loss which implies that a = p and expand: min max a x a x W + x Ax + b x = min max a k = max min p a = max E p k p a e k W + e k Ae k + b e k [ E k p a e k W + ] e k Ae k + b e k [ p e k W + ] e k Ae k + b e k = max p p W p + diag W + A p + b p The proof is completed by applying Lemma Minimax Analysis of the Brier Game Next, we turn to computing V s, σ, n as a recursion and specifying the minimax and maximin strategies. However, for the value-to-go function to retain its quadratic form, we need an alignment condition on W. We say that W is aligned with the simplex if W W W W diagw W W, 4 where denotes an entry-wise inequality between vectors. Note that many matrices besides I satisfy this condition: for example, all symmetric matrices. We can now fully specify the value and strategies for the Brier game. Theorem 4.3. Consider the T -round Brier game with Mahalanobis loss a x W with W satisfying the alignment condition 4. After n outcomes x,..., x n with statistics s = n t= x t and σ = n t= x t W x t the value-to-go is V s, σ, n = α ns W s σ + nα n diagw s + γ n, 4
5 and the minimax and maximin strategies are given by a s, σ, n = p s, σ, n = W W + α n+ where the coefficients are defined recursively by s nw W + nα n+ W W W W diagw α T = T γ T = 0 α n = α n+ + α n+ γ n = nα n+ + γ n+. 4 diagw W diagw W diagw W Proof. We prove this by induction, beginning at the end of the game and working backwards in time. Assume that V s, σ, T has the given form. Recall that the value at the end of the game is V s, σ T, T = a t= a x t W and is given by Lemma 3.. Matching coefficients, we find V s, σ, T corresponds to α T = T and γ T = 0. Now assume that V has the assumed form after n rounds. Using s and σ to denote the state after n rounds, we can write V s, σ, n = min max a x a x W + α ns + x W s + x σ + x W x + nα n diagw s + x + γ n. Using Lemma 4. to evaluate the right hand side produces a quadratic function in the state, and we can then match terms to find α n and γ n and the minimax and maximin strategy. The final step is checking the p 0 condition necessary to apply Lemma 4., which is equivalent to W being aligned with the simplex. See the appendix for a complete proof. This full characterization of the game allows us to derive the following minimax regret bound. Theorem 4.4. Let W satisfy the alignment condition 4. The minimax regret of the T -round simplex game satisfies V + lnt 4 diagw W diagw W diagw. W Proof. The regret is equal to the value of the game, V = V 0, 0, 0 = γ 0. First observe that nα n+ = nα n+ + n α n+ = nα n+ + n α n α n+ = α n+ + n + α n+ + n α n. After summing over n the last two terms telescope, and we find γ 0 nα n+ = T α T + + α n+ = n=0 n=0 α n. Each α n can be bounded by /n, as observed in [TW00, proof of Lemma ]. In the base case n = T this holds with equality, and for n < T we have α n = α n+ + α n+ n= n + + n + = nn + n n + n. It follows that γ 0 T n= α n T n= n + lnt as desired. 5
6 5 Norm Ball Game This section parallels the previous. Here, we consider the online game with Mahalanobis loss and A = X = := {x R K x }, the -norm Euclidian ball not the Mahalanobis ball. We show that the value-to-go function is always quadratic in s and linear in σ and derive the minimax and maximin strategies. Lemma 5.. Fix a symmetric matrix A and vector b and assume A + W 0. Let λ max be the largest eigenvalue of W +A and v max the corresponding eigenvector. If b λ max I A b, then the optimization problem a x W + x Ax + x b a x has value b λ max I A b + λ max, minimax strategy a = λ max I A b and a randomized maximin strategy that plays two unit length vectors, with Pr x = a ± a a v max = ± a v max a a, where a and a are the components of a perpendicular and parallel to v max. Proof. As the objective is convex, the inner optimum must be on the boundary and hence will be at a unit vector x. Introduce a Lagrange multiplier λ for x x to get the Lagrangian a λ 0 x a x W + x Ax + x b + λ x x. This is concave in x if W + A λi 0, that is, λ max λ. Differentiating yields the optimizer x = W + A λi W a b, which leaves us with an optimization in only a and λ: a λ λ max a W a W a b W + A λi W a b + λ. Since the imums are over closed sets, we can exchange their order. Unconstrained optimization of a results in a = λi A b. Evaluating the objective at a and using W a b = W λi A b b = W + A λi λi A b results in λ λ max b λi A b + λ = λ λ max i u i b λ λ i + λ using the spectral decomposition A = i λ iu i u i. For λ λ max, we have λ λ i. Taking derivatives, provided b λ max I A b, this function is increasing in λ λ max, and so obtains its imum at λ max. Thus, when the assumed inequality is satisfied, the a is minimax for the given x. To obtain the maximin strategy, we can take the usual convexification where the Adversary plays distributions P over the unit sphere. This allows us to swap the imum and remum see e.g. Sion s minimax theorem[sio58] and obtain an equivalent optimization problem. We then see that the objective only depends on the mean µ = E x and second moment D = E xx of the distribution P. The characterization in [KNW3, Theorem.] tells us that µ, D are the first two moments of a distribution on units iff trd = and D µµ. Then, our usual min-max swap yields V = P = E a x P µ,d a [ a W a a W x + x W x + x Ax + b x a W a a W µ + tr W + AD + b µ = µ,d µ W µ + tr W + AD + b µ = a W a + b a + D a a trd= tr W + AD, ] 6
7 µ v max Figure : Illustration of the maximin distribution from Lemma 5.. The mixture of red unit vectors with mean µ has second moment D = µµ + µ µv max v max. where the second equality uses a = µ and the third used the saddle point condition µ = a. The matrix D with constraint trd = now seeks to align with the largest eigenvector of W + A but it also has to respect the constraint D a a. We now re-parameterize by C = D a a. We then need to find C 0 tr W + AC. trc= a a By linearity of the objective the maximizer is of rank, and hence this is a scaled maximum eigenvalue problem, with solution given by C = a a v max v max, so that D = a a + a a v max v max. This essentially reduces finding P to a -dimensional problem, which can be solved in closed form [KNW3, Lemma 4.]. It is easy to verify that the mixture in the theorem has the desired mean a and second moment D. See Figure for the geometrical intuition. Notice that both the minimax and maximin strategies only depend on W through λ max and v max. 5. Minimax Analysis of the Ball Game With the above lemma, we can compute the value and strategies for the ball game in an analogous way to Theorem 4.3. Again, we find that the value function at the end of the game is quadratic in the state, and, surprisingly, remains quadratic under the backwards induction. Theorem 5.. Consider the T -round ball game with loss a x W. After n rounds, the valueto-go for a state with statistics s = n t= x t and σ = n t= x t W x t is The minimax strategy plays V s, σ, n = s A n s σ + γ n. a s, σ, n = λ max I A n+ W An+ s and the maximin strategy plays two unit length vectors with Pr x = a ± a a v max = ± a v max a a, where λ max and v max correspond to the largest eigenvalue of A n+ and a and a are the components of a perpendicular and parallel to v max. The coefficients A n and γ n are determined recursively by base case A T = T W and γ T = 0 and recursion A n = A n+ W + λ max I A n+ An+ + A n+ γ n = λ max + γ n+. 7
8 Proof outline. The proof is by induction on the number n of rounds played. In the base case n = T we find see 3 A T = T W and γ T = 0. For the the induction step, we need to calculate V s, σ, n = a x a x W + V s + x, σ + x W x, n +. Using the induction hypothesis, we expand the right-hand-side to a x a x W + s + x A n+ s + x σ + x W x + γ n+. which we can evaluate by applying Lemma 5. with A = A n+ W and b = A n+ s. Collecting terms and matching with V s, σ, n = s A n s σ + γ n yields the recursion for A n and γ n as well as the given minimax and maximin strategies. As before, much of the algebra has been moved to the appendix. Understanding the eigenvalues of A n As we have seen from the A n recursion, the eigensystem is always the same as that of W. Thus, we can characterize the minimax strategy completely by its effect on the eigenvalues of W. Denote the eigenvalues of A n and W to be λ i n and ν i, respectively, with λ n corresponding to the largest eigenvalue. The eigenvalues follow: λ i n = λ i n+ ν i + λ n+ λi n+ + λ i n+ = λi n+ν i + λ n+ ν i + λ n+, λi n+ which leaves the order of the λ i n unchanged. The largest eigenvalue λ n satisfies the recurrence λ T /ν = /T and λ n/ν = λ n+/ν + λ n+ /ν, which, remarkably, is the same recurrence for the α n parameter in the Brier game, i.e. λ max n = α n ν max. This observation is the key to analyzing the minimax regret. Theorem 5.3. The minimax regret of the T -round ball game satisfies γ 0 = +lnt λ max W Proof. We have V = V 0, 0, 0 = γ 0 = / T n= λmax n proof of Theorem 4.4 gives the bound on T n= α n. = /λ max W T n= α n. The Taking stock, we find that the minimax regrets of the Brier game Theorems 4.3 and ball game Theorems 5. have identical dependence on the horizon T but differ in a complexity factor arising from the interaction of the action space and the loss matrix W. 6 Conclusion In this paper, we have presented two games that, unexpectedly, have computationally efficient minimax strategies. While the structure of the squared Mahalanobis distance is important, it is the interplay between the loss and the constraint set that allows efficient calculation of the backwards induction, value-to-go, and achieving strategies. For example, the squared Mahalanobis game with l ball action spaces does not admit a quadratic value-to-go unless W = I. We emphasize the low computational cost of this method despite the exponential blow-up in state space size. In the Brier game, the α n coefficients need to be precomputed, which can be done in OT time. Similarly, computation of the eigenvalues of the A n coefficients for the ball game can be done in OT K + K 3 time. Then, at each iteration of the algorithm, only matrix-vector multiplications between the current state and the precomputed parameters are required. Hence, playing either T round game requires OT K time. Unfortunately, as is the case with most minimax algorithms, the time horizon must be known in advance. There are many different future directions. We are currently pursuing a characterization of action spaces that permit quadratic value functions under squared Mahalanobis loss, and investigating connections between losses and families of value functions closed under backwards induction. There is some notion of conjugacy between losses, value-to-go functions, and action spaces, but a generalization seems difficult: the Brier game and ball game worked out for seemingly very different reasons. 8
9 References [ABRT08] Jacob Abernethy, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal strategies and minimax lower bounds for online convex games. In Servedio and Zhang [SZ08], pages [AWY08] Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In Servedio and Zhang [SZ08], pages [BGH + 3] Peter L. Bartlett, Peter Grunwald, Peter Harremoës, Fares Hedayati, and Wojciech Kotłowski. Horizon-independent optimal prediction with log-loss in exponential families. CoRR, abs/ , 03. [Bri50] [CBL06] [CBS] [HB] [KM05] [KNW3] [Sht87] Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78: 3, 950. Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 006. Nicolò Cesa-Bianchi and Ohad Shamir. Efficient online learning via randomized rounding. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 4, pages , 0. Fares Hedayati and Peter L. Bartlett. Exchangeability characterizes optimality of sequential normalized maximum likelihood and bayesian prediction with jeffreys prior. In International Conference on Artificial Intelligence and Statistics, pages , 0. Petri Kontkanen and Petri Myllymäki. A fast normalized maximum likelihood algorithm for multinomial data. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05, pages 63 66, 005. Wouter M. Koolen, Jiazhong Nie, and Manfred K. Warmuth. Learning a set of directions. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 6th Annual Conference on Learning Theory COLT, June 03. Yurii Mikhailovich Shtar kov. Universal sequential coding of single messages. Problemy Peredachi Informatsii, 33:3 7, 987. [Sio58] Maurice Sion. On general minimax theorems. Pacific J. Math., 8:7 76, 958. [SZ08] Rocco A. Servedio and Tong Zhang, editors. st Annual Conference on Learning Theory - COLT 008, Helsinki, Finland, July 9-, 008. Omnipress, 008. [TW00] Eiji Takimoto and Manfred K. Warmuth. The minimax strategy for Gaussian density estimation. In 3th COLT, pages 00 06,
10 A Proofs Proof of Theorem 4.3. We prove by induction. Recall that V s, σ, T, the value at the end of the game, already has V s, σ, T = T s W s σ, corresponding to α T = T and γ T = 0. Adding a constant to the objective in Lemma 4. only adds the same offset to the value without changing the saddle point. Now, assume the induction hypothesis for n T rounds. Using the abbreviation d = diagw, we then need to evaluate V s, σ, n = min max a x a x W + α ns + x W s + x σ + x W x + nα n d s + x + γ n. = min max a x a x W + α nx W x x W x + nα n d x + α n s W x + α ns W s + nα n d s σ + γ n. Applying Lemma 4. with A = α n W and b = nα n d + α n W s produces the value where all time indices are n V s, σ, n = α + αs W n α s + α + nα d s + n α d W d n α W d + α s W and the strategy p = σ + γ W W W W n α nd + α n W s + W W. Using the fact that s = n, the value can be further simplified to V s, σ, n = α + αs W s + n α + α d s + n α 8 d W d W d W σ + γ. This establishes the induction step upon setting α n = α n+ + α n+, γ n = γ n+ + nα n+ 8 d W d W d. W Finally, we need to verify that the strategies above respect the constraints of the game, i.e. that a and p are in the simplex. Otherwise, the above calculations do not correspond to the minimax strategy. We need to show for all s 0 with sum n that α n s + n α n W d α n n + W n α n W d W. 0
11 As this needs to hold for all s 0, we need in fact that component-wise that is n α n W d α n n + W n α n W d W, W W W diagw n α n W W n α n W = W W. Proof of Theorem 5.. Recall that we needed to calculate V s, σ, n = a x a x W + s + x A n+ s + x σ + x W x + γ n+ which may be reorganized to s A n+ s σ + γ n+ + a x a x W + x A n+ W x + s A n+ x. We now apply Lemma 5. with A = A n+ W and b = A n+ s. Let λ max n+ be the largest eigenvalue of A n+ W + W = A n+. If we have then the value equals s A n+ W + λ max n+i A n+ An+ s, 5 s A n+ s σ + γ n+ + λmax n+ + s A n+ W + λ max n+i A n+ An+ s. So if we assert that V s, σ, n = s A n s σ + γ n, we find that we must have the correspondence: A n γ n = A n+ W + λ max n+i A n+ An+ + A n+, = λmax n+ + γ n+. The final piece is to check Equation 5. To see this, let ν denote the eigenvalues of W. Observe that the largest eigenvalue of A n+ W + λ max n+i A n+ An+ is, λ max n+/ν max because the mapping λ λ ν + λ max n+ λ is monotonic in λ λ max. The proof now follows by combining the recurrence for the largest eigenvalue from Section 5. with the bound on α n from the proof of Theorem 4.4: s A n+ W + λ max λ max n+i A n+ An+ s s αn+n n+ ν max n n + <.
Minimax Fixed-Design Linear Regression
JMLR: Workshop and Conference Proceedings vol 40:1 14, 2015 Mini Fixed-Design Linear Regression Peter L. Bartlett University of California at Berkeley and Queensland University of Technology Wouter M.
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationEfficient Sequential Decision Making. Alan Malek. A dissertation submitted in partial satisfaction of the. requirements for the degree of
Efficient Sequential Decision Making by Alan Malek A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering Electrical Engineering and
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationThe Free Matrix Lunch
The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationMinimax strategy for prediction with expert advice under stochastic assumptions
Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationOn-line Variance Minimization
On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06
More informationA Drifting-Games Analysis for Online Learning and Applications to Boosting
A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationOn Minimaxity of Follow the Leader Strategy in the Stochastic Setting
On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction
More informationAchievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies
Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationBandit Convex Optimization: T Regret in One Dimension
Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationAchievability of Asymptotic Minimax Regret in Online and Batch Prediction
JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute
More informationMove from Perturbed scheme to exponential weighting average
Move from Perturbed scheme to exponential weighting average Chunyang Xiao Abstract In an online decision problem, one makes decisions often with a pool of decisions sequence called experts but without
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationThe convex optimization approach to regret minimization
The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationEE 227A: Convex Optimization and Applications October 14, 2008
EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationSequential prediction with coded side information under logarithmic loss
under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science
More information3. THE SIMPLEX ALGORITHM
Optimization. THE SIMPLEX ALGORITHM DPK Easter Term. Introduction We know that, if a linear programming problem has a finite optimal solution, it has an optimal solution at a basic feasible solution (b.f.s.).
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More information280 Eiji Takimoto and Manfred K. Warmuth
The Last-Step Minimax Algorithm Eiji Takimoto 1? and Manfred K. Warmuth?? 1 Graduate School of Information Sciences, Tohoku University Sendai, 980-8579, Japan. t@ecei.tohoku.ac.jp Computer Science Department,
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 1.
CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationEfficiently Training Sum-Product Neural Networks using Forward Greedy Selection
Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends
More informationNo-Regret Learning in Convex Games
Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 Amy Greenwald Casey Marks Department of Computer Science, Brown University, Providence, RI 02912 Abstract
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationCompeting With Strategies
JMLR: Workshop and Conference Proceedings vol (203) 27 Competing With Strategies Wei Han Alexander Rakhlin Karthik Sridharan wehan@wharton.upenn.edu rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationOnline Kernel PCA with Entropic Matrix Updates
Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationCompeting With Strategies
Competing With Strategies Wei Han Univ. of Pennsylvania Alexander Rakhlin Univ. of Pennsylvania February 3, 203 Karthik Sridharan Univ. of Pennsylvania Abstract We study the problem of online learning
More informationDuke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014
Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationWe describe the generalization of Hazan s algorithm for symmetric programming
ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationSmooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality
More informationOn the Worst-case Analysis of Temporal-difference Learning Algorithms
Machine Learning, 22(1/2/3:95-121, 1996. On the Worst-case Analysis of Temporal-difference Learning Algorithms schapire@research.att.com ATT Bell Laboratories, 600 Mountain Avenue, Room 2A-424, Murray
More informationBayesian Properties of Normalized Maximum Likelihood and its Fast Computation
Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation Andre Barron Department of Statistics Yale University Email: andre.barron@yale.edu Teemu Roos HIIT & Dept of Computer Science
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationAdvanced Machine Learning
Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationAN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES
AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016
U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest
More information1 Seidel s LP algorithm
15-451/651: Design & Analysis of Algorithms October 21, 2015 Lecture #14 last changed: November 7, 2015 In this lecture we describe a very nice algorithm due to Seidel for Linear Programming in lowdimensional
More informationUnconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMinimizing Adaptive Regret with One Gradient per Iteration
Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,
More informationThe local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance
The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationCS261: Problem Set #3
CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:
More information