Prediction by random-walk perturbation

Size: px

Start display at page:

Download "Prediction by random-walk perturbation"

Shanon Phillips
5 years ago
Views:

1 Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra Gergely Neu MTA SZTAKI Institute for Coputer Science and Control and Departent of Coputer Science and Inforation Theory Budapest University of Technology and Econoics Editor: Abstract We propose a version of the follow-the-perturbed-leader online prediction algorith in which the cuulative losses are perturbed by independent syetric rando walks. The forecaster is shown to achieve an expected regret of the optial order O( n log N where n is the tie horizon and N is the nuber of experts. More iportantly, it is shown that the forecaster changes its prediction at ost O( n log N ties, in expectation. We also extend the analysis to online cobinatorial optiization and show that even in this ore general setting, the forecaster rarely switches between experts while having a regret of near-optial order. This is the first forecaster with such a proven property. Keywords: Online learning, Follow the Perturbed Leader, rando walk 1. Preliinaries In this paper we study the proble of online prediction with expert advice, see Cesa- Bianchi and Lugosi (006. The proble ay be described as a repeated gae between a forecaster and an adversary the environent. At each tie instant t 1,..., n, the forecaster chooses one of the N available actions (often called experts and suffers a loss l i,t 0, 1 corresponding to the chosen action i. We consider the so-called oblivious adversary odel in which the environent selects all losses before the prediction gae starts and reveals the losses l i,t at tie t after the forecaster has ade its prediction. The losses are deterinistic but the forecaster ay randoize: at tie t, the forecaster chooses a probability distribution p t over the set of N actions and draws a rando action I t according to the distribution p t. The prediction protocol is described in Figure 1. 1

2 The usual goal for the standard prediction proble is to devise an algorith such that the cuulative loss L n n l I t,t is as sall as possible, in expectation and/or with high probability (where probability is with respect to the forecaster s randoization. Since we do not ake any assuption on how the environent generates the losses l i,t, we cannot hope to iniize the above the cuulative loss. Instead, a eaningful goal is to iniize the perforance gap between our algorith and the strategy that selects the best action chosen in hindsight. This perforance gap is called the regret and is defined forally as R n ax i {1,,...,N} n (l It,t l i,t L n L n, where have we also introduced the notation L n in n i {1,,...,N} l i,t. Miniizing the regret defined above is a well studied proble. It is known that no atter what algorith the forecaster uses, li inf sup n,n ER n (n/ ln N 1 where the supreu is taken with respect to all possible loss assignents with losses in 0, 1 (see, e.g., Cesa-Bianchi and Lugosi (006. On the other hand, several prediction algoriths are known whose expected regret is of optial order O( n log N and any of the achieve a regret of this order with high probability. Perhaps the ost popular one is the exponentially weighted average forecaster (a variant of weighted ajority algorith of Littlestone and Waruth (1994, and aggregating strategies of Vovk (1990, also known as Hedge by Freund and Schapire (1997. The exponentially weighted average forecaster assigns probabilities to the actions that are inversely proportional to an exponential function of the loss accuulated by each action up to tie t. Another popular forecaster is the follow the perturbed leader (fpl algorith of Hannan, Kalai and Vepala (003 showed that Hannan s forecaster, when appropriately odified, indeed achieves an expected regret of optial order. At tie t, the fpl forecaster adds a rando perturbation Z i,t to the cuulative loss L i,t 1 t 1 s1 l i,s of each action and chooses an action that iniizes the su L i,t 1 + Z i,t. If the vector of rando variables Z t (Z 1,t,..., Z N,t have joint density (η/ N e η z 1 for η log N/n, then the expected regret of the forecaster is of order O( n log N (Kalai and Vepala (003, see also Cesa- Bianchi and Lugosi (006, Hutter and Poland (004, Poland (005. This is true whether Z 1,..., Z n are independent or not. It they are independent, then one ay show that the regret is concentrated around its expectation. Another interesting choice is when Z 1 Z n, that is, the sae perturbation is used over tie. Even though this forecaster has an expected regret of optial order, the regret is uch less concentrated and ay fail with reasonably high probability. Sall regret is not the only desirable feature of an online forecasting algorith. In any applications, on would like to define forecasters that do not change their prediction too often. Exaples of such probles include the online buffering proble described by Geulen et al. (010 and the online lossy source coding proble of György and Neu (011. A ore abstract proble where the nuber of abrupt switches in the behavior is costly is the proble of online learning in Markovian decision processes, as described by Even-Dar et al. (009 and Neu et al. (010.

3 Paraeters: set of actions I {1,,..., N}, nuber of rounds n; The environent chooses the losses l i,t 0, 1 for all i {1,,..., N} and t 1,..., n. For all t 1,,..., n, repeat 1. The forecaster chooses a probability distribution p t over {1,,..., N}.. The forecaster draws an action I t randoly according to p t 3. The environent reveals l i,t for all i {1,,..., N}. 4. The forecaster suffers loss l It,t. Figure 1: Prediction with expert advice. To be precise, define the nuber of action switches up to tie n by C n {1 < t n : I t 1 I t }. In particular, we are interested in defining randoized forecasters that achieve a regret R n of the order O( n log N while keeping the nuber of action switches C n as sall as possible. However, the usual forecasters with sall regret such as the exponentially weighted average forecaster or the fpl forecaster with i.i.d. perturbations ay switch actions a large nuber typically Θ(n ties. Therefore, the design of special forecasters with sall regret and sall nuber of action switches is called for. The first paper to explicitly attack this proble is by Geulen et al. (010, who propose a variant of the exponentially weighted average forecaster called the shrinking dartboard algorith and prove that it provides an expected regret of O( n log N, while guaranteeing that the nuber of switches is at ost O( n log N with high probability. A less conscious attept to solve the proble is due to Kalai and Vepala (005b; they show that the siplified version of the fpl algorith with identical perturbations (as described above guarantees an O( n log N bound on both the expected regret and the expected nuber of switches. propose a ethod based on fpl in which perturbations are defined by independent syetric rando walks. We show that this, intuitively appealing, forecaster has siilar regret and switch-nuber guarantees as shrinking dartboard and fpl with identical perturbations. A further iportant advantage of the new forecaster is that it ay be used siply in the ore general proble of online cobinatorial or, ore generally, linear optiization. We postpone the definitions and the stateent of the results to Section 4 below.. The algorith To address the proble described in the previous section, we propose a variant of the Follow the Perturbed Leader (fpl algorith. The proposed forecaster perturbs the loss of each action at every tie instant by a syetric coin flip and chooses an action with inial cuulative perturbed loss. More precisely, the algorith draws the independent rando variables X i,t that take values ±1/ with equal probabilities and X i,t is added to each loss 3

4 l i,t 1. At tie t action i is chosen that iniizes t s1 (l i,t 1 + X i,t (where we define l i,0 0. Algorith 1 The algorith. Initialization: set L i,0 0 and Z i,0 0 for all i 1,,..., N. For all t 1,,..., n, repeat 1. Draw X i,t for all i 1,,..., N such that X i,t { 1 with probability 1 1 with probability 1.. Let Z i,t Z i,t 1 + X i,t for all i 1,,..., N. 3. Choose action I t arg in i (L i,t 1 + Z i,t. 4. Observe losses l i,t for all i 1,,..., N, suffer loss l It,t. 5. Set L i,t L i,t 1 + l i,t for all i 1,,..., N. Equivalently, the forecaster ay be thought of as an fpl algorith in which the cuulative losses L i,t 1 are perturbed by Z i,t t i1 X i,t. Since for each fixed i, Z i,1, Z i,,... is a syetric rando walk, cuulative losses of the N actions are perturbed by N independent syetric rando walks. This is the way the algorith is presented in Algorith 1. A siple variation is when one replaces rando coin flips by independent standard noral rando variables. Both have siilar perforance guarantees and we choose ±(1/- valued perturbations for atheatical convenience. In Section 4 we switch to norally distributed perturbations again driven by atheatical siplicity. In practice both versions are expected to have a siilar behavior. Conceptually, the difference between standard fpl and the proposed version is the way the perturbations are generated: while coon versions of fpl use perturbations that are generated in an i.i.d. fashion, the perturbations of the algorith proposed here are dependent. This will enable us to control the nuber of action switches during the learning process. Note that the standard deviation of these perturbations is still of order t just like for the standard fpl forecaster with optial paraeter settings. To obtain intuition why this approach will solve our proble, first consider a proble with N actions and an environent that generates equal losses, say l i,t 0 for all i and t, for all actions. When using i.i.d. perturbations, fpl switches actions with probability 1/ in each round, thus yielding C t t/ + O( t with overwheling probability. The sae holds for the exponentially weighted average forecaster. On the other hand, when using the rando-walk perturbations described above, we only switch between the actions when the leading rando walk is changed, that is, when the difference of the two rando walks which is also a syetric rando walk hits zero. It is a well known that the nuber 4

5 of occurrences of this event up to tie t is O p ( t, see, Feller (1968. As we show below, this is the worst case for the nuber of switches. 3. Perforance bounds The next theore suarizes our perforance bounds for the proposed forecaster. Theore 1 The expected regret and expected nuber of switches of actions of the forecaster of Algorith 1 satisfy, for all possible loss sequences (under the oblivious-adversary odel, ER n EC n 8 n log N + 16 log n Reark. Even though we only prove bounds for the expected regret and the expected nuber of switches, it is of great interest to understand upper tail probabilities. However, this is a highly nontrivial proble. One ay get an intuition by considering the case when N and all losses are equal to zero. In this case the algorith switches actions whenever a syetric rando walk returns to zero. This distribution is well understood and the probability that this occurs ore than x n ties during the first n steps is roughly P{N > x} e x where N is a standard noral rando variable (see (Feller, 1968, Section III.4. Thus, ( in this case we see that bot the nuber of switches and the regret n are bounded by O log(1/δ, with probability at least 1 δ. However, proving analog bounds for the general case reains a challenge. To prove the theore, we first show that the regret can be bounded in ters of the nuber of action switches. Then we turn to analyzing the expected nuber of action switches. 3.1 Regret and nuber of switches The next siple lea shows that the regret of the forecaster ay be bounded in ters of the nuber of ties the forecaster switches actions. Lea Fix any i {1,,..., N}. Then L n L i,n C n + Z i, X It 1,t. Proof We apply Lea 3.1 of Cesa-Bianchi and Lugosi (006 (soeties referred to as the be-the-leader lea for the sequence (l,t 1 + X,t with l j,0 0 for all j {1,,..., N}, obtaining Reordering ters, we get (l It,t 1 + X It,t (l i,t 1 + X i,t n l It,t L i,n + n L i,n + Z i,. ( lit,t l It+1,t + Zi,n X It,t. (1 5

6 The last ter can be rewritten as X It,t X It 1,t + ( XIt 1,t X It,t. Now notice that X It 1,t X It,t and l It 1,t 1 l It,t 1 are both zero when I t I t 1 and are upper bounded by 1 otherwise. That is, we get that ( ( lit 1,t 1 l It,t 1 + XIt 1,t X It,t I {I t 1 I t } C n. Putting everything together gives the stateent of the lea. 3. Bounding the nuber of switches Next we analyze the nuber of switches C n. In particular, we upper bound the arginal probability P I t+1 I t for each t 1. We define the lead pack A t as the set of actions that, at tie t, have a positive probability of taking the lead at tie t + 1: A t { } i {1,,..., N} : L i,t 1 + Z i,t in (L j,t 1 + Z j,t + j We bound the probability of lead change as. P I t I t+1 1 P A t > 1. The key to the proof of the theore is the following lea that gives an upper bound for the probability that the lead pack contains ore than one action. It iplies, in particular, that E C n 4 n log N + 8 log n + 8, which is what we need to prove the expected-value bounds of Theore 1. Lea 3 P A t > 1 4 log N t + 8 t. Proof Define p t (k P Z i,t k for all k t,..., t and we let St denote the set of leaders at tie t (so that the forecaster picks I t S t arbitrarily: S t { } j {1,,..., N} : L j,t 1 + Z j,t in {L i,t 1 + Z i,t } i 6.

7 Let us start with analyzing P A t 1: P A t 1 t N k t j1 t 4 N k t j1 t k t+4 j1 p t (kp p t (k + 4P N p t (kp in {L i,t 1 + Z i,t } L j,t 1 + k i {1,,...,N}\j + in {L i,t 1 + Z i,t } L j,t 1 + k + 4 i {1,,...,N}\j in i {1,,...,N}\j {L i,t 1 + Z i,t } L j,t 1 + k Before proceeding, we need to ake two observations. First of all, N j1 p t (kp in {L i,t 1 + Z i,t } L j,t 1 + k i {1,,...,N}\j pt (k 4 p t (k P j S t : Z j,t k P in Z j,t k, j S t p t (k p t (k + 4 where the first inequality follows fro the union bound and the second fro the fact that the latter event iplies the forer. Also notice that Z i,t + t is binoially distributed with paraeters t and 1/ and therefore ( t 1 p t (k t+k t. Hence It can be easily verified that p t (k 4 p t (k ( t+k (! t k! ( t+k! ( t k +! (t + k(t + k (t k + (t k + 4 t + tk + k k t t tk + k 6k + 6t + 8 4(t + 1(k 1 + (t k + (t k (t + 1(k 4(t + 1(k (t k + (t k + 4 (t + (t + 4 holds for all k t, t. Putting these observations together, we get P A t 1 j t k t+4 t k t+4 P p t (kp in {L i,t 1 + Z l,t } L j,t 1 + k pt (k 4 i {1,,...,N}\j p t (k in Z j,t k pt (k 4 j S t p t (k 7..

8 This iplies Now using iplies as desired. P A t > k t t k t+4 t P k t P in Z j,t k pt (k 4 j S t p t (k in j S t Z j,t k t P in Z j,t k j S t ( 1 + 8(t + 1 (t + (t t + 1 (t + (t + 4 E 8 t + 8 t E ax. j {1,,...,N} Z j,t t log N E ax Z j,t j P A t > 1 4 4(t + 1(k (t + (t + 4 ( 4( k(t + 1 (t + (t + 4 in Z j,t j S t log N t + 8 t 4. Online cobinatorial optiization In this section we study the case of online linear optiization (see, aong others, Gentile and Waruth (1998, Kivinen and Waruth (001, Grove et al. (001, Takioto and Waruth (003, Kalai and Vepala (005a, Waruth and Kuzin (008, Helbold and Waruth (009, Hazan et al. (010, Koolen et al. (010, Audibert et al. (011. This is a siilar prediction proble as the one described in the introduction but here each action i is represented by a vector v i R d. The loss corresponding to action i at tie t equals v i l t where l t 0, 1 d is the so-called loss vector. Thus, given a set of actions S {v i : i 1,,..., N} R d, at every tie instant t, the forecaster chooses, in a possibly randoized way, a vector V t S and suffers loss V t l t. We denote by L n n V t l t the cuulative loss of the forecaster and the regret becoes L n in v S v L t where L t t s1 l t is the cuulative loss vector. Of course, one ay treat each v i S as a separate action and the results of the previous section hold but one ay gain iportant coputational advantage by taking the structure of the action set into account. In particular, as Kalai and Vepala (005a ephasize, fpl-type forecasters ay often be coputed efficiently. In this section we propose such a forecaster which adds a rando-walk 8

9 perturbation to each coponent of the loss vector. To gain siplicity in the presentation, we restrict our attention to the case of online cobinatorial optiization in which S {0, 1} d, that is, each action is represented a binary vector. This special case arguably contains ost iportant applications such a the online shortest path proble. In this exaple, a fixed directed acyclic graph of d edges is given with two distinguished vertices u and w. The forecaster, at every tie instant t, chooses a directed path fro u to w. Such a path is represented by it binary incidence vector v {0, 1} d. The coponents of the loss vector l t 0, 1 d represent losses assigned to the d edges and v l t is the total loss assigned to the path v. Another (non-essential siplifying assuption is that every action v S has the sae nuber of 1 s: v 1 for all v S. The value of plays an iportant role in the bounds below. The proposed prediction algorith is defined as follows. Let X 1,..., X n be independent Gaussian rando vectors taking values in R d such that the coponents of each X t are i.i.d. noral X i,t N (0, η for soe fixed η > 0 whose value will be specified later. Denote Z t The forecaster at tie t, chooses the action V t arg in v S t X t. s1 { v (L t 1 + Z t where L t t s1 l t for t 1 and L 0 (0,..., 0. The next theore bounds the perforance of the proposed forecaster. Again, we are not only interested in the regret but also the nuber of switches n I {V t+1 V t }. The regret of siilar order roughly dn as that of the standard fpl forecaster, up to a logarithic factor. Moreover, the expected nuber of switches is O ( (log d 5/ n. Rearkably, the dependence on d is only polylogarithic and it is the weight of the actions that plays an iportant role. Theore 4 The expected regret and the expected nuber of action switches satisfy (under the oblivious adversary odel E L n v L n ( d n η + η d(log n + 1 log d + η and E n I {V t+1 V t } n + }, ( 1 + η ( log d + log d η ( log d + log d + 1 4η t n ( 1 + η ( log d + log d + 1 log d η. t In particular, setting η d log d yields E L n v L n 4 dn 4 log d + (log n + 1 log d. 9

10 and E n ( I {V t+1 V t } O (log d 5/ n. The proof of the regret bound is quite standard, siilar to Audibert et al. (011, and it is oitted. The ore interesting part is the bound for the expected nuber of action switches E n I {V t+1 V t } n P V t+1 V t. It follows fro the lea below and the well-known fact that the expected value of the axiu of the square of d independent standard noral rando variables is at ost log d + log d + 1 (see, e.g., Boucheron et al. (013. Thus, it suffices to prove the following: Lea 5 For each t 1,,..., n, P V t+1 V t X t+1 h t 4η t + h t E Z t η. t Proof We use the notation P t P X t+1 and E t E X t+1. Also, let t 1 h t l t + X t+1 and H t h t. Define the set A t (c as the lead pack of width c : A t (c { } w S : (w V t H t c where c is a positive nuber that we choose later. (It is allowed to depend on X t+1. Observe that c ax w S (w V t h t guarantees that no action outside A t (c can take the lead at tie t + 1, since if w A t, then (w V t H t ax w S (w V t h t so (w V t H t+1 0 and w cannot be the new leader. For w A t, we use the trivial bound P t V t+1 w 1, thus we have the bound P t V t+1 V t P t A t (c > 1, which leaves us with the proble of bounding P t A t (c > 1. Siilarly to the proof of Lea 3, we start analyzing P t A t (c 1: P t A t (c 1 v S v S P t w v : (w v H t c s0 f v (yp t w v : w H t y + c v H t y dy, 10. (

11 where f v is the distribution of v H t. Next we crucially use the fact that the conditional distributions of correlated Gaussian rando variables are also Gaussian. In particular, defining k(w, v ( w v 1, the covariances are given as cov (w H t, v H t η ( w v 1 t η k(w, vt. Let us organize all actions w S\v into a vector W (w 1, w,..., w N 1. The conditional distribution of W H t is an (N 1-variate Gaussian distribution with ean µ v (y ( w1 k(w, v L t 1 + y,..., k(w, v w N 1L t 1 + y and covariance atrix Σ v, given that v H t y. Defining K (k(w 1, v,..., k(w N 1, v 1 x and using the notation ϕ(x exp(, we get that (π N 1 Σ v P t w v : w H t y + c v H t y z i y+c ( z i y+c ( z i y+c ( φ (z µ v (y Σ 1 y (z µ v (y 1+ k(w i,v 1+ k(w i,v φ dz ( ( z µ v (y c ( K Σ 1 y z µ v (y c K dz ( φ P t w v : w H t y + c (z µ v (y + c Σ 1 y (z µ v (y + c ( 1 + k v H t y + c, dz 11

12 where we used µ y+c µ y + c K. Using this, we rewrite ( as P t A t (c 1 v S v S v S v S f v (yp t w v : w k(w, v H t y + c v H t y (f v (y f v (y c P t w v : w k(w, v H t y + c v H t y f v (yp t w v : w H t y + c 1 v H t y ( P t A t c 1 v S (f v (y f v (y c P t w v : w k(w, v H t y + c v H t y 1 (f v (y f v (y c P t w v : w k(w, v H t y + c v H t y, where we used k(w, v 1 in the inequality. After reorganizing, we obtain an upper bound for ( P t A t (c > 1 P t A t c 1 > 1, which is the probability that there are actions in the outer ring of the lead pack. Introducing the notation B t (a, b { } w S : (w V t H t a, b, we ay write ( P t A t (c > 1 P t A t c 1 ( > 1 P t B t c 1, c > 0. To treat the reaining ter, we use that v H t is Gaussian with ean v L t 1 and standard deviation η t and obtain ( f v (y f v (y c f v (y 1 f v(y c f v (y ( c f v (y η t c(y v L t 1 η t 1.

13 Thus, v S (f v (y f v (y c P t w v : w k(w, v H t y + c v S v S v S y y c η t + ce v H t y ( c f v (y η t c(y v L t 1 η P t w v : w k(w, v H t y + c v H t y t ( c η t + c(y v L t 1 η t f v (yp t w v : w k(w, v H t y + c v H t y ( c η t + c(y v L t 1 η t f v (yp t w v : w H t y v H t y V t Z t η c t η t + ce Z t η, t where we used k(w, v 0 in the third inequality. Eventually, we obtain ( P t B t c 1, c > 0 c η t + ce Z t η. (3 t Now observe that the lead pack can be decoposed into disjoint layers as A t (c \ V t Using the union bound, we obtain ( 1 s+1 ( 1 s B t (c, c s0 ( 1 s+1 ( 1 s P t A t (c > 1 P t B t (c, c > 0 s0 ( ( 1 s c ( 1 s η t + ce Z t η t s0 Using c h t proves the theore: 1 c η t + ce Z t η t P t A t (c > 1 h t η t h t 4η t + h t E Z t η t + h t log d η. t (4 13

14 References J. Y. Audibert, S. Bubeck, and G. Lugosi. Miniax policies for cobinatorial prediction gaes. In Conference on Learning Theory, 011. URL S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities:a Nonasyptotic Theory of Independence. Oxford University Press, 013. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Gaes. Cabridge University Press, New York, NY, USA, 006. Eyal Even-Dar, Sha. M. Kakade, and Yishay Mansour. Online Markov decision processes. Matheatics of Operations Research, 34(3:76 736, 009. ISSN X. doi: W. Feller. An Introduction to Probability Theory and its Applications, Vol. 1. John Wiley, New York, Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Coputer and Syste Sciences, 55: , C. Gentile and M. Waruth. Linear hinge loss and average argin. In Advances in Neural Inforation Processing Systes (NIPS, S. Geulen, B. Voecking, and M. Winkler. Regret iniization for online buffering probles using the weighted ajority algorith. In Proceedings of the Twenty-Third Conference on Coputational Learning Theory, 010. A. Grove, N. Littlestone, and D. Schuurans. General convergence results for linear discriinant updates. Machine Learning, 43:173 10, 001. András György and Gergely Neu. Near-optial rates for liited-delay universal lossy source coding. In Proceedings of the IEEE International Syposiu on Inforation Theory (ISIT, 011. J. Hannan. Approxiation to Bayes risk in repeated play. Contributions to the theory of gaes, 3:97 139, E. Hazan, S. Kale, and M. Waruth. Learning rotations with little regret. In Proceedings of the 3rd Annual Conference on Learning Theory (COLT, 010. D. P. Helbold and M. Waruth. Learning perutations with exponential weights. Journal of Machine Learning Research, 10: , 009. M. Hutter and J. Poland. Prediction with expert advice by following the perturbed leader for general weights. In Algorithic Learning Theory, pages Springer, 004. A. Kalai and S Vepala. Efficient algoriths for the online decision proble. In B. Schölkopf and M. Waruth, editors, Proceedings of the 16th Annual Conference on Learning Theory and the 7th Kernel Workshop, COLT-Kernel 003, pages 6 40, New York, USA, Aug Springer. 14

15 A. Kalai and S. Vepala. Efficient algoriths for online decision probles. Journal of Coputer and Syste Sciences, 71:91 307, 005a. Ada Kalai and Santosh Vepala. Efficient algoriths for online decision probles. J. Coput. Syst. Sci., 71:91 307, October 005b. ISSN doi: URL J. Kivinen and M. Waruth. Relative loss bounds for ultidiensional regression probles. Machine Learning, 45:301 39, 001. W. Koolen, M. Waruth, and J. Kivinen. Hedging structured concepts. In Proceedings of the 3rd Annual Conference on Learning Theory (COLT, pages , 010. N. Littlestone and M.K. Waruth. The weighted ajority algorith. Inforation and Coputation, 108:1 61, G. Neu, A. György, Cs Szepesvári, and A. Antos. Online Markov decision processes under bandit feedback. In Advances in Neural Inforation Processing Systes 3, 010. Jan Poland. FPL analysis for adaptive bandits. In In 3rd Syposiu on Stochastic Algoriths, Foundations and Applications (SAGA 05, pages 58 69, 005. E. Takioto and M. Waruth. Paths kernels and ultiplicative updates. Journal of Machine Learning Research, 4: , 003. V. Vovk. Aggregating strategies. In Proceedings of the third annual workshop on Coputational learning theory (COLT, pages , M. Waruth and D. Kuzin. Randoized online pca algoriths with regret bounds that are logarithic in the diension. Journal of Machine Learning Research, 9:87 30,

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October