Minimax strategy for prediction with expert advice under stochastic assumptions

Size: px

Start display at page:

Download "Minimax strategy for prediction with expert advice under stochastic assumptions"

Augusta Jones
6 years ago
Views:

1 Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland Abstract We consider the setting of prediction ith expert advice ith an additional assumption that each expert generates its losses i.i.d. according to some distribution. We first identify a class of admissable strategies, hich e call permutation invariant, and sho that every strategy outside this class ill perform not better than some permutation invariant strategy. We then sho that hen the losses are binary, a simple Follo the Leader (FL algorithm is the minimax strategy for this game, here minimaxity is simultaneously achieved for the expected regret, the expected redundancy, and the excess ris. Furthermore, FL has also the smallest regret, redundancy, and excess ris over all permutation invariant prediction strategies, simultaneously for all distributions over binary losses. When the losses are continuous in 0, 1, FL remains minimax only hen an additional tric called loss binarization is applied. 1 Introduction In the game of prediction ith expert advice 1, 2, the algorithm sequentially decides on one of K experts to follo, and suffers loss associated ith the chosen expert. The difference beteen the algorithm s cumulative loss and the cumulative loss of the best expert is called regret. The goal is to minimize the regret in the orst case over all possible loss sequences. An algorithm hich achieves this goal (i.e., minimizes the orst-case regret is called minimax. While there is no non solution to this problem in the general setting, it is possible to derive minimax algorithms for some special variants of this game 1, 2, 3, 4. Interestingly, all these algorithms share a similar strategy of playing against a maximin adversary hich assigns losses uniformly at random. They often have the equlization property: all data sequences lead to the same value of the regret. While this property maes them robust against the orst-case sequence, it also maes them over-conservative, preventing them from exploiting the case, hen the actual data are easy. There are various algorithm hich combine almost optimal orst-case performance ith good performance on easy sequences 5, 6, 7, 8, 9; these algorithms, hoever, are not motivated from the minimax principle. In this paper, e drop the analysis of orst-case performance entirely, and explore the minimax principle in a more constrained setting, in hich the adversary is assumed to be stochastic. In particular, e associate ith each expert a fixed distribution P over loss values, and assume the observed losses of expert are generated independently from P. We believe this setting might be practically useful, and that it is interesting to determine the minimax algorithm under this assumption. We immediately face to difficulties here. First, due to stochastic nature of the adversary, it is no longer possible to follo standard approaches of minimax analysis, such as bacard induction1, 2 or sequential minimax duality 10, 3, and e need to resort to a different technique. We define the notion of permutation invariance of prediction strategies. This let us identify a class of admissable strategies (hich e call permutation invariant, and sho that every strategy outside this class ill perform not better than some permutation invariant strategy. Secondly, hile the regret is a single, commonly used performance metric in the orst-case setting, the situation is different in 1

2 the stochastic case. We no at least three potentially useful metrics in the stochastic setting: the expected regret, the expected redundancy, and the excess ris 11, and it is not clear, hich of them should be used to define the minimax strategy. Fortunately, it turns out that there exists a single strategy hich is minimax ith respect to all three metrics simultaneously. In the case of binary losses, hich tae out values from {0, 1}, this strategy turns out to be the Follo the Leader (FL algorithm, hich chooses an expert ith the smallest cumulative loss at a given trial (ith ties broen randomly. Interestingly, FL is non to perform poorly in the orst-case, as its orst-case regret ill gro linearly ith T 2. On the contrary, in the stochastic setting ith binary losses, FL has also the smallest regret, redundancy, and excess ris over all permutation invariant prediction strategies, simultaneously for all distributions over binary losses! In a more general case of continuous losses in the range 0, 1, FL is provably suboptimal. Hoever, by applying binarization tric to the losses 6, i.e. randomly setting them to {0, 1} such that the expectation matches the actual loss, and using FL on the binarized sequence, e obtain the minimax strategy in the continuous case. 2 Problem Setting In the game of prediction ith expert advice, at each trial t = 1,..., T, the algorithm predicts ith a distribution t K over K experts. Then the loss vector l t X K is revealed (here X is either {0, 1} or 0, 1, and the algorithm suffers loss t l t. The sequence of outcomes l 1,..., l t is abbreviated as l t. We let L t, denote the cumulative loss of expert at iteration t, L t, = q t l t,. We assume there are K distributions P = (P 1,..., P K over X, such that for each, the losses l t,, t = 1,..., T, are generated i.i.d. from P. Note that this implies that l t, is independent from l t, henever t t or. We formally define the prediction strategy as a sequence of T functions ω = ( 1,..., T, such that t : X t 1 K. The performance of the strategy ω on the set of distributions P can be measured by one of the three folloing metrics: the expected regret: the expected redundancy: the excess ris: T R eg (ω, P = E t=1 t (l t 1 l t min L T, T R ed (ω, P = E t (l t 1 l t min E L T,, t=1 R is (ω, P = E T (l T 1 l T min E l T,, here in each case, the expectation is over the sequence l T ith respect to distributions P 1,..., P K. Given performance measure R, e say that a strategy ω is minimax ith respect to R, if: sup R(ω, P = inf sup R(ω, P, P ω P here the supremum is over all K-sets of distributions (P 1,..., P K on X, and the infimum is over all prediction strategies. We say that a strategy ω is permutation invariant if for any t = 1,..., T, and any permutation S K, here S K denotes the group of permutations over {1,..., K}, t ((l t 1 = ( t (l t 1, here for any vector v = (v 1,..., v K, e defined (v = (v (1,..., v (K and abbreviated (l t 1 = (l 1,..., (l t 1. In ords, if e -permute the indices of all past loss vectors, the resulting eight vector ill be the -permutation of the original eight vector. Permutation invariant strategies are natural, as they only rely on the observed outcomes, not on the expert indices. Lemma 1. Let ω be permutation invariant. Then, for any permutation S K, E (P t (l t 1 l t = EP t (l t 1 l t, and moreover R(ω, (P = R(ω, P, here R is the expected regret, expected redundancy, or excess ris, and (P = (P (1,..., P (K. We no sho that permutation invariant strategies are admissable in the folloing sense: Theorem 2. For any strategy ω, there exists permutation invariant strategy ω, such that sup P R( ω, P sup P R(ω, P, here R is either the expected regret, the expected redundancy or the excess ris., 2

3 Theorem 2 statest that it suffices to search for minimax strategy only ithin the set of permutation invariant strategies. Given loss sequence l t 1, let N = argmin j=1,...,k L t 1,j be the size of the leader set at the beginning of trial t. We define Follo the Leader (FL strategy t such that t, = 1 N if argmin j L t 1,j and t, = 0 otherise. In other ords, FL predicts ith the current leader, breaing ties uniformly at random. It is easy to sho that FL strategy is permutation invariant. 3 Binary losses In this section, e set X = {0, 1}, so that all losses are binary. In this case, each P is a Bernoulli distribution. Tae any permutation invariant strategy ω. Using Lemma 1 on all S K : E P t (l t 1 1 l t = E (P t (l t 1 l t. (1 }{{} =: loss t( t,p We no sho the main result of this paper, a surprisingly strong property of FL strategy, hich states that FL minimizes loss t ( t, P over all K-sets of distributions simultaneously. Hence, FL is not only optimal in the orst case, but is actually optimal for permutation-averaged expected loss for any P, hich implies by (1 that FL has the smallest expected loss among all permutation invariant strategies for any P. Theorem 3. Let ω = ( 1,..., T be the FL strategy. Then, for any K-set of distributions P = (P 1,..., P K over binary losses, for any strategy ω = ( 1,..., T, and any t = 1,..., T : loss t ( t, P loss t ( t, P. The consequence of Theorem 3 is the folloing corollary hich states the minimaxity of FL strategy for binary losses: Corollary 4. Let ω = ( 1,..., T be the FL strategy. Then, for any P over binary losses, and any permutation invariant strategy ω: R(ω, P R(ω, P. here R is the expected regret, expected redundancy, or excess ris. This implies: sup R(ω, P = inf sup R(ω, P, P ω P here the supremum is over all distributions on binary losses, and the infimum over all (not necessarily permutation invariant strategies. Proof. The second statement immediately follos from the first statement and Theorem 2. For the first statement, note that the loss of the best expert part of each measure only depends on P. Hence, e only need to sho that for any t = 1,..., T, E P t l t EP t l t. Since t and t are permutation invariant, Lemma 1 shos that E P t l t = losst ( t, P, and similarly, E P t l t = loss t ( t, P. Application of Theorem 3 finishes the proof. 4 Continuous losses In this section, e consider the general case X = 0, 1 of continuous loss vectors. We give a modification of FL and prove its minimaxity. In the appendix, e justify the modification by arguing that the vanilla FL strategy is not minimax for continuous losses. 3

4 The modification of FL is based on the procedure e call binarization. A similar tric has already been used in 6 to deal ith non-integer losses in a different context. We define a binarization of any loss value l t, 0, 1 as a Bernoulli random variable b t, hich taes out value 1 ith probability l t, and value 0 ith probability 1 l t,. In other ords, e replace each non-binary loss l t, by a random binary outcome b t,, such that Eb t, = l t,. Note that if l t, {0, 1}, then b t, = l t,, i.e. binarization has no effect on losses hich are already binary. Let us also define b t = (b t,1,..., b t,k, here all K Bernoulli random variables b t, are independent. Similarly, b t ill denote a binary loss sequence b 1,..., b t, here the binarization procedure as applied independently (ith a ne set of Bernoulli variables for each trial t. No, given the loss sequence l t 1, e define the binarized FL strategy ω b by: b t (l t 1 = E b t 1 t (b t 1, here t (b t 1 is the standard FL strategy applied to binarized losses b t 1, and the expectation is over internal randomization of the algorithm (binarization variables. Note that if the set of distributions P has support only on {0, 1}, then b t t. On the other hand, these to strategies may differ significantly for non-binary losses. Hoever, e ill sho that for any K-set of distributions P (ith support in 0, 1, b t ill behave in the same ay as t ould behave on some particular K-set of distributions over binary losses. To this end, e introduce binarization of a K-set of distributions P, defined as P bin = (P1 bin,..., PK bin, here is a distribution ith support {0, 1} such that: P bin E P binl t, = P bin (l t, = 1 = E P l t,. In other ords, P bin is a Bernoulli distribution hich has the same expectation as the original distribution (over continuous losses P. We no sho the folloing results: Lemma 5. For any K-set of distributions P = (P 1,..., P K ith support on X = 0, 1, E l t P t b t (l t 1 l t = El t (P bin t t (l t 1 l t. Lemma 6. For any K-set of distributions P = (P 1,..., P K ith support on X = 0, 1, R(ω b, P R(ω, P bin, here R is either the expected regret, the expected redundancy, or the excess ris. Theorem 7. Let ω b = ( b 1,..., b be the binarized FL strategy. Then: T sup R(ω b, P = inf sup R(ω, P, P ω P here R is the expected regret, expected redundancy, or excess ris, the supremum is over all K-sets of distributions on 0, 1, and the infimum is over all prediction strategies. In the appendix, e argue that vanilla FL is not the minimax strategy for continuous losses, so that the binarization procedure is justified. 5 Open Problem The setting considered in this paper is quite limited even in the stochastic case, as it does not consider distributions over loss vectors hich are i.i.d. beteen trials, but not necessarily i.i.d. beteen experts. It ould be interesting to determined the minimax strategy in this more general setting. Preliminary computational experiment suggest that FL is not minimax even for binary losses, hen dependencies beteen experts are alloed. Acnoledgments. The author as supported by the Polish National Science Centre under grant no. 2013/11/D/ST6/

5 References 1 Nicolò Cesa-Bianchi, Yaov Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. Ho to use expert advice. Journal of the ACM, 44(3: , Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, Wouter M. Koolen. Combining Strategies Efficiently: High-quality Decisions from Conicting Advice. PhD thesis, ILLC, University of Amsterdam, Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In COLT, pages , July Steven de Rooij, Tim van Erven, Peter D. Grünald, and Wouter M. Koolen. Follo the leader if you can, hedge if you must. Journal of Machine Learning Research, 15(1: , Tim van Erven, Wojciech Kotłosi, and Manfred K. Warmuth. Follo the leader ith dropout perturbations. In COLT, pages , Amir Sani, Gergely Neu, and Alessandro Lazaric. Exploiting easy data in online optimization. In NIPS, pages Wouter M. Koolen and Tim van Erven. Second-order quantile methods for experts and combinatorial games. In COLT, pages , Haipeng Luo and Robert E. Schapire. Achieving all ith no parameters: AdaNormalHedge. In COLT, pages , Jacob Abernethy, Aleh Agaral, Peter L. Bartlett, and Alexander Rahlin. A stochastic vie of optimal regret through minimax duality. In COLT, Peter D. Grünald. The Minimum Description Length Principle. MIT Press, Cambridge, MA,

6 Proof of Lemma 1 We first sho that the expected loss of the algorithm at any iteration t = 1,..., T, is the same for both (P and P: E (P t (l t 1 l t = EP t ((l t 1 (l t = E P (t (l t 1 (l t = E P t (l t 1 l t, here the first equality is due to the fact, that permuting the distributions is equivalent to permuting the coordinates of the losses (hich are random variables ith respect to these distributions, the second equality exploits the permutation invariance of ω, hile the third inequality uses a simple fact that the dot product is invariant under permuting both arguments. Therefore, the loss of the algorithm part of any of the three measures (regret, redundancy, ris remains the same. To sho that the loss of the optimal part of each measure is the same, note that for any t = 1,..., T, = 1,..., K, E (P l t, = E P lt,(, hich implies: min min E (P l T, = min E P lt,( = min E P l T,, E (P L T, = min E P LT,( = min E P L T,, E (P min L T, = E P min L T,( = E P min L T, so that the loss of the best expert parts of all measures are also the same for both (P and P. Proof of Theorem 2 Define ω = ( 1,..., T as: t (l t 1 = 1 τ 1 ( t (τ(l t 1. Note that ω is a valid prediction strategy, since t is a function of l t 1, and t K ( t is a convex combination of distributions, so it is a distribution itself. Moreover, ω is permutation invariant: t ((l t 1 = 1 ( τ 1 t (τ(l t 1 = 1 ( (τ 1 1 t (τ(l t 1 = 1 ( t (τ(l t 1 = ( t (l t 1, τ 1 here the second inequality is from replacing the summation index τ τ. No, note that the expected loss of t is: E P t (l t 1 1 l t = τ 1 ( t (τ(l t 1 l t = 1 = 1 = 1 E P E P E τ 1 (P S K E (P t (τ(l t 1 τ(l t t (l t 1 l t t (l t 1 l t. Since the loss of the best expert parts of all three measures are invariant under any permutation of P (see the proof of Lemma 1, e have: R( ω, P = 1 R(ω, (P max R(ω, (P. S K S K Hence, sup R( ω, P sup max R(ω, (P = sup R(ω, P. P P S K P, 6

7 Proof of Theorem 3 For any distribution P over binary losses, let p := P (l t, = 1 = E P l t,. We have: loss t ( t, P = 1 E (P t (l t 1 l t (2 = 1 E (P t (l t 1 E (P l t ( = 1 K ( t 1 K q=1 p l q, ( (1 p ( t 1 t 1 q=1 l q, t, (l t 1 p ( l t 1 =1 =1 = 1 K t, (l t 1 t 1 q=1 p lq,j (j (1 p (j t 1 t 1 q=1 lq,j p (, l t 1 =1 j=1 }{{} =: loss t( t,p l t 1 here in the second equality e used the fact that t depends on l t 1 and does not depend on l t. Fix l t 1 and consider the term loss t ( t, P l t 1. This term is linear in t, hence it is minimized by t = e for some = 1,..., K, here e is the -th standard basis vector ith 1 on the -th coordinate, and zeros on the remaining coordinates. We ill use a shorthand notation L j = t 1 q=1 l q,j, for j = 1,..., K, and L = (L 1,..., L K. In this notation, e rerite loss t ( t, P l t 1 as: K loss t ( t, P l t 1 = t, (l t 1 (j (1 p (j t 1 Lj p (, (3 =1 We ill sho that for any P, and any l t 1 (and hence, any L, loss t ( t, P l t 1 is minimized by setting t = e for any argmin j L j. In other ords, e ill sho that for any P, L, any argmin j L j, and any = 1,..., K, j=1 loss t (e, P l t 1 loss t (e, P l t 1. or equivalently, using (3, that for any P, L, argmin j L j, and = 1,..., K, j=1 (j (1 p (j t 1 Lj p ( j=1 (j (1 p (j t 1 Lj p ( (4 We proceed by induction on K. Tae K = 2 and note that hen =, there is nothing to prove, as both sides of (4 are identical. Therefore, ithout loss of generality, assume = 1 and = 2, hich implies L 1 L 2. Then, (4 reduces to: p L1 1 pl2 2 (1 p 1 t 1 L1 (1 p 2 t 1 L2 p 1 + p L1 2 pl2 1 (1 p 2 t 1 L1 (1 p 1 t 1 L2 p 2 p L1 1 pl2 2 (1 p 1 t 1 L1 (1 p 2 t 1 L2 p 2 + p L1 2 pl2 1 (1 p 2 t 1 L1 (1 p 1 t 1 L2 p 1, After rearranging the terms, it amounts to sho that: (p 1 p 2 L1( (1 p 1 (1 p 2 t 1 L 2(p1 p 2 ( (p 2 (1 p 1 L2 L1 (p 1 (1 p 2 L2 L1 0. But this ill hold if: (p 1 p 2 ( (p 2 (1 p 1 L2 L1 (p 1 (1 p 2 L2 L1 0. (5 If L 1 = L 2, (5 clearly holds; therefore assume L 1 < L 2. We prove the validity of (5 by noticing that: p 2 (1 p 1 > p 1 (1 p 2 0 p 2 > p 1, hich means that the to factors of the product on the left-hand side of (5 have the opposite sign (hen p 1 p 2 or are zero at the same time (hen p 1 = p 2. Hence, e proved (5, hich implies 7

8 (4 hen = 1 and = 2. The opposite case = 2, = 1 ith L 2 L 1 can be shon ith exactly the same line of arguments by simply exchanging the indices 1 and 2. No, e assume (4 holds for K 1 2 experts and any P = (P 1,..., P K 1, any L = (L 1,..., L K 1, any argmin j=1,...,k 1 L j, and any = 1,..., K 1, and e sho that it also holds for K experts. Tae any argmin j=1,...,k L j, and any = 1,..., K. Without loss of generality, assume that 1 and 1 (it is alays possible find expert different than and, because there are K 3 experts. We expand the sum over permutations on the left-hand side of (4 ith respect to the value of (1: K s=1 p L1 s (1 p s t 1 L1 : (1=s j=2 (j (1 p (j t 1 Lj p (, and e also expand the sum on the right-hand side of (4 in the same ay. To prove (4, it suffices to sho that every term in the sum over s on the left-hand side is not greater than the corresponding term in the sum on the right-hand side, i.e. to sho that for any s = 1,..., K, : (1=s j=2 (j (1 p (j t 1 Lj p ( : (1=s j=2 (j (1 p (j t 1 Lj p (. (6 We no argue that this inequality follos directly from the inductive assumption by dropping L 1 and P s, and applying (4 to such a (K 1-expert case. More precisly, note that the sum on both sides of (6 goes over all permutations on indices (1,..., s 1, s + 1,..., K and since, 1, argmin j=2,...,k L j and 2. Hence, applying (4 to K 1 expert case ith K 1 distributions (P 1, P 2,..., P s 1, P s+1,..., P K (or any permutation thereof, and K 1 integers (L 2,..., L K immediately implies (6. Thus, e proved (4 hich states that loss t ( t, P l t 1 is minimized by any leader argmin j L j, here L j = t 1 q=1 l q,j. This means loss t ( t, P l t 1 is also minimized by the FL strategy t, hich distributes its mass uniformly over all leaders. Since FL minimizes loss t ( t, P l t 1 for any l t 1, by (2 it also minimizes loss t ( t, P. Note that the proof did not require uniform tie breaing over leaders, as any distribution over leaders ould or as ell. Uniform distribution, hoever, maes the FL strategy permutation invariant. Proof of Lemma 5 Let p be the expectation of l t, according to either P or P bin, p := E P l t, = E P binl t,. Since for any prediction strategy ω, t depends on l t 1 and does not depend on l t, e have E P b t l t = EP b t EP l t = E P b t p, here p = (p1,..., p K. Simiarly, E P bin t l t = EP bin t p. Hence, e only need to sho that EP b t = EP bin t. This holds because b t sees only binary outcomes resulting from the joint distribution of P and the distribution of binarization variables: E l t 1 P t 1 b t (l t 1 = E l t 1 P t 1,b t 1 t (b t 1, and for any b t,, the probability (jointly over P and binarization variables of b t, = 1 is the same as probability of l t, = 1 over the distribution P bin : P (b t, = 1 = P (b t, = 1 l t, P (l t, dl t, Hence, = 0,1 0,1 l t, P (l t, dl t, = p t = P bin (l t, = 1. E l t 1 P t 1,b t 1 t (b t 1 = E l t 1 (P bin t 1 t (l t. 8

9 Proof of Lemma 6 Lemma 5 shos that the expected loss of ω b on P is the same as the expected loss of ω on P bin. Hence, to prove the inequality, e only need to consider the loss of the best expert part of each measure. For the expected redundancy, and the expected regret, it directly follos from the definition of P bin that for any t,, E P l t, = E P binl t,, hence min E P l T, = min E P binl T,, and simiarly, min E P L T, = min E P binl T,. Thus, for the expected redundancy and the excess ris, the lemma actually holds ith equality. For the expected regret, e ill sho that E P min L T, E P binmin L T,, hich ill finish the proof. Using the argument from the proof of Lemma 5, and denoting B T, = T t=1 b t,, e have: E l T (P bin min L T, = E T l T P T,b T min B T, E l T P min E T b T B T, l T = E l T P min L T,, T here the inequality follos from Jensen s inequality applied to a concave function min(. Proof of Theorem 7 Since ω b is the same as ω hen all the losses are binary, R(ω b, P = R(ω, P for every P over binary losses. Furthermore, Lemma 6 states that R(ω b, P R(ω b, P bin, i.e. for every P over continuous losses, there is a corresponding P bin over binary losses hich incurs at least the same regret/redundancy/ris to ω b. Therefore, sup R(ω b, P = sup R(ω b, P = sup R(ω, P. P on 0,1 P on {0,1} P on {0,1} By the second part of Corollary 4, for any prediction strategy ω: sup R(ω, P P on {0,1} hich finishes the proof. A counterexample to FL strategy sup R(ω, P P on {0,1} sup R(ω, P, P on 0,1 We no argue that the vanilla FL is not the minimax strategy for continuous losses, so that the binarization procedure is justified. We ill only consider excess ris for simplicity, but e can use similar arguments to sho a counterexample for the expected regret and the expected redundancy as ell. Tae K = 2 experts, T = 2 iterations and distributions P 1, P 2 on binary losses. Denote p 1 = P 1 (l t,1 = 1 and p 2 = P 2 (l t,2 = 1, and assume p 1 p 2. The ris of FL strategy is given by: ( p1 p 2 + (1 p 1 (1 p 2 p 1 + p 2 2 +p 1 (1 p 2 p 2 + p 2 (1 p 1 p 1 p 1 = δ 2 δ2 2, } {{ } ties here δ = p 2 p 1. Maximizing over δ the expression above gives δ = 1 2 and the maximum ris of FL on binary losses is equal to 1 8. This is also the minimax ris on continuous losses, as follos from the proof of Theorem 7 (because the binarized FL is the minimax strategy on continuous losses, and it achieves the maximum ris on binary losses. We no sho that there exist distributions P 1, P 2 on continuous losses hich force FL to suffer more excess ris than 1 8. We tae P 1 ith support on to points {ɛ, 1}, here ɛ is a very small positive number, and p 1 = P 1 (l t,1 = 1. Note that El t,1 = p 1 + ɛ(1 p 1. P 2 has support on {0, 1 ɛ}, and let p 2 = P 2 (l t,2 = 1. We also assume p 1 < p 2, i.e. expert 1 is the better expert. The main idea in this counterexample is that by using ɛ values, all ties are resolved in favor of expert 2, hich maes the FL algorithm suffer more loss. More precisely, this ris of FL is no given by: ( p1 p 2 + (1 p 1 (1 p 2 p 2 +p 1 (1 p 2 p 2 + p 2 (1 p 1 p 1 p 1 + O(ɛ. }{{} ties Choosing, e.g. p 1 = 0 and p 2 = 0.5, this gives O(ɛ excess ris, hich is more than 1 8, given tha e tae ɛ sufficiently small. 9

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction