A Second-order Bound with Excess Losses

Size: px

Start display at page:

Download "A Second-order Bound with Excess Losses"

Clementine Austin
5 years ago
Views:

1 A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands June 14, / 14

2 Setting of prediction with expert advice In each round t the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k {1,..., K } incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The goal of the learner is to control his cumulative loss, which he can do by controlling his cumulative regret against any expert k: R k,t ) 2 / 14

3 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

4 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

5 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

6 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

7 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

8 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

9 A Second-order Bound with Excess Losses We provide a third form of second-order bound R k,t ) 2. (1) Features of the bound: bounds of form (1) entail optimal scaling in the setting of experts reporting confidences [Blum and Mansour, 2007]. improvement for small excess losses. constant regret in the special case of i.i.d. losses [Van Erven et al., 2011]. probabilistic bounds on the cumulative predictive risk [Wintenberger, 2014]. Key element in the analysis: consider multiple learning rates [Blum and Mansour, 2007] and develop tuning techniques that go with it. 4 / 14

10 The Prod forecaster [Cesa-Bianchi et al, 2007] Parameter: η > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t = w k,t 1 / ( j w j,t 1 for each expert k perform the update w k,t = w k,t 1 ( 1 + η ) ) ) If η 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η ) 2 + η 5 / 14

11 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If η k 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η k + η k ) 2 6 / 14

12 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If we could optimize η k = ln K / ( l t t l k,t ) 2 R k,t 2 ) 2 The learning rates can be calibrated online at the multiplicative cost ln ln T in the regret bound. 7 / 14

13 Improvement for small excess losses If a strategy satisfies a bound of the form R k,t ) 2 + Then, if l t [0, 1] K, it also satisfies R k,t ( ln K ) t:l k,t l t ( lk,t l t ) + This bound is invariant by translation of ( the losses and implies the improvement for small losses R k,t O ) l t k,t. 8 / 14

14 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The learner aims at minimizing his confidence regret simultaneously for all experts R c = ) k,t I k,t The special case I k,t = 0 expresses that expert k is inactive in round t. 9 / 14

15 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The best available stated bound [Blum and Mansour, 2007] is R c = ) k,t I k,t I k,t l k,t. 10 / 14

16 Application to experts that report their confidence If a strategy satisfies a standard regret bound of the form R k,t ) 2 + Then, if l t [0, 1] K, applying the strategy on the modified losses l k,t = I k,t l k,t + (1 I k,t )l k,t, leads to an algorithm with a confidence regret bound of the form Rk,T c ) Ik,t( lt 2 2 l k,t + Ik,t 2 + for all k. 11 / 14

17 Stochastic (i.i.d.) losses We now turn to a stochastic setting considered by [Van Erven et al, 2011] where the loss vectors are identically distributed. Assumption [Van Erven et al, 2011] The loss vectors l t [0, 1] K are independent random variables such that there exists an expert k and some α (0, 1] such that t 1, min E[ ] l k,t l k k k,t α. If some strategy satisfies R k,t ) 2 ( lt l k,t + Then E [R k,t ] ln K α For any δ (0, 1), with probability at least 1 δ R k,t ln K α + α ln δ ln K 12 / 14

18 Application to cumulative risk [Wintenberger, 2014] Some additional results were obtained recently by [Wintenberger, 2014] extends the analysis to exponential updates; proves that deterministic second-order bounds in excess losses imply bounds on cumulative risk in a quite general stochastic setting. 13 / 14

19 Summary We provide a new form of second-order bound with several desirable features: R k,t ( ln K ) T ( lk,t l ) 2 t + in the setting of experts reporting confidences improvement for small excess losses constant regret for i.i.d. losses probabilistic bound on cumulative risk Thank you! 14 / 14

20 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 If 0 η k,t 1/2, (η k,t ) is non-increasing in t and l t [0, 1] K, ( R k,t ln K ) 2 + η k,t ln K η k,0 η k,t ε k =1 ( ηk,t 1 η k,t 1 )) } {{ } Cost of tuning multiple learning rates

21 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 With learning rates, for t 1, { 1 η k,t 1 = min 2,. 1 + t 1 s=1 } ln K ) 2, ( ls l k,s the cumulative regret is bounded simultaneously for all expert k as (ln ) R k,t = O 2 ln K + K ) ln ln T,

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2