Full-information Online Learning

Size: px

Start display at page:

Download "Full-information Online Learning"

Emerald Griffin
6 years ago
Views:

1 Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017

2 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

3 Outline Introduction Expert Advice OCO Definitions Regret 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

4 Introduction Expert Advice OCO Definitions Regret What Happens in an Internet Minute?

5 Outline Introduction Expert Advice OCO Definitions Regret 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

6 Introduction Expert Advice OCO Definitions Regret Online Algorithm vs Online Algorithm [Karp, 1992] An online algorithm is one that receives a sequence of requests and performs an immediate action in response to each request. [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information.

7 Introduction Expert Advice OCO Definitions Regret Online Algorithm vs Online Algorithm [Karp, 1992] An online algorithm is one that receives a sequence of requests and performs an immediate action in response to each request. Computer Vision, Machine Learning, Data Mining Theoretical Computer Science, Computer Networks [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information.

8 Introduction Expert Advice OCO Definitions Regret Online Algorithm vs Online Algorithm [Karp, 1992] An online algorithm is one that receives a sequence of requests and performs an immediate action in response to each request. Computer Vision, Machine Learning, Data Mining Theoretical Computer Science, Computer Networks [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information. Machine Learning, Game Theory, Information Theory

9 Introduction Expert Advice OCO Definitions Regret Online Algorithm vs Online Algorithm [Karp, 1992] An online algorithm is one that receives a sequence of requests and performs an immediate action in response to each request. The Ski Rental Problem: in the t-th day Rent (1$) or Buy (10$)? [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information.

10 Introduction Expert Advice OCO Definitions Regret Online Algorithm vs Online Algorithm [Karp, 1992] An online algorithm is one that receives a sequence of requests and performs an immediate action in response to each request. The Ski Rental Problem: in the t-th day Rent (1$) or Buy (10$)? [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information. Online Classification: in the t-th round Recevie x t R d, predict ŷ t, observe y t

11 Introduction Expert Advice OCO Definitions Regret Full-information vs Bandit [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information.

12 Introduction Expert Advice OCO Definitions Regret Full-information vs Bandit [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information. Full-information

13 Introduction Expert Advice OCO Definitions Regret Full-information vs Bandit [Shalev-Shwartz, 2011] Online learning is the process of answering a sequence of questions given (maybe partial) knowledge of answers to previous questions and possibly additional information. Bandit

14 Introduction Expert Advice OCO Definitions Regret Formal Definitions 1: for t = 1, 2,...,T do 4: end for

15 Introduction Expert Advice OCO Definitions Regret Formal Definitions 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 4: end for A classifier Learner An example A loss Adversary

16 Introduction Expert Advice OCO Definitions Regret Formal Definitions 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 3: Learner suffers loss f t (w t ) and updates w t 4: end for A classifier Learner An example A loss Adversary Suffer and update

17 Introduction Expert Advice OCO Definitions Regret Formal Definitions 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 3: Learner suffers loss f t (w t ) and updates w t 4: end for Cumulative Loss Cumulative Loss = f t (w t )

18 Introduction Expert Advice OCO Definitions Regret Formal Definitions 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 3: Learner suffers loss f t (w t ) and updates w t 4: end for Cumulative Loss Cumulative Loss = f t (w t ) Full-information Learner observes the function f t ( ) Online first-order optimization Bandit Learner only observes the value of f t (w t ) Online zero-order optimization

19 Outline Introduction Expert Advice OCO Definitions Regret 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

20 Regret Introduction Expert Advice OCO Definitions Regret Cumulative Loss Cumulative Loss = f t (w t )

21 Regret Introduction Expert Advice OCO Definitions Regret Cumulative Loss Regret Regret = Cumulative Loss = f t (w t ) f t (w t ) min w W f t (w)

22 Regret Introduction Expert Advice OCO Definitions Regret Cumulative Loss Cumulative Loss = f t (w t ) Regret Regret = f t (w t ) }{{} Cumulative Loss of Online Learner min w W f t (w) }{{} Minimal Loss of Batch Learner

23 Regret Introduction Expert Advice OCO Definitions Regret Cumulative Loss Cumulative Loss = f t (w t ) Regret Regret = f t (w t ) }{{} Cumulative Loss of Online Learner min w W f t (w) }{{} Minimal Loss of Batch Learner Hannan Consistent ( ) 1 lim sup f t (w t ) min f t (w) = 0, with probability 1 T T w W

24 Regret Introduction Expert Advice OCO Definitions Regret Cumulative Loss Cumulative Loss = f t (w t ) Regret Regret = f t (w t ) }{{} Cumulative Loss of Online Learner min w W f t (w) }{{} Minimal Loss of Batch Learner Hannan Consistent f t (w t ) min f t (w) = o(t), with probability 1 w W

25 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

26 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 7: end for

27 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 7: end for

28 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 3: The expert advice is revealed to the forecaster 7: end for

29 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 3: The expert advice is revealed to the forecaster 4: The forecaster chooses the prediction ˆp t 7: end for

30 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 3: The expert advice is revealed to the forecaster 4: The forecaster chooses the prediction ˆp t 5: The environment reveals the next outcome y t 7: end for

31 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 3: The expert advice is revealed to the forecaster 4: The forecaster chooses the prediction ˆp t 5: The environment reveals the next outcome y t 6: The forecaster incurs loss l(ˆp t, y t ) and each expert i incurs loss l(f i,t, y t ) 7: end for

32 Introduction Expert Advice OCO Definitions The Prediction Protocol 1: for t = 1,...,T do 2: The environment chooses the next outcome y t and the expert advice {f i,t : i [N]} 3: The expert advice is revealed to the forecaster 4: The forecaster chooses the prediction ˆp t 5: The environment reveals the next outcome y t 6: The forecaster incurs loss l(ˆp t, y t ) and each expert i incurs loss l(f i,t, y t ) 7: end for Regret with respect to expect i ( R i,t = l( p t, y t ) l(f i,t, y t ) ) = L T L i,t

33 Introduction Expert Advice OCO Online Algorithms Weighted Average Prediction N i=1 p t = w i,t 1f i,t N j=1 w j,t 1 where w i,t 1 is the weight assigned to expert i

34 Introduction Expert Advice OCO Online Algorithms Weighted Average Prediction N i=1 p t = w i,t 1f i,t N j=1 w j,t 1 where w i,t 1 is the weight assigned to expert i and Polynomially Weighted Average Forecaster w i,t 1 = 2(R i,t 1) p 1 + (R t 1 ) + p 2 p p t = N t 1 ( i=1( s=1 l( p s, y s ) l(f i,s, y s ) )) p 1 ( l( p s, y s ) l(f j,s, y s ) )) p 1 N j=1( t 1 s=1 + f i,t +

35 Introduction Expert Advice OCO Online Algorithms Weighted Average Prediction N i=1 p t = w i,t 1f i,t N j=1 w j,t 1 where w i,t 1 is the weight assigned to expert i Exponentially Weighted Average Forecaster w i,t 1 = e ηr i,t 1 N j=1 eηr j,t 1 and p t = ( ) N i=1 exp η( L t 1 L i,t 1 ) f i,t ( ) = N j=1 exp η( L t 1 L j,t 1 ) N i=1 e ηl i,t 1f i,t N j=1 e ηl j,t 1

36 Introduction Expert Advice OCO Regret Bounds I Corollary 2.1 of [Cesa-Bianchi and Lugosi, 2006] Assume that the loss function l is convex in its first argument and that it takes values in [0, 1]. Then, for any sequence y 1, y 2,... of outcomes and for any T 1, the regret of the polynomially weighted average forecaster satisfies L T min i=1,...,n L i,t T(p 1)N 2/p

37 Introduction Expert Advice OCO Regret Bounds I Corollary 2.1 of [Cesa-Bianchi and Lugosi, 2006] Assume that the loss function l is convex in its first argument and that it takes values in [0, 1]. Then, for any sequence y 1, y 2,... of outcomes and for any T 1, the regret of the polynomially weighted average forecaster satisfies L T min i=1,...,n L i,t T(p 1)N 2/p When p = 2, we have L T min i=1,...,n L i,t TN When p = 2ln N, we have L T min i=1,...,n L i,t Te(2ln N 1)

38 Introduction Expert Advice OCO Regret Bounds II Corollary 2.2 of [Cesa-Bianchi and Lugosi, 2006] Assume that the loss function l is convex in its first argument and that it takes values in [0, 1]. Then, for any sequence y 1, y 2,... of outcomes and for any T 1, the regret of the exponentially weighted average forecaster satisfies L T min L i,t ln N i=1,...,n η + ηt 2

39 Introduction Expert Advice OCO Regret Bounds II Corollary 2.2 of [Cesa-Bianchi and Lugosi, 2006] Assume that the loss function l is convex in its first argument and that it takes values in [0, 1]. Then, for any sequence y 1, y 2,... of outcomes and for any T 1, the regret of the exponentially weighted average forecaster satisfies L T min L i,t ln N i=1,...,n η + ηt 2 When η = 2ln N/T, we have L T min i=1,...,n L i,t 2T ln N

40 Introduction Expert Advice OCO Extensions Tighter Regret Bounds Small Losses L T min L i,t i=1,...,n where L T = min i=1,...,n L i,t Exp-concave Losses 2L T ln N +ln N L T min i=1,...,n L i,t ln N η

41 Introduction Expert Advice OCO Extensions Tighter Regret Bounds Small Losses L T min L i,t i=1,...,n where L T = min i=1,...,n L i,t Exp-concave Losses 2L T ln N +ln N L T min i=1,...,n L i,t ln N η Tracking Regret ( R(i 1,...,i T ) = l( p t, y t ) l(f it,t, y t ) )

42 Outline Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

43 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Definitions Online Convex Optimization 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 3: Learner suffers loss f t (w t ) and updates w t 4: end for where W and f t s are convex

44 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Definitions Online Convex Optimization 1: for t = 1, 2,...,T do 2: Learner picks a decision w t W Adversary chooses a function f t ( ) 3: Learner suffers loss f t (w t ) and updates w t 4: end for where W and f t s are convex Online Gradient Descent w t+1 = Π W (w t η t f t (w t )) where Π W ( ) is the projection operator

45 Outline Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

46 Analysis I Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Define w t+1 = w t η t f t (w t ). For any w W, we have f t (w t ) f t (w) f t (w t ), w t w = 1 η t w t w t+1, w t w = 1 ( wt w 2 2 w t+1 w w t w 2η t+1 2 ) 2 t = 1 2η t ( wt w 2 2 w t+1 w η t ( wt w 2 2 w t+1 w 2 2 By adding the inequalities of all iterations, we have f t (w t ) f t (w) 1 2η 1 w 1 w η T w T+1 w t=2 ( 1 1 ) w t w η t η t 1 2 ) η t + 2 f t(w t ) 2 2 ) η t + 2 f t(w t ) 2 2 η t f t (w t ) 2 2

47 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Analysis II Assuming we have By setting η t η t 1, w t w 2 2 D 2, and f t (w t ) 2 2 G 2, f t (w t ) f t (w) D2 + D2 2η 1 2 = D2 + G2 2η T 2 t=2 η t = 1 t, ( 1 1 )+ G2 η t η t 1 2 η t η t we have f t (w t ) f t (w) ( ) D 2 T 2 + G2

48 Outline Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

49 Analysis I Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Define w t+1 = w t η t f t (w t ). For any w W, we have f t (w t ) f t (w) f t (w t ), w t w λ 2 w t w ( wt w 2 2 w t+1 w 2 ) η t 2 + 2η t 2 f t(w t ) 2 2 λ 2 w t w 2 2 By adding the inequalities of all iterations, we have f t (w t ) f t (w) 1 2η 1 w 1 w 2 2 λ 2 w 1 w η T w T+1 w t=2 ( 1 1 ) λ w t w η t η t 1 2 η t f t (w t ) 2 2

50 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Analysis II Assuming we have f t (w t ) 2 2 G 2, f t (w t ) f t (w) 1 2η 1 w 1 w 2 2 λ 2 w 1 w 2 2 By setting we have t=2 ( 1 1 λ ) w t w 22 + G2 η t η t 1 2 η t = 1 λt, η t f t (w t ) f t (w) G2 2λ 1 t G2 (log T + 1) 2λ

51 Outline Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co 1 Introduction Definitions Regret 2 Prediction with Expert Advice 3 Online Convex Optimization Convex Functions Strongly Convex Functions Exponentially Concave Functions

52 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Definitions Exponential Concavity A function f( ) : W R is α-exp-concave if exp( αf( )) is concave over W.

53 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Definitions Exponential Concavity A function f( ) : W R is α-exp-concave if exp( αf( )) is concave over W. Logistic Loss Square Loss Negative Logarithm Loss ( ) f(w) = log 1+exp( yx w) f(w) = (x w y) 2 f(w) = log(x w)

54 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Online Newton Step The Algorithm 1: A 1 = 1 β 2 D 2 I 2: for t = 1, 2,...,T do 3: A t+1 = A t + f t (w t )[ f t (w t )] 4: w t+1 =Π A t W =argmin w W ( w t 1 β A 1 where w t+1 = w t 1 β A 1 t f t (w t ) 5: end for ) t f t (w t ) ( w w t+1 ) At ( w w t+1 )

55 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Regret Bound Theorem 2 of [Hazan et al., 2007] Assume that for all t, the loss function f t : W R d R is α-exp-concave and has the property that f t (w) 2 G, w W, t [T] Then, online Newton Step with β = 1 { } 1 2 min 4GD,α has the following regret bound ( ) 1 f t (w t ) min f t (w) 5 w W α + GD d log T

56 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Online-to-batch Conversion Statistical Assumption f 1, f 2,... are sampled independently from P Define F(w) = E f P [f(w)]

57 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Online-to-batch Conversion Statistical Assumption f 1, f 2,... are sampled independently from P Define F(w) = E f P [f(w)] Regret Bound f t (w t ) f t (w) C Taking expectation over both sides, we have [ ] E F(w t ) F(w) C

58 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Online-to-batch Conversion Statistical Assumption f 1, f 2,... are sampled independently from P Define F(w) = E f P [f(w)] Regret Bound f t (w t ) f t (w) C Taking expectation over both sides, we have [ ] E F(w t ) F(w) C Risk Bound E[F( w)] F(w) C T, where w = 1 T w t

59 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Examples of Conversion Convex Functions E[F( w)] F(w) 1 ( ) D 2 T 2 + G2 High-probability Bound [Cesa-bianchi et al., 2002]

60 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Examples of Conversion Convex Functions E[F( w)] F(w) 1 ( ) D 2 T 2 + G2 High-probability Bound [Cesa-bianchi et al., 2002] Strongly Convex Functions E[F( w)] F(w) G2 (log T + 1) 2λT High-probability Bound [Kakade and Tewari, 2009]

61 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Examples of Conversion Convex Functions E[F( w)] F(w) 1 ( ) D 2 T 2 + G2 High-probability Bound [Cesa-bianchi et al., 2002] Strongly Convex Functions E[F( w)] F(w) G2 (log T + 1) 2λT High-probability Bound [Kakade and Tewari, 2009] Exponentially Concave Functions ( ) 1 d log T E[F( w)] F(w) 5 α + GD T High-probability Bound [Mahdavi et al., 2015]

62 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Extensions Faster Rates [Srebro et al., 2010] f t (w t ) min f t (w) O( Tf ) w W where f = min w W T f t(w)

63 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Extensions Faster Rates [Srebro et al., 2010] f t (w t ) min f t (w) O( Tf ) w W where f = min w W T f t(w) Dynamic Regret [Zinkevich, 2003, Yang et al., 2016] R(u 1,...,u T ) = f t (w t ) f t (u t )

64 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Extensions Faster Rates [Srebro et al., 2010] f t (w t ) min f t (w) O( Tf ) w W where f = min w W T f t(w) Dynamic Regret [Zinkevich, 2003, Yang et al., 2016] R(u 1,...,u T ) = f t (w t ) f t (u t ) R(T,τ) = max [s,s+τ 1] [T] Adaptive Regret [Hazan and Seshadhri, 2007, Daniely et al., 2015] ( s+τ 1 t=s s+τ 1 f t (w t ) min w W t=s f t (w) )

65 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Reference I Cesa-bianchi, N., Conconi, A., and Gentile, C. (2002). On the generalization ability of on-line learning algorithms. In Advances in Neural Information Processing Systems 14, pages Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press. Daniely, A., Gonen, A., and Shalev-Shwartz, S. (2015). Strongly adaptive online learning. In Proceedings of The 32nd International Conference on Machine Learning. Hazan, E., Agarwal, A., and Kale, S. (2007). Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3): Hazan, E. and Seshadhri, C. (2007). Adaptive algorithms for online decision problems. Electronic Colloquium on Computational Complexity, 88. Kakade, S. M. and Tewari, A. (2009). On the generalization ability of online strongly convex programming algorithms. In Advances in Neural Information Processing Systems 21, pages

66 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Reference II Karp, R. M. (1992). On-line algorithms versus off-line algorithms: How much is it worth to know the future? In Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture Information Processing, pages Mahdavi, M., Zhang, L., and Jin, R. (2015). Lower and upper bounds on the generalization of stochastic exponentially concave optimization. In Proceedings of the 28th Conference on Learning Theory. Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2): Srebro, N., Sridharan, K., and Tewari, A. (2010). Optimistic rates for learning with a smooth loss. ArXiv e-prints, arxiv: Yang, T., Zhang, L., Jin, R., and Yi, J. (2016). Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. In Proceedings of the 33rd International Conference on Machine Learning.

67 Introduction Expert Advice OCO Convex Functions Strongly Convex Functions Exponentially Co Reference III Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, pages

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn