Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Size: px

Start display at page:

Download "Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization"

Isaac Underwood
5 years ago
Views:

1 Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi

2 Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform laws of large numbers III Bounded di erences 1 Martingales 2 Martingale concentration 3 Functions with bounded di erences IV Rademacher complexities 1 Bounded di erences in loss minimization 2 Symmetrization 3 Rademacher complexity V Examples 1 Binary classification 2 Multiclass classification

3 Standard recipe In machine learning problem (say of predicting y 2Y from x 2X) 1 Choose data representation for x and parameter space (said di erently, hypothesis class H) 2 Choose loss function ` 3 Given sample (X 1,Y 1 ),...,(X n,y n ),minimize 1 n nx `( ;(X i,y i )). i=1 What is actual goal? Minimize risk/expected loss L( ) :=E[`( ;(X, Y ))] or L(h) :=E[`(h;(X, Y ))]

4 Does ML work? Fixed : b n 2 argmin 2 bl n ( ) where b L n ( ) := 1 n bl n ( )! L( ) nx `( ;(X i,y i )) i=1 But b n depends on data. Example: Failure when X i iid N(0,Id ), Y i? X i, =R d

5 Does ML work? Definition (Uniform law of large numbers) sup 2 bl n ( ) L( ) p! 0 More generally, a collection of functions F, f : X!R, satisfies ULLN if 1 nx p f(x i ) E[f(X)]! 0. n sup f2f i=1 Consequence for risk minimization

6 One picture of ULLNs Covering idea (come back later)

7 Bounded di erences and martingales Definition (Martingales) Let X 1,X 2,... be random vectors and Z 1,Z 2,... be a sequence of random variables. Then {X n } is a martingale sequence adapted to {Z n } if i X i is a function of Z 1,Z 2,...,Z i ii E[X i Z 1,...,Z i 1 ]=X i 1. Example: independent sums

8 Concentration of martingales Definition (Martingale di erences) Let {X i } be a martingale adapted to {Z i } and define D i = X i X i 1.Then{D i } is a martingale di erence sequence Example: independent sums Definition (Sub-gaussian martingale) D i is a 2 -sub-gaussian martingale di erence if E[e D i Z 1:i 1 ] apple e for all 2 R

9 Azuma-Hoe ding inequality If D i is a 2 -sub-gaussian martingale di erence sequence, for t 0! nx t 2 P D i t apple exp 2n 2 i=1! nx t 2 P D i apple t apple exp 2n 2 i=1

10 Doob martingales Let X 1,...,X n 2X be a sequence of independent random variables and f : X n! R. TheDoob martingale di erence is D i := E[f(X 1:n ) X 1:i ] E[f(X 1:n ) X 1:i 1 ] Remark: look at expectations and sums

11 Concentration of functions with bounded di erences A function f : X n! R has bounded di erences if f(x 1:i 1,x i,x i+1:n ) f(x 1:i 1,x 0 i,x i+1:n ) apple c i for all i and x, x 0 Theorem (McDiarmid s or bounded-di erences inequality) Let f satisfy bounded di erences and X i be independent RVs. Then! 2t 2 P ( f(x 1:n ) E[f(X 1:n )] t) apple exp kck 2 2

12 Proof of McDiarmid s inequality

13 Bounded di erences in risk minimization Let ` : X![a, b]. Then sup 2 bl n ( ) L( ) =sup 2 1 n nx `( ; X i ) i=1 L( ) satisfies bounded di erences

14 From probability to expectation Corollary Let ` : X!R take values in [a, b]. Then apple P sup 2 bl n ( ) L( ) E sup 2 bl n ( ) L( ) + t apple exp 2nt 2 (b a) 2

15 Symmetrization Proposition (Symmetrization inequality) Let F be a collection of f : X!Rand X 1,...,X n be independent. Then " # " nx nx E (f(x i ) E[f(X i )]) apple 2E " i f(x i ) sup f2f i=1 sup f2f i=1 # where " i 2 { 1, 1} are i.i.d. random signs

16 Rademacher complexity Let x 1,...,x n 2X be arbitrary and F a collection of f : X!R. The empirical Rademacher complexity of F on x 1:n is " # nx br n (F) :=E " i f(x i ) sup f2f where " i 2 { 1, 1} are i.i.d. random signs. The Rademacher complexity of F is h i R n (F) :=E brn (F) i=1 where expectation is over X 1,...,X n

17 Rademacher complexity and losses Theorem (Concentration and Rademacher complexity) Let F := {`(, ) 2 } (viewed as functions on X )and `(, x) 2 [a, b]. Then P 9 2 s.t. Ln b ( ) L( ) 2R n (F)+t apple exp 2nt 2 (b a) 2

18 Rademacher complexity of norm balls (`2)

19 Rademacher complexity of norm balls (`1)

20 Properties of Rademacher complexity (1) Containment: if F Hthen b R n (F) apple b R n (H) (2) Convex hulls: b Rn (F) = b R n (Conv(F)) = b R n (absconv(f)) (3) Single functions: b Rn ({f}) apple 1 p n kfk L 2 (P n ) (4) Sums of function classes: br n (F 1 + F F k ) apple P k i=1 b R n (F i ) (5) Contraction [Ledoux & Talagrand, Thm. 4.12]: if : R! R is L -Lipschitz and (0) = 0, then br n ( F) apple 2L b R n (F)

21 Example: margin-based classification Setting: Data (X, Y ) 2 R d { 1, 1} where `( ;(x, y)) = (y T x) for : R! R, non-increasing

22 Margin-based classification and generalization Theorem Assume that is c -Lipschitz, that kxk 2 apple b X and k k 2 apple b. Then with probability at least 1,forall 2 " r # L( ) apple L b L b X b n ( )+O(1) p log 1 + p (0). n n

23 Proof of margin-based classifiers

24 Multiclass classification Consider k-class classification problem, = 1 2 k 2 R d k Let margin s = T x 2 R k, loss : R k! R of form for some labeling matrix y `( ; x, y) = ( y s)= ( y T x)

25 Multiclass margin-based losses 1. Multiclass logistic 2. Multiclass hinge/svm

26 Additional comments

27 Reading and bibliography 1. M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer, P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: , S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics, 9: , S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: a Nonasymptotic Theory of Independence. Oxford University Press, 2013

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples