P exp(tx) = 1 + t 2k M 2k. k N

1. Subgaussan tals <1> Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan. <2> Example. Suppose X s a bounded random varable wth a symmmetrc dstrbuton. That s, X M for some constant M and X has the same dstrbuton as X. Then P exp(tx) = 1 + t k PX k k! k N By symmetry, PX k = for each odd k. For even k, bound PX k by M k, leavng P exp(tx) = 1 + t 2k M 2k exp(m 2 t 2 /2) (2k)! k N because (2k)! 2 k k! for each k n N. The argument for boundng the maxmum of normal random varables carres over to subgaussans. <3> Theorem. Suppose X 1,...,X n are subgaussan wth scale factors bounded by a constant σ. Then P max n X 3 2 σ 1 + log(2n). Proof. For each t >, exp(tp max X ) P max exp(t X ) ( Pe tx + Pe tx) 2n exp( 1 n n 2 σ 2 t 2 ) n Choose t = log(2n)/σ. In fact, we could mprove the nequalty to gve smlar bounds for varous L p norms of max n X by choosng slghtly dfferent convex functons nstead of x exp(tx). I won t derve these bounds explctly because there s an even better nequalty obtanable from another characterzaton of subgaussanty. <4> Theorem. Suppose PX =. Then X s subgaussan f and only f there exsts a fnte constant C for whch P exp(x 2 /C 2 )<. Proof. If P exp(tx) exp(σ 2 t 2 /2) for all real t then P exp(x 2 /4σ 2 ) 1 = P {X 2 /4σ 2 t }e t dt ( X t P exp σ ) t dt ( P exp(x t/σ ) + exp( X t/σ ) 2e t/2 dt <. ) e t dt Conversely, f P exp(x 2 /C 2 ) = D < then, from the nequalty ab (a 2 + b 2 )/2, we get ( X 2 P exp(tx) P exp C + C2 t 2 ) = D exp(c 2 t 2 /4). 2 4 13 January 25 Asymptopa, verson: 13jan5 c Davd Pollard 1

Ths bound s not qute what we need for subgaussanty. If we bound t away from zero we can elmnate the D: f D exp(mc 2 2 ) for some constant M then P exp(tx) exp((m + 1)C 2 t 2 ) for t. If s small enough, the Taylor expanson gves, for small enough, P exp(tx) = 1 + tpx + 1 2 t 2 PX 2 + o(t 2 ) exp ( 1 2 t 2 (1 + PX 2 ) ) when t. The subgaussanty bound follows. Subgaussan random varables can also be characterzed by an exponental tal bound. Take t = x/σ 2 n the nequalty P{X x} exp( tx)p exp(tx) exp( tx + σ 2 t 2 /2) to deduce that P{X x} exp( x 2 /2σ 2 ) for x. Replace X by X, whch s also subgaussan, then add, to derve the analogous two-sded bound. Conversely, f P{ X x} C exp( x 2 /2σ 2 ) then P exp(x 2 /9σ 2 ) 1 = P = whch, va Theorem <4>, gves subgaussanty. {X 2 9σ 2 t }e t dt P{ X 3σ 2 t}e t dt C exp( 9t/2 + t) dt < 2. Orlcz norms The convexty argument used to prove Theorem <3> also works for hgher moments. ( ) p P max X P max X p P X p N max P X p. N N N N Thus <5> P max X N max X N N 1/p max X p for p 1. p N More generally, f s a nonnegatve, convex, strctly ncreasng functon on R +, then, for each σ>, ( ) ( ) X P max P max N σ X N σ ( ) X P σ N ( ) N max P X. N σ If σ s such that P( X /σ ) 1 for each then we have P max X σ 1 (N). N 2 13 January 25 Asymptopa, verson: 13jan5 c Davd Pollard

<6> Defnton. An Orlcz functon s a convex, ncreasng functon on R + Most authors actually requre wth () <1. Defne the Orlcz norm X (semnorm actually, unless () = one dentfes random varables that are almost everywhere equal) by X = nf{c > :P( X /c) 1}, wth the understandng that X = f the nfmum runs over an empty set. It s not hard to show (Pollard 21, Problems 2.22 through 2.24) that X < f and only f P( X /C) < for at least one fnte constant C. The nfmum defnng X s acheved when the norm s fnte. <7> Example. Let (x) = exp(x 2 ) 1. Then X < f and only f X PX s subgaussan. Notce that a bound on an Orlcz norm, X σ, automatcally gves a tal bound, P{ X x} P( X /σ )/(x/σ ) 1/(x/σ ) for x. For example, f (x) = 1 2 exp(x 2 ) then we get a subgaussan tal bound. Sometmes t s possble to fnd such that P( X /) K, for a constant K > 1. It then follows from convexty of that <8> X /θ where θ = 1 () K (), because P (θ X /) θp ( X /) + (1 θ)() θ K + (1 θ)() = 1. <9> Example. (Compare wth page 96 of van der Vaart & Wellner (1996).) Let be an Orlcz functon (such as exp(x 2 ) 1, as n Problem [1]) for whch there exsts a fnte constant C such that (α)(β) (C αβ) for (α) (β) 1. Then <1> max X N C 1 (N) max X where C := 2 () N 1 () C To prove the asserton, defne D = C 1 (N) and = max N X. Notce that (D/C ) = N 1. When (max X /D) 1, ( ) ( ) ( ) max X D max X ( ) X. D C That s, ( ) ( max X mn 1, N ( ) ) 1 X D Take expectatons. ( ) max X P 1 + N 1 D ( ) X P 2. Invoke nequalty <8>. Fnally, notce that f X = σ for (x) = exp(x 2 ) 1 then P X 2p p!p exp(x 2 /σ 2 ) 2p!. σ 2p A bound on the Orlcz norm, for ths partcular, gves a bound on moments of all orders. 13 January 25 Asymptopa, verson: 13jan5 c Davd Pollard 3

<11> Example. For each event A wth >, wrte P A for the condtonal expectaton gven A. Suppose X <. From Jensen s nequalty and the defnton of the Orlcz norm we get (P A X /) P A ( X /) = P( X /)A from whch t follows that <12> P A X X 1 (1/). 1, Wth cunnng choces of A, ths nequalty wll delver a useful maxmal nequalty for fnte collectons of random varables, namely, <13> P A max X 1 (N/) f max N X. N Indeed, f A 1,...,A N denotes a partton of A nto subsets, such that X s the largest of the X j on the set A,then P A max X = P A X A = N P A X. Inequalty <12> and concavty of the functon 1 bound the last sum by ( ) ( ) 1 ( ) 1 1 1 N = 1. The bound <13> wll turn out to be much more powerful than one mght at frst glance suspect. If we choose A ={max N X ɛ} then we get lower bound for 1/. The full power of ths trck wll appear n the Chapter on channg. 3. Problems [1] Show that (exp(x 2 ) 1)(exp(y 2 ) 1) exp(2x 2 y 2 ) 1forx y 1. [2] Suppose X has a symmetrc dstrbuton. Show that t s subgaussan f and only f there exsts some constant c for whch X k c k for each k n N. Hnts: Note that X k s an ncreasng functon of k. Fork even, try to show that X k k P exp(tx) nf k! t t k [3] Let X and Y be dentcally dstrbuted random varables wth PX = PY =. () Let H be a convex functon. [Any other regularty condtons?] Show that PH(X) = PH(X PY ) PH(X Y ). () Show that X X Y 2 X for each Orlcz functon. () Generalze the result from Problem [2]: Show that the moment characterzaton of subgaussanty stll holds f replace the symmetry assumpton on X by the assumpton that PX =. 4. Notes Acknowledge Ledoux & Talagrand (1991) for several of the deas used n ths Chapter, ncludng Example <11> Cte Aad van der Vaart (personal communcaton, or van der Vaart & Wellner 1996) for mprovement on the method used n Pollard (199, Secton 3). 4 13 January 25 Asymptopa, verson: 13jan5 c Davd Pollard

Who frst got the characterzaton n Problems [2] and [3]? I got t from a sharper result n Lugos (23, Secton 2), but t must be older. Gve some hstory of earler work: Dudley, Pser? References Ledoux, M. & Talagrand, M. (1991), Probablty n Banach Spaces: Isopermetry and Processes, Sprnger, New York. Lugos, G. (23), Concentraton-of-measure nequaltes, Notes from the Summer School on Machne Learnng, Australan Natonal Unversty. Avalable at http://www.econ.upf.es/ lugos/. Pollard, D. (199), Emprcal Processes: Theory and Applcatons, Vol.2 of NSF-CBMS Regonal Conference Seres n Probablty and Statstcs, Insttute of Mathematcal Statstcs, Hayward, CA. Pollard, D. (21), A User s Gude to Measure Theoretc Probablty, Cambrdge Unversty Press. van der Vaart, A. W. & Wellner, J. A. (1996), Weak Convergence and Emprcal Process: Wth Applcatons to Statstcs, Sprnger-Verlag. 13 January 25 Asymptopa, verson: 13jan5 c Davd Pollard 5