Chapter I Some basic elements of Probability Theory 1 Terminology (and elementary observations Probability theory and the material covered in a basic Real Variables course have much in common. However the terminology is often different, this for historical reasons and because the focus of Probability theory is somewhat different. In this section we introduce some of the standard terminology and basic facts of Probability. Additional results that we shall need will be introduced as needed. 2.1 Definitions: 2 Terminology conventions. a. A real-valued random variable: A measurable real-valued function on some probability measure space ( Ω,F,P. Similarly, a complex-valued random variable, a vector-valued random variable (aka random vector. The range space of a random vector will be a topological vector space, typically R n, C n, etc. Random variables are often denoted by X, Y, etc. but may be denoted by other letters, e.g., by f, g, r, ϕ, etc. Example. A Bernoulli variable (with success probability p is a variable X that assumes two values: 1 with probability p, and 0 with probability 1 p. b. Event: A measurable set, element of F ; Example. If X is a real-valued random variable, then the event {X λ} is the set {ω :X(ω λ}. c. The probability of an event A is its measure: P(A. Events of probability 1 are said to hold almost surely or a.s (almost everywhere or a.e. in the standard laguage of measure theory.
2 PROBABILISTIC METHODS IN ANALYSIS d. The field F X of a real valued variable X is the sub-sigma-algebra of F spanned by the events {X I}, I R an open set. Similarly for complex valued or vector-valued variables. e. The distribution of a random variable X is the image of P under X; it is a probability measure on the range of X. f. The (cumulative distribution function of a real-valued random variable X is the function F X (λ=p{x λ}; In this case the distribution of X is simply the measure df X. For a single random variable X the distribution df X is the complete information offered by X. The variables X and Y are similar if they have the same distribution. g. The joint distribution of k random variables X 1,...,X k, is, by definition, the distribution of the random vector (X 1,...,X k R k. It is the measure on R k, image of P under (X 1,...,X k. The corresponding distribution function is The function on R k defined by F X1,...,X k (λ 1,...,λ k =P{X 1 λ 1,...,X k λ k } = P( k j=1{x j λ j }. h. The expectation E ( X of a random variable X: It is its integral, (assuming that X is integrable. E ( X = XdP = λ df X (λ. i. The moment of order k of a random variable X: It is E ( X k. Similar definitions are used for random vectors, with the absolute value replaced by the norm. j. The variance V (X of a random variable X: assuming that X has a (finite second moment, i.e. X L 2( Ω,F,P, the variance is defined by V (X=E ( (X E ( X 2 = E ( X 2 (E ( X (V (X= means that X is not square integrable.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 3 2.2 Theorem. If Φ is continuous on R then Φ X is integrable (has expectation, has first moment on ( Ω,F,P if, and only if Φ is integrable df X, and then (2.1 E ( Φ X = Φ(λdF X (λ. 2.3 Proposition. Let X and Y be nonnegative random variables such that F X (λ F Y (λ for all λ [0,. If Φ(λ is monotone nondecreasing then (2.2 Φ(λdF X (λ Φ(λdF Y (λ. 2.4 Conditional expectation. If X is an integrable random variable, so that XP is a (signed measure on F, and D F is a sub sigma-algebra, the conditional expectation of X given D, denoted E ( X D is defined as follows: The restriction of P to D is a probability measure P D on D, and the restriction (XP D of XP to D is a (signed measure on D, absolutely continuous with respect to P D. The Radon-Nikodym derivative of (XP D with respect to P D is denoted E ( X D and called the conditional expectation of X given D. It is D-measurable, and its integrals on elements of D are equal to those of X on the same sets. In particular, if D = F Y for some random variable Y we write E ( X FY simply as E ( X Y (the conditional expectation of X given Y. Observations: If B 1 B 2 are subalgebras of F, and f L p( Ω,F,P, p 1, then (2.3 E ( E ( f B 2 B 1 = E ( f B 1 and (2.4 E ( fg = E ( ge ( f g. If 1 < p <, then E ( f B 1 p E ( f B 2 p, with equality only if the functions are equal; (the uniform convexity of L p( Ω,B, µ. 2.5 The following are also know as the weak type inequalities. Theorem (Chebishev s inequalities. For p > 0, (2.5 P{ X λ} 1 λ p E( X p.
4 PROBABILISTIC METHODS IN ANALYSIS 2.6 Lemma. Let X be a non-negative random variable with E ( X 2 <. Then for 0 < λ < 1, (2.6 P({ω :X(ω λe ( X } (1 λ 2 E( X 2 E ( X 2. PROOF: Denote A = {ω :X(ω λe ( X }, a = P(A. As X λe ( X on the complement of A, the contribution of A to E ( X is at least (1 λe ( X, which means that average of X on A is at least a 1 (1 λe ( X, and A s contribution to E ( X 2 is at least a 1 (1 λ 2 E ( X It follows that a 1 (1 λ 2 E ( X 2 E ( X 2, i.e., a (1 λ 2 E ( X 2 /E ( X 2. 3 The characteristic function The characteristic function of a real-valued random variable X is the Fourier- Stieltjes transform χ X (ξ of its distribution. Taking Φ(X=e iξ X in equation (2.1, we have (3.1 χ X (ξ =E ( e iξ X = e iξ x df X (x. 3.1 As the name suggests, the characteristic function χ X (ξ determines the distribution df X (x of X. This is the uniqueness theorem for Fourier-Stieltjes transforms on R, an immediate consequence of the Parseval s formula. Theorem (Parseval s formula. Let µ be a finite measure on R and let f be a continuous function in L 1 (R such that ˆf L 1 ( ˆR. Then 2 (3.2 f (xdµ(x= 1 PROOF: By [15], VI.1.12, (page 158, f (x= 1 hence f (xdµ(x= 1 ˆf (ξ ˆµ( ξ dξ. ˆf (ξ e iξ x dξ ; ˆf (ξ e iξ x dµ(xdξ = 1 ˆf (ξ ˆµ( ξ dξ. 2 Notice that (3.2 is equivalent to f (xdµ(x=1/ ˆf (ξ ˆµ(ξ dξ.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 5 Corollary (uniqueness theorem. If ˆµ(ξ = 0 for all ξ, then µ = 0. If ϕ(x= R Ψ(ξ eiξ x dξ, with Ψ smooth of compact support on R, then ϕ(xdf X (x= Ψ(ξ e iξ x dξ df X (x (3.3 = e iξ x df X (xψ(ξ dξ = χ X (ξ Ψ(ξ dξ. Since such ϕ s are sufficient to determine measures on R: if µ 1, µ 2 are finite measures on R and (3.4 ϕdµ 1 = ϕdµ 2 for all such ϕ s, then µ 1 = µ 2, it follows that if χ X1 = χ X2 then df X1 = df X 3.2 Similarly, the characteristic function of an R n -valued random vector V is the Fourier-Stieltjes transform χ V (ξ of its distribution on R n. Taking Φ(V = e iξ V in equation (2.1, we have (3.5 χ V (ξ =E ( e iξ X = e iξ v df V (v. 3.3 If X has finite moment of order k, then χ X (ξ is k-times continuously differentiable. For j k, ( d j ( d (3.6 χ je dξ X (ξ = iξλ df X (λ= (iλ j e iξλ df X (λ. dξ and χ ( j X (ξ E( X j for j k. Other properties: (3.7 χ cx (ξ =E ( e iξ cx = E ( e icξ X = χ X (cξ so that χ ( j cx (ξ =cj χ ( j X (cξ. 4 Independence. 4.1 Let F j,1 j k be sub-sigma-algebras of F. DEFINITION: have The algebras F j are independent if for A j F j,1 j k we j=1 (4.1 P { k } k A j = P{A j }. The variables X 1,...,X k are independent if F Xj are independent. An infinite set of variables is independent if any finite subset thereof is independent. j
6 PROBABILISTIC METHODS IN ANALYSIS 4.2 If X is a random variable, and f is a continuous function on the range of X then F f (X F X. It follows that if X j are independent and f j are continuous functions on the ranges of X j such that f j (X j have finite expectation (are integrable, then (4.2 E ( f j (X j = E ( f j (X j. 4.3 For real-valued variables the condition (4.1 is equivalent to: For all real λ 1,...,λ k we have (4.3 F X1,...,X k (λ 1,...,λ k = F Xj (λ j, j that is, the k-dimensional distribution of the vector (X 1,...,X k is the direct product of the distributions of its components. The convolution µ j of k probability measures {µ j } k j=1 on R can be defined by the condition that for all (bounded continuous functions ϕ on R (4.4 ϕ d ( µ j = ϕ(x 1 + + x k dµ 1 (x 1 dµ k (x k. In other words, ( µ j is the image of the product measure µ1 µ k under the projection onto R of R k modulo the subspace of codimension 1 defined by {(x 1,...,x k : x k = 0}. A single projection does not identify a probability measure on R n, but projections modulo all subspaces of codimention 1 does, and we have the following theorem. Theorem. The following conditions are equivalent: a. The variables X 1,...,X k are independent. b. For all (a 1,...,a k R k, df k 1 a j X j = df a j X j (convolution product. c. For all (a 1,...,a k R k, χ aj X j (ξ = χ a j X j (ξ.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 7 4.4 Bernoulli, Rademacher and Steinhaus. Very useful sequences of independent variables are the Bernoulli, Rademacher and the Steinhaus sequences. The Bernoulli variables ξ n,δ, where δ (0,1, are independent copies of the Bernoulli variable that takes the value 1(success with probability δ and the value 0 (failure with probability (1 δ. The Rademacher variables {r n }, are independent random variables, taking the values 1 and 1 with probability 1 2 each. The classical (concrete representation is as follows: the probability space is the interval [0, 1], endowed with the Lebesgue measure. If we denote by ε n (x the coefficients in the binary expansion of x [0,1, that is write x = ε j (x2 j, with ε j (x either zero or one, we can take r n (x = ( 1 εn(x. Another common representation of the Rademacher variables is as functions on the Cantor group D, the direct product of a sequence of groups of order 2. This is the group of all sequences {ε n } n=1, ε n = ±1, with the group action defined as pointwise multiplication. Endowed with the topology of pointwise convergence (the product topology it is compact, homeomrphic to the Cantor set. The Haar measure on D is the product measure of the 1 2, 1 2 measure on the components. The Rademacher functions can be taken as the coordinate functionss on D, and are characters on the group, generating its dual group. The Steinhaus variables ω n are independent copies of a real-valued variable with the Lebesgue measure on [0,1] as distribution. The variables s n are, by definition, e iω n. These are independent, with uniform distribution on the unit circle, {z: z = 1} and are often referred to as Steinhaus variables as well. An equivalent definition of s n, natural in uses in harmonic analysis, is to consider them as characters, or coordinate functions, on the group 3 T N, which can be taken, endowed with the Haar measure, as the probability space Ω. 4.5 Pairwise independence. Notice that pairwise independence does not imply independence; if X j = ±1 with probability 1/2 each, j = 1,2 and we set X 3 = X 1 X 2, then if X 1, X 2 are independent, the trio {X j } 3 j=1 is pairwise 3 Countable infinite product of the circle group T.
8 PROBABILISTIC METHODS IN ANALYSIS independent but not independent. 4.6 Independence orthogonality Pairwise independent random variables X j with E(X j =0 and finite variance (square summable are orthogonal in L 2( Ω,B, µ since, for i j, (4.5 E ( X i X j = E ( Xi E ( Xj = 0. Notice that the converse, orthogonality implies independence is false in general. We shall see later (see 8.3 special situations in which the converse does hold. 5 The zero one law. This theorem states that an event defined by a sequence of independent variables, which is a tail event, that is independent of any finite subset of the defining variables, is trivial: its probability is either zero or one. Similarly, a variable defined by a sequence of independent variables, which is independent of any finite subset of these, is equal to a constant a.s. EXAMPLES a. For a sequence of real-valued variables {X n } and a R, limsup n X n is independent of any finite subset of the sequence, so that {limsup n X n > a} is a tail event. If X n are independent variables, then limsup n X n = Const a.s. (The constant may be +. b. A more interesting example: a random Taylor series is a series of the form (5.1 F(z= X n z n. where X n are independent numerical variables. Given r > 0, the event the series (5.1 converges in the disc {z: z < r} is a tail event, hence of probability zero or one. Therefore, The radius of convergence of the series is constant a.s. Borel expressed (in 1896 the belief that the circle of convergence of a general random Taylor series is a.s. a natural boundary, that is, a.s. the series admits no analytic continuation. An analytic continuation across a given arc I (on the a.s. common circle of convergence is a tail event, and therefore is a.s. true or a.s. false.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 9 Assume that the a.s. radius of convergence of series (5.1 is r > 0. If the circle {z: z = r} is not an a.s. natural boundary, there is an arc I of length ar > 0 such that a.s. F has an analytic continuation across I. Consider first the case of symmetric X n (X n X n. Take N > a, and write (for l = 0,...,N 1 Y n = { Xn if n l (mod N X n otherwise. Y n z n is similar to X n z n, hence can be continued, a.s., across I, and so can (X n +Y n z n = 2z l X jn+l z jn. Since the periodicity of X jn+l z jn is /N < I, it extends analytically beyond the original circle of convergence. This is the case for every 0 l < N, and hence for their sum which is the original series a contradiction. Without the assumption of symmetry of the coefficients X n the claim is false! (e.g., constant X n. The correct statement is a theorem conjectured by D. Blackwell and proven by Ryll Nardzewski (1953 stating: Theorem. Let X n be independent numerical variables. Assume that the almost sure radius of convergence R of the series F(z= X n z n is finite. Then there exists a constant series b n z n such that the radius of convergence of the series G(z = (X n b n z n is at least R, and such that G has the circle of convergence as a natural boundary. PROOF: Denote the probability space on which the X n s are defined by ( Ω,B, µ. Let ( Ω,B, µ be a copy of ( Ω,B, µ and {X n} a copy of {X n }. Consider the series F(z= (X n (ω X n(ω z n. Since the series is the difference X n z n X nz n, and both series converge a.s. for z < R, the a.s. radius of convergence R of (X n (ω X n(ω z n is at least R. Since our coefficients are now symmetric, the circle of convergence is a.s. a natural boundary. By Fubini s theorem, we can take b n = X n(ω with almost every (fixed ω Ω, use this to define G and have our claim satisfied.
10 PROBABILISTIC METHODS IN ANALYSIS 5.1 Theorem (Borel Cantelli. Let ( Ω,F,P be a probability space and for all n let 11 En be the indicator function of an event E n F. a. If P{E n } < then 11 En converges a.s. (The set of points that belong to infinitely many E n s has zero measure. b. If P{E n } =, and if {11 En } are independent, then 11 En = a.s. (The set of points that belong to infinitely many E n s has full measure. Notice that in part a. there is no assumption of independence while the assumption in part b. that {11 En } are independent is crucial: if E n =(0,1/n in the unit interval endowed with the Lebesgue measure, then P{E n } = and the set of points that belong to infinitely many E n s is empty. EXERCISES FOR SECTION 5. exi.5.1 Show that the conclusion of b. above is false without the assumption of independence. exi.5.2 Show that if { f i } i=1 is independent and if F i are continuous real-valued functions on R, then {F i f i } is independent. 6 Martingales. Many results concerning sums of independent random variables are valid in the more general framework of martingales. DEFINITION: A (discrete time Martingale is a sequence {g n } of (integrable random variables which satisfy the condition E ( g n+1 Fn = gn, where F n is the span of the fields F g j, j n. The condition is equivalent to: for all l > n, (6.1 E ( g l F n = gn. Notice that if {g n } is a martingale then g n L p for every p 1. Examples: is monotone non-decreasing a. Let {ϕ n } be independent. The sequence of partial sums g n = n 1 ϕ j is a martingale if, and only if E ( ϕ j = 0 for all j > 1.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 11 b. Let f be integrable, {G n } a monotone increasing sequence of sigma-algebras in B, and f n = E ( f G n. 6.1 Stopping time. Given a monotone increasing sequence F n of sigmaalgebras (in B, a stopping time for {F n } is a (positive integer-valued function τ such that (6.2 {τ(x k} = {x:τ(x k} F k. Notice that if τ is a stopping time and τ n, then (6.3 E ( g τ Fn = gn. This is seen by looking at each of the sets {x: τ(x=l}, l = n, n + 1,..., and applying (6.1. 6.2 Inequalities. Let { f n } be a martingale, 1 p <, and assume sup f n L p = C <. a. Let τ be a stopping time for { f n }. Then b. Kolmogorov s inequality: P{sup n f τ L p C. f n λ} Cp λ p. 7 Poisson variables. The Poisson distribution with parameter λ is the measure carried by the nonnegative integers, assigning to the integer k = 0,1,2,... the mass e λ λ k k!. A Poisson variable with parameter λ is a random variable whose distribution is the Poisson distribution with parameter λ, that is an integer-valued function X on ( Ω,B, µ, satisfying: for k = 0,1,..., (7.1 P ( X = k = e λ λ k k!
12 PROBABILISTIC METHODS IN ANALYSIS We have (7.2 E ( X = e λ kλ k = e λ λ k=0 k! E ( X 2 = e λ so that k=1 k 2 λ k ( = λe λ k 1λ k=1 k! k 1 k=1 (k 1! (7.3 V (X=λ 2 + λ λ 2 = λ. λ k 1 (k 1! = λ. + k=1 λ k 1 (k 1! = λ 2 + λ. The characteristic function (7.4 χ X (ξ =E ( e iξ X = e λ (e iξ λ k = e (iξ 1λ. k=0 k! 8 Gaussian variables. The normal Gaussian density is the function g(x= 1 e x2 The measure gdx is the normal Gaussian distribution, and G(λ = λ g(xdx is the normal Gaussian distribution function. DEFINITION: A real-valued random variable X is a centered normal variable if its distribution is the normal Gaussian distribution gdx. Equivalently, X is a centerd normal variable if P ( X < λ = G(λ for all λ R. A variable X is called (centered gaussian if it is a constant multiple of a normal variable. A gaussian variable is a variable of the form Y = a+x where a is a constant and X is centered gaussian. If X is (centered normal, then E ( X = 0, and (8.1 V ( X = E ( X 2 = 1 x 2 e x2 2 dx = 1. 8.1 The characteristic function χ X (ξ of a normal variable X is the Fourier transform ĝ of g, that is (8.2 ĝ(ξ = 1 e iξ x e x2 2 dx = e ξ 2
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 13 To prove the second equality we observe that d 1 dξ e iξ x e x2 2 dx = i e iξ x xe x2 ξ 2 dx = e iξ x e x2 2 dx (the second equality by integration by parts. Since d ĝ(ξ = ξĝ(ξ and dξ d dξ e ξ 2 2 = ξ e ξ 2 2 we have d (ĝ(ξ dξ /e ξ 2 2 = 0, and since both are equal to 1 at ξ = 0 we have ĝ(ξ =e ξ 2 8.2 Taylor s theorem for e x, written for x = ξ 2 2, compared with the formula written for ĝ(ξ =e ξ 2 2 gives (8.3 e ξ 2 2 = 1 n ξ 2n 2 n n! = ĝ(2n (0ξ 2n (2n! Combining this with (3.6 we obtain for normal X (8.4 E ( X 2n = 1 x 2n e x2 2 dx =( 1nĝ (2n (0= (2n! 2 n n!, so that for n 2, 2 1 2 n < X L 2n < n, and the monotonicity in p of X L p for gaussian X gives (8.5 X L p p. exi.8.1 Observe that (8.6 0 e x2 π 2 dx = 2, xe x2 2 dx = 1, 0 and prove (using integration by parts (8.7 0 x p e x2 2 dx = p x p 2 e x2 2 dx. 0 so that E ( X p = pe ( X p 2
14 PROBABILISTIC METHODS IN ANALYSIS 8.3 Gaussian Hilbert spaces. DEFINITION: AGuassian Hilbert space H is a closed subspace of L 2( Ω,F,P all of whose elements are centered gaussian variables. Let {X j } 1 be a sequence of independent normal variables, and denote by H the closed subspace they span in L 2( Ω,B, µ. Since E ( X j = 0 it follows that {X j } 1 is in fact orthonormal and (8.8 H = {X :X = a j X j, a j 2 < }. If X = a j X j then (8.9 E ( e iξ X = E ( e iξ a jx j = e ξ 2 a j 2 2 = e X 2 ξ 2 Thus every element of H is gaussian. If X, Y H are independent then they are mutually orthogonal. Conversely, if X, Y H are mutually orthogonal then they are independent since for all a,b R, ax +by 2 = ax 2 + by 2 which, along with the assumption that ax + by are gaussian, implies that the characteristic function of the sum is the product of the characteristic functions, and by theorem 4.3 the two are independent. Without the assumption that ax + by are gaussian mutually orthogonal gaussian variables need not be independent as shown in the following example: Example. X normal, A Ω, P(A =1/2, and 11 A independent of X. Set Y = X on A, and Y = X on the complement. 9 Some basic estimates. 9.1 Lemma. Assume that X is real-valued, X a, and E ( X = 0. Then (9.1 E ( e X cosha e a2 PROOF: The distribution of (X,e X lies on the graph of y = e x above the interval a x a, and its center of gravity is (0,E ( e X. The highest point on the intersection of the y-axis with the convex hull of the graph is (0, cosha. Also cosha = a2n (2n! a2n 2 n n! = e a2 2
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 15 Remark: The inequality cosha e a2 2 is useful for small values of a; for a 2 we can use cosha e a. Assume that X j are real-valued, independent, E ( X j = 0, and Xj 1, j = 1,...,. Let a j be real-valued, a =( a 2 j 1/2 < and Y = a j X j. Then (9.2 E ( e λy = E ( e λx j e 1 2 a2 j λ 2 = e 1 2 a2 λ As E ( e a 1 λy e λ 2 P{Y aλ}, we obtain (9.3 P{Y aλ} e λ2 Applying the same inequality to Y we have (9.4 P { Y aλ } 2e λ2 Let Z j be independent complex-valued variables such that Z j 1 and E ( Z j = 0, and let a j be complex numbers, a j 2 = a Apply the estimates above separately to the real parts and the imaginary parts of a j Z j, and, if z = x + iy, we have z 2max( x, y we obtain { } (9.5 P a j Z j 2λ 4e 1 2 a 2 λ If the variables Z j are constant multiples of real-valued variables; that is Z j = a j X j where X j are real-valued and bounded by 1, a j maybe complex, a j 2 = a 2 ; we write a j = c j + id j, the decomposition to real and imaginary parts, and notice that a j 2 = c 2 j + d2 j so that if c 2 j = c2 and d 2 j = d2, then a 2 = c 2 + d 2. If a j X j > aλ then either c j X j > cλ or d j X j > dλ so that in this case the factor 2 is not needed and we have { } (9.6 P a j X j > aλ 4e λ2
16 PROBABILISTIC METHODS IN ANALYSIS 9.2 Combining (2.3 with (9.5 we obtain the following theorem: Theorem. There exists a universal constant C such that if X = a j Z j where Z j are independent complex-valued variables, Z j 1 and E ( Z j = 0; aj are complex numbers, a j 2 = a 2 ; and 2 p < ; then (9.7 X L p C p X L 2. 9.3 The Rademacher and Steinhaus variables, basic estimates. The Rademacher variables {r n } and the Steinhaus variables s n were introduced in section 4.4. The results of the previous subsection apply and give the following estimates: Proposition. Let a n be real numbers such that a n 2 = 1. Then (9.8 P { a n r n > λ } e λ2 If a 2 n = a 2 instead of 1, the inequlity reads as either (9.9 P { a n r n > aλ } e λ2 2, or, P { a n r n > λ } e λ2 2a 2 and (9.10 P { a n r n > aλ } 2e λ2 For complex a n with a 2 n = a, { } (9.11 P a n r n > aλ 4e λ2 For real or complex valued coefficients a n, if a j 2 = a 2, we have { } (9.12 P a j s j 2aλ 4e 1 2 λ 9.4 Subgaussian variables. DEFINITION: A random variable X is subgaussian if e c X 2 is integrable for some constant c > 0. The inequalities in 9.1 imply that if {Z n } are uniformly bounded centered (i.e. E ( Z n = 0, and independent; {a j } l 2, then Y = a n Z n is subgaussian. In particular, if {a j } l 2 then X = a n r n and Y = a n s n are subgaussian.
I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 17 Proposition. The variable X is subgaussian if, and only if, for some positive constants C and η and every λ > 0 λ2 η (9.13 P ( X > λ Ce PROOF: Assume E ( e c X 2 <. Then, since E ( e c X 2 e cλ 2 P ( X > λ we have P ( X > λ E ( e c X 2 e cλ On the other hand, (9.13 implies E ( e c X 2 e c(n+12 P ( X > n C e c(n+12 e η n2 2 < provided c < η/2. Corollary. Assume that the complex-valued random variables Z j are independent, E ( Z j = 0, and Z j 1. Let {a j } l Then (9.14 e aj Z j 2 L p for all p. PROOF: Given p take N such that j>n a j 2 < 1/10p, and write Y = Y +Y, with Y = a j Z j, and Y = a j Z j. j N j>n Write C 1 = sup Y j N a j. Now Y C 1 + Y. By (9.5, we have (9.13 satisfied for X = Y with η > 5p, and by the lemma e p Y 2 L If e p Y 2 L 2 then e C 1 p Y L 2 and hence e p Y 2 L 1.