Some basic elements of Probability Theory

Similar documents
Probability and Measure

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

Part II Probability and Measure

Lecture 6 Basic Probability

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Real Analysis Problems

Probability Theory. Richard F. Bass

THEOREMS, ETC., FOR MATH 515

Measurable functions are approximately nice, even if look terrible.

Topics in Harmonic Analysis Lecture 1: The Fourier transform

MATHS 730 FC Lecture Notes March 5, Introduction

Real Analysis Notes. Thomas Goller

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

We denote the space of distributions on Ω by D ( Ω) 2.

CHAPTER VIII HILBERT SPACES

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

A D VA N C E D P R O B A B I L - I T Y

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Functional Analysis I

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

1 Fourier Integrals of finite measures.

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Tools from Lebesgue integration

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

THEOREMS, ETC., FOR MATH 516

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

REAL AND COMPLEX ANALYSIS

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

1.3.1 Definition and Basic Properties of Convolution

Pervasive Function Spaces

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

1 Inner Product Space

The main results about probability measures are the following two facts:

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

for all x,y [a,b]. The Lipschitz constant of f is the infimum of constants C with this property.

Independent random variables

9 Brownian Motion: Construction

Measure-theoretic probability

STAT 7032 Probability Spring Wlodek Bryc

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Brownian Motion and Conditional Probability

CONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects...

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 Integration and Expectation

Introduction to Functional Analysis

4 Expectation & the Lebesgue Theorems

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Commutative Banach algebras 79

Weak Topologies, Reflexivity, Adjoint operators

9 Radon-Nikodym theorem and conditioning

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Advanced Probability

Introduction and Preliminaries

Measure Theory & Integration

Random Process Lecture 1. Fundamentals of Probability

TD 1: Hilbert Spaces and Applications

Integral Jensen inequality

CHAPTER I THE RIESZ REPRESENTATION THEOREM

1.1. MEASURES AND INTEGRALS

Gaussian automorphisms whose ergodic self-joinings are Gaussian

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

7 Convergence in R d and in Metric Spaces

Elementary Probability. Exam Number 38119

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Analysis II Lecture notes

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d

Problem Set. Problem Set #1. Math 5322, Fall March 4, 2002 ANSWERS

1 Continuity Classes C m (Ω)

Fourier Series. ,..., e ixn ). Conversely, each 2π-periodic function φ : R n C induces a unique φ : T n C for which φ(e ix 1

Wiener Measure and Brownian Motion

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

I. ANALYSIS; PROBABILITY

Harmonic Functions and Brownian motion

1 Presessional Probability

1. If 1, ω, ω 2, -----, ω 9 are the 10 th roots of unity, then (1 + ω) (1 + ω 2 ) (1 + ω 9 ) is A) 1 B) 1 C) 10 D) 0

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

Notes on Distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

2.4 Hilbert Spaces. Outline

Lecture Notes Introduction to Ergodic Theory

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.

Bernstein s inequality and Nikolsky s inequality for R d

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Transcription:

Chapter I Some basic elements of Probability Theory 1 Terminology (and elementary observations Probability theory and the material covered in a basic Real Variables course have much in common. However the terminology is often different, this for historical reasons and because the focus of Probability theory is somewhat different. In this section we introduce some of the standard terminology and basic facts of Probability. Additional results that we shall need will be introduced as needed. 2.1 Definitions: 2 Terminology conventions. a. A real-valued random variable: A measurable real-valued function on some probability measure space ( Ω,F,P. Similarly, a complex-valued random variable, a vector-valued random variable (aka random vector. The range space of a random vector will be a topological vector space, typically R n, C n, etc. Random variables are often denoted by X, Y, etc. but may be denoted by other letters, e.g., by f, g, r, ϕ, etc. Example. A Bernoulli variable (with success probability p is a variable X that assumes two values: 1 with probability p, and 0 with probability 1 p. b. Event: A measurable set, element of F ; Example. If X is a real-valued random variable, then the event {X λ} is the set {ω :X(ω λ}. c. The probability of an event A is its measure: P(A. Events of probability 1 are said to hold almost surely or a.s (almost everywhere or a.e. in the standard laguage of measure theory.

2 PROBABILISTIC METHODS IN ANALYSIS d. The field F X of a real valued variable X is the sub-sigma-algebra of F spanned by the events {X I}, I R an open set. Similarly for complex valued or vector-valued variables. e. The distribution of a random variable X is the image of P under X; it is a probability measure on the range of X. f. The (cumulative distribution function of a real-valued random variable X is the function F X (λ=p{x λ}; In this case the distribution of X is simply the measure df X. For a single random variable X the distribution df X is the complete information offered by X. The variables X and Y are similar if they have the same distribution. g. The joint distribution of k random variables X 1,...,X k, is, by definition, the distribution of the random vector (X 1,...,X k R k. It is the measure on R k, image of P under (X 1,...,X k. The corresponding distribution function is The function on R k defined by F X1,...,X k (λ 1,...,λ k =P{X 1 λ 1,...,X k λ k } = P( k j=1{x j λ j }. h. The expectation E ( X of a random variable X: It is its integral, (assuming that X is integrable. E ( X = XdP = λ df X (λ. i. The moment of order k of a random variable X: It is E ( X k. Similar definitions are used for random vectors, with the absolute value replaced by the norm. j. The variance V (X of a random variable X: assuming that X has a (finite second moment, i.e. X L 2( Ω,F,P, the variance is defined by V (X=E ( (X E ( X 2 = E ( X 2 (E ( X (V (X= means that X is not square integrable.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 3 2.2 Theorem. If Φ is continuous on R then Φ X is integrable (has expectation, has first moment on ( Ω,F,P if, and only if Φ is integrable df X, and then (2.1 E ( Φ X = Φ(λdF X (λ. 2.3 Proposition. Let X and Y be nonnegative random variables such that F X (λ F Y (λ for all λ [0,. If Φ(λ is monotone nondecreasing then (2.2 Φ(λdF X (λ Φ(λdF Y (λ. 2.4 Conditional expectation. If X is an integrable random variable, so that XP is a (signed measure on F, and D F is a sub sigma-algebra, the conditional expectation of X given D, denoted E ( X D is defined as follows: The restriction of P to D is a probability measure P D on D, and the restriction (XP D of XP to D is a (signed measure on D, absolutely continuous with respect to P D. The Radon-Nikodym derivative of (XP D with respect to P D is denoted E ( X D and called the conditional expectation of X given D. It is D-measurable, and its integrals on elements of D are equal to those of X on the same sets. In particular, if D = F Y for some random variable Y we write E ( X FY simply as E ( X Y (the conditional expectation of X given Y. Observations: If B 1 B 2 are subalgebras of F, and f L p( Ω,F,P, p 1, then (2.3 E ( E ( f B 2 B 1 = E ( f B 1 and (2.4 E ( fg = E ( ge ( f g. If 1 < p <, then E ( f B 1 p E ( f B 2 p, with equality only if the functions are equal; (the uniform convexity of L p( Ω,B, µ. 2.5 The following are also know as the weak type inequalities. Theorem (Chebishev s inequalities. For p > 0, (2.5 P{ X λ} 1 λ p E( X p.

4 PROBABILISTIC METHODS IN ANALYSIS 2.6 Lemma. Let X be a non-negative random variable with E ( X 2 <. Then for 0 < λ < 1, (2.6 P({ω :X(ω λe ( X } (1 λ 2 E( X 2 E ( X 2. PROOF: Denote A = {ω :X(ω λe ( X }, a = P(A. As X λe ( X on the complement of A, the contribution of A to E ( X is at least (1 λe ( X, which means that average of X on A is at least a 1 (1 λe ( X, and A s contribution to E ( X 2 is at least a 1 (1 λ 2 E ( X It follows that a 1 (1 λ 2 E ( X 2 E ( X 2, i.e., a (1 λ 2 E ( X 2 /E ( X 2. 3 The characteristic function The characteristic function of a real-valued random variable X is the Fourier- Stieltjes transform χ X (ξ of its distribution. Taking Φ(X=e iξ X in equation (2.1, we have (3.1 χ X (ξ =E ( e iξ X = e iξ x df X (x. 3.1 As the name suggests, the characteristic function χ X (ξ determines the distribution df X (x of X. This is the uniqueness theorem for Fourier-Stieltjes transforms on R, an immediate consequence of the Parseval s formula. Theorem (Parseval s formula. Let µ be a finite measure on R and let f be a continuous function in L 1 (R such that ˆf L 1 ( ˆR. Then 2 (3.2 f (xdµ(x= 1 PROOF: By [15], VI.1.12, (page 158, f (x= 1 hence f (xdµ(x= 1 ˆf (ξ ˆµ( ξ dξ. ˆf (ξ e iξ x dξ ; ˆf (ξ e iξ x dµ(xdξ = 1 ˆf (ξ ˆµ( ξ dξ. 2 Notice that (3.2 is equivalent to f (xdµ(x=1/ ˆf (ξ ˆµ(ξ dξ.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 5 Corollary (uniqueness theorem. If ˆµ(ξ = 0 for all ξ, then µ = 0. If ϕ(x= R Ψ(ξ eiξ x dξ, with Ψ smooth of compact support on R, then ϕ(xdf X (x= Ψ(ξ e iξ x dξ df X (x (3.3 = e iξ x df X (xψ(ξ dξ = χ X (ξ Ψ(ξ dξ. Since such ϕ s are sufficient to determine measures on R: if µ 1, µ 2 are finite measures on R and (3.4 ϕdµ 1 = ϕdµ 2 for all such ϕ s, then µ 1 = µ 2, it follows that if χ X1 = χ X2 then df X1 = df X 3.2 Similarly, the characteristic function of an R n -valued random vector V is the Fourier-Stieltjes transform χ V (ξ of its distribution on R n. Taking Φ(V = e iξ V in equation (2.1, we have (3.5 χ V (ξ =E ( e iξ X = e iξ v df V (v. 3.3 If X has finite moment of order k, then χ X (ξ is k-times continuously differentiable. For j k, ( d j ( d (3.6 χ je dξ X (ξ = iξλ df X (λ= (iλ j e iξλ df X (λ. dξ and χ ( j X (ξ E( X j for j k. Other properties: (3.7 χ cx (ξ =E ( e iξ cx = E ( e icξ X = χ X (cξ so that χ ( j cx (ξ =cj χ ( j X (cξ. 4 Independence. 4.1 Let F j,1 j k be sub-sigma-algebras of F. DEFINITION: have The algebras F j are independent if for A j F j,1 j k we j=1 (4.1 P { k } k A j = P{A j }. The variables X 1,...,X k are independent if F Xj are independent. An infinite set of variables is independent if any finite subset thereof is independent. j

6 PROBABILISTIC METHODS IN ANALYSIS 4.2 If X is a random variable, and f is a continuous function on the range of X then F f (X F X. It follows that if X j are independent and f j are continuous functions on the ranges of X j such that f j (X j have finite expectation (are integrable, then (4.2 E ( f j (X j = E ( f j (X j. 4.3 For real-valued variables the condition (4.1 is equivalent to: For all real λ 1,...,λ k we have (4.3 F X1,...,X k (λ 1,...,λ k = F Xj (λ j, j that is, the k-dimensional distribution of the vector (X 1,...,X k is the direct product of the distributions of its components. The convolution µ j of k probability measures {µ j } k j=1 on R can be defined by the condition that for all (bounded continuous functions ϕ on R (4.4 ϕ d ( µ j = ϕ(x 1 + + x k dµ 1 (x 1 dµ k (x k. In other words, ( µ j is the image of the product measure µ1 µ k under the projection onto R of R k modulo the subspace of codimension 1 defined by {(x 1,...,x k : x k = 0}. A single projection does not identify a probability measure on R n, but projections modulo all subspaces of codimention 1 does, and we have the following theorem. Theorem. The following conditions are equivalent: a. The variables X 1,...,X k are independent. b. For all (a 1,...,a k R k, df k 1 a j X j = df a j X j (convolution product. c. For all (a 1,...,a k R k, χ aj X j (ξ = χ a j X j (ξ.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 7 4.4 Bernoulli, Rademacher and Steinhaus. Very useful sequences of independent variables are the Bernoulli, Rademacher and the Steinhaus sequences. The Bernoulli variables ξ n,δ, where δ (0,1, are independent copies of the Bernoulli variable that takes the value 1(success with probability δ and the value 0 (failure with probability (1 δ. The Rademacher variables {r n }, are independent random variables, taking the values 1 and 1 with probability 1 2 each. The classical (concrete representation is as follows: the probability space is the interval [0, 1], endowed with the Lebesgue measure. If we denote by ε n (x the coefficients in the binary expansion of x [0,1, that is write x = ε j (x2 j, with ε j (x either zero or one, we can take r n (x = ( 1 εn(x. Another common representation of the Rademacher variables is as functions on the Cantor group D, the direct product of a sequence of groups of order 2. This is the group of all sequences {ε n } n=1, ε n = ±1, with the group action defined as pointwise multiplication. Endowed with the topology of pointwise convergence (the product topology it is compact, homeomrphic to the Cantor set. The Haar measure on D is the product measure of the 1 2, 1 2 measure on the components. The Rademacher functions can be taken as the coordinate functionss on D, and are characters on the group, generating its dual group. The Steinhaus variables ω n are independent copies of a real-valued variable with the Lebesgue measure on [0,1] as distribution. The variables s n are, by definition, e iω n. These are independent, with uniform distribution on the unit circle, {z: z = 1} and are often referred to as Steinhaus variables as well. An equivalent definition of s n, natural in uses in harmonic analysis, is to consider them as characters, or coordinate functions, on the group 3 T N, which can be taken, endowed with the Haar measure, as the probability space Ω. 4.5 Pairwise independence. Notice that pairwise independence does not imply independence; if X j = ±1 with probability 1/2 each, j = 1,2 and we set X 3 = X 1 X 2, then if X 1, X 2 are independent, the trio {X j } 3 j=1 is pairwise 3 Countable infinite product of the circle group T.

8 PROBABILISTIC METHODS IN ANALYSIS independent but not independent. 4.6 Independence orthogonality Pairwise independent random variables X j with E(X j =0 and finite variance (square summable are orthogonal in L 2( Ω,B, µ since, for i j, (4.5 E ( X i X j = E ( Xi E ( Xj = 0. Notice that the converse, orthogonality implies independence is false in general. We shall see later (see 8.3 special situations in which the converse does hold. 5 The zero one law. This theorem states that an event defined by a sequence of independent variables, which is a tail event, that is independent of any finite subset of the defining variables, is trivial: its probability is either zero or one. Similarly, a variable defined by a sequence of independent variables, which is independent of any finite subset of these, is equal to a constant a.s. EXAMPLES a. For a sequence of real-valued variables {X n } and a R, limsup n X n is independent of any finite subset of the sequence, so that {limsup n X n > a} is a tail event. If X n are independent variables, then limsup n X n = Const a.s. (The constant may be +. b. A more interesting example: a random Taylor series is a series of the form (5.1 F(z= X n z n. where X n are independent numerical variables. Given r > 0, the event the series (5.1 converges in the disc {z: z < r} is a tail event, hence of probability zero or one. Therefore, The radius of convergence of the series is constant a.s. Borel expressed (in 1896 the belief that the circle of convergence of a general random Taylor series is a.s. a natural boundary, that is, a.s. the series admits no analytic continuation. An analytic continuation across a given arc I (on the a.s. common circle of convergence is a tail event, and therefore is a.s. true or a.s. false.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 9 Assume that the a.s. radius of convergence of series (5.1 is r > 0. If the circle {z: z = r} is not an a.s. natural boundary, there is an arc I of length ar > 0 such that a.s. F has an analytic continuation across I. Consider first the case of symmetric X n (X n X n. Take N > a, and write (for l = 0,...,N 1 Y n = { Xn if n l (mod N X n otherwise. Y n z n is similar to X n z n, hence can be continued, a.s., across I, and so can (X n +Y n z n = 2z l X jn+l z jn. Since the periodicity of X jn+l z jn is /N < I, it extends analytically beyond the original circle of convergence. This is the case for every 0 l < N, and hence for their sum which is the original series a contradiction. Without the assumption of symmetry of the coefficients X n the claim is false! (e.g., constant X n. The correct statement is a theorem conjectured by D. Blackwell and proven by Ryll Nardzewski (1953 stating: Theorem. Let X n be independent numerical variables. Assume that the almost sure radius of convergence R of the series F(z= X n z n is finite. Then there exists a constant series b n z n such that the radius of convergence of the series G(z = (X n b n z n is at least R, and such that G has the circle of convergence as a natural boundary. PROOF: Denote the probability space on which the X n s are defined by ( Ω,B, µ. Let ( Ω,B, µ be a copy of ( Ω,B, µ and {X n} a copy of {X n }. Consider the series F(z= (X n (ω X n(ω z n. Since the series is the difference X n z n X nz n, and both series converge a.s. for z < R, the a.s. radius of convergence R of (X n (ω X n(ω z n is at least R. Since our coefficients are now symmetric, the circle of convergence is a.s. a natural boundary. By Fubini s theorem, we can take b n = X n(ω with almost every (fixed ω Ω, use this to define G and have our claim satisfied.

10 PROBABILISTIC METHODS IN ANALYSIS 5.1 Theorem (Borel Cantelli. Let ( Ω,F,P be a probability space and for all n let 11 En be the indicator function of an event E n F. a. If P{E n } < then 11 En converges a.s. (The set of points that belong to infinitely many E n s has zero measure. b. If P{E n } =, and if {11 En } are independent, then 11 En = a.s. (The set of points that belong to infinitely many E n s has full measure. Notice that in part a. there is no assumption of independence while the assumption in part b. that {11 En } are independent is crucial: if E n =(0,1/n in the unit interval endowed with the Lebesgue measure, then P{E n } = and the set of points that belong to infinitely many E n s is empty. EXERCISES FOR SECTION 5. exi.5.1 Show that the conclusion of b. above is false without the assumption of independence. exi.5.2 Show that if { f i } i=1 is independent and if F i are continuous real-valued functions on R, then {F i f i } is independent. 6 Martingales. Many results concerning sums of independent random variables are valid in the more general framework of martingales. DEFINITION: A (discrete time Martingale is a sequence {g n } of (integrable random variables which satisfy the condition E ( g n+1 Fn = gn, where F n is the span of the fields F g j, j n. The condition is equivalent to: for all l > n, (6.1 E ( g l F n = gn. Notice that if {g n } is a martingale then g n L p for every p 1. Examples: is monotone non-decreasing a. Let {ϕ n } be independent. The sequence of partial sums g n = n 1 ϕ j is a martingale if, and only if E ( ϕ j = 0 for all j > 1.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 11 b. Let f be integrable, {G n } a monotone increasing sequence of sigma-algebras in B, and f n = E ( f G n. 6.1 Stopping time. Given a monotone increasing sequence F n of sigmaalgebras (in B, a stopping time for {F n } is a (positive integer-valued function τ such that (6.2 {τ(x k} = {x:τ(x k} F k. Notice that if τ is a stopping time and τ n, then (6.3 E ( g τ Fn = gn. This is seen by looking at each of the sets {x: τ(x=l}, l = n, n + 1,..., and applying (6.1. 6.2 Inequalities. Let { f n } be a martingale, 1 p <, and assume sup f n L p = C <. a. Let τ be a stopping time for { f n }. Then b. Kolmogorov s inequality: P{sup n f τ L p C. f n λ} Cp λ p. 7 Poisson variables. The Poisson distribution with parameter λ is the measure carried by the nonnegative integers, assigning to the integer k = 0,1,2,... the mass e λ λ k k!. A Poisson variable with parameter λ is a random variable whose distribution is the Poisson distribution with parameter λ, that is an integer-valued function X on ( Ω,B, µ, satisfying: for k = 0,1,..., (7.1 P ( X = k = e λ λ k k!

12 PROBABILISTIC METHODS IN ANALYSIS We have (7.2 E ( X = e λ kλ k = e λ λ k=0 k! E ( X 2 = e λ so that k=1 k 2 λ k ( = λe λ k 1λ k=1 k! k 1 k=1 (k 1! (7.3 V (X=λ 2 + λ λ 2 = λ. λ k 1 (k 1! = λ. + k=1 λ k 1 (k 1! = λ 2 + λ. The characteristic function (7.4 χ X (ξ =E ( e iξ X = e λ (e iξ λ k = e (iξ 1λ. k=0 k! 8 Gaussian variables. The normal Gaussian density is the function g(x= 1 e x2 The measure gdx is the normal Gaussian distribution, and G(λ = λ g(xdx is the normal Gaussian distribution function. DEFINITION: A real-valued random variable X is a centered normal variable if its distribution is the normal Gaussian distribution gdx. Equivalently, X is a centerd normal variable if P ( X < λ = G(λ for all λ R. A variable X is called (centered gaussian if it is a constant multiple of a normal variable. A gaussian variable is a variable of the form Y = a+x where a is a constant and X is centered gaussian. If X is (centered normal, then E ( X = 0, and (8.1 V ( X = E ( X 2 = 1 x 2 e x2 2 dx = 1. 8.1 The characteristic function χ X (ξ of a normal variable X is the Fourier transform ĝ of g, that is (8.2 ĝ(ξ = 1 e iξ x e x2 2 dx = e ξ 2

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 13 To prove the second equality we observe that d 1 dξ e iξ x e x2 2 dx = i e iξ x xe x2 ξ 2 dx = e iξ x e x2 2 dx (the second equality by integration by parts. Since d ĝ(ξ = ξĝ(ξ and dξ d dξ e ξ 2 2 = ξ e ξ 2 2 we have d (ĝ(ξ dξ /e ξ 2 2 = 0, and since both are equal to 1 at ξ = 0 we have ĝ(ξ =e ξ 2 8.2 Taylor s theorem for e x, written for x = ξ 2 2, compared with the formula written for ĝ(ξ =e ξ 2 2 gives (8.3 e ξ 2 2 = 1 n ξ 2n 2 n n! = ĝ(2n (0ξ 2n (2n! Combining this with (3.6 we obtain for normal X (8.4 E ( X 2n = 1 x 2n e x2 2 dx =( 1nĝ (2n (0= (2n! 2 n n!, so that for n 2, 2 1 2 n < X L 2n < n, and the monotonicity in p of X L p for gaussian X gives (8.5 X L p p. exi.8.1 Observe that (8.6 0 e x2 π 2 dx = 2, xe x2 2 dx = 1, 0 and prove (using integration by parts (8.7 0 x p e x2 2 dx = p x p 2 e x2 2 dx. 0 so that E ( X p = pe ( X p 2

14 PROBABILISTIC METHODS IN ANALYSIS 8.3 Gaussian Hilbert spaces. DEFINITION: AGuassian Hilbert space H is a closed subspace of L 2( Ω,F,P all of whose elements are centered gaussian variables. Let {X j } 1 be a sequence of independent normal variables, and denote by H the closed subspace they span in L 2( Ω,B, µ. Since E ( X j = 0 it follows that {X j } 1 is in fact orthonormal and (8.8 H = {X :X = a j X j, a j 2 < }. If X = a j X j then (8.9 E ( e iξ X = E ( e iξ a jx j = e ξ 2 a j 2 2 = e X 2 ξ 2 Thus every element of H is gaussian. If X, Y H are independent then they are mutually orthogonal. Conversely, if X, Y H are mutually orthogonal then they are independent since for all a,b R, ax +by 2 = ax 2 + by 2 which, along with the assumption that ax + by are gaussian, implies that the characteristic function of the sum is the product of the characteristic functions, and by theorem 4.3 the two are independent. Without the assumption that ax + by are gaussian mutually orthogonal gaussian variables need not be independent as shown in the following example: Example. X normal, A Ω, P(A =1/2, and 11 A independent of X. Set Y = X on A, and Y = X on the complement. 9 Some basic estimates. 9.1 Lemma. Assume that X is real-valued, X a, and E ( X = 0. Then (9.1 E ( e X cosha e a2 PROOF: The distribution of (X,e X lies on the graph of y = e x above the interval a x a, and its center of gravity is (0,E ( e X. The highest point on the intersection of the y-axis with the convex hull of the graph is (0, cosha. Also cosha = a2n (2n! a2n 2 n n! = e a2 2

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 15 Remark: The inequality cosha e a2 2 is useful for small values of a; for a 2 we can use cosha e a. Assume that X j are real-valued, independent, E ( X j = 0, and Xj 1, j = 1,...,. Let a j be real-valued, a =( a 2 j 1/2 < and Y = a j X j. Then (9.2 E ( e λy = E ( e λx j e 1 2 a2 j λ 2 = e 1 2 a2 λ As E ( e a 1 λy e λ 2 P{Y aλ}, we obtain (9.3 P{Y aλ} e λ2 Applying the same inequality to Y we have (9.4 P { Y aλ } 2e λ2 Let Z j be independent complex-valued variables such that Z j 1 and E ( Z j = 0, and let a j be complex numbers, a j 2 = a Apply the estimates above separately to the real parts and the imaginary parts of a j Z j, and, if z = x + iy, we have z 2max( x, y we obtain { } (9.5 P a j Z j 2λ 4e 1 2 a 2 λ If the variables Z j are constant multiples of real-valued variables; that is Z j = a j X j where X j are real-valued and bounded by 1, a j maybe complex, a j 2 = a 2 ; we write a j = c j + id j, the decomposition to real and imaginary parts, and notice that a j 2 = c 2 j + d2 j so that if c 2 j = c2 and d 2 j = d2, then a 2 = c 2 + d 2. If a j X j > aλ then either c j X j > cλ or d j X j > dλ so that in this case the factor 2 is not needed and we have { } (9.6 P a j X j > aλ 4e λ2

16 PROBABILISTIC METHODS IN ANALYSIS 9.2 Combining (2.3 with (9.5 we obtain the following theorem: Theorem. There exists a universal constant C such that if X = a j Z j where Z j are independent complex-valued variables, Z j 1 and E ( Z j = 0; aj are complex numbers, a j 2 = a 2 ; and 2 p < ; then (9.7 X L p C p X L 2. 9.3 The Rademacher and Steinhaus variables, basic estimates. The Rademacher variables {r n } and the Steinhaus variables s n were introduced in section 4.4. The results of the previous subsection apply and give the following estimates: Proposition. Let a n be real numbers such that a n 2 = 1. Then (9.8 P { a n r n > λ } e λ2 If a 2 n = a 2 instead of 1, the inequlity reads as either (9.9 P { a n r n > aλ } e λ2 2, or, P { a n r n > λ } e λ2 2a 2 and (9.10 P { a n r n > aλ } 2e λ2 For complex a n with a 2 n = a, { } (9.11 P a n r n > aλ 4e λ2 For real or complex valued coefficients a n, if a j 2 = a 2, we have { } (9.12 P a j s j 2aλ 4e 1 2 λ 9.4 Subgaussian variables. DEFINITION: A random variable X is subgaussian if e c X 2 is integrable for some constant c > 0. The inequalities in 9.1 imply that if {Z n } are uniformly bounded centered (i.e. E ( Z n = 0, and independent; {a j } l 2, then Y = a n Z n is subgaussian. In particular, if {a j } l 2 then X = a n r n and Y = a n s n are subgaussian.

I. SOME BASIC ELEMENTS OF PROBABILITY THEORY 17 Proposition. The variable X is subgaussian if, and only if, for some positive constants C and η and every λ > 0 λ2 η (9.13 P ( X > λ Ce PROOF: Assume E ( e c X 2 <. Then, since E ( e c X 2 e cλ 2 P ( X > λ we have P ( X > λ E ( e c X 2 e cλ On the other hand, (9.13 implies E ( e c X 2 e c(n+12 P ( X > n C e c(n+12 e η n2 2 < provided c < η/2. Corollary. Assume that the complex-valued random variables Z j are independent, E ( Z j = 0, and Z j 1. Let {a j } l Then (9.14 e aj Z j 2 L p for all p. PROOF: Given p take N such that j>n a j 2 < 1/10p, and write Y = Y +Y, with Y = a j Z j, and Y = a j Z j. j N j>n Write C 1 = sup Y j N a j. Now Y C 1 + Y. By (9.5, we have (9.13 satisfied for X = Y with η > 5p, and by the lemma e p Y 2 L If e p Y 2 L 2 then e C 1 p Y L 2 and hence e p Y 2 L 1.