Lecture Notes 9: Probability

Size: px

Start display at page:

Download "Lecture Notes 9: Probability"

Betty Powers
5 years ago
Views:

1 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 1 of 91 Lecture Notes 9: Probability Andrés Carvajal, with many revisions by Peter Hammond September, 2011; latest revision September 2018

2 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 2 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

3 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 3 of 91 Power Sets Fix an abstract set S. In case it is finite, its cardinality, denoted by #S, is the number of distinct elements of S. The power set of S is the family P(S) := {T T S} of all subsets of S. Exercise Show that the mapping P(S) T f (T ) {0, 1} S := { x s s S s S : x s {0, 1}} defined by f (T ) s = 1 T (s) = { 1 if s T 0 if s T is a bijection.

4 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 4 of 91 Constructing a Bijection Proof. Evidently the mapping T f (T ) = 1 T (s) s S {0, 1} S is defined for every T S i.e., for every T P(S). Conversely, for each x s s S {0, 1} S, there is a unique set T ( x s s S ) = {s S x s = 1} Now, given this x s s S, for each r S, one has { 1 if x r = 1 1 {s S xs=1}(r) = 0 if x r = 0 So f (T ( x s s S )) = 1 {s S xs=1}(r) r S = x r r S. This implies that T ( x s s S ) = f 1 ( x s s S ).

5 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 5 of 91 Counting a Finite Power Set The existence of the bijection P(S) T f (T ) {0, 1} S proves that when S is finite, then the cardinality of the power set is #P(S) = 2 #S. This helps explain why the power set P(S) is often denoted by 2 S.

6 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 6 of 91 Boolean Algebras, Sigma-Algebras, and Measurable Spaces The family A P(S) is a Boolean algebra on S just in case 1. A; 2. A A implies S \ A A; 3. if A, B lie in A, then A B A. The family Σ P(S) is a σ-algebra just in case it is a Boolean algebra with the stronger property: Whenever {A n } n=1 is a countably infinite family of sets in Σ, then n=1 A n Σ. The pair (S, Σ) is a measurable space just in case Σ is a σ-algebra.

7 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 7 of 91 Exercise on Boolean Algebras and Sigma-Algebras Exercise 1. Let A be a Boolean algebra on S. Prove that if A, B A, then A B A. 2. Let Σ be a σ-algebra on S. Hint Prove that if {A n } n=1 is a countably infinite family of sets in Σ, then n=1 A n Σ. 1. For part 1, use de Morgan s laws S \ (A B) = (S \ A) (S \ B) S \ (A B) = (S \ A) (S \ B) 2. For part 2, use the infinite extension of de Morgan s laws: S \ ( n=1 A n) = n=1 (S \ A n); S \ ( n=1 A n) = n=1 (S \ A n)

8 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 8 of 91 Generating a Sigma-Algebra Theorem Let {Σ i i I } be any indexed family of σ-algebras. Then the intersection Σ := i I Σ i is also a σ-algebra. Proof left as an exercise. Let X be a space, and F 2 X any family of subsets. Since 2 X is obviously a σ-algebra, there exists a non-empty set S(F) of σ-algebras that include F. Let σ(f) denote the intersection {Σ Σ S(F)}; it is the smallest σ-algebra that includes F. Exercise Let X be any uncountably infinite set, and let F := {{x} x X } denote the family of all singleton subsets of X. Show that σ(f) consists of all subsets of X that are either countable, or co-countable (i.e., have a countable complement).

9 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 9 of 91 Topological Spaces Given a set X, a topology T on X is a family of open subsets U X satisfying: 1. T and X T ; 2. if U, V T, then U V T ; 3. if {U α α A} is any family of open sets in T, then the union α A U α T. Thus, finite intersections and arbitrary unions of open sets are open. A topological space (X, T ) is any set X together with a topology T that consists of all the open subsets of X.

10 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 10 of 91 The Metric Topology Let (X, d) be any metric space. The open ball of radius r centred at x is the set B r (x) := {y X d(x, y) < r} The metric topology T d of (X, d) is the smallest topology that includes the entire family {B r (x) x X & r > 0} of all open balls in X.

11 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 11 of 91 Borel Sigma-Algebra Let (X, T ) be any topological space. Its Borel σ-algebra is defined as σ(t ) i.e., the smallest σ-algebra containing every open set of X. Suppose the topological space is a metric space (X, d) with its metric topology T d. Then the Borel σ-algebra is generated by all the open balls B r (x) := {x X d(x, x ) < r} in X. For the case of the real line when X = R, its Borel σ-algebra is generated by all the open intervals of R.

12 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 12 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

13 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 13 of 91 Finitely Additive Set Functions Let R := R { + } = [, + ] denote the extended real line which, at each end, has an endpoint added at infinity. Let R + := R + {+ } = [0, + ] be the non-negative part of R. Any family F of subsets A X is said to be pairwise disjoint just in case A B = whenever A, B F with A B. Definition Let (X, Σ) be a measurable space. A mapping µ : Σ R + is said to be a set function. The set function µ : Σ R + is said to be additive (or finitely additive) just in case, for any pair {A, B} of disjoint sets in Σ, one has µ(a B) = µ(a) + µ(b).

14 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 14 of 91 Implications of Finite Additivity Lemma If the set function µ : Σ R + is finitely additive, then µ( ) = 0. Proof. For any non-empty A Σ, the sets A and are disjoint. Additivity implies that µ(a) = µ(a ) = µ(a) + µ( ), so µ( ) = 0. Exercise For any finite collection {A n } k n=1 of pairwise disjoint sets in Σ, prove by induction on k that finite additivity implies ) µ ( k n=1a n = k µ(a n) n=1

15 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 15 of 91 Measure as a Countably Additive Set Functions Definition The set function µ : Σ R + on a measurable space (X, Σ) is said to be σ-additive or countably additive just in case, for any countable collection {A n } n=1 of pairwise disjoint sets in Σ, one has ( ) µ A n = µ(a n) n=1 n=1 A measure on a measurable space (X, Σ) is a countably additive set function µ : Σ R +.

16 Measure Space A measure space is a triple (X, Σ, µ) where 1. Σ is a σ-algebra on X ; 2. µ is a measure on the measurable space (X, Σ). Example A prominent example of a measure space is (R, B, l) where: 1. B is the Borel σ-algebra induced by the open sets of the real line R; 2. the measure l(j) of any interval J R is its length, defined for all (a, b) R 2 with a < b by l([a, b]) = l([a, b)) = l((a, b]) = l((a, b)) = b a 3. l is extended to all of B to satisfy countable additivity (it can be shown that this extension is unique). Exercise Prove that if S R is countable, then S B and l(s) = 0. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 16 of 91

17 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 17 of 91 Lebesgue Measurable Subsets of the Real Line A set N R is null just in case there exists a Borel subset B B with l(b) = 0 such that N B. This is possible for some non-borel subsets of R. Let N denote the family of null subsets of R. These null sets can be used to generate the Lebesgue σ-algebra of Lebesgue measurable sets, which is σ(b N ). The symmetric difference of any two sets S and B is defined as the set S B := (S \ B) (B \ S) = (S B) \ (S B) of elements s that belong to one of the two sets, but not to both. One can show that S σ(b N ) if and only if there exists a Borel set B B such that S B N i.e., S differs from a Borel set only by a null set.

18 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 18 of 91 The Lebesgue Real Line There is a well-defined function λ : σ(b N ) R + that satisfies λ(s) := l(b) whenever S B N. Moreover, one can prove that the function S λ(s) is countably additive. This makes λ a measure, called the Lebesgue measure. The associated measure space (R, σ(b N ), λ) is called the Lebesgue real line.

19 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 19 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

20 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 20 of 91 Measurable Partitions Let (X, Σ, µ) be a measure space. Given any set E Σ, the indicator function of E is defined by { 1 if x E X x 1 E (x) := 0 if x E The finite collection {E k k K} of pairwise disjoint measurable sets E k Σ is a measurable partition of X just in case k K E k = X.

21 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 21 of 91 Simple Functions The function f : X R is simple just in case there exist a measurable partition {E k k K} of X together with a corresponding finite collection (a k ) k K of real constants such that f (x) = k K a k1 Ek (x). Note that the range f (X ) := {y R x X : y = f (x)} of this step function is the precisely the set {a k k K} of real constants. Let F(X, Σ) denote the set of all simple functions on the measurable space (X, Σ); in fact it is a real vector space.

22 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 22 of 91 Integrating Simple Functions Given a function f : X R, whenever possible we want to define the integral X f (x) dµ = X f (x) µ(dx). The simple function f (x) = k K a k 1 Ek (x) is µ-integrable just in case one has k K a k µ(e k ) < +. In particular, this requires that a k 0 = µ(e k ) < + or equivalently, that µ(e k ) = + = a k = 0. Then we can define the integral X f (x) µ(dx) of the µ-integrable simple function f (x) = k K a k 1 Ek (x) as the real number k K a k µ(e k ).

23 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 23 of 91 Measurable Functions Given the measure space (X, Σ, µ), the function f : X R is measurable just in case, for every Borel set B R, its inverse image satisfies f 1 (B) := {x X f (x) B} Σ Note that we have defined a simple function to be measurable.

24 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 24 of 91 Upper and Lower Bounds In this subsection, we consider the case of a finite measure satisfying µ(x ) < +. In case X R and µ is Lebesgue measure, this implies that X must be bounded for example, X = [a, b]. Given any function f : X R, define the two sets F (f ; X, Σ) := {f F(f ; X, Σ) x X : f (x) f (x)} F (f ; X, Σ) := {f F(f ; X, Σ) x X : f (x) f (x)} of simple functions that are respectively upper or lower bounds for the function f.

25 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 25 of 91 Upper and Lower Integrals The integral X f (x) µ(dx) of each simple function f F (f ; X, Σ), is an over-estimate of the true integral of f. But the integral X f (x) µ(dx) of each simple function f F (f ; X, Σ), is an under-estimate of the true integral of f. Define the upper integral and lower integral of f as, respectively I (f ) := inf f F (f ;X,Σ) X f (x) µ(dx) and I (f ) := sup f F (f ;X,Σ) X f (x) µ(dx) Of course, in case f is itself a simple function, one has I (f ) = I (f ) = X f (x) µ(dx).

26 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 26 of 91 Integrability and Integration Definition 1. The function f : X R is integrable just in case the upper and lower integrals I (f ) and I (f ) are equal. 2. In case f : X R is integrable, its integral X f (x) µ(dx) is defined as the common value of its upper integral I (f ) and its lower integral I (f ).

27 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 27 of 91 A Continuum of Independent Random Variables Hotelling had a continuum of ice cream buyers distributed along a finite beach. Vickrey and Mirrlees considered optimal income taxation when a population of workers have continuous distributed skills. Aumann and Hildenbrand modelled a perfectly competitive market system with a continuum of traders who each have negligible individual influence over market prices. Market clearing in the economy as a whole requires, for each separate commodity, equality between: (i) mean demand per trader; and (ii) mean endowment per trader. Bewley considered a continuum of consumers with independently fluctuating random endowments. Then, is mean endowment in the population even defined?

28 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 28 of 91 Difficulty Illustrated In the following graphs, think of the horizontal axis as the Lebesgue unit interval, indicating something like a U.S. social security number (SSN) ( 10 9 ). Think of the horizontal axis as the Lebesgue unit interval, indicating something like an individual s height, measured as a percentile. Assume that SSN gives no information about height. Then the heights of, approximately, a continuum of individuals may be regarded as statistically IID random variables.

29 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 29 of 91 Difficulty Illustrated: 25 Random Draws The points have x { (1/25)(n 1 2 ) n {1,..., 25}} and y pseudo-randomly drawn from a uniform distribution on [0, 1].

30 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 30 of 91 Difficulty Illustrated: 200 Random Draws The points have x { (1/200)(n 1 2 ) n {1,..., 200}} and y pseudo-randomly drawn from a uniform distribution on [0, 1].

31 A Theorem on Non-Measurable Sample Paths Recall that the supremum is the least upper bound, and the infimum is the greatest lower bound. Define the essential supremum and infimum of each random variable ω f (i, ω) as: ess sup f (i, ω) := inf{b R P({ω Ω f (i, ω) b}) = 1} ess inf f (i, ω) := sup{a R P({ω Ω f (i, ω) a}) = 1} For all the IID random variables ω f (i, ω), let a R denote the common value of ess inf f (i, ω) and b R the common value of ess sup f (i, ω). Theorem Whenever a < b, the sample path I i f (i, ω) is P-a.s. non measurable. Of course, when a = b, then P-a.s. one has f (i, ω) = a = b i.e., the process (i, ω) f (i, ω) is essentially constant. The key idea in the proof is to show that the sample path has a lower integral a and an upper integral b. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 31 of 91

32 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 32 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

33 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 33 of 91 Probability Measure and Probability Space Fix a measurable space (S, Σ), where S is a set of unknown states of the world. Then Σ is a σ-algebra of unknown events. A probability measure on (S, Σ) is a measure P : Σ R + satisfying the requirement that P(S) = 1. Countable additivity implies that P(E) + P(E c ) = 1 for every event E Σ, where E c := S \ E. It follows that P(E) [0, 1] for every E Σ. A probability space is a triple (S, Σ, P) where: 1. (S, Σ) is a measurable space; 2. Σ E P(E) [0, 1] is a probability measure on (S, Σ),

34 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 34 of 91 Properties of Probability Theorem Let (S, Σ, P) be a probability space. Then the following hold for all Σ-measurable sets E, E etc. 1. P(E) 1 and P(S \ E) = 1 P(E); 2. P(E \ E ) = P(E) P(E E ) and P(E E ) = P(E) + P(E ) P(E E ); 3. for every partition {E n } m n=1 of S into m pairwise disjoint Σ-measurable sets, one has P(E) = m n=1 P(E E n); 4. P(E E ) P(E) + P(E ) P( n=1 E n) n=1 P(E n). Proof. We leave the routine proof as an exercise.

35 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 35 of 91 Two Limiting Properties Theorem Let (S, Σ, P) be a probability space, and (E n ) n=1 an infinite sequence of Σ-measurable sets. 1. If E n E n+1 for all n N, then P( n=1 E n) = lim n P(E n ) = sup n P(E n ). 2. If E n E n+1 for all n N, then P( n=1 E n) = lim n P(E n ) = inf n P(E n ).

36 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 36 of 91 Proving the Two Limiting Properties Proof. 1. Because E n E n+1 for all n N, one has E n = E 1 [ n k=2 (E k \ E k 1 )] and n=1 E n = E 1 [ k=2 (E k \ E k 1 )] where the sets E 1 and {E k \ E k 1 k = 2, 3,...} are all pairwise disjoint. Hence P(E n ) = P(E 1 ) + n k=2 P(E k \ E k 1 ) P( n=1 E n) = P(E 1 ) + k=2 P(E k \ E k 1 ) = lim n [P(E 1 ) + n k=2 P(E k \ E k 1 )] = lim n P(E n ) 2. Apply part 1 to the complements of the sets E n.

37 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 37 of 91 Conditional Probability: First Definition Let E Σ be such that P(E ) > 0. The conditional probability measure on E is the mapping Σ E P(E E ) := P(E E ) P(E ) [0, 1] The triple (E, Σ(E ), P( E )) with Σ(E ) := {E E E Σ} = {E Σ E E } is then a conditional probability space given the event E.

38 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 38 of 91 Conditional Probability: Two Key Properties Theorem Provided that P(E) (0, 1), one has P(E ) = P(E)P(E E) + (1 P(E))P(E E c ) Theorem Let (E k ) n k=1 be any finite list of sets in Σ. Provided that P( n 1 k=1 E k) > 0, one has P( n k=1 E k) = P(E 1 ) P(E 2 E 1 ) P(E 3 E 1 E 2 )... P(E n n 1 k=1 E k)

39 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 39 of 91 Independence The finite or countably infinite family {E k } k K of events in Σ is: pairwise independent if P(E E ) = P(E) P(E ) whenever E E ; independent if for any finite subfamily {E k } n k=1, one has P( n k=1 E k) = n k=1 P(E k). Exercise Let S be the set {1, 2, 3, 4, 5, 6, 7, 8, 9}, and P the probability measure on 2 S satisfying P({s}) = 1/9 for all s S. Consider the three events E 1 = {1, 2, 7}, E 2 = {3, 4, 7} and E 3 = {5, 6, 7} Are these events pairwise independent? Are they independent? Exercise Prove that if {E, E } is independent, then so is {E c, E }.

40 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 40 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

41 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 41 of 91 Random Variables Definition The function X : S R is Σ-measurable just in case for every x R one has X 1 ( (, x] ) := {s S X (s) x} Σ A random variable (with values in R) is a Σ-measurable function S s X (s) R. The distribution function or cumulative distribution function (cdf) of X is the mapping F X : R [0, 1] defined by x F X (x) = P({s S X (s) x}) = P ( X 1 ( (, x] ) )

42 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 42 of 91 Properties of Distribution Functions, I Theorem The CDF of any random variable s X (s) satisfies: 1. lim x F X (x) = 0 and lim x + F X (x) = x x implies F X (x) F X (x ). 3. lim h 0 F X (x + h) = F X (x). 4. P({s S X (s) > x}) = 1 F X (x). 5. P({s S : x < X (s) x }) = F X (x ) F X (x) whenever x < x, 6. P({s S : X (s) = x}) = F X (x) lim h 0 F X (x + h). CDFs are sometimes said to be càdlàg, which is a French acronym for continue à droite, limite à gauche (continuous on the right, limit on the left).

43 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 43 of 91 Properties of Distribution Functions, II Definition A continuity point of the CDF F X : R [0, 1] is an x R at which the mapping x F X (x) is continuous. Is it always true that lim h 0 F X (x + h) = F X (x)? Exercise Let F X : R [0, 1] be the CDF of any random variable S s X (s) R, and x R any point. Prove that the following three conditions are equivalent: 1. x is a continuity point of F X ; 2. P({s S X (s) = x}) = 0; 3. lim h 0 F X ( x + h) = F X ( x).

44 Continuous Random Variable Definition A random variable S s X (s) R is 1. continuous just in case x F X (x) is continuous; 2. absolutely continuous just in case there exists a density function R x f X (x) R + such that F X (x) = x f X (u)du for all x R. The support of the random variable S s X (s) R is the closure of the set on which F X is strictly increasing. Example The uniform distribution on a closed interval [a, b] of R has density function f and distribution function F given by f X (x) := 1 b a 1 [a,b](x) and F X (x) := 0 if x < a x a if x [a, b] b a 1 if x > b University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 44 of 91

45 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 45 of 91 The Normal or Gaussian Distribution Example The standard normal distribution on R has density function f given by f X (x) := ke 1 2 x2 where k := 1/ 2π is chosen so that + ke 1 2 x2 dx = 1. Its mean and variance are + kxe 1 2 x2 dx = lim a +a a kxe 1 2 x2 dx = lim a [ a 1 0 kxe 2 x2 dx + a 1 0 kxe 2 x2 dx] = 0 + kx 2 e 1 2 x2 dx = 1

46 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 46 of 91 The Gaussian Integral, I Define I (a) := +a 1 a e 2 x2 dx for each a R. Then ( ) ( ) [I (a)] 2 +a = 1 a e 2 x2 +a dx 1 a e 2 y 2 dy = ( ) +a +a 1 a a e 2 y 2 dy e 1 2 x2 dx = +a +a 1 a a e 2 x2 e 1 2 y 2 dxdy = S(a) e 1 2 (x2 +y 2) dxdy where S(a) := [ a, a] 2 denotes the Cartesian product of the line interval [ a, a] with itself. Thus, S(a) is the solid square subset of R 2 that is centred at the origin and has sides of length 2a.

47 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 47 of 91 The Gaussian Integral, II Let D(b) := {(x, y) R 2 x 2 + y 2 b 2 } denote the disk of radius b centred at the origin. Consider the transformation (r, θ) (x, y) = (r cos θ, r sin θ) from polar to Cartesian coordinates. The Jacobian matrix of this transformation is x/ r x/ θ y/ r y/ θ = cos θ r sin θ sin θ r cos θ = r(cos2 θ + sin 2 θ) = r It follows that changing to polar coordinates in the double integral J(b) = 1 D(b) e 2 (x2 +y 2) dx dy transforms it to J(b) = b = 0 2π 0 re 1 2 r 2 drdθ = ( b 0 ( e 1 2 r 2 ) ( ) ( b 1 ) 0 re 2 r 2 2π dr 0 1dθ ) 2π = 2π(1 e 1 2 b2 )

48 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 48 of 91 The Gaussian Integral, III Note that D(a) S(a) D(a 2) and so 1 D(a) e 2 (x2 +y 2) dx dy 1 S(a) e 2 (x2 +y 2) dx dy D(a 1 2) e 2 (x2 +y 2) dx dy It follows that J(a) [I (a)] 2 = S(a) e 1 2 (x2 +y 2) dxdy J(a 2) and so 2π(1 e 1 2 a2 ) [I (a)] 2 2π(1 e a2 ). Taking limits as a one has 2π(1 e 1 2 a2 ) 2π and also 2π(1 e a2 ) 2π, implying that [I (a)] 2 2π. Theorem The Gaussian integral + e 1 2 x2 dx equals 2π.

49 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 49 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

50 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 50 of 91 Expectation Let g : R R be any Borel function, and x f X (x) the density function of the random variable X. Whenever the integral g(x) f X (x)dx exists, the expectation of g X is defined as E(g(X )) = g(x)f X (x)dx Theorem Let g 1, g 2 : R R and a, b, c R. Then: 1. E(ag 1 (X ) + bg 2 (X ) + c) = ae(g 1 (X )) + be(g 2 (X )) + c. 2. If g 1 0, then E(g 1 (X )) If g 1 g 2, then E(g 1 (X )) E(g 2 (X )).

51 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 51 of 91 Chebychev s Inequality Theorem For any random variable S s X (s) Z, fix any measurable function g : Z R + with E[g(X (s))] < +. Then for all r > 0 one has P(g(X ) r) 1 E[g(X )]. r Proof. The two indicator functions s 1 g(x ) r (s) and s 1 g(x )<r (s) satisfy 1 g(x ) r (s) + 1 g(x )<r (s) = 1 for all s S. Because g(x (s)) 0 for all s S, one has E[g(X )] = E[{1 g(x ) r (s) + 1 g(x )<r (s)} g(x (s))] = E[1 g(x ) r (s) g(x (s))] + E[1 g(x )<r (s) g(x (s))] r E[1 g(x ) r (s)] = r P(g(X ) r) Dividing by r implies that 1 E[g(X )] P(g(X ) r). r

52 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 52 of 91 Moments and Central Moments For a random variable X and any k N: its k th (noncentral) moment is E[X k ]; its k th central moment is E[(X E[X ]) k ], assuming that E[X ] exists in R; its variance, Var X, is its second central moment.

53 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 53 of 91 Odd Central Moments of the Gaussian Distribution Let m n := + kx n e 1 2 x2 dx denote the nth central moment of the standard normal distribution. When n is odd, one has ( x) n = x n, so m n = + 0 = lim a = lim a = 0 kx n e 1 2 x2 dx a a 0 kx n e 1 2 x2 dx + lim a kx n e 1 2 x2 dx + lim a +a 0 a 0 kx n e 1 2 x2 dx kx n e 1 2 x2 dx

54 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 54 of 91 Even Central Moments of the Gaussian Distribution Now suppose n = 2r, where r N. Because d dx e 1 2 x2 = x e 1 2 x2, integrating by parts gives +a kx n e 1 2 x2 dx = a = a kx n 1 e 1 2 x2 + +a +a a +a a = k[a n 1 ( a) n 1 ]e 1 2 a2 + kx n 1 ( d dx e 1 2 x2 ) dx k(n 1)x n 2 e 1 2 x2 dx +a a k(n 1)x n 2 e 1 2 x2 dx Taking the limit as a, one obtains m n = (n 1)m n 2. Note that m 0 = 1, so when n is an even integer 2r, one has m 2r = (2r 1)(2r 3) = 2r(2r 1)(2r 2)(2r 3) r(2r 2)(2r 4) = (2r)! 2 r r!

55 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 55 of 91 Multiple Random Variables Let S s X(s) = (X n (s)) N n=1 be an N-dimensional vector of random variables defined on the probability space (S, Σ, P). Its joint distribution function is the mapping defined by R N x F X (x) := P({s S X(s) x}) The random vector X is absolutely continuous just in case there exists a density function f X : R N R + such that F X (x) = f X (u) du for all x R N u x

56 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 56 of 91 Independent Random Variables Let X be an N-dimensional vector valued random variable. If X is absolutely continuous, the marginal density R x f Xn (x) of its nth component X n is defined as the N 1-dimensional iterated integral f Xn (x) = f X (x 1,..., x n 1, x, x n+1,..., x N ) dx 1... dx N in which every variable except X n gets integrated out. The N components of X are independent just in case f X = N n=1 f X n. The infinite sequence (X n ) n=1 of random variables is independent just in case every finite subsequence (X n ) n K (K finite) is independent.

57 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 57 of 91 Expectations Let X be an N-dimensional vector valued random variable, and g : R N R a measurable function. The expectation of g(x) is defined as the N-dimensional integral E[g(X)] := g(u)f X (u) du R N Theorem If the collection (X n ) N n=1 of random variables is independent, then [ N ] E X n = N E(X n) n=1 n=1 Exercise Prove that if the pair (X 1, X 2 ) of r.v.s is independent, then its covariance satisfies Cov(X 1, X 2 ) := E[(X 1 E[X 1 ])(X 2 E[X 2 ])] = 0

58 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 58 of 91 Marginal and Conditional Density Fix the pair (X 1, X 2 ) of random variables. The marginal density of X 1 is f X1 (x 1 ) = f (X1,X 2 )(x 1, x 2 )dx 2. At points x 1 where f X1 (x 1 ) > 0, the conditional density of X 2 given that X 1 = x 1 is f X2 X 1 (x 2 x 1 ) = f (X 1,X 2 )(x 1, x 2 ) f X1 (x 1 ) Theorem If the pair (X 1, X 2 ) is independent and f X1 (x 1 ) > 0, then f X2 X 1 (x 2 x 1 ) = f X2 (x 2 )

59 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 59 of 91 Conditional Expectations Fix the pair (X 1, X 2 ) of random variables. The conditional expectation of g(x 2 ) given that X 1 = x 1 is E[g(X 2 ) X 1 = x 1 ] = g(x 2 )f X2 X 1 (x 2 x 1 )dx 2. Given any measurable function (x 1, x 2 ) g(x 1, x 2 ), the law of iterated expectations states that E f(x1,x 2 ) [g((x 1, X 2 )(s))] = E fx1 [E fx2 X 1 [g((x 1, X 2 )(s))]] Proof. E [g] = f(x1,x 2 ) R g(x 2 1, x 2 ) f (X1,X 2 )(x 1, x 2 ) dx 1 dx 2 = [ R R g(x ] 1, x 2 ) f X2 X 1 (x 2 x 1 ) dx 2 fx1 (x 1 ) dx 1 = E fx1 [E fx2 X 1 g(x 1, x 2 )]]

60 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 60 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

61 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 61 of 91 Convergence of Random Variables The sequence (X n ) n=1 of random variables: converges in probability to X (written as X n p X ) just in case, for all ɛ > 0, one has lim P( X n X < ɛ) = 1. n converges in distribution to X (written as X n d X ) just in case, for all x at which F X is continuous, lim F X n n (x) = F X (x)

62 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 62 of 91 Definition of Weak Convergence Definition Let (X, Σ, P) be any probability space. Then a continuity set of (X, Σ, P) is any set B Σ whose boundary B satisfies P( B) = 0. Definition Let (X, d) be a metric space with its Borel -algebra Σ. A sequence (P n ) n N of probability measures on the measurable space (X, Σ) converges weakly to the probability measure P, written P n P, just in case P n (B) P(B) as n for any continuity set of (X, Σ, P).

63 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 63 of 91 Portmanteau Theorem Theorem Let P and (P n ) n N be probability measures on the measurable space (X, Σ). Then P n P if and only if: 1. for all bounded continuous functions f : X R, one has: f (x)p n (dx) f (x)p(dx) X 2. lim sup P n (C) P(C) for every closed subset C X ; n 3. lim inf P n(u) P(U) for every open set U X. n X

64 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 64 of 91 Convergence of Distribution Functions Theorem Let F and (F n ) n N be cumulative distribution functions on R with associated probability measures P and (P n ) n N on the Lebesgue real line that satisfy F (x) = P((, x]) and F n (x) = P n ((, x]) (n N) on the measurable space (X, Σ). Then P n P if and only if F n (x) F (x) for all x at which F is continuous.

65 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 65 of 91 The Weak Law of Large Numbers The sequence (X n ) n=1 of random variables is i.i.d. i.e., independently and identically distributed just in case 1. it is independent, and 2. for every Borel set D R, one has P(X n D) = P(X n D). The weak law of large numbers: Let (X n ) n=1 be i.i.d. with E(X n) = µ. Define the sequence ( ( X 1 ) n ) n n=1 := n X k k=1 n=1 of sample means. Then, X n p µ.

66 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 66 of 91 A Frequentist Interpretation of Probability Prove the following: Let γ = p(x Ω) (0, 1). Consider the following experiment: n realizations of X are taken independently. Let G n be the relative frequency with which a realization in Ω is obtained in the experiment. Then, G n p γ.

67 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 67 of 91 The Central Limit Theorem The central limit theorem: Let (X k ) k=1 be an infinite sequence of i.i.d. random variables with common mean E(X k ) = µ and variance V(X k ) = σ 2. For each n N, define X n := 1 n n k=1 X k as the sample average of n observations. Then: 1. E( X n ) = µ and V( X n ) = 1 n 2 n k=1 V(X k) = nσ2 n 2 = σ2 n ; 2. For each n N, the random variable Z n := n X n µ σ is standardized in the sense that E[Z n ] = 0 and E[Z 2 n ] = 1. d 3. One has Z n Y where Y has the standard normal cdf given by F Y (x) = 1 x 2π e 1 2 u2 du for all x R. In particular, E(Y ) = 0 and E(Y 2 ) = 1.

68 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 68 of 91 The Fundamental Theorems Let (X n ) n=1 be i.i.d., with E[X n] = µ and V(X n ) = σ 2. Then: by the law of large numbers, X n p µ; so X n d µ; but by the central limit theorem, X n µ (σ/ n) d 1 x 2π e 1 2 u2 du.

69 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 69 of 91 Concepts of Convergence, I Definition Say that the sequence X n of random variables converges almost surely or with probability 1 or strongly towards X just in case lim inf n P({ω Ω X n(ω) X (ω) < ɛ}) = 1 Hence, the values of X n approach those of X, in the sense that the event that X n (ω) does not converge to X (ω) has probability 0. Almost sure convergence is often denoted by X n X, P a.s. with the letters a.s. over the arrow that indicates convergence.

70 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 70 of 91 Concepts of Convergence, II For generic random elements X n on a metric space (S, d), almost sure convergence is defined similarly, replacing the absolute value X n (ω) X (ω) by the distance d(x n (ω), X (ω)). Almost sure convergence implies convergence in probability, and a fortiori convergence in distribution. It is the notion of convergence used in the strong law of large numbers.

71 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 71 of 91 The Strong Law Definition The strong law of large numbers (or SLLN) states that the sample average X n converges almost surely to the expected value µ = EX. It is this law (rather than the weak LLN) that justifies the intuitive interpretation of the expected value of a random variable as its long-term average when sampling repeatedly.

72 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 72 of 91 Differences Between the Weak and Strong Laws The weak law states that for a specified large n, the average X n is likely to be near µ. This leaves open the possibility that X n µ ɛ happens an infinite number of times, although at infrequent intervals. The strong law shows that this almost surely will not occur. In particular, it implies that with probability 1, for any ɛ > 0 there exists n ɛ such that X n µ < ɛ holds for all n > n ɛ.

73 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 73 of 91 Monte Carlo Integration Because of the strong law of large numbers, here is one way to approximate numerically the integral K f (x) µ(dx) of a complicated function of l variables, where K R l has an l-dimensional Lebesgue measure µ(k) < First, choose a large sample x r n r=1 of n points that are independent and identically distributed random draws from the set K, with common probability measure π satisfying π(b) = µ(b)/µ(k) for all Borel sets B K. 2. Second, calculate the sample average function value n M n ( x k n r=1) := 1 n f r=1 (xk ) 3. Third, observe that, by the strong law of large numbers, the sample average M n ( x r n r=1 ) converges almost surely as n to the theoretical mean E π [f (x k )] = f (x) π(dx) = 1 f (x) µ(dx) µ(k) K K

74 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 74 of 91 A Process with a Continuum of IID Random Variables Let B denote the Borel σ-field on R, and let I denote the unit interval [0, 1]. Definition A process with a continuum of iid random variables on the Lebesgue unit interval (I, L, λ) is: a sample probability space (Ω, F, P); a mapping I Ω (i, ω) f (i, ω) R satisfying P ( n N {ω Ω f (i n, ω) B n }) for every countable collection (i N, B N ) of pairs (i, B) I B. = n N P ({ω Ω f (i n, ω) B n })

75 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 75 of 91 The Monte Carlo Integral: Rescuing Macroeconomics Definition Given the process I Ω (i, ω) f (i, ω) R with a continuum of iid random variables, define the Monte Carlo integral as the random variable Ω ω MC f (i, ω)λ(di) R I as the almost sure limit as n of the average 1 n n k=1 f (i k, ω) when the n points i k n k=1 are independent draws from the Lebesgue unit interval (I, L, λ). Then, even though the Lebesgue integral I f (i, ω)λ(di) is almost surely undefined, the strong law of large numbers implies that the Monte Carlo integral MC I f (i, ω)λ(di) is well defined as a degenerate random variable Ω ω δ m (ω) that attaches probability one to the common theoretical mean m := Ω f (i, ω) P(dω).

76 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 76 of 91 Outline Measures and Integrals Measurable Spaces Measure Spaces Lebesgue Integration Integrating Simple Functions Integrating Integrable Functions Non-integrability for Macroeconomists Kolmogorov s Definition of Probability Random Variables and Their Distribution and Density Functions Expected Values Limit Theorems Convergence Results

77 Moment-Generating Functions Definition The nth moment about the origin is defined as m n := E[X n ]. This may not exist for large n unless the random variable X is essentially bounded, meaning that there exists an upper bound x such that P({ω Ω X (ω) x}) = 1. Definition The moment-generating function of a random variable X is wherever this expectation exists. At t = 0, of course, M X (0) = 1. R t M X (t) := E[e tx ] For t 0, however, unless X is essentially bounded above, the expectation typically may not exist because e tx can be unbounded. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 77 of 91

78 The Gaussian Case For a normal or Gaussian distribution N(µ, σ 2 ), even though the random variable is unbounded, the tails of the distribution are thin enough to ensure that the moment generating function exists and is given by M(t; µ, σ 2 ) = E[e tx ] = = e tx 1 2πσ 2 e (x µ)2 /2σ 2 dx 1 2πσ 2 etx (x µ)2 /2σ 2 dx Now make the substitution y = (x µ σ 2 t)/σ, implying that dx = σdy and that (x µ)2 tx 2σ 2 = 1 2 y 2 + µt σ2 t 2 This transforms the integral to M(t; µ, σ 2 ) = 1 2π e 1 2 y 2 e µt+ 1 2 σ2 t 2 dy = e µt+ 1 2 σ2 t 2 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 78 of 91

79 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 79 of 91 From Moment-Generating Functions to Moments Note that e tx = 1 + tx + t2 X 2 2! + t3 X 3 3! + + tn X n + = n! n=0 t n X n Taking the expectation term by term and then using the definition of the moments of the distribution, one obtains M X (t) = E[e tx ] = 1 + te[x ] + t2 2! E[X 2 ] + t3 3! E[X 3 ] + + tn n! E[X n ] + = 1 + tm 1 + t2 2! m 2 + t3 3! m tk k! m k + = t k k=0 k! m k n!

80 Derivatives of the Moment-Generating Function Suppose we find the nth derivative with respect to t of M X (t) = t k k=0 k! m k. One can easily proved by induction on n that d n dt n tk = k(k 1)(k 2)... (k n + 1)t k n = k! (k n)! tk n So differentiating t M X (t) term by term n times, one obtains [ ] M (n) d n X (t) = E dt n etx = k! t k n k=n (k n)! k! m k = t k n k=n (k n)! m k Putting t = 0 yields the equality M (n) t0 X (0) = 0! m n = m n. In this sense, the moment-generating function does exponentially generate the moments of the probability distribution. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 80 of 91

81 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 81 of 91 Definition of Characteristic Functions The moment-generating function may not exist because the expectation need not converge absolutely. By contrast, the expectation of the bounded function e itx always lies in the unit disc of the complex plane C. So the characteristic function that we are about to introduce always exists, which makes it more useful in many contexts. Definition For a scalar random variable X with CDF x F X (x), the characteristic function is defined as the (complex) expected value of e itx = cos tx + i sin tx, where i = 1 is the imaginary unit, and t R is the argument of the characteristic function: R t φ X (t) = Ee itx = + e itx df X (x) C

82 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 82 of 91 Gaussian Case Consider a normally distributed random variable X with mean µ and variance σ 2. Its characteristic function can be found by replacing t by it in the expression for the moment M(t; µ, σ 2 ) = E[e tx ] = e µt+ 1 2 σ2 t 2 Recalling that (it) 2 = t 2, the result is ϕ(t; µ, σ 2 ) = e itx 1 2πσ 2 e 1 2 (x µ)2 /σ 2 dx = e iµt 1 2 σ2 t 2 In the standard normal or N(0, 1) case, when µ = 0 and σ 2 = 1, one has ϕ(t; 0, 1) = e 1 2 t2.

83 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 83 of 91 Use of Characteristic Functions Characteristic functions can be used to give superficially simple proofs of both the LLN and the classical central limit theorems. The following merely sketches the argument. For much more careful detail, see Richard M. Dudley s major text, Real Analysis and Probability. A key tool is Lévy s continuity theorem. For a sequence of random variables, this connects convergence in distribution to pointwise convergence of their characteristic functions.

84 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 84 of 91 Statement of Lévy s Continuity Theorem Theorem Suppose (X n ) n=1 is a sequence of random variables, not necessarily sharing a common probability space, with the corresponding sequence R t ϕ n (t) = Ee itxn C (n N) of complex-valued characteristic functions. If X n converges in distribution to the random variable X, then t ϕ n (t) converges pointwise to t ϕ(t) = Ee itx, the characteristic function of X. Conversely, if t ϕ n (t) converges pointwise to a function t ϕ(t) which is continuous at t = 0, then t ϕ(t) is the characteristic function Ee itx of a random variable X, and X n converges in distribution to X.

85 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 85 of 91 Linear Approximation to the Characteristic Function Suppose that the random variable X has a mean µ X := EX = xdf (x). One can then differentiate within the expectation to obtain [ ] d d dt EeitX = E dt eitx = E[iXe itx ] Consider the linear approximation Ee ihx = 1 + i[µ + ξ(h)]h where ξ(h) := (Ee ihx 1 ihµ)/h By l Hôpital s rule, one has lim ξ(h) = 0/0 = lim h 0 h 0 (E[iXeihX ] iµ)/1 = E[iX ] iµ = 0

86 Quadratic Approximation to the Characteristic Function Next, suppose that the random variable X has not only a mean µ X := xdf (x), but also a variance σx 2 := (x µ)2 df (x). One can then differentiate twice within the expectation to obtain d 2 [ ] d 2 dt 2 EeitX = E dt 2 eitx = E[(iX ) 2 e itx ] = E[X 2 e itx ] Consider the quadratic approximation Ee ihx = 1 + iµh 1 2 [σ2 + µ 2 + η(h)]h 2 where η(h) := (1/h 2 )[Ee ihx 1 ihµ] (σ2 + µ 2 ). Applying l Hôpital s rule twice, one has lim h 0 1 h 2 [EeihX 1 ihµ] = 0/0 = 1 = 0/0 = lim h 0 2 E[(iX )2 e ihx ] = 1 2 EX 2 = 1 2 (σ2 + µ 2 ) implying that η(h) 0 as h 0. 1 lim h 0 2h (E[iXeihX ] iµ) University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 86 of 91

87 A Useful Lemma Lemma Suppose that R h ζ(h) C satisfies ζ(h) 0 as h 0. Then for all z C, one has {1 + 1 n [z + ζ(1/n)]}n e z as n. For a sketch proof, first one can show that lim {1 + 1 n n [z + ζ(1/n)]}n = lim (1 + 1 n n z)n Second, in case z R, putting h = 1/n and taking logs gives [ ln lim (1 + 1 ] [ ] n n z)n = ln lim (1 + hz)1/h h 0 1 = lim h 0 h [ln(1 + hz) ln 1] = d dh implying that (1 + 1 n z)n e z as n. ln(1 + hz) = z h=0 Dealing with the case when z is complex is more tricky. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 87 of 91

88 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 88 of 91 Sketch Proof of the Weak LLN, I Consider now any infinite sequence X 1, X 2,... of observations of IID random variables drawn from a common CDF F (x) on R, with common characteristic function t ϕ X (t) = E[e itx ]. For each n N, let X n := 1 n n j=1 denote the random variable whose value is the sample mean of the first n observations. This sample mean has its own characteristic function [ ϕ Xn (t) := E[e it X n n ] = E j=1 eitx j /n ] Then ϕ Xn (t) = n j=1 E[eitX j /n ] = ( ) n E[e itx /n ] because the random variables X j are respectively independently and identically distributed.

89 Sketch Proof of the Weak LLN, II Suppose we take the linear approximation Ee ihx = 1 + i[µ + ξ(h)]h where ξ(h) 0 as h 0 and replace h by t/n to obtain E[e it X n ] = {1 + (it/n)[µ + ξ(t/n)]} n Because ξ(t/n) 0 as n and so h = t/n 0, one has lim {1 + 1 n n it[µ + ξ(t/n)]}n = lim (1 + 1 n n itµ)n = e itµ = E[e ity ] where E[e ity ] is the characteristic function of a degenerate random variable Y which is equal to µ with probability 1. Using the Lévy theorem, it follows that the distribution of X n converges to this degenerate distribution, implying that X n converges to µ in probability. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 89 of 91

90 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 90 of 91 Sketch Proof of the Classical CLT, I For each j N, let Z j denote the standardized value (X j µ)/σ of X j, defined to have the property that EZ j = 0 and EZ 2 j = 1. Now define Z n := n j=1 Z j n. This is called the standardized mean because: 1. linearity implies that E Z n = 1 n n j=1 EZ j = 0; 2. independence implies that E Z 2 n = 1 n n j=1 EZ 2 j = 1. Putting µ = 0 and σ 2 = 1 in the quadratic approximation Ee ihx = 1 + iµh 1 2 [σ2 + µ 2 + η(h)]h 2 implies Ee ihz = [1 + η(h)]h2 where η(h) 0 as h 0. Replacing hx by tz j / n in this quadratic approximation yields E[e itz j / n ] = t 2 [1 + η(t/n)] n

91 University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 91 of 91 Sketch Proof of the Classical CLT, II Now independence implies that [ ( E[e it Z n ] = E exp it 1 )] n Z j n j=1 = n Hence, another careful limiting argument shows that E[e it Z n ] = { But we showed that this limit e 1 2 t2 is precisely the characteristic function of a standard normal distribution N(0, 1). j=1 E[eitZ j / n] t 2 } n [1 + η(t/n)] e 1 2 t2 as n n Because t e 1 2 t2 is continuous at t = 0, the central limit theorem follows from the Lévy continuity theorem, which confirms that the convergence of characteristic functions implies convergence in distribution.

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability