Elements of Probability Theory

Size: px
Start display at page:

Download "Elements of Probability Theory"

Transcription

1 Chapter 5 Elements of Probability Theory The purpose of this chapter is to summarize some important concepts and results in probability theory. Of particular interest to us are the limit theorems which are powerful tools to analyze the convergence behaviors of econometric estimators and test statistics. These properties are the core of the asymptotic analysis in subsequent chapters. For a more complete and thorough treatment of probability theory see Davidson (1994) and other probability textbooks, such as Ash (1972) and Billingsley (1979). Bierens (1994), Gallant (1997) and White (2001) also provide concise coverages of the topics in this chapter. Many results here are taken freely from the references cited above; we will not refer to them again in the text unless it is necessary. 5.1 Probability Space and Random Variables Probability Space The probability space associated with a random experiment is determined by three components: the outcome space Ω whose element ω is an outcome of the experiment, a collection of events F whose elements are subsets of Ω, and a probability measure IP assigned to the elements in F. Given the subset A of Ω, its complement is defined as A c = {ω Ω: ω A}. Inthe probability space (Ω, F, IP), F is a σ-algebra (σ-field) in the sense that it satisfies the following requirements. 1. Ω F. 2. If A F,thenA c F. 3. If A 1,A 2,... are in F, then n=1 A n F. 111

2 112 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY The first and second properties together imply that Ω c = is also in F. Combining the second and third properties we have from de Morgan s law that ( ) c A n = A c n F. n=1 n=1 A σ-algebra is thus closed under complementation, countable union and countable intersection. The probability measure IP : F [0, 1] is a real-valued set function satisfying the following axioms. 1. IP(Ω) = IP(A) 0 for all A F. 3. if A 1,A 2,... F are disjoint, then IP( n=1 A n )= n=1 IP(A n ). From these axioms we easily deduce that IP( ) =0,IP(A c )=1 IP(A), IP(A) IP(B) if A B, and IP(A B) =IP(A)+IP(B) IP(A B). Moreover, if {A n } is an increasing (decreasing) sequence in F with the limiting set A, then lim n IP(A n )=IP(A). Let C be a collection of subsets of Ω. The intersection of all the σ-algebras that contain C is the smallest σ-algebra containing C; see Exercise 5.1. This σ-algebra is referred to as the σ-algebra generated by C, denoted as σ(c). When Ω = R, theborel field is the σ-algebra generated by all open intervals (a, b) inr, usually denoted as B d. Note that open intervals, closed intervals [a, b], half-open intervals (a, b] orhalf lines (,b] can be obtained from each other by taking complement, union and/or intersection. For example, (a, b] = n=1 ( a, b + 1 ), (a, b) = n n=1 ( a, b 1 ]. n Thus, the collection of all closed intervals (half-open intervals, half lines) generates the same Borel field. This is why open intervals, closed intervals, half-open intervals and half lines are also known as Borel sets. TheBorelfieldonR d, denoted as B d, is generated by all open hypercubes: (a 1,b 1 ) (a 2,b 2 ) (a d,b d ).

3 5.1. PROBABILITY SPACE AND RANDOM VARIABLES 113 Equivalently, B d can be generated by all closed hypercubes: [a 1,b 1 ] [a 2,b 2 ] [a d,b d ], or by (,b 1 ] (,b 2 ] (,b d ]. The sets that generate the Borel field B d are all Borel sets Random Variables A random variable z defined on (Ω, F, IP) is a function z :Ω R such that for every B in the Borel field B, itsinverse image of B is in F, i.e., z 1 (B) ={ω : z(ω) B} F. We also say that z is a F/B-measurable (or simply F-measurable) function. Nonmeasurable functions are very exceptional in practice and hence are not of general interest. Given the random outcome ω, the resulting value z(ω) isknownasarealization of z. The realization of z varies with ω and hence is governed by the random mechanism of the underlying experiment. A R d -valued random variable (random vector) z defined on (Ω, F, IP) is a function z :Ω R d such that for every B B d, z 1 (B) ={ω : z(ω) B} F; that is, z is a F/B d -measurable function. Given the random vector z, its inverse images z 1 (B) formaσ-algebra, denoted as σ(z). This σ-algebra must be in F, and it is the smallest σ-algebra contained in F such that z is measurable. This is known as the σ-algebra generated by z or, more intuitively, the information set associated with z. A function g : R R is said to be B-measurable or Borel measurable if {ζ R: g(ζ) b} B. If z is a random variable defined on (Ω, F, IP), then g(z) is also a random variable defined on the same probability space provided that g is Borel measurable. Note that the functions we usually encounter (e.g., continuous functions and integrable functions) are Borel measurable. Similarly, for the d-dimensional random vector z, g(z) isa random variable provided that g is B d -measurable.

4 114 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Recall from Section 2.1 that the joint distribution function ofz is the non-decreasing, right-continuous function F z such that for ζ =(ζ 1... ζ d ) R d, F z (ζ) =IP{ω Ω: z 1 (ω) ζ 1,...,z d (ω) ζ d }, with lim F z (ζ) =0, lim ζ 1,..., ζ d F z (ζ) =1. ζ 1,..., ζ d The marginal distribution function of the i th component of z is such that F zi (ζ i )=IP{ω Ω: z i (ω) ζ i } = F z (,...,,ζ i,,..., ). Note that while IP is a set function defined on F, the distribution function of z is a point function defined on R d. Two random variables y and z are said to be (pairwise) independent if, and only if, for any Borel sets B 1 and B 2, IP(y B 1 and z B 2 )=IP(y B 1 )IP(z B 2 ). This immediately leads to the standard definition of independence: y and z are independent if, and only if, their joint distribution is the product of their marginal distributions, as in Section 2.1. A sequence of random variables {z i } is said to be totally independent if ( ) IP {z i B i } = IP(z i B i ), all i all i for any Borel sets B i. In what follows, a totally independent sequence will be referred to an independent sequence or a sequence of independent variables for convenience. For an independent sequence, we have the following generalization of Lemma 2.1. Lemma 5.1 Let {z i } be a sequence of independent random variables and h i, i = 1, 2,..., be Borel-measurable functions. Then {h i (z i )} is also a sequence of independent random variables Moments and Norms The expectation of the i th element of z is IE(z i )= z i (ω) dip(ω), Ω

5 5.1. PROBABILITY SPACE AND RANDOM VARIABLES 115 where the right-hand side is a Lebesgue integral. In view of the distribution function defined above, a change of ω causes the realization of z to change so that IE(z i )= ζ i df z (ζ) = ζ i df zi (ζ i ), R d R where F zi is the marginal distribution function of the i th component of z, as defined in Section 2.2. For the Borel measurable function g of z, IE[g(z)] = g(z(ω)) d IP(ω) = g(ζ) df z (ζ). Ω R d Other moments, such as variance and covariance, can also be defined as Lebesgue integrals with respect to the probability measure; see Section 2.2. A function g is said to be convex on a set S if for any a [0, 1] and any x, y in S, g ( ax +(1 a)y ) ag(x)+(1 a)g(y); g is concave on S if the inequality above is reversed. For example, g(x) =x 2 is convex, and g(x) = logx for x > 0 is concave. The result below is concerned with convex (concave) transformations of random variables. Lemma 5.2 (Jensen) For the Borel measurable function g that is convex on the support of the integrable random variable z, suppose that g(z) is also integrable. Then, g(ie(z)) IE[g(z)]; the inequality reverses if g is concave. For the random variable z with finite p th moment, let z p =[IE(z p )] 1/p denote its L p -norm. Also define the inner product of two square integrable random variables z i and z j as their cross moment: z i,z j =IE(z i z j ). Then, L 2 -norm can be obtained from the inner product as z i 2 = z i,z i 1/2. It is easily seen that for any c>0andp>0, c p IP( z c) =c p 1 {ζ: ζ c} df z (ζ) ζ p df z (ζ) IE z p, {ζ: ζ c} where 1 {ζ: ζ c} is the indicator function which equals one if ζ c and equals zero otherwise. This establishes the following result.

6 116 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Lemma 5.3 (Markov) Let z be a random variable with finite p th moment. Then, IE z p IP( z c) c p, where c is a positive real number. For p =2,Lemma5.3isalsoknownastheChebyshev inequality. Ifc is small such that IE z p /c p > 1, Markov s inequality is trivial. When c becomes large, the probability that z assumes very extreme values will be vanishing at the rate c p. Another useful result in probability theory is stated below without proof. Lemma 5.4 (Hölder) Let y be a random variable with finite p th moment (p >1) and z a random variable with finite q th moment (q = p/(p 1)). Then, IE yz y p z q. For p =2,wehaveIE yz y 2 z 2. By noting that IE(yz) < IE yz, we immediately have the next result; cf. Lemma 2.3. Lemma 5.5 (Cauchy-Schwartz) Let y and z be two square integrable random variables. Then, IE(yz) y 2 z 2. Let y =1andx = z p. Then for q>pand r = q/p, Hölder s inequality also ensures that IE z p x r y r/(r 1) =[IE(z pr )] 1/r =[IE(z q )] p/q. This shows that when a random variable has finite q th moment, it must also have finite p th moment for any p<q, as stated below. Lemma 5.6 (Liapunov) Let z be a random variable with finite q th moment. Then for p<q, z p z q. The inequality below states that the L p -norm of a finite sum is less than the sum of individual L p -norms.

7 5.2. CONDITIONAL DISTRIBUTIONS AND MOMENTS 117 Lemma 5.7 (Minkowski) Let z i, i =1,...,n, be random variables with finite p th moment (p 1). Then, n n z i z i p. i=1 p i=1 When there are only two random variables in the sum, this is just the triangle inequality for L p -norms; see also Exercise Conditional Distributions and Moments Given two events A and B in F, ifitisknownthatb has occurred, the outcome space is restricted to B, so that the outcomes of A must be in A B. The likelihood of A is thus characterized by the conditional probability IP(A B) =IP(A B)/ IP(B), for IP(B) 0. ItcanbeshownthatIP( B) satisfies the axioms for probability measures; see Exerise 5.4. This concept is readily extended to construct conditional density function and conditional distribution function Conditional Distributions Let y and z denote two integrable random vectors such that z has the density function f z.forf y (η) 0, define the conditional density function of z given y = η as f z y (ζ y = η) = f z,y(ζ, η), f y (η) which is clearly non-negative whenever it is defined. This function also integrates to one on R d because f z y (ζ y = η) dζ = 1 R d f y (η) f z,y (ζ, η) dζ = 1 R d f y (η) f y (η) =1. Thus, f z y is a legitimate density function. For example, the bivariate density function of two random variables z and y forms a surface on the zy-plane. By fixing y = η, we obtain a cross section (slice) under this surface. Dividing the joint density by the marginal density f y (η) amounts to adjusting the height of this slice so that the resulting area integrates to one.

8 118 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Given the conditional density function f z y,wehavefora B d, IP(z A y = η) = f z y (ζ y = η) dζ. A Note that this conditional probability is defined even when IP(y = η) maybezero. In particular, when A =(,ζ 1 ] (,ζ d ], we obtain the conditional distribution function: F z y (ζ y = η) =IP(z 1 ζ 1,...,z d ζ d y = η). When z and y are independent, the conditional density (distribution) simply reduces to the unconditional density (distribution) Conditional Moments Analogous to unconditional expectation, the conditional expectation of the integrable random variable z i given the information y = η is IE(z i y = η) = ζ i df z y (ζ i y = η); R the conditional expectation of the random vector z is IE(z y = η) which is defined elementwise. By allowing y to vary across all possible values η, we obtain the conditional expectation function IE(z y) whose value depends on η, the realization of y. Thus, IE(z y) is a function of y and hence also a random vector. More generally, the conditional expectation can be defined by taking a suitable σ-algebra as the conditioning set. Let G be a sub-σ-algebra of F. The conditional expectation IE(z G) is the integrable and G-measurable random variable satisfying IE(z G)dIP= z dip, G G. G G This definition basically says that the conditional expectation with respect to G is such that its weighted sum is the same as that of z over any G in G. Suppose that G is the trivial σ-algebra {Ω, }, i.e., the smallest σ-algebra that contains no extra information from any random vectors. For the conditional expectation with respect to the trivial σ-algebra, it is readily seen that it must be a constant c with probability one so as to be measurable with respect to {Ω, }. Then, IE(z) = z dip= c dip=c. Ω Ω

9 5.2. CONDITIONAL DISTRIBUTIONS AND MOMENTS 119 That is, the conditional expectation with respect to the trivial σ-algebra is the unconditional expectation IE(z). Consider now G = σ(y), the σ-algebra generated by y. We also write IE(z y) =IE[z σ(y)], which is interpreted as the prediction of z given all the information associated with y. Similar to unconditional expectations, conditional expectations are monotonic: if z x with probability one, then IE(z G) IE(x G) with probability one; in particular, if z is non-negative with probability one, then IE(z G) 0 with probability one. Moreover, If z is independent of y, thenie(z y) =IE(z). For example, when z is a constant vector c which is independent of any random variable, IE(z y) =c. The linearity result below is analogous to Lemma 2.2 for unconditional expectations. Lemma 5.8 Let z (d 1) andy (c 1) be integrable random vectors and A (n d) and B (n c) be non-stochastic matrices. Then with probability one, IE(Az + By G)=AIE(z G)+BIE(y G). If b (n 1) isanon-stochasticvector,ie(az + b G)=AIE(z G)+b with probability one. From the definition of conditional expectation, we immediately have IE[IE(z G)] = IE(z G)dIP= z dip=ie(z); Ω Ω this is known as the law of iterated expectations. This result also suggests that if conditional expectations are taken sequentially with respect to a collection of nested σ-algebras, only the smallest σ-algebra matters. For example, for k random vectors y 1,...,y k, IE[IE(z y 1,...,y k ) y 1,...,y k 1 ]=IE(z y 1,...,y k 1 ). A formal result is stated below; see Exercise 5.5. Lemma 5.9 (Law of Iterated Expectations) Let G and H be two sub-σ-algebras of F such that G H. Then for the integrable random vector z, IE[IE(z H) G]=IE[IE(z G) H]=IE(z G); in particular, IE[IE(z G)] = IE(z).

10 120 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY If z is G-measurable, all the information resulted from z is already contained in G so that z can be treated as known in IE(z G) and taken out from the conditional expectation. That is, IE(z G) = z with probability one. Hence, IE(zx G)=zIE(x G). In particular, z can be taken out from the conditional expectation when z itself is a conditioning variable. This result is generalized as follows. Lemma 5.10 Let z be a G-measurable random vector. Then for any Borel-measurable function g, IE[g(z)x G]=g(z)IE(x G), with probability one. Two square integrable random variables z and y are said to be orthogonal if their inner product IE(zy) = 0. This definition allows us to discuss orthogonal projection in the space of square integrable random vectors. Let z be a square integrable random variable and z be a G-measurable random variable. Then, by Lemma 5.9 (law of iterated expectations) and Lemma 5.10, IE [( z IE(z G) ) z ] =IE [IE [( z IE(z G) ) z G ]] =IE [ IE(z G) z IE(z G) z ] =0. That is, the difference between z and its conditional expectation IE(z G)mustbe orthogonal to any G-measurable random variable. It can then be seen that for any square integrable, G-measurable random variable z, IE(z z) 2 =IE[z IE(z G)+IE(z G) z] 2 =IE[z IE(z G)] 2 +IE[IE(z G) z] 2 IE[z IE(z G)] 2, where in the second equality the cross-product term vanishes because both IE(z G) and z are G-measurable and hence orthogonal to z IE(z G). That is, among all G-measurable random variables that are also square integrable, IE(z G) is the closest to z in terms of the L 2 -norm. This shows that IE(z G) is the orthogonal projection of z onto the space of all G-measurable, square integrable random variables.

11 5.2. CONDITIONAL DISTRIBUTIONS AND MOMENTS 121 Lemma 5.11 Let z be a square integrable random variable. Then IE[z IE(z G)] 2 IE(z z) 2, for any G-measurable random variable z. In particular, let G = σ(y), where y is a square integrable random vector. Lemma 5.11 implies that IE [ z IE ( z σ(y) )] 2 IE ( z h(y) ) 2, for any Borel-measurable function h such that h(y) is also square integrable. Thus, IE[z σ(y)] minimizes the L 2 -norm z h(y) 2, and its difference from z is orthogonal to any function of y that is also square integrable. We may then say that, given all the information generated from y, IE[z σ(y)] is the best approximation of z in terms of the L 2 -norm (or simply the best L 2 predictor). The conditional variance-covariance matrix of z given y is var(z y) =IE ( [z IE(z y)][z IE(z y)] y ) =IE(zz y) IE(z y)ie(z y). Similar to unconditional variance-covariance matrix, we have for non-stochastic matrices A and b, var(az + b y) =A var(z y) A, which is nonsingular provided that A has full row rank and var(z y) is positive definite. It can also be shown that var(z) =IE[var(z y)] + var ( IE(z y) ) ; see Exercise 5.6. That is, the variance of z can be expressed as the sum of two components: the mean of its conditional variance and the variance of its conditional mean. This is also known as the decomposition of analysis of variance. Example 5.12 Suppose that (y x ) is distributed as a multivariate normal random vector: [ ] ([ ] [ ]) y μ N y Σ, y Σ xy. x μ x Σ xy Σ x

12 122 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY It is easy to see that the conditional density function of y given x, obtained from dividing the multivariate normal density function of y and x by the normal density of x, is also normal with the conditional mean IE(y x) =μ y Σ xyσ 1 x (x μ x ), and the conditional variance-covariance matrix var(y x) =var(y) var ( IE(y x) ) = Σ y Σ xyσ 1 x Σ xy. Note that IE(y x) is a linear function of x and that var(y x) doesnotvarywithx. 5.3 Modes of Convergence Consider now a sequence of random variables {z n (ω)} n=1,2,... defined on the probability space (Ω, F, IP). For a given ω, {z n } is a realization (a sequence of sample values) of the random element ω with the index n, and that for a given n, z n is a random variable which assumes different values depending on ω. In this section we will discuss various modes of convergence for sequences of random variables Almost Sure Convergence We first introduce the concept of almost sure convergence (convergence with probability one). Suppose that {z n } is a sequence of random variables and z is a random variable, all defined on the probability space (Ω, F, IP). The sequence {z n } is said to converge to z almost surely if, and only if, IP(ω : z n (ω) z(ω) asn )=1, a.s. denoted as z n z or z n z a.s. Note that for a given ω, the realization z n (ω) may or may not converge to z(ω). Almost sure convergence requires that z n (ω) z(ω) for almost all ω Ω, except for those ω in a set with probability zero. That is, almost all the realizations z n (ω) will be eventually close to z(ω) for all n sufficiently large; the event that z n will not approach z is improbable. When z n and z are both R d -valued, almost sure convergence is defined elementwise. That is, z n z a.s. if every element of z n converges almost surely to the corresponding element of z. The following result shows that continuous transformation preserves almost sure convergence. Lemma 5.13 Let g : R R be a function continuous on S g R.

13 5.3. MODES OF CONVERGENCE 123 a.s. [a] If z n z, wherez is a random variable such that IP(z S g )=1,theng(z n ) a.s. g(z). [b] If z n a.s. c, wherec is a real number at which g is continuous, then g(z n ) a.s. g(c). Proof: Let Ω 0 = {ω : z n (ω) z(ω)} and Ω 1 = {ω : z(ω) S g }.Thus,forω (Ω 0 Ω 1 ), continuity of g ensures that g(z n (ω)) g(z(ω)). Note that (Ω 0 Ω 1 ) c =Ω c 0 Ωc 1, which has probability zero because IP(Ω c 0 )=IP(Ωc 1 ) = 0. It follows that Ω 0 Ω 1 has probability one. This proves that g(z n ) g(z) with probability one. The second assertion is just a special case of the first result. z n Lemma 5.13 is easily generalized to R d -valued random variables. a.s. z implies For example, z 1,n + z 2,n z 1,n z 2,n z 2 1,n + z 2 2,n a.s. z 1 + z 2, a.s. z 1 z 2, a.s. z z 2 2, where z 1,n,z 2,n are two elements of z n and z 1,z 2 are the corresponding elements of z. Also, provided that z 2 0 with probability one, z 1,n /z 2,n z 1 /z 2 a.s Convergence in Probability A convergence concept that is weaker than almost sure convergence is convergence in probability. A sequence of random variables {z n } is said to converge to z in probability if for every ɛ>0, lim IP(ω : z n n (ω) z(ω) >ɛ)=0, or equivalently, lim IP(ω : z n n (ω) z(ω) ɛ) =1, IP denoted as z n z or z n z in probability. We also say that z is the probability limit of z n, denoted as plim z n = z. In particular, if the probability limit of z n is a constant c, all the probability mass of z n will concentrate around c when n becomes

14 124 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY large. For R d -valued random variables z n and z, convergence in probability is also defined elementwise. In the definition of convergence in probability, the events Ω n (ɛ) ={ω : z n (ω) z(ω) ɛ} vary with n, and convergence is referred to the probabilities of such events: p n =IP(Ω n (ɛ)), rather than the random variables z n. By contrast, almost sure convergence is related directly to the behaviors of random variables. For convergence in probability, the event Ω n that z n will be close to z becomes highly likely when n tends to infinity, or its complement (z n will deviate from z by a certain distance) becomes highly unlikely when n tends to infinity. Whether z n will converge to z is not of any concern in convergence in probability. More specifically, let Ω 0 denote the set of ω such that z n (ω) convergestoz(ω). For ω Ω 0,thereissomem such that ω is in Ω n (ɛ) for all n>m.thatis, Ω 0 Ω n (ɛ) F. m=1 n=m As n=m Ω n (ɛ) isalsoinf and non-decreasing in m, it follows that ( ) ( ) IP(Ω 0 ) IP Ω n (ɛ) = lim IP Ω m n (ɛ) m=1 n=m n=m lim m IP( Ω m (ɛ) ). This inequality proves that almost sure convergence implies convergence in probability, but the converse is not true in general. We state this result below. Lemma 5.14 If z n a.s. IP z, thenz n z. The following well-known example shows that when there is convergence in probability, the random variables themselves may not even converge for any ω. Example 5.15 Let Ω = [0, 1] and IP be the Lebesgue measure (i.e., IP{(a, b]} = b a for (a, b] [0, 1]). Consider the sequence {I n } of intervals [0, 1], [0, 1/2), [1/2, 1], [0, 1/3), [1/3, 2/3), [2/3, 1],..., and let z n = 1 In be the indicator function of I n : z n (ω) =1if ω I n and z n = 0 otherwise. When n tends to infinity, I n shrinks toward a singleton which has the Lebesgue measure zero. For 0 <ɛ<1, we then have IP( z n >ɛ)=ip(i n ) 0, IP which shows z n 0. On the other hand, it is easy to see that each ω [0, 1] must be covered by infinitely many intervals. Thus, given any ω [0, 1], z n (ω) = 1 for

15 5.3. MODES OF CONVERGENCE 125 infinitely many n, and hence z n (ω) does not converge to zero. Note that convergence in probability permits z n to deviate from the probability limit infinitely often, but almost sure convergence does not, except for those ω in the set of probability zero. Intuitively, when z n has finite variance such that var(z n ) vanishes asymptotically, the distribution of z n would shrink toward its mean IE(z n ). If, in addition, IE(z n ) tends to a constant c (or IE(z n ) = c), then z n ought to be degenerate at c in the limit. These observations suggest the following sufficient conditions for convergence in probability; see Exercises 5.7 and 5.8. In many cases, it is easier to establish convergence in probability by verifying these conditions. Lemma 5.16 Let {z n } be a sequence of square integrable random variables. If IE(z n ) IP c and var(z n ) 0, thenz n c. Analogous to Lemma 5.13, continuous functions also preserve convergence in probability. Lemma 5.17 Let g : R R be a function continuous on S g R. IP [a] If z n z, wherez is a random variable such that IP(z S g )=1,theng(z n ) IP g(z). [b] (Slutsky) If z n g(z n ) IP g(c). IP c, wherec is a real number at which g is continuous, then Proof: By the continuity of g, foreachɛ>0, we can find a δ>0 such that {ω : z n (ω) z(ω) δ} {ω : z(ω) S g } {ω : g(z n (ω)) g(z(ω)) ɛ}. Taking complementation of both sides and noting that the complement of {ω : z(ω) S g } has probability zero, we have IP( g(z n ) g(z) >ɛ) IP( z n z >δ). As z n converges to z in probability, the right-hand side converges to zero and so does the left-hand side.

16 126 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY z n Lemma 5.17 is readily generalized to R d -valued random variables. IP z implies For instance, z 1,n + z 2,n z 1,n z 2,n z 2 1,n + z 2 2,n IP z 1 + z 2, IP z 1 z 2, IP z z 2 2, where z 1,n,z 2,n are two elements of z n and z 1,z 2 are the corresponding elements of z. IP Also, provided that z 2 0 with probability one, z 1,n /z 2,n z 1 /z Convergence in Distribution Another convergence mode, known as convergence in distribution or convergence in law, concerns the behavior of the distribution functions of random variables. Let F zn and F z be the distribution functions of z n and z, respectively. A sequence of random variables D {z n } is said to converge to z in distribution, denoted as z n z, if lim F n z n (ζ) =F z (ζ), for every continuity point ζ of F z. That is, regardless the distributions of z n,convergence in distribution ensures that F zn will be arbitrarily close to F z for all n sufficiently large. The distribution F z is thus known as the limiting distribution of z n. We also say that A z n is asymptotically distributed as F z, denoted as z n Fz. D For random vectors {z n } and z, z n z if the joint distributions Fzn converge to F z for every continuity point ζ of F z. It is, however, more cumbersome to show convergence in distribution for a sequence of random vectors. The so-called Cramér- Wold device allows us to transform this multivariate convergence problem to a univariate one. This result is stated below without proof. Lemma 5.18 (Cramér-Wold Device) Let {z n } be a sequence of random vectors in R d D.Thenz n z if and only if α D z n α z for every α R d such that α α =1. There is also a uni-directional relationship between convergence in probability and convergence in distribution. To see this, note that for some arbitrary ɛ>0anda continuity point ζ of F z,wehave IP(z n ζ) =IP({z n ζ} { z n z ɛ})+ip({z n ζ} { z n z >ɛ}) IP(z ζ + ɛ)+ip( z n z >ɛ).

17 5.3. MODES OF CONVERGENCE 127 Similarly, IP(z ζ ɛ) IP(z n ζ)+ip( z n z >ɛ). IP If z n z, then by passing to the limit and noting that ɛ is arbitrary, the inequalities above imply lim IP(z n n ζ) =IP(z ζ). That is, F zn (ζ) F z (ζ). The converse is not true in general, however. When z n converges in distribution to a real number c, it is not difficult to show that z n also converges to c in probability. In this case, these two convergence modes are equivalent. To be sure, note that a real number c canbeviewedasadegenerate random variable with the distribution function: { 0, ζ < c, F (ζ) = 1, ζ c, which is a step function with a jump point at c. Whenz n c, all the probability mass IP of z n will concentrate at c as n becomes large; this is precisely what z n c means. More formally, for any ɛ>0, IP( z n c >ɛ)=1 [F zn (c + ɛ) F zn ((c ɛ) )], where (c ɛ) D denotes the point adjacent to and less than c ɛ. Now,z n c implies that F zn (c + ɛ) F zn ((c ɛ) ) converges to one, so that IP( z n c >ɛ)convergesto zero. We summarizes these results below. Lemma 5.19 If z n D c. z n IP z, thenz n D z. For a constant c, zn D IP c is equivalent to The continuous mapping theorem below asserts that continuous functions preserve convergence in distribution; cf. Lemmas 5.13 and Lemma 5.20 (Continuous Mapping Theorem) Let g : R R be a function continuous almost everywhere on R, except for at most countably many points. If z n z, D then g(z n ) D g(z). For example, if z n converges in distribution to the standard normal random variable, the limiting distribution of zn 2 is χ2 (1). Generalizing this result to R d -valued random variables, we can see that when z n converges in distribution to the d-dimensional standard normal random variable, the limiting distribution of z nz n is χ 2 (d).

18 128 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Two sequences of random variables {y n } and {z n } are said to be asymptotically equivalent if their differences y n z n converge to zero in probability. Intuitively, the limiting distributions of two asymptotically equivalent sequences, if exist, ought to be the same. This is stated in the next result without proof. Lemma 5.21 Let {y n } and {z n } be two sequences of random vectors such that y n IP D D z n 0. Ifz n z, thenyn z. The next result is concerned with two sequences of random variables such that one converges in distribution and the other converges in probability. Lemma 5.22 If y n converges in probability to a constant c and z n converges in distribution to z, then y n + z n c + z, yn z n cz, and zn /y n z/c if c D D D Stochastic Order Notations It is typical to use order notations to describe the behavior of a sequence of numbers, whether it converges or not. Let {c n } denote a sequence of positive real numbers. 1. Given a sequence {b n },wesaythatb n is (at most) of order c n, denoted as b n = O(c n ), if there exists a Δ < such that b n /c n Δ for all sufficiently large n. When c n diverges, b n cannot diverge faster than c n ;whenc n converges to zero, the rate of convergence of b n is no slower than that of c n. For example, the polynomial a + bn is O(n), and the partial sum of a bounded sequence n i=1 b i is O(n). Note that an O(1) sequence is a bounded sequence. 2. Given a sequence {b n },wesaythatb n is of smaller order than c n, denoted as b n = o(c n ), if b n /c n 0. When c n diverges, b n must diverge slower than c n ;when c n converges to zero, the rate of convergence of b n should be faster than that of c n. For example, the polynomial a + bn is o(n 1+δ ) for any δ>0, and the partial sum n i=1 αi, α < 1, is o(n). Note that an o(1) sequence is a sequence that converges to zero. If b n is a vector (matrix), b n is said to be O(c n )(o(c n )) if every element of b n is O(c n ) (o(c n )). It is also easy to verify the following results; see Exercise Lemma 5.23 Let {a n } and {b n } be two non-stochastic sequences.

19 5.5. LAW OF LARGE NUMBERS 129 (a) If a n = O(n r ) and b n = O(n s ),thena n b n = O(n r+s ) and a n + b n = O(n max(r,s) ). (b) If a n = o(n r ) and b n = o(n s ),thena n b n = o(n r+s ) and a n + b n = o(n max(r,s) ). (c) If a n = O(n r ) and b n = o(n s ),thena n b n = o(n r+s ) and a n + b n = O(n max(r,s) ). The order notations can be easily extended to describe the behavior of sequences of random variables. A sequence of random variables {z n } is said to be O a.s. (c n )(oro(c n ) almost surely) if z n /c n is O(1) a.s., and it is said to be O IP (c n )(oro(c n ) in probability) if for every ɛ>0, there is some Δ such that IP( z n /c n Δ) ɛ, for all n sufficiently large. Similarly, {z n } is o a.s. (c n )(oro(c n ) almost surely) if z n /c n IP 0, and it is o IP (c n )(oro(c n ) in probability) if z n /c n 0. a.s. If {z n } is O a.s. (1) (o a.s (1)), we say that z n is bounded (vanishing) almost surely; if {z n } is O IP (1) (o IP (1)), z n is bounded (vanishing) in probability. Note that Lemma 5.23 also holds for stochastic order notations. In particular, if a sequence of random variables is bounded almost surely (in probability) and another sequence of random variables is vanishing almost surely (in probability), the products of their corresponding elements are vanishing almost surely (in probability). That is, y n = O a.s. (1) and z n = o a.s (1), then y n z n is o a.s (1). D When z n z, we know that zn does not converge in probability to z in general, but more can be said about the behavior of z n.letζbeacontinuity point of F z.then for any ɛ>0, we can choose a sufficiently large ζ such that IP( z >ζ) <ɛ/2. As D z, we can also choose n large enough such that z n IP( z n >ζ) IP( z >ζ) <ɛ/2, which implies IP( z n >ζ) <ɛ. This leads to the following conclusion. Lemma 5.24 Let {z n } be a sequence of random vectors such that z n z n = O IP (1). D z. Then 5.5 Law of Large Numbers We first discuss the law of large numbers which is concerned with the averaging behavior of random variables. Intuitively, a sequence of random variables obeys a law of

20 130 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY large numbers when its sample average essentially follows its mean behavior; random irregularities (deviations from the mean) are wiped out in the limit by averaging. When a law of large numbers holds almost surely, it is a strong law of large numbers (SLLN); when a law of large numbers holds in probability, it is a weak law of large numbers (WLLN). For a sequence of random vectors (matrices), a SLLN (WLLN) is defined elementwise. There are different versions of the SLLN (WLLN) for various types of random variables. Below is a well known SLLN for i.i.d. random variables. Lemma 5.25 (Kolmogorov) Let {z t } be a sequence of i.i.d. random variables with mean μ o.then, 1 a.s. z T t μ o. This result asserts that, when z t have a finite, common mean μ o, the sample average of z t is essentially close to μ o, a non-stochastic number. Note, however, that i.i.d. random variables need not obey Kolmogorov s SLLN if they do not have a finite mean; for instance, Lemma 5.25 does not apply to i.i.d. Cauchy random variables. As almost sure convergence implies convergence in probability, the same condition in Lemma 5.25 ensures that {z t } also obeys a WLLN. When {z t } is a sequence of independent random variables with possibly heterogeneous distributions, it may still obey a SLLN (WLLN) under a stronger condition. Lemma 5.26 (Markov) Let {z t } be a sequence of independent random variables with non-degenerate distributions such that for some δ>0, IE z t 1+δ is bounded for all t. Then, 1 [z T t IE(z t )] a.s. 0, Comparing to Kolmogorov s SLLN, Lemma 5.26 requires a stronger moment condition: bounded (1+δ) th moment, yet z t need not have a common mean. This SLLN indicates that the sample average of z t eventually behaves like the average of IE(z t ). Note that the average of IE(z t ) may or may not converge; if it does converge to, say, μ, 1 a.s. 1 z T t lim IE(z T T t )=:μ.

21 5.5. LAW OF LARGE NUMBERS 131 Finally, as non-stochastic numbers can be viewed as independent random variables with degenerate distributions, it is understood that a non-stochastic sequence obeys a SLLN if its sample average converges. The following example shows that a sequence of correlated random variables may also obey a WLLN. Example 5.27 Suppose that y t is generated as a weakly stationary AR(1) process: y t = α o y t 1 + u t, α o < 1, where u t are i.i.d. random variables with mean zero and variance σu 2. In view of Section 4.3, we have IE(y t )=0,var(y t )=σu 2/(1 α2 o ), and σ 2 u cov(y t,y t j )=α j o 1 α 2. o These results imply that IE(T 1 T y t)=0and ( T ) 1 var y t = var(y t )+2 (T τ)cov(y t,y t τ ) var(y t )+2T τ=1 1 τ=1 cov(y t,y t τ ) = O(T ). ( The latter result shows that var T 1 ) T y t = O(T 1 ) which converges to zero as T approaches infinity. It follows from Lemma 5.16 that 1 IP y T t 0; that is, {y t } obeys a WLLN. It can be seen that a key condition in the proof above is that the variance of T y t does not grow too rapidly (it is O(T )). The facts that y t has a constant variance and that cov(y t,y t j ) goes to zero exponentially fast as j tends to infinity are sufficient for this condition. This WLLN result is readily generalized to weakly stationary AR(p) processes. The example above shows that it may be quite cumbersome to establish a WLLN for weakly stationary processes. The lemma below gives a strong law for correlated random variables and is convenient in practice; see Davidson (1994, p. 326) for a more general result.

22 132 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Lemma 5.28 Let y t = i=0 π i u t i,whereu t are i.i.d. random variables with mean zero and variance σu 2. If π i are absolutely summable, i.e., i= π i <, then T 1 T y a.s. t 0. In Example 5.27, y t = i=0 αi ou t i with α o < 1, so that i=0 αi o <. Hence, Lemma 5.28 ensures that the average of y t also converges to its mean (zero) almost surely. If y t = z t μ o, then the average of z t converges to IE(z t )=μ o almost surely. Comparing to Example 5.27, Lemma 5.28 is quite general and applicable to any process that can be expressed as an MA process with absolutely summable weights. From Lemmas 5.25, 5.26 and 5.28 we can see that a SLLN (WLLN) does not always hold. The random variables in a sequence must be well behaved (i.e., satisfying certain regularity conditions) to ensure a SLLN (WLLN). In particular, the sufficient conditions for a SLLN (WLLN) usually regulate the moments and dependence structure of random variables. Intuitively, random variables without certain bounded moment may exhibit aberrant behavior so that their random irregularities cannot be completely averaged out. For random variables with strong correlations over time, the variation of their partial sums may grow too rapidly and cannot be eliminated by simple averaging. More generally, it is also possible for a sequence of weakly dependent and heterogeneously distributed random variables to obey a SLLN (WLLN). This usually requires even stronger conditions on their moments and dependence structure. To avoid technicality, we will not give a SLLN (WLLN) for such general sequences but refer to White (2001) and Davidson (1994) for details. The following examples illustrate why a SLLN (WLLN) may fail to hold. Example 5.29 Consider the sequences {t} and {t 2 }, t =1, 2,...Itiswellknownthat t = T (T +1)/2, t 2 = T (T + 1)(2T +1)/6. Hence, T 1 T t and T 1 T t2 both diverge. Thus, these sequences do not obey a SLLN. Example 5.30 Suppose that u t are i.i.d. random variables with mean zero and variance σu.thus,t 2 1 T u a.s. t 0 by Kolmogorv s SLLN (Lemma 5.25). Consider now {tu t }.

23 5.5. LAW OF LARGE NUMBERS 133 This sequence does not have bounded (1 + δ) th moment because IE tu t 1+δ grows with t and therefore does not obey Markov s SLLN (Lemma 5.26). Moreover, note that ( T ) var tu t = t 2 var(u t )=σu 2 T (T + 1)(2T +1). 6 By Exercise 5.11, T tu t = O IP (T 3/2 ). It follows that T 1 T tu t = O IP (T 1/2 ) which diverges in probability. This shows that the sequence {tu t } does not obey a WLLN either. Example 5.31 Suppose that y t is generated as a random walk: y t = y t 1 + u t, t =1, 2,..., with y 0 =0,whereu t are i.i.d. random variables with mean zero and variance σu 2. Clearly, y t = t u i, i=1 which has mean zero and unbounded variance tσ 2 u.fors<t,write y t = y s + t i=s+1 u i = y s + v t s, where v t s = t i=s+1 u i is independent of y s.wethenhave cov(y t,y s )=IE(y 2 s )=sσ2 u, for t>s.consequently, ( T ) 1 var y t = var(y t )+2 τ=1 t=τ+1 cov(y t,y t τ ). It can be verified that the first term on the right-hand side is var(y t )= tσu 2 = O(T 2 ), and that the second term is 1 2 τ=1 t=τ+1 1 cov(y t,y t τ )=2 τ=1 t=τ+1 (t τ)σ 2 u = O(T 3 ).

24 134 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Thus, var( T y t )=O(T 3 ), so that T y t = O IP (T 3/2 ) by Exercise This shows that 1 T y t = O IP (T 1/2 ), which diverges in probability. This shows that when {y t } is a random walk, it does not obey a WLLN. In this case, y t have unbounded variances and strong correlations over time. Due to these correlations, the variation of the partial sum of y t grows much too fast. (Recall that the variance of T y t is only O(T ) in Example 5.27.) The conclusions above will not be altered when {u t } is a white noise or a weakly stationary process. Example 5.32 Suppose that y t is generated as a random walk: y t = y t 1 + u t, t =1, 2,..., with y 0 = 0, as in Example Then, the sequence {y t 1 u t } has mean zero and var(y t 1 u t )=IE(y 2 t 1) IE(u 2 t )=(t 1)σ 4 u. More interestingly, it can be seen that for s<t, cov(y t 1 u t,y s 1 u s )=IE(y t 1 y s 1 u s )IE(u t )=0. We then have ( T ) var y t 1 u t = var(y t 1 u t )= (t 1)σu 4 = O(T 2 ), and T y t 1 u t = O IP (T ). Note, however, that var(t 1 T y t 1 u t )convergesto σu 4/2, rather than 0. Thus, T 1 T y t 1 u t cannot behave like a non-stochastic number in the limit. This shows that {y t 1 u t } does not obey a WLLN, even though its partial sums are O IP (T ). In the asymptotic analysis of ecnometric estimators and test statistics, we usually encounter functions of several random variables, e.g., the product of two random variables. In some cases, it is easy to find sufficient conditions ensuring a SLLN (WLLN) for these functions. For example, suppose that z t = x t y t,where{x t } and {y t } are two mutually independent sequences of independent random variables, each with bounded (2 + δ) th moment. Then, z t are also independent random variables and have bounded

25 5.6. UNIFORM LAW OF LARGE NUMBERS 135 (1 + δ) th moment by the Cauchy-Schwartz inequality. Lemma 5.26 then provides the SLLN for {z t }.When{x t } and {y t } are two sequences of correlated (or weakly dependent) random variables, it is more cumbersome to find suitable conditions on x t and y t that ensure a SLLN (WLLN). if In what follows, a sequence of integrable random variables z t is said to obey a SLLN 1 T [z t IE(z t )] a.s. 0; (5.1) it is said to obey a WLLN if the almost sure convergence above is replaced by convergence in probability. When IE(z t )isaconstantμ o, (5.1) simplifies to 1 a.s. z T t μ o. In our analysis, we may only invoke this generic SLLN (WLLN). 5.6 Uniform Law of Large Numbers It is also common to deal with functions of random variables and model parameters. For example, q(z t (ω); θ) is a random variable for a given parameter θ, and it is a function of θ for a given ω. When θ is fixed, we may impose conditions on q and z t such that {q(z t (ω); θ)} obeys a SLLN (WLLN), as discussed in Section 5.5. When θ assumes values in the parameter space Θ, a SLLN (WLLN) that does not depend on θ is then needed. More specifically, suppose that {q(z t ; θ)} obeys a SLLN for each θ Θ: Q T (ω; θ) = 1 T q(z t (ω); θ) a.s. Q(θ), where Q(θ) is a non-stochastic function of θ. As this convergent behavior may depend on θ, Ω c 0 (θ) ={ω : Q T (ω; θ) Q(θ)} varies with θ. When Θ is an interval of R, θ Θ Ω c 0 (θ) is an uncountable union of non-convergence sets and hence may not have probability zero, even though each Ω c 0 (θ) does. Thus, the event that Q T (ω; θ) Q(θ) for all θ, i.e., θ Θ Ω 0 (θ), may occur with probability less than one. In fact, the union of all Ω c 0 (θ) maynotevenbeinf (only countable unions of the elements in F are guaranteed to be in F). If so, we cannot conclude anything regarding the convergence of Q T (ω; θ). Worse still is when θ also depends on T,asinthecasewhereθ is replaced

26 136 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY by an estimator θ T. There may not exist a finite T such that Q T (ω; θ T ) are arbitrarily close to Q(ω; θ T ) for all T>T. These observations suggest that we should study convergence that is uniform on the parameter space Θ. In particular, Q T (ω; θ) convergestoq(θ) uniformly in θ almost surely (in probability) if the largest possible difference: sup Q T (θ) Q(θ) 0, a.s. (in probability). θ Θ In what follows we always assume that this supremum is a random variables for all T. The example below, similar to Example 2.14 of Davidson (1994), illustrates the difference between uniform and pointwise convergence. Example 5.33 Let z t be i.i.d. random variables with zero mean and Tθ, 0 θ 1 2T, q T (z t (ω); θ) =z t (ω)+ 1 1 Tθ, 2T <θ 1 T, 1 0, T <θ<. Observe that for θ 1/T and θ =0, Q T (ω; θ) = 1 T q T (z t ; θ) = 1 T z t, which converges to zero almost surely by Kolmogorov s SLLN. Thus, for a given θ, we can always choose T large enough such that Q T (ω; θ) a.s. 0, where 0 is the pointwise limit. On the other hand, it can be seen that Θ = [0, ) and sup Q T (ω; θ) = z T +1/2 a.s. 1/2, θ Θ so that the uniform limit is different from the pointwise limit. Let z Tt denote the t th random variable in a sample of T variables. These random variables are indexed by both T and t and form a triangular array. In this array, there is only one random variable z 11 when T = 1, there are two random variables z 21 and z 22 when T = 2, there are three random variables z 31, z 32 and z 33 when T =3,andso on. If this array does not depend on T, it is simply a sequence of random variables. We now consider a triangular array of functions q Tt (z t ; θ), t =1, 2,...,T,wherez t are integrable random vectors and θ is the parameter vector taking values in the parameter

27 5.6. UNIFORM LAW OF LARGE NUMBERS 137 space Θ R m. For notation simplicity, we will not explicitly write ω in the functions. We say that {q Tt (z t ; θ)} obeys a strong uniform law of large numbers (SULLN) if sup θ Θ 1 T [q Tt (z t ; θ) IE(q Tt (z t ; θ))] a.s. 0, (5.2) cf. (5.1). Similarly, {q Tt (z t ; θ)} is said to obey a weak uniform law of large numbers (WULLN) if the convergence condition above holds in probability. If q Tt is R m -valued functions, the SULLN (WULLN) is defined elementwise. We have seen that pointwise convergence alone does not imply uniform convergence. An interesting question one would ask is: What are the additional conditions required to guarantee uniform convergence? Let Q T (θ) = 1 T Suppose that Q T and θ in Θ, [q Tt (z t ; θ) IE(q Tt (z t ; θ))]. satisfies the following Lipschitz-type continuity requirement: for θ Q T (θ) Q T (θ ) C T θ θ a.s., where denotes the Euclidean norm, and C T is a random variable bounded almost surely and does not depend on θ. Under this condition, Q T (θ ) can be made arbitrarily close to Q T (θ), provided that θ is sufficiently close to θ. Using the triangle inequality and taking supremum over θ we have sup Q T (θ) sup Q T (θ) Q T (θ ) + Q T (θ ). θ Θ θ Θ Let Δ denote an almost sure bound of C T.Thengivenanyɛ>0, choosing θ such that θ θ <ɛ/(2δ) implies sup Q T (θ) Q T (θ ɛ ) C T θ Θ 2Δ ɛ 2, uniformly in T.Moreover,becauseQ T (θ) converges to 0 almost surely for each θ in Θ, Q T (θ ) is also less than ɛ/2 for sufficiently large T.Consequently, sup Q T (θ) ɛ, θ Θ for all T sufficiently large. As these results hold almost surely, we have a SULLN for Q T (θ); the conditions ensuring a WULLN are analogous.

28 138 CHAPTER 5. ELEMENTS OF PROBABILITY THEORY Lemma 5.34 Suppose that for each θ Θ, {q Tt (z t ; θ)} obeys a SLLN (WLLN) and that for θ, θ Θ, Q T (θ) Q T (θ ) C T θ θ a.s., where C T is a random variable bounded almost surely (in probability) and does not depend on θ. Then,{q Tt (z t ; θ)} obeys a SULLN (WULLN). Lemma 5.34 is quite convenient for establishing a SULLN (WULLN) because it requires only two conditions. First, the random functions must obey a standard SLLN (WLLN) for each θ in the parameter space. Second, the function q Tt must satisfy a Lipschitz-type continuity condition which amounts to requiring q Tt to be sufficiently smooth in the second argument. Note, however, that C T being bounded almost surely may imply that the random variables in q Tt are also bounded almost surely. This requirement is much too restrictive in applications. Hence, a SULLN may not be readily obtained from Lemma On the other hand, a WULLN is practically more plausible because the requirement that C T is O IP (1) is much weaker. For example, the boundedness of IE C T is sufficient for C T being O IP (1) by Markov s inequality. For more specific conditions ensuring these requirements we refer to Gallant and White (1988) and Bierens (1994). 5.7 Central Limit Theorem Thecentral limit theorem (CLT) is another important result in probability theory. When a CLT holds, the distributions of suitably normalized averages of random variables are close to the standard normal distribution in the limit, regardless of the original distributions of these random variables. This is a very powerful result in applications because, as far as the approximation of normalized sample averages is concerned, only the standard normal distribution matters. There are also different versions of CLT for various types of random variables. The following CLT applies to i.i.d. random variables. Lemma 5.35 (Lindeberg-Lévy) Let {z t } be a sequence of i.i.d. random variables with mean μ o and variance σo 2 > 0. Then, T ( zt μ o ) D N (0, 1). σ o

29 5.7. CENTRAL LIMIT THEOREM 139 A sequence of i.i.d. random variables need not obey this CLT if they do not have a finite variance, e.g., random variables with t(2) distribution. Comparing to Lemma 5.25, one can immediately see that the Lindeberg-Lévy CLT requires a stronger condition (i.e., finite variance) than does Kolmogorov s SLLN. Remark: In this example, z T converges to μ o in probability, and its variance σo 2/T vanishes when T tends to infinity. To prevent a degenerate distribution in the limit, it is natural to consider the normalized average T 1/2 ( z T μ o ), which has a constant variance σo 2 for all T. This explains why the normalizing factor T 1/2 is needed. For a normalizing factor T a with a<1/2, the normalized average still converges to zero because its variance vanishes in the limit. For a normalizing factor T a with a>1/2, the normalized average diverges. In both cases, the resulting normalized averages cannot have a well-behaved, non-degenerate distribution in the limit. Thus, when {z t } obeys a CLT, z T is said to converge to μ o at the rate T 1/2. Independent random variables may also have the effect of a CLT. Below is a a version of Liapunov s CLT for independent (but not necessarily identically distributed) random variables. Lemma 5.36 Let {z Tt } be a triangular array of independent random variables with mean μ Tt and variance σtt 2 > 0 such that σ 2 T = 1 T σtt 2 σo 2 > 0. If for some δ>0, IE z Tt 2+δ are bounded for all t, then T ( zt μ T ) D N (0, 1). σ o Note that this result requires a stronger condition (bounded (2 + δ) th moment) than does Markov s SLLN, Lemma Comparing to Lindeberg-Lévy s CLT, Lemma 5.36 allows mean and variance to vary with t at the expense of a stronger moment condition. The sufficient conditions for a CLT are similar to but usually stronger than those for a WLLN. In particular, the random variables that obey a CLT have bounded moment up to some higher order and are asymptotically independent with dependence vanishing sufficiently fast. Moreover, every random variable must also be asymptotically negligible, in the sense that no random variable is influential in affecting the partial sums.

Elements of Probability Theory

Elements of Probability Theory Elements of Probability Theory CHUNG-MING KUAN Department of Finance National Taiwan University December 5, 2009 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 1 / 58

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the

More information

Stat 451: Solutions to Assignment #1

Stat 451: Solutions to Assignment #1 Stat 451: Solutions to Assignment #1 2.1) By definition, 2 Ω is the set of all subsets of Ω. Therefore, to show that 2 Ω is a σ-algebra we must show that the conditions of the definition σ-algebra are

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures 36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration? Lebesgue Integration: A non-rigorous introduction What is wrong with Riemann integration? xample. Let f(x) = { 0 for x Q 1 for x / Q. The upper integral is 1, while the lower integral is 0. Yet, the function

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

1.1. MEASURES AND INTEGRALS

1.1. MEASURES AND INTEGRALS CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined

More information

Construction of a general measure structure

Construction of a general measure structure Chapter 4 Construction of a general measure structure We turn to the development of general measure theory. The ingredients are a set describing the universe of points, a class of measurable subsets along

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

STAT 7032 Probability. Wlodek Bryc

STAT 7032 Probability. Wlodek Bryc STAT 7032 Probability Wlodek Bryc Revised for Spring 2019 Printed: January 14, 2019 File: Grad-Prob-2019.TEX Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221 E-mail address:

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( ) Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

Economics 620, Lecture 8: Asymptotics I

Economics 620, Lecture 8: Asymptotics I Economics 620, Lecture 8: Asymptotics I Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 8: Asymptotics I 1 / 17 We are interested in the properties of estimators

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

01. Review of metric spaces and point-set topology. 1. Euclidean spaces

01. Review of metric spaces and point-set topology. 1. Euclidean spaces (October 3, 017) 01. Review of metric spaces and point-set topology Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 017-18/01

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

5: MULTIVARATE STATIONARY PROCESSES

5: MULTIVARATE STATIONARY PROCESSES 5: MULTIVARATE STATIONARY PROCESSES 1 1 Some Preliminary Definitions and Concepts Random Vector: A vector X = (X 1,..., X n ) whose components are scalarvalued random variables on the same probability

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Building Infinite Processes from Finite-Dimensional Distributions

Building Infinite Processes from Finite-Dimensional Distributions Chapter 2 Building Infinite Processes from Finite-Dimensional Distributions Section 2.1 introduces the finite-dimensional distributions of a stochastic process, and shows how they determine its infinite-dimensional

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

212a1214Daniell s integration theory.

212a1214Daniell s integration theory. 212a1214 Daniell s integration theory. October 30, 2014 Daniell s idea was to take the axiomatic properties of the integral as the starting point and develop integration for broader and broader classes

More information

Lebesgue Measure on R n

Lebesgue Measure on R n 8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL VARIABLES: PROBLEM SET 1. = x limsup E k REAL VARIABLES: PROBLEM SET 1 BEN ELDER 1. Problem 1.1a First let s prove that limsup E k consists of those points which belong to infinitely many E k. From equation 1.1: limsup E k = E k For limsup E

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Birkhoff s Ergodic Theorem extends the validity of Kolmogorov s strong law to the class of stationary sequences of random variables. Stationary sequences occur naturally even

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

COMPLEX ANALYSIS Spring 2014

COMPLEX ANALYSIS Spring 2014 COMPLEX ANALYSIS Spring 204 Cauchy and Runge Under the Same Roof. These notes can be used as an alternative to Section 5.5 of Chapter 2 in the textbook. They assume the theorem on winding numbers of the

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

Abstract Measure Theory

Abstract Measure Theory 2 Abstract Measure Theory Lebesgue measure is one of the premier examples of a measure on R d, but it is not the only measure and certainly not the only important measure on R d. Further, R d is not the

More information

A User s Guide to Measure Theoretic Probability Errata and comments

A User s Guide to Measure Theoretic Probability Errata and comments A User s Guide to Measure Theoretic Probability Errata and comments Chapter 2. page 25, line -3: Upper limit on sum should be 2 4 n page 34, line -10: case of a probability measure page 35, line 20 23:

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

( f ^ M _ M 0 )dµ (5.1)

( f ^ M _ M 0 )dµ (5.1) 47 5. LEBESGUE INTEGRAL: GENERAL CASE Although the Lebesgue integral defined in the previous chapter is in many ways much better behaved than the Riemann integral, it shares its restriction to bounded

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

Problem List MATH 5143 Fall, 2013

Problem List MATH 5143 Fall, 2013 Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was

More information

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence 1 Overview We started by stating the two principal laws of large numbers: the strong

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3. 1. GAUSSIAN PROCESSES A Gaussian process on a set T is a collection of random variables X =(X t ) t T on a common probability space such that for any n 1 and any t 1,...,t n T, the vector (X(t 1 ),...,X(t

More information

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

CONVERGENCE OF RANDOM SERIES AND MARTINGALES CONVERGENCE OF RANDOM SERIES AND MARTINGALES WESLEY LEE Abstract. This paper is an introduction to probability from a measuretheoretic standpoint. After covering probability spaces, it delves into the

More information

MORE ON CONTINUOUS FUNCTIONS AND SETS

MORE ON CONTINUOUS FUNCTIONS AND SETS Chapter 6 MORE ON CONTINUOUS FUNCTIONS AND SETS This chapter can be considered enrichment material containing also several more advanced topics and may be skipped in its entirety. You can proceed directly

More information

Basic Measure and Integration Theory. Michael L. Carroll

Basic Measure and Integration Theory. Michael L. Carroll Basic Measure and Integration Theory Michael L. Carroll Sep 22, 2002 Measure Theory: Introduction What is measure theory? Why bother to learn measure theory? 1 What is measure theory? Measure theory is

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3 Analysis Math Notes Study Guide Real Analysis Contents Ordered Fields 2 Ordered sets and fields 2 Construction of the Reals 1: Dedekind Cuts 2 Metric Spaces 3 Metric Spaces 3 Definitions 4 Separability

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017. Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 017 Nadia S. Larsen 17 November 017. 1. Construction of the product measure The purpose of these notes is to prove the main

More information