Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Similar documents
Lecture Notes 3 Convergence (Chapter 5)

Elements of Probability Theory

CHAPTER 3: LARGE SAMPLE THEORY

STAT 7032 Probability Spring Wlodek Bryc

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

STA205 Probability: Week 8 R. Wolpert

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Product measure and Fubini s theorem

Economics 583: Econometric Theory I A Primer on Asymptotics

8 Laws of large numbers

Lecture 4: September Reminder: convergence of sequences

Probability and Measure

Probability and Measure

STAT 7032 Probability. Wlodek Bryc

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Economics 620, Lecture 8: Asymptotics I

The main results about probability measures are the following two facts:

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

4 Expectation & the Lebesgue Theorems

Asymptotic Statistics-III. Changliang Zou

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Fundamental Tools - Probability Theory IV

Lecture 1: Review of Basic Asymptotic Theory

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Lecture 1: August 28

Gaussian, Markov and stationary processes

Stochastic Convergence, Delta Method & Moment Estimators

Convergence in Distribution

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Part II Probability and Measure

GAUSSIAN PROCESSES; KOLMOGOROV-CHENTSOV THEOREM

6.1 Moment Generating and Characteristic Functions

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

On the convergence of sequences of random variables: A primer

9 Brownian Motion: Construction

Stochastic Models (Lecture #4)

Notes 1 : Measure-theoretic foundations I

Probability Background

Topic 7: Convergence of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

STAT 200C: High-dimensional Statistics

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

4 Sums of Independent Random Variables

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

3 Integration and Expectation

CS145: Probability & Computing

Probability Theory I: Syllabus and Exercise

Fourth Week: Lectures 10-12

The strictly 1/2-stable example

P (A G) dp G P (A G)

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Bootstrap - theoretical problems

Econometrics I. September, Part I. Department of Economics Stanford University

7 Convergence in R d and in Metric Spaces

STAT 331. Martingale Central Limit Theorem and Related Results

1 Stat 605. Homework I. Due Feb. 1, 2011

ELEMENTS OF PROBABILITY THEORY

Spectral representations and ergodic theorems for stationary stochastic processes

Asymptotic Statistics-VI. Changliang Zou

Gaussian Random Fields

Lecture 6 Basic Probability

Lecture 12. F o s, (1.1) F t := s>t

Part 2 Continuous functions and their properties

1. Stochastic Processes and filtrations

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

The Canonical Gaussian Measure on R

Limit Thoerems and Finitiely Additive Probability

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Lecture 2: Random Variables and Expectation

1 Appendix A: Matrix Algebra

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 11. Probability Theory: an Overveiw

Lecture 21: Convergence of transformations and generating a random variable

Probability Background

1 Sequences of events and their limits

Brownian Motion and Conditional Probability

Probability and Measure

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

A Primer on Asymptotics

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Lecture 17 Brownian motion as a Markov process

Lecture Notes on Asymptotic Statistics. Changliang Zou

Mod-φ convergence I: examples and probabilistic estimates

17. Convergence of Random Variables

A PECULIAR COIN-TOSSING MODEL

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

Introduction and Preliminaries

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Quick Tour of Basic Probability Theory and Linear Algebra

Notes for Expansions/Series and Differential Equations

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Lecture Notes for MA 623 Stochastic Processes. Ionut Florescu. Stevens Institute of Technology address:

Chapter 7: Special Distributions

Transcription:

Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the properties of estimators when the sample size becomes arbitrarily large. The idea is that given a reonably large datet, the properties of an estimator even when the sample size is finite are similar to the properties of an estimator when the sample size is arbitrarily large. In these notes we focus on the large sample properties of sample averages formed from i.i.d. data. That is, sume that X i i.i.d.f, for i = 1,..., n,.... Assume EX i = µ, for all i. The sample average after n draws is X n 1 n i X i. We focus on two important sets of large sample results: (1) Law of large numbers: Xn EX n. () Central limit theorem: n( X n EX) N(0, ). That is, n times a sample average looks like (in a precise sense to be defined later) a normal random variable n gets large. An important endeavor of ymptotic statistics is to show (1) and () under various sumptions on the data sampling process. Consider a sequence of random variables Z 1, Z,...,. Convergence in probability: p Z for all ϵ > 0, limn P rob ( Z < ϵ) = 1. More formally: for all ϵ > 0 and δ > 0, there exists n 0 (ϵ, δ) such that for all n > n 0 (ϵ, δ): P rob ( Z < ϵ) > 1 δ. Note that the limiting variable Z can also be a random variable. Furthermore, for convergence in probability, the random variables Z 1, Z,... and Z should be defined on the same probability space: if this common sample space is Ω with elements ω, then the statement of convergence in probability is that: P rob (ω Ω : (ω) Z(ω) < ϵ) > 1 δ. Corresponding to this convergence concept, we have the Weak Law of Large Numbers (WLLN), which is the result that X n p µ. 1

Earlier, we had used Chebyshev s inequality to prove a version of the WLLN, under the sumption that X i are i.i.d. with mean µ and variance σ. Khinchine s law of large numbers only requires that the mean exists (i.e., is finite), but does not require existence of variances. Recall Markov s inequality: for positive random variables Z, and p > 0, ϵ > 0, we have P (Z > ϵ) E(Z) p /ϵ p. Take Z = X n X. Then if we have: E[ X n X p ] 0 : that is, X n converges in p-th mean to X, then by Markov s inequality, we also have P ( X n X ϵ) 0. That is, convergence in p-th mean implies convergence in probability. Some definitions and results: Define: the probability limit of a random sequence, denoted plim n, is a non-random quantity α such that p α. Stochtic orders of magnitude: big-o.p. (bounded in probability): = O p (n λ ) for every δ, there exists a finite (δ) and n (δ) such that P rob( Zn > ) < δ for all n n λ n (δ). little-o.p. : = o p (n λ ) Zn n λ p 0. = o p (1) p 0. Plim operator theorem: Let be a k-dimensional random vector, and g( ) be a function which is continuous at a constant k-vector point α. Then p α = g(zn ) p g(α). In other words: g(plim ) = plim g( ). Proof: See Serfling, Approximation Theorems of Mathematical Statistics, p. 4. Don t confuse this: plim 1 n i g(z i) = Eg(Z i ), from the LLN, but this is distinct from plim g( 1 n i Z i) = g(ez i ), using plim operator result. The two are generally not the same; if g( ) is convex, then plim 1 n i g(z i) plim g( 1 n i Z i).

p A useful intermediate result ( squeezing ): if α (a constant) and Z n lies between and α with probability 1, then Zn p α. (Quick proof: for all ϵ > 0, X n α < ϵ Xn α < ϵ. Hence P rob( Xn α < ϵ) P rob( X n α < ϵ) 1.) Convergence almost surely Z P rob (ω : lim n (ω) = Z(ω)) = 1. As with convergence in probability, almost sure convergence makes sense when the random variables Z 1,...,,... and Z are all defined on the same sample space. Hence, for each element in the sample space ω Ω, we can sociate the sequence Z 1 (ω),..., (ω),... well the limit point Z(ω). To understand the probability statement more clearly, sume that all these RVs are defined on the sample space (Ω, B(Ω), P ). Define some set-theoretic concepts. Consider a sequence of sets S 1,..., S n, all B(Ω). Unless this sequence is monotonic, it s difficult to talk about a limit of this sequence. So we define the liminf and limsup of this set sequence. lim inf n S n lim n m=n S m = n=1 m=n S m = {ω Ω : ω S n, n n 0 (ω)}. Note that the sequence of sets m=n S m, for n = 1,, 3,... is a non-decreing sequence of sets. By taking the union, the liminf is this the limit of this monotone sequence of sets. That is, for all ω lim infs n, there exists some number n 0 (ω) such that ω S n for all n n 0 (ω). Hence we can say that lim infs n is the set of outcomes which occur eventually. lim sup n S n S m = lim n=1 m=n S m n m=n = { ω Ω : for every m, ω S n0 (ω,m) for some n 0 (ω, m) m }. 3

Note that the sequence of sets m=n S n, for n = 1,, 3,... is a non-increing sequence of sets. Hence, limsup is limit of this monotone sequence of sets. The lim sups n is the set of outcomes ω which occurs within every tail of the set sequence S n. Hence we say that an outcome ω lim sups n occurs infinitely often. 1 Note that Hence and liminf S n limsup S n. P (limsup S n ) P (liminf S n ) P (limsup S n ) = 0 P (liminf S n ) = 0 P (liminf S n ) = 1 P (limsup S n ) = 1. Borel-Cantelli Lemma: if i=1 P (S i) < then P (lim sup n S n ) = 0. Proof: for each m, we have P (lim sup n S n ) = P (lim { m=n S m }) = lim n P ( which equals zero (by the sumption that i=1 P (S i) < ). m=n S m ) lim P (S m ) n m n The first equality (interchange of limit and Prob operations) is an application of the Monotone Convergence Theorem, which holds only because the sequence m=n S m is monotone. To apply this to almost-sure convergence, consider the sequence of sets S n {ω : (ω) Z(ω) > ϵ}, for ϵ > 0. Almost sure convergence is the statement that P (limsup S n ) = 0: Then m=n S m denotes the ω s such that the sequence (ω) Z(ω) exceeds ϵ in the tail beyond n. 1 However, note that all outcomes ω lim infs n also occur an infinite number of times along the infinite sequence S n. 4

Then lim n m=n S m denotes all ω s such that the sequence (ω) Z(ω) exceeds ϵ in every tail: those ω for which (ω) escapes the ϵ-ball around Z(ω) infinitely often. For these ω s, lim n (ω) Z(ω). For almost-sure convergence, you require the probability of this set to equal zero, i.e. Pr(lim n m=n S m) = 0. Corresponding to this convergence concept, we have the Strong Law of Large Numbers (SLLN), which is the result that X n µ. Consider that the sample averages are random variables X 1 (ω), X (ω),... defined on the same probability space, say (Ω, B, P ), where each ω indexes a sequence a sequence X 1 (ω), X (ω), X 3 (ω),.... Consider the set sequence S n = { ω : X n (ω) µ > ϵ }. SLLN is the statement that P (limsups n ) = 0. Proof (sketch; Davidson, pg. 96): For convenience, consider µ = 0. From the above discussion, we see that the two statements X n 0 and P ( X n > ϵ, i.o.) = 0 for any ϵ > 0 are equivalent. Therefore, we proceed in proving the SLLN by verifying that n=1 P ( X n > ϵ) < for all ϵ > 0, and then apply the Borel-Cantelli Lemma. As a starting point, we see that Chebyshev s inequality is not good enough by itself; it tells us P ( X n > ϵ) σ nϵ ; but 1 nϵ =. However, consider the subsequence with indices n : {1, 4, 9, 16,...}. Chebyshev s inequality, we have P ( X n > ϵ) n=1 σ n ϵ ; and 1 n ϵ = 1 ϵ (π /6) 1.64 <. ϵ n=1 Again using Hence, by BC lemma, we have that X n 0. Next, examine the omitted terms from the sum n=1 P ( X n > ϵ). Define D n = max n k<(n+1) X k X n. Note that: ( ) X k X n n = k 1 X n + 1 k Another of Euler s greatest hits: i=1 1 n = π 6 k t=n +1 X t (the Riemann zeta function). 5

and, by independence of X i, the two terms on RHS are uncorrelated. Hence V ar( X k X n ) = ) (1 n σ k n + k n σ k ) = σ ( 1 n 1 k By Chebyshev s inequality, then, we have n P (D n > ϵ) σ ϵ σ ( 1 n 1 (n + 1) ( ) 1 n 1 ; so (n + 1) P (D n > ϵ) σ ( ) 1 ϵ n 1 < σ ( ) 1 (n + 1) ϵ n 1 = σ (n + 1) ϵ n implying, using BC Lemma, that D n Now, consider, for n l < (n + 1), n 0. X l = X l X n + X n X l X n + X n D n + X n. By the above discussion, the RHS converges a.s. to 0. Hence, for all integers l, we have that X l is bounded by a random variable which converges a.s. to 0. Thus, X l 0. Example: X i i.i.d. U[0, 1]. Show that X (1:n) min i=1,...,n X i 0. Take S n { X (1:n) > ϵ }. For all ϵ > 0, P ( X (1:n) > ϵ) = P (X (1:n) > ϵ) = P (X i > ϵ, i = 1,..., n) = (1 ϵ) n. Hence, n=1 (1 ϵ)n = 1/ϵ <. So the conclusion follows by the BC Lemma. Theorem: Z = p Z. ). 6

Proof: 0 = P ( lim = lim n P n m n ( m n {ω : Z m (ω) Z(ω) > ϵ} {ω : Z m (ω) Z(ω) > ϵ} lim n P ({ω : (ω) Z(ω) > ϵ}). ) ) Convergence in Distribution: d Z. A sequence of real-valued random variables Z 1, Z, Z 3,... converges in distribution to a random variable Z if lim F (z) = F Z (z) z s.t. F Z (z) is continuous. (1) n This is a statement about the CDFs of the random variables Z, and Z 1, Z,.... These random variables do not need to be defined on the same probability space. F Z is also called the limiting distribution of the random sequence. Alternative definitions of convergence in distribution ( Portmanteau theorem ): Letting F ([a, b]) F (b) F (a) for a < b, convergence (1) is equivalent to F Zn ([a, b]) F Z ([a, b]) [a, b] s.t. F Z continuous at a, b. For all bounded, continuous functions g( ), g(z)df Zn (z) g(z)df Z (z) Eg( ) Eg(Z). () This definition of distributional convergence is more useful in advanced settings, because it is extendable to setting where and Z are general random elements taking values in metric space. Levy s continuity theorem: A sequence {X n } of random variables converges in distribution to random variable X if and only if the sequence of characteristic function {ϕ Xj (t)} converges pointwise (in t) to a function ϕ X (t) which is continuous at the origin. Then ϕ X is the characteristic function of X. 7

Hence, this ties together convergence of characteristic functions, and convergence in distribution. This theorem will be used to prove the CLT later. For proofs of most results here, see Serfling, Approximation Theorems in Mathematical Statistics, ch. 1. Distributional convergence, defined above, is called weak convergence. This is because there are stronger notions of distributional convergence. One such notion is: sup A B(R) P Zn (A) P Z (A) 0 which is convergence in total variation norm. Example: Multinomial distribution on [0, 1]. X n { 1,, 3,..., n = 1} each with n n n n probability 1. Consider A = Q (set of rational numbers in [0, 1]). n Some definitions and results: Slutsky Theorem: If Z d p n Z, and Y n α (a constant), then (a) Y n Z d n αz (b) + Y d n Z + α. Theorem *: (a) p Z = Zn d Z (b) d Z = p Z if Z is a constant. Note: convergence in probability implies that (roughly speaking), the random variables (for n large enough) and Z frequently have the same numerical value. Convergence in distribution need not imply this, only that the CDF s of and Z are similar. Z d n Z = = O p (1). Use = max( F 1 1 Z (1 ϵ), FZ (ϵ) ) in definition of O p (1). Note that the LLN tells us that X p, n µ, which implies (trivially) that X n degenerate limiting distribution. d µ, a 8

This is not very useful for our purposes, because we are interested in knowing (say) how far X n is from µ, which is unknown. How do we fix X n, so that it h a non-degenerate limiting distribution? Central Limit Theorem: (Lindeberg-Levy) Let X i be i.i.d. with mean µ and variance σ. Then n( Xn µ) d N(0, 1). σ n is also called the rate of convergence of the sequence Xn. By definition, a rate of convergence is the lowest polynomial in n for which n p ( X n µ) converges in distribution to a nondegenerate distribution. The rate of convergence n make sense: if you blow up X n by a constant (no matter how big), you still get a degenerate limiting distribution. If you blow up by n, then the sequence S n n X n = n i=1 X n will diverge. Proof via characteristic functions: Consider X i i.i.d. with mean µ and variance σ. Define: n( Xn µ) = X i µ 1 W i. σ i nσ n Note that, W i (X i µ)/σ are i.i.d. with mean EW i = 0 and variance VW i = EW i = 1. Let ϕ(t) denote characteristic function. Then [ n [ ϕ Zn (t) = ϕ n W (t)] = ϕw (t/ n) ] n. Using the second-order Taylor expansion for ϕ W (t/ n) around 0, we have: ϕ W (t/ n) = ϕ W (0) + t ( ) iew + t t n n i EW + o n ( ) = ϕ W (0) + 0 t t n 1 + o n i 9

Now we have ( )} log ϕ Zn(t)(t) = n log {ϕ W (0) + 0 + (it) t n 1 + o n ( )} = n log {1 + (it) t n + o n { [ ( )] ( )} (it) t t n log 1 + n + o log (1) + o n n ( ) = 0 + t t n o n t n The third line comes from Taylor-expanding log { } around 1. Hence ( ) t ϕ Zn(t)(t) n exp = ϕ N(0,1) (t). Since ϕ N(0,1) (t) is continuous at t = 0, we have d N(0, 1). Asymptotic approximations for X n CLT tells us that n( X n µ)/σ h a limiting standard normal distribution. We can use this true result to say something about the distribution of Xn, even when n is finite. That is, we use the CLT to derive an ymptotic approximation for the finite-sample distribution of Xn. The approximation is follows: starting with the result of the CLT: n( Xn µ)/σ d N(0, 1) we flip over the result of the CLT: X n a σ n N(0, 1) + µ X n a N(µ, 1 n σ ). The notation a makes explicit that what is on the RHS is an approximation. Note that X d n N(µ, 1 n σ ) is definitely not true! This approximation intuitively makes sense: under the sumptions of the LLCLT, we know that E X n = µ and V ar( X n ) = σ /n. What the ymptotic approximation tells us is that the distribution of Xn is approximately normal. 3 3 Note that the approximation is exactly right if we sumed that X i N(µ, σ ), for all i. 10

Asymptotic approximations for functions of X n Oftentimes, we are not interested per se in approximating the finite-sample distribution of the sample mean X n, but rather functions of a sample mean. (Later, you will see that the ymptotic approximations for many statistics and estimators that you run across are derived by expressing them sample averages.) Continuous mapping theorem: Let g( ) be a continuous function. Then d Z = g( ) d g(z). Proof: Serfling, Approximation Theorems of Mathematical Statistics, p. 5. (Note: you still have to figure out what the limiting distribution of g(z) is. But if you know F X, then you can get F g(x) by the change of variables formul.) Note that for any linear function g( X n ) = a X n +b, deriving the limiting distribution of g( X n ) is no problem (just use Slutsky s Theorem to get a X n + b a N(aµ + b, 1 n a σ )). The problem in deriving the distribution of g( X n ) arises when g( ) is nonlinear so that the X n is inside the g function: 1. Use the Mean Value Theorem: Let g be a continuous function on [a, b] that is differentiable on (a, b). Then, there exists (at let one) λ (a, b) such that g (λ) = g(b) g(a) b a g(b) g(a) = g (λ)(b a). Using the MVT, we can write where X n is an RV strictly between X n and µ.. On the RHS of Eq. (3): g( X n ) g(µ) = g (X n)( X n µ) (3) (a) g (X n) p g (µ) by the squeezing result, and the plim operator theorem. (b) If we multiply by n and divide by σ, we can apply the CLT to get n( Xn µ)/σ d N(0, 1). (c) Hence, n RHS = n[g( Xn ) g(µ)] using Slutsky s theorem. 11 d N(0, g (µ) σ ), (4)

3. Now, in order to get the ymptotic approximation for the distribution of g( X n ), we flip over to get g( X n ) a N(g(µ), 1 n g (µ) σ ). 4. Check that g satisfies sumptions for these results to go through: continuity and differentiability for MVT, and continuous at µ for plim operator theorem. Examples: 1/ X n, exp( X n ), ( X n ), etc. Eq. (4) is a general result, known the Delta method. For the purposes of this cls, I want you to derive the approximate distributions for g( X n ) from first principles ( we did above). 1

1 Some extra topics (CAN SKIP) 1.1 Triangular array CLT LLCLT sumes independence (across n) well identically distributed. We extend this to the independent, non-identically distributed setting. Lindeberg-Feller CLT: For each n, let Y n,1,..., Y n,n be independent random variables with finite (possibly non-identical) means and variances σ n,i such that (1/C n ) n E Y n,i EY n,i 1 { Yn,i EY n,i /C n>ϵ} 0 every ϵ > 0 i=1 where C n [ n i=1 σ n,i] 1/. Then [ n i=1 (Y n,i EY n,i )] C n d N(0, 1) The sampling framework is known a triangular array. Note that the (n 1)-th observation in the n-th sample (Y n,n 1 ) need not coincide with Y n 1,n 1, the (n 1)-th observation in the (n 1)-th sample. is a stan- C n is just the standard deviation of n i=1 Y n,i. Hence [ n i=1 (Y n,i EY n,i )] C n dardized sum. This is useful for showing ymptotic normality of Let-Squares regression. Let y = β 0 X + ϵ with ϵ i.i.d. with mean zero and variance σ, and x being an n 1 vector of covariates (here we have just 1 RHS variable). The OLS estimator is ˆβ = (X X) 1 X Y. To apply the LFCLT, we consider the normalized difference between the estimated and true β s: ((X X) 1/ /σ) ( ˆβ β 0 ) = (1/σ) (X X) 1/ X ϵ (1/σ) n a n,i ϵ i i=1 where a n,i corresponds to the i-th component of the 1 n vector (1/σ) (X X) 1/ X. So we just need to show that the Lindeberg condition applies to Y n,i = a n,i ϵ i. Note that n i=1 V (a n,iϵ i ) = V ( n i=1 a n,iϵ i ) = (1/σ ) V ((X X) 1/ X ϵ) = σ /σ 1 = Cn. 1. Convergence in distribution vs. a.s. Combining two results above, we have Z = d Z. 13

The converse is not generally true. (Indeed, {Z 1, Z, Z 3,...} need not be defined on the same probability space, in which ce s.d. convergence makes no sense.) However, consider Z n = F 1 (U), Z = F 1 Z (U); U U[0, 1]. Here F 1 (U) denotes the quantile function corresponding to the CDF F (z): Z F 1 (τ) = inf {z : F (z) > τ} τ [0, 1]. We have F (F 1 (τ)) = τ. (Note that quantile function is also right-continuous; discontinuity points of the quantile function arise where the CDF function is flat.) Then P (Z n z) = P (F 1 (U) z) = P (U F Zn (z)) = F Zn (z) so that Zn d =. (Even though their domains are different!) Similarly, Z = d Z. The notation = d means identically distributed. Moreover, it turns out (quite intuitive) that the convergence F 1 (U) F 1 Z (U) fails only at points where F 1 Z (U) is discontinuous (corresponding to flat portions of F Z (z)). Since these points of discontinuity are a countable set, their probability (under U[0, 1]) is equal to zero, so that F 1 (U) F 1 Z (U) for almost-all U. So what we have here, is a result that for real-valued random variables Z d n Z, we can construct identically distributed variables such that both Zn d Z and Zn Z. This is called the Skorokhod construction. Skorokhod representation: Let (n = 1,,...) be random elements defined on probability spaces (Ω n, B(Ω n ), P n ) and d Z, where X is defined on (Ω, B(Ω), P ). Then there exist random variables Z n (n = 1,,...) and Z defined on a common probability space ( Ω, B( Ω), P ) such that Z n d = (n = 1,,...); Z = d Z; Zn Z. Applications: Continuous mapping theorem: Z d n Z Zn Z ; for h( ) continuous, we have h(zn) h(z ) h(zn) d h(z ) which implies h( ) d h(z). 14

Building on the above, if h is bounded, then we get Eh( ) = Eh(Z n) Eh(Z ) = Eh(Z) under the bounded convergence theorem, which shows one direction of the Portmanteau theorem, Eq. (). 1.3 Functional Central limit theorems There are a set of distributional convergence results, which are known functional CLT s (or Donsker theorems), because they deal with convergence of random functions (or, interchangeably, processes). These are indispensible tools in finance. One of the simplest random functions is the Wiener process W(t), which is viewed a random function on the unit interval t [0, 1]. This is also known a Brownian motion process. Features of the Wiener process: 1. W(0) = 0. Gaussian marginals: W(t) = d N(0, t); that is, P (W(t) a) = 1 a ( ) u exp du. πt t 3. Independent increments: Define x t W(t). For any set 0 t 0 t 1... 1, the differences x t1 x t0, x t x t1,... are independent. 4. Given the two above features, we have that the increments are themselves normally distributed: x ti x ti 1 d = N(0, t i t i 1 ). Moreover, from t 1 t 0 = V (x 1 x 0 ) = E[(x 1 x 0 ) ] = Ex 1 Ex 1 x 0 + Ex 0 = t 1 + t 0 Ex 1 x 0 implying Ex 1 x 0 = Cov(x 1, x 0 ) = t 0. 15

5. Furthermore, we know that any finite collection (x t1, x t,...) is jointly distributed multivariate normal, with mean 0 and variance matrix Σ described above. Take the conditions of the LLCLT: X 1, X,... are iid with mean zero and finite variance σ. For a given n, define the partial sum S k = X 1 + X + + X k for k n. Now, for t [0, 1], define the normalized partial sum process S n (t) = S [tn] σ n + (tn [tn]) 1 σ n X [tn]+1 where [a] denotes the largest integer a. Sn (t) is a random function on [0, 1]. (Graph) Then the functional CLT is the following result: Functional CLT: S n d W. Note that S n is a random element taking values in C[0, 1], the space of continuous functions on [0,1]. Proving this result is beyond our focus in this cls. 4 Note that the functional CLT implies more than the LLCLT. For example, take t = 1. Then S n (1) = By the FCLT, we have n i=1 X i σ n = n X n σ which is the same result the LLCLT. Similarly, for k n < n but kn n ( ) k S n = n P ( S n (1) a) P (W(1) a) = P (N(0, 1) a) τ [0, 1], we have k i=1 X i σ n = k X k σ n d N (0, τ). Also, it turns out, by a functional version of the continuous mapping theorem, we can get convergence results for functionals of the partial-sum process. For instance, sup t Sn (t) d sup t W(t), and it turns out P (sup t W(t) a) = a exp( u /) du; a 0. π (Recall that W 0 = 0, so a < 0 makes no sense.) 4 See, for instance, Billingsley, Convergence of Probability Meures, ch., section 10. 0 16