Lecture Notes on Asymptotic Statistics. Changliang Zou

Size: px
Start display at page:

Download "Lecture Notes on Asymptotic Statistics. Changliang Zou"

Transcription

1 Lecture Notes on Asymptotic Statistics Changliang Zou

2 Prologue Why asymptotic statistics? The use of asymptotic approximation is two-fold. First, they enable us to find approximate tests and confidence regions. Second, approximations can be used theoretically to study the quality (efficiency) of statistical procedures Van der Vaart Approximate statistical procedures To carry out a statistical test, we need to know the critical value of the test statistic. Roughly speaking, this means we must know the distribution of the test statistic under the null hypothesis. Because such distributions are often analytically intractable, only approximations are available in practice. Consider for instance the classical t-test for location. Given a sample of iid observations X 1,..., X n, we wish to test H 0 : µ = µ 0. If the observations arise from a normal distribution with mean µ 0, then the distribution of t-test statistic, n( X n µ 0 )/S n, is exactly known, say t(n 1). However, we may have doubts regarding the normality. If the number of observations is not too small, this does not matter too much. Then we may act as if n( Xn µ 0 )/S n N(0, 1). The theoretical justification is the limiting result, as n, sup x ( n( P Xn µ) S n ) x Φ(x) 0, provided that the variables X i have a finite second moment. Then, a large-sample or asymptotical level α test is to reject H 0 if n( X n µ 0 )/S n > z α/2. When the underlying distribution is exponential, the approximation is satisfactory if n 100. Thus, one aim of asymptotic statistics is to derive the asymptotical distribution of many types of statistics. There are similar benefits when obtaining confidence intervals. For instance, consider maximum likelihood estimator θ n of dimension p based on a sample of size n from a density f(x; θ). A major result in asymptotic statistic is that in many situations n( θ n θ) is asymptotically normally distributed with zero mean and covariance matrix I 1 θ, where [ ( ) ( ) ] T log f(x; θ) log f(x; θ) I θ = E θ θ θ is the Fisher information matrix. Thus, acting as if n( θ n θ) N p (0, I 1 ), we can find 2 θ

3 the following ellipsoid { } θ : (θ θ n ) T I θ (θ θ n ) χ2 p,α n is an approximate 1 α confidence region. Efficiency of statistical procedures For a relatively small number of statistical problems, there exists an exact, optimal solution. For example, the Neyman-Pearson lemma to find UMP tests, the Rao-Blackwell theory to find MVUE, and Cramer-Rao Theorem. However, there are not always exact optimal theory or procedure, then asymptotic optimality theory may help. For instance, to compare two tests, we might compare approximations to their power functions. Consider the foregoing hypothesis problem for location. A well-known nonparametric test statistic is the sign statistic T n = n 1 n I Xi >θ 0, where the null hypothesis is H 0 : θ = θ 0 and θ denotes the median associated the distribution of X. To compare the efficiency of sign and t-test is rather difficult because the exact power functions of two tests are untractable. However, by the definitions and methods introduced later, we can obtain the asymptotic relative efficiency of the sign test versus the t-test is equal to 4f 2 (0) x 2 f(x)dx. To compare estimators, we might compare asymptotic variances rather than exact variances. A major result in this area is that for smooth parametric models maximum likelihood estimators are asymptotically optimal. This roughly means the following. First, MLE are asymptotically consistent; Second, the rate at which MLE converge to the true value is the fastest possible, typically n; Third, the asymptotic variance, attain the C-R bound. Thus, asymptotic justify the use of MLE in certain situations. (Even though in general it does not lead to best estimators for finite sample in many cases, it is always not a worst one and always leads to a reasonable estimator. Contents Basic convergence concepts and preliminary theorems (8) Transformations of given statistics: The Delta method (4) 3

4 The basic sample statistics: distribution function, moment, quantiles, and order statistics (3) Asymptotic theory in parametric inference: MLE, likelihood ratio test, etc (6) U-statistic, M-estimates and R-estimates (6) Asymptotic relative efficiency (6) Asymptotic theory in nonparametric inference: rank and sign tests (6) Goodness of fit (3) Nonparametric regression and density estimation (4) Advanced topic selected: bootstrap and empirical likelihood (4) Text books Billingsley, P. (1995). Probability and Measure, 3rd edition, John Wiley, New York. DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability, Springer. Serfling, R. (1980). Approximation Theorems of Mathematical Statistics, John Wiley, New York. Shao, J. (2003). Mathematical Statistics, 2nd ed. Springer, New York. Van der Vaart, A. W. (2000). Asymptotic Statistics, Cambridge University Press. 4

5 Chapter 1 Basic convergence concepts and preliminary theorems Throughout this course, there will usually be an underlying probability space (Ω, F, P ), where Ω is a set of points, F is a σ-field of subsets of Ω, and P is a probability distribution or measure defined on the element of F. A random variable X(w) is a transformation of Ω into the real line R such that images X 1 (B) of Borel sets B are elements of F. A collection of random variables X 1 (w), X 2 (w),... on a given (Ω, F) will typically be denoted by X 1, X 2, Modes of convergence of a sequence of random variables Definition (convergence in probability) Let {X n, X} be random variables defined on a common probability space. We say X n converges to X in probability if, for any ɛ > 0, P ( X n X > ɛ) 0 as n, or equivalently lim P ( X n X < ɛ) = 1, every ɛ > 0. n 5

6 p This is usually written as X n X. Extensions to the vector case: for random p-vectors p X 1, X 2... and X, we say X n X if Xn X p 0, where z = ( p z2 i ) 1/2 denotes the Euclidean distance (L 2 -norm) for z R p. It is easily to seen that X n p X iff the corresponding component-wise convergence holds. Example For iid Bernoulli trials with a success probability p = 1/2, let X n denote the number of times in the first n trials that a success is followed by a failure. Denoting T i = I{ith trial is success and (i+1)st trial is a failure}, X n = n 1 T i, and therefore E[X n ] = (n 1)/4, and Var[X n ] = n 1 Var[T i]+2 n 2 Cov[T i, T i+1 ] = 3(n 1)/16 2(n 2)/16 = (n+1)/16. It then follows by an application of Chebyshev s inequality that X n /n p 1/4. [P ( x µ ɛ) σ 2 /ɛ 2 ] Definition (bounded in probability) A sequence of random variables X n is said to be bounded in probability if, for any ɛ > 0, there exists a constant k such that P ( X n > k) ɛ for all n. Any random variable (vector) is bounded in probability. It is convenient to have short expressions for terms that converge or bounded in probability. If X n p 0, then we write X n = o p (1), pronounced by small oh-p-one ; The expression O p (1) ( big oh-p-one ) denotes a sequence that is bounded in probability, say, write X n = O p (1). These are so-called stochastic o( ) and O( ). More generally, for a given sequence of random variables R n, X n = o p (R n ) means X n = Y n R n and Y n p 0; X n = O p (R n ) means X n = Y n R n and Y n = O p (1). This expresses that the sequence X n converges in probability to zero or is bounded in probability at the rate R n. For deterministic sequences X n and R n, O p ( ) and o p ( ) reduce to the usual o( ) and O( ) from calculus. Obviously, X n = o p (R n ) implies that X n = O p (R n ). An expression we will often used is: for some sequence a n, if a n X n X n = o p (a 1 n ); if a n X n = O p (1), then we write X n = O p (a 1 n ). p 0, then we write Definition (convergence with probability one) Let {X n, X} be random variables 6

7 defined on a common probability space. We say X n converges to X with probability 1 (or almost surely, strongly, almost everywhere) if ( ) P lim X n = X = 1. n This can be written as P (ω : X n (ω) X(ω)) = 1. We denote this mode of convergence as wp1 a.s X n X or X n X. Extensions to random vector case is straightforward. Almost sure convergence is a stronger mode of convergence than convergence in probability. In fact, a characterization of wp1 is that lim P ( X m X < ɛ, all m n) = 1, every ɛ > 0. (1.1) n It is clear from this equivalent condition that wp1 is stronger than convergence in probability. Its proof can be found on page 7 in Serfling (1980). Example Suppose X 1, X 2,... is an infinite sequence of iid U[0, 1] random variables, and let X (n) = max{x 1,..., X n }. See X (n) wp1 1. Note that P ( X (n) 1 ɛ, n m) = P (X (n) 1 ɛ, n m) = P (X (m) 1 ɛ) = 1 (1 ɛ) m 1, as m. Definition (convergence in rth mean) Let {X n, X} be random variables defined on a common probability space. For r > 0, we say X n converges to X in rth mean if lim E X n X r = 0. n This is written X n rth X. It is easily shown that X n rth X X n sth X, 0 < s < r, by Jensen s inequality (If g( ) is a convex function on R, and X and g(x) are integrable r.v. s, then g(e[x]) E[g(X)]). 7

8 Definition (convergence in distribution) Let {X n, X} be random variables. Consider their distribution functions F Xn ( ) and F X ( ). We say that X n converges in distribution (in law) to X if lim n F Xn (t) = F X (t) at every point that is a continuity point of F X. This is written as X n d X or F Xn F X. Example Consider X n Uniform{ 1, 2,..., n 1, 1}. Then, it can be shown easily n n n that the sequence X n converges in law to U[0, 1]. Actually, consider any t [ i, i+1 ), the n n difference between F Xn (t) = i and F n X(t) = t can be arbitrarily small if n is sufficiently large ( i n t < n 1 ). The result follows from the definition of d. Example Let {X n } n=1 is a sequence of random variables where X n N(0, 1 + n 1 ). Taking the limit of the distribution function of X n as n yields lim n F Xn (x) = Φ(x) for all x R. Thus, X n d N(0, 1). According to the assertion below the definition of, p p we know that X n X is equivalent to convergence of every one of the sequences of components. The analogous statement for convergence in distribution is false: Convergence in distribution of the sequence X n is stronger than convergence of every one of the sequences of components X ni. The point is that the distribution of the components X ni separately does not determine their distribution (they might be independent or dependent in many ways). We speak of joint convergence in law versus marginal convergence. Example If X U[0, 1] and X n = X for all n, and Y n = X for n odd and Y n = 1 X for n even, then X n d X and Y n d U[0, 1], yet (X n, Y n ) does not converge in law. Suppose {X n, X} are integer-valued random variables. It is not hard to show that X d n X P (X n = k) P (X = k) for every integer k. This is a useful characterization of convergence in law for integer-valued random variables. 8

9 1.2 Fundamental results and theorems on convergence Relationship The results describes the relationship among four convergence modes are summarized as follows. Theorem Let {X n, X} be random variables (vectors). (i) If X n wp1 X, then X n p X. (ii) If X n rth X for a r > 0, then X n p X. (iii) If X n p X, then Xn d X. (iv) If, for every ɛ > 0, n=1 P ( X n X > ɛ) <, then X n wp1 X. Proof. ɛ > 0, (i) is an obvious consequence of the equivalent characterization (1.1); (ii) for any E X n X r E[ X n X r I( X n X > ɛ)] ɛ r P ( X n X > ɛ) and thus P ( X n X > ɛ) ɛ r E X n X r 0, as n. (iii) This is a direct application of Slutsky Theorem; (iv) Let ɛ > 0 be given. We have ( ) P ( X m X ɛ, for some m n) = P { X m X ɛ} P ( X m X ɛ). m=n m=n The last term in the equation above is the tail of a convergent series and hence goes to zero as n. Example Consider iid N(0, 1) random variables X 1, X 2,..., and suppose X n is the mean of the first n observations. For an ɛ > 0, consider n=1 P ( X n > ɛ). By Markov s inequality, P ( X n > ɛ) E[ X n 4] = 3. Since ɛ 4 ɛ 4 n 2 n=1 n 2 <, from Theorem (iv) it wp1 follows that X n 0. 9

10 1.2.2 Transformation It turns out that continuous transformations preserve many types of convergence, and this fact is useful in many applications. We record it next. Its proof can be found on page 24 in Serfling (1980). Theorem (Continuous Mapping Theorem) Let X 1, X 2,... and X be random p- vectors defined on a probability space, and let g( ) be a vector-valued (including real-valued) continuous function defined on R p. If X n converges to X in probability, almost surely, or in law, then g(x n ) converges to X in probability, almost surely, or in law, respectively. Example (i) If X n d N(0, 1), then χ 2 1; (ii) If (X n, Y n ) d N 2 (0, I 2 ), then max{x n, Y n } d max{x, Y }, which has the CDF [Φ(x)] 2. The most commonly considered functions of vectors converging in some stochastic sense are linear and quadratic forms, which is summarized in the following result. Corollary Suppose that the p-vector X n converge to the p-vector X in probability, almost surely, or in law. Let A q p and B p p be matrices. Then AX n AX and X T nbx n X T BX in the given mode of convergence. Proof. The vector-valued function ( p ) T p Ax = a 1i x i,..., a qi x i and the real-valued function x T Bx = p p b ij x i x j j=1 are continuous function of x = (x 1,..., x p ) T. 10

11 Example (i) If X n d N p (µ, Σ), then CX n d N(Cµ, CΣC T ) where C q p is a matrix; Also, (X n µ) T Σ 1 (X n µ) d χ 2 p; (ii) (Sums and products of random variables converging wp1 or in probability) If X n wp1 X and Y n wp1 Y, then X n + Y n wp1 X + Y and X n Y n wp1 XY. Replacing the wp1 with in probability, the foregoing arguments also hold. Remark The condition that g( ) is continuous function in Theorem can be further relaxed to that g( ) is continuous a.s., i.e., P (X C(g)) = 1 where C(g) = {x : g is continuous at x} is called the continuity set of g. Example (i) If X d n X N(0, 1), then 1/X d n Z, where Z has the distribution of 1/X, even though the function g(x) = 1/x is not continuous at 0. This is due to P (X = 0) = 0. However, if X n = 1/n (degenerate distribution) and 1, x > 0, g(x) = 0, x 0, then X d n 0 but g(x n ) d 1 g(0); (ii)if (X n, Y n ) d N 2 (0, I 2 ) then X n /Y d n Cauchy. Example Let {X} n=1 be a sequence of independent random variables where X n has a Poi(θ) distribution. Let X n be the sample mean computed on X 1,..., X n. By definition, we can see that X n p θ as n. If we wish to find a consistent estimator of the standard deviation of X n which is θ 1/2 we can consider transformation is continuous at θ if θ > 0 that X 1/2 X 1/2 n n. CMT implies that the square root p θ 1/2 as n. and Y n In Example 1.2.2, the condition that (X n, Y n ) d N 2 (0, I 2 ) cannot be relaxed to X n d Y where X and Y are independent, i.e., we need the convergence of the joint CDF of (X n, Y n ). This is different when d is replaced by p or wp1, such as in Example (ii). The following result, which plays an important role in probability and statistics, establishes the convergence in distribution of X n + Y n or X n Y n when no information regarding the joint CDF of (X n, Y n ) is provided. d X Theorem (Slutsky s Theorem) Let X n d X and Y n p c, where c is a finite constant. Then, 11

12 (i) X n + Y n d X + c; (ii) X n Y n d cx; (iii) X n /Y n d X/c if c 0. Proof. The method of proof of the theorem is demonstrated sufficiently by proving (i). Choose and fix t such that t c is a continuity point of F X. Let ε > 0 be such that t c + ε and t c ε are also continuity points of F X. Then F Xn+Yn (t) = P (X n + Y n t) P (X n + Y n t, Y n c < ε) + P ( Y n c ε) P (X n t c + ε) + P ( Y n c ε) and, similarly F Xn+Y n (t) P (X n t c ε) P ( Y n c ε). It follows from the previous two inequalities and the hypotheses of the theorem that F X (t c ε) lim inf n F Xn+Yn (t) lim sup F Xn+Yn (t) F X (t c + ε). n Since t c is a continuity point of F X, and since ε can be taken arbitrary small, the above equation yields lim n F Xn+Y n (t) = F X (t c). The result follows from F X (t c) = F X+c (t). Extensions to the vector case is straightforward. (iii) is valid provided C 0 is understood as C being invertible. A straightforward but often used result by this theorem is that X d p n X and X n Y n 0, then Y d n X. In asymptotic practice, we often firstly derive the result such as Y n = X n +o p (1) and then investigate the asymptotic distribution of X n. 12

13 Example (i) Theorem (iii); Furthermore, convergence in probability to a constant is equivalent to convergence in law to the given constant. follows from the part (i). can be proved by definition. Because the degenerate distribution function of constant c is continuous everywhere except for point c, for any ɛ > 0, P ( X n c ɛ) = P (X n c + ɛ) + P (X n c ɛ) 1 F X (c + ɛ) + F X (c ɛ) = 0 The results follows from the definition of convergence in probability. Example Let {X n } n=1 is a sequence of independent random variables where X n Gamma(α n, β n ), where α n and β n are sequences of positive real numbers such that α n α and β n β for some positive real numbers α and β. Also, let ˆβ n be a consistent estimator of β. We can conclude that X n / ˆβ d n Gamma(α, 1). Example (t-statistic) Let X 1, X 2,... be iid random variables with EX 1 = 0 and EX1 2 <. Then the t-statistic n X n /S n, where Sn 2 = (n 1) 1 n (X i X n ) 2 is the sample variance, is asymptotically standard normal. To see this, first note that by two applications of WLLN and CMT ( ) Sn 2 = n 1 Xi 2 n 1 n X n 2 p 1(EX1 2 (EX 1 ) 2 ) = Var(X 1 ). Again, by CMT, S n p Var(X1 ). By the CLT, n X n d N(0, Var(X 1 )). Finally, Slutsky s Theorem gives that the sequence of t-statistics converges in law to N(0, Var(X 1 ))/ Var(X 1 ) = N(0, 1) WLLN and SLLN We next state some theorems known as the laws of large numbers. It concerns the limiting behavior of sums of independent random variables. The weak law of large numbers (WLLN) refers to convergence in probability, whereas the strong of large numbers (SLLN) refers to a.s. convergence. Our first result gives the WLLN and SLLN for a sequence of iid random variables. 13

14 Theorem Let X 1, X 2,..., be iid random variables having a CDF F. (i) The WLLN The existence of constants a n for which 1 n p X i a n 0 holds iff lim x x[1 F (x) + F ( x)] = 0, in which case we may choose a n = n xdf (x). n (ii) The SLLN The existence of a constant c for which holds iff E[X 1 ] is finite and equals c. 1 n X i wp1 c Example Suppose {X i } is a sequence of independent random variables where X i t(2). The variance of X i does not exist, but Theorem still applies to this case and we can still therefore conclude that X n p 0 as n. The next result is for sequences of independent but not necessarily identically distributed random variables. Theorem Let X 1, X 2,..., be random variables with finite expectations. (i) The WLLN Let X 1, X 2,..., be uncorrelated with means µ 1, µ 2,... and variances σ 2 1, σ 2 2,.... If lim n 1 n 2 n σ2 i = 0, then 1 n X i 1 n p µ i 0. (ii) The SLLN Let X 1, X 2,..., be independent with means µ 1, µ 2,... and variances σ 2 1, σ 2 2,.... If σ2 i /c 2 i < where c n ultimately monotone and c n, then c 1 n (X i µ i ) wp

15 (iii) The SLLN with common mean Let X 1, X 2,..., be independent with common mean µ and variances σ1, 2 σ2, If σ 2 i =, then X i / σi 2 σ 2 wp1 i µ. A special case of Theorem (ii) is to set c i = i in which we have 1 n X i 1 n µ i wp1 0. The proof of Theorems and can be found in Billingsley (1995). indep Example Suppose X i (µ, σi 2 ). Then, by simple calculus, the BLUE (best linear unbiased estimate) of µ is n σ 2 i X i / n σ 2 i. Suppose now that the σi 2 do not grow at a rate faster than i; i.e., for some constant K, σi 2 ik. Then, n σ 2 i clearly diverges as n, and so by Theorem (iii) the BLUE of µ is strongly consistent. Example Suppose (X i, Y i ), i = 1,..., n are iid bivariate samples from some distribution with E(X 1 ) = µ 1, E(Y 1 ) = µ 2, Var(X 1 ) = σ1, 2 Var(Y 1 ) = σ2, 2 and corr(x 1, Y 1 ) = ρ. Let r n denote the sample correlation coefficient. The almost sure convergence of r n to ρ follow very easily. We write r n = 1 Xi Y n i XȲ ( Xi 2 X n 2 )(, Yi 2 Ȳ n 2 ) then from the SLLN for iid random variables (Theorem 1.2.4) and continuous mapping theorem (Theorem 1.2.2; Example (ii)), r n wp1 E(X 1Y 1 ) µ 1 µ 2 σ 2 1σ 2 2 = ρ Characterization of convergence in law Next we provide a collection of basic facts about convergence in distribution. The following theorems provide methodology for establishing convergence in distribution. 15

16 Theorem Let X, X 1, X 2,... random p-vectors. (i) (The Portmanteau Theorem) X d n X is equivalent to the following condition: E[g(X n )] E[g(X)] for every bounded continuous function g. (ii) (Levy-Cramer continuity theorem) Let Φ X, Φ X1, Φ X2,... be the character functions of X, X 1, X 2,..., respectively. X d n X iff lim n Φ Xn (t) = Φ X (t) for all t R p. (iii) (Cramer-Wold device) X d n X iff c T X d n c T X for every c R p. Proof. (i) See Serfling (1980), page 16; (ii) Shao (2003), page 57; (iii) Assume c T X n d c T X for any c, then by Theorem (ii) lim Φ X n (tc 1,..., tc p ) = Φ X (tc 1,..., tc p ), for all t. n With t = 1, and since c is arbitrary, it follows by Theorem (ii) again that X n d X. The converse can be proved by a similar argument. [Φ c T X n (t) = Φ Xn (tc) and Φ c T X(t) = Φ X (tc) for any t R and any c R p.] A straightforward application of Theorem is that if X n d X and Y n d c for constant vector c, then (X n, Y n ) d (X, c). Example Example revisited. Consider now the function g(x) = x 10, 0 x 1. Note that g is continuous and bounded. Therefore, by the Portmanteau theorem, E(g(X n )) = n i 10 E(g(X)) = 1 n 11 0 x10 dx = Example For n 1, 0 p 1, and a given continuous function g : [0, 1] R, define the sequence B n (p) = g( k n )Ck np k (1 p) n k, k=0 which is so-called Bernstein polynomials. Note that B n (p) = E[g( X ) X Bin(n, p)]. As n X p n, p (WLLN), and it follows that X d δ n n p, the point mass at p. Since g is continuous and hence bounded (compact interval), it follows from the Portmanteau theorem that B n (p) g(p). 16

17 Example (i) Let X 1,..., X n be independent random variables having a common CDF and T n = X X n, n = 1, 2,.... Suppose that E X 1 <. It follows from the property of CHF and Taylor expansion that the CHF of X 1 1EX, [ 2 Φ X (t) ] t 2 t=0 = EX 2 ] Φ X1 (t) = Φ X1 (0) + 1µt + o( t ) as t 0, where µ = EX 1. Then, it follows that the CHF of T n /n is [ ( n [ t 1µt Φ Tn/n(t) = Φ X1 = o( t n n)] 1 ) n satisfies [ Φ X(t) t ] t=0 = for any t R as n. Since (1 + c n /n) n exp{c} for any complex sequence c n satisfying c n c, we obtain that Φ Tn/n(t) exp{ 1µt}, which is the CHF of the distribution degenerated at µ. By Theorem (ii), T n /n d µ. From (i), this also shows that T n /n p µ (an informal proof of WLLN); (ii) Similarly, µ = 0 and σ 2 = Var(X 1 ) < imply [second-order Taylor expansion] Φ Tn/ n(t) = [1 σ2 t 2 2n + o(t2 n 1 ) for any t R as n, which implies that Φ Tn/ n(t) exp{ σ 2 t 2 /2}, the CHF of N(0, σ 2 ). Hence, T n / n d N(0, σ 2 ); (iii) Suppose now that X 1,..., X n are random p-vectors and µ = EX 1 and Σ = Cov(X 1 ) are finite. For any fixed c R p, it follows from the previous discussion that (c T T n nc T µ)/ n d N(0, c T Σc). From Theorem (iii), we conclude that (T n nµ)/ n d N p (0, Σ). ] n ] n The following two simple results are frequently useful in calculations. Theorem (i) (Prohorov s Theorem) If X n d X for some X, then X n = O p (1). (ii) (Polya s Theorem) If F Xn F X and F X is continuous, then as n, sup F Xn F X 0. <x< Proof. (i) For any given ε > 0, fix a constant M such that P (X M) < ε. By the definition of convergence in law, P ( X n M) exceeds P ( X M) arbitrarily small for 17

18 sufficiently large n. Thus, there exists N such that P ( X n M) < 2ε, for all n N. The results follows from the definition of O p (1). (ii) Firstly, fix k N. By the continuity of F there exists points = x 0 < x 1 < < x k = with F (x i ) = i/k. By monotonicity, we have, for x i 1 x x i, F Xn (x) F X (x) F Xn (x i ) F X (x i 1 ) = F Xn (x i ) F X (x i ) + 1/k F Xn (x i 1 ) F X (x i ) = F Xn (x i 1 ) F X (x i 1 ) 1/k. Thus, F Xn (x) F X (x) is bounded above by sup i F Xn (x i ) F X (x i ) + 1/k, for every x. The latter, finite supremum converges to zero because each term converges to zero due to the condition, for each fixed k. Because k is arbitrary, the result follows. The following result can be used to check whether X n X n has a PDF f n. d X when X has a PDF f and Theorem (Scheffe Theorem) Let f n be a sequence of densities of absolutely continuous functions,, with lim n f n (x) = f(x), each x R p. If f is a density function, then lim n fn (x) f(x) dx = 0. Proof. Put g n (x) = [f(x) f n (x)]i f(x) fn(x). By noting that [f n (x) f(x)]dx = 0, f n (x) f(x) dx = 2 g n (x)dx. Since 0 g n (x) f(x) for all x. Hence, by dominated convergence, lim n gn (x)dx = 0. [Dominated convergence theorem. If lim n f n = f and there exists an integrable function g such that f n g, then lim n fn (x)dx = lim n f n (x)dx holds] As an example, consider the PDF f n of the t- distribution t n, n = 1, 2,.... One can show (exercise) that f n f, where f is the standard normal PDF. The following result provides a convergence of moments criterion for convergence in law. Theorem (Frechet and Shohat Theorem) Let the distribution function F n possess finite moments α nk = t k df n (t) for k = 1, 2,... and n = 1, 2,.... Assume that the limits α k = lim n α nk exist (finite) for each k. Then, 18

19 (i) the limits α k are the moments of some a distribution function F ; (ii) if the F given by (i) is unique, then F n F. [A sufficient condition: the moment sequence α k determines the distribution F uniquely if the Carleman condition α 1/(2i) = holds.] Results on o p and O p There are many rules of calculus with o and O symbols, which we will apply without comment. For instance, o p (1) + o p (1) = o p (1), o p (1) + O p (1) = O p (1), O p (1)o p (1) = o p (1) (1 + o p (1)) 1 = O p (1), o p (R n ) = R n o p (1), O p (R n ) = R n O p (1), o p (O p (1)) = o p (1). Two more complicated rules are given by the following lemma. Lemma Let g be a function defined on R p such that g(0) = 0. Let X n be a sequence of random vectors with values on R that converges in probability to zero. Then, for every r > 0, (i) if g(t) = o( t r ) as t 0, then g(x n ) = o p ( X n r ); (ii) if g(t) = O( t r ) as t 0, then g(x n ) = O p ( X n r ). Proof. Define f(t) = g(t)/ t r for t 0 and f(0) = 0. Then g(x n ) = f(x n ) X n r. (i) Because the function f is continuous at zero by assumption, f(x n ) p f(0) = 0 by Theorem (ii) By assumption there exists M and δ > 0 such that f(t) M whenever t δ. Thus P ( f(x n ) > M) P ( X n > δ) 0, and the sequence f(x n ) is bounded. 19

20 1.3 The central limit theorem The most fundamental result on convergence in law is the central limit theorem (CLT) for sums of random variables. We firstly state the case of chief importance, iid summands. Definition A sequence of random variables X n is asymptotically normal with µ n and σ 2 n if (X n µ n )/σ n d N(0, 1), written by X n is AN(µ n, σ 2 n) The CLT for the iid case Theorem (Lindeberg-Levy) Let X i be iid with mean µ and finite variance σ 2. Then ( ) n X µ d N(0, 1). σ By Slutsky s Theorem, we can write n ( X µ ) d N(0, σ 2 ). Also, X is AN(µ, σ 2 /n). See Billingsley (1995) for a proof. Example (Confidence intervals) This theorem can be used to approximate P ( X µ + kσ n ) by Φ(k). This is very useful because the sampling distribution of X is not available except for some special cases. Then, setting k = Φ 1 (1 α) = z α, [ X n σ/ nz α, X n + σ/ nz α ] is a confidence interval for µ of asymptotic level 1 2α. More precisely, we have that the probability that µ is contained in this interval converges to 1 2α (how accurate?). Example (Sample variance) Suppose X 1,..., X n are iid with mean µ, variance σ 2 and E(X1) 4 <. Consider the asymptotic distribution of Sn 2 = 1 n n 1 (X i X n ) 2. Write n(s 2 n σ 2 ) = ( ) 1 n (X i µ) 2 σ 2 n n n 1 n 1 ( X n µ) 2. The second term converges to zero in probability and the first term is asymptotically normal by the CLT. The whole expression is asymptotically normal by the Slutsky Theorem, i.e., n(s 2 n σ 2 ) d N(0, µ 4 σ 4 ), 20

21 where µ 4 denotes the centered fourth moment of X 1 and µ 4 σ 4 comes certainly from computing the variance of (X 1 µ) 2. Example (Level of the Chi-square test) Normal theory prescribes to reject the null hypothesis H 0 : σ 2 1 for values of ns 2 n exceeding the upper α point χ 2 n 1,α of the χ 2 n 1 distribution. If the observations are sample from a normal distribution, the test has exactly level α. However, this is not approximately the case of the underlying distribution is not normal. The CLT and the Example yield the following two statements χ 2 n 1 (n 1) d N(0, 1), ( ) S 2 n n 2(n 1) σ 1 d N(0, κ + 2), 2 where κ = µ 4 /σ 4 3 is the kurtosis of the underlying distribution. The first statement implies that (χ 2 n 1,α (n 1))/ 2(n 1) converges to the upper α point z α of N(0, 1). Thus, the level of the chi-square test satisfies ( ( ) ) ( ) n S P H0 (nsn 2 > χ 2 2 n 1,α) = P n σ 1 > χ2 n 1,α n z α 2 1 Φ 2 n k + 2 So, the asymptotic level reduces to 1 Φ(z α ) = α iff the kurtosis of the underlying distribution is 0. If the kurtosis goes to infinity, then the asymptotic level approaches to 1 Φ(0) = 1/2. We conclude that the level of the chi-square test is nonrobust against departures of normality that affect the value of the kurtosis. If, instead, we would use a normal approximation to the distribution n(s 2 n/σ 2 1) the problem would not arise, provided that the asymptotic variance κ + 2 is estimated accurately. Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n ( X µ ) d N p (0, Σ). Proof. By the Cramer-Wold device, this can be proved by finding the limit distribution of the sequences of real variables ( c T 1 n ) (X i µ) = 1 n 21 (c T X i c T µ).

22 Because the random variables c T X i c T µ are iid with zero mean and variance c T Σc, this sequence is AN(0, c T Σc) by Theorem This is exactly the distribution of c T X if X possesses an N p (0, Σ). Example Suppose that X 1,..., X n is a random sample from the Poisson distribution with mean θ. Let Z n be the proportions of zero observed, i.e., Z n = 1/n n I {X j =0}. Let us find the joint asymptotic distribution of ( X n, Z n ). Note that E(X 1 ) = θ, EI {X1 =0} = e θ, Var(X 1 ) = θ, Var(I {X1 =0}) = e θ (1 e θ ), and EX 1 I {X1 =0} = 0. So, Cov(X 1, I {X1 =0}) = θe θ. Hence, n ( ( X n, Z n ) (θ, e θ ) ) d N 2 (0, Σ), where Σ = θ θe θ θe θ e θ (1 e θ ). It is not as widely known that existence of a variance is not necessary for asymptotic normality of partial sums of iid random variables. A CLT without a finite variance can sometimes be useful. We present the general result below and then give an illustrative example. Feller (1966) contains detailed information on the availability of CLTs without the existence of a variance, along with proofs. First, we need a definition. Definition A function g : R R is called slowly varying at if, for every t > 0, lim x g(tx)/g(x) = 1. Examples of slowly varying functions are log x, x/(1 + x), and indeed any function with a finite limit as x. But, for example, x or e x are not slowly varying. Theorem Let X 1, X 2,... be iid from a CDF F on R. Let v(x) = x x y2 df (y). Then, there exist constants {a n }, {b n } such that if and only if v(x) is slowly varying at. n X i a n b n d N(0, 1), 22

23 If F has a finite second moment, then automatically v(x) is slowly varying at. We present an example below where asymptotic normality of the sample partial sums still holds, although the summands do not have a finite variance. Example Suppose X 1, X 2,... are iid from a t-distribution with 2 degrees of freedom (t(2)) that has a finite mean but not a finite variance. The density is given by f(y) = c/(2 + y 2 ) 3 2 for some positive c. Hence, by a direct integration, for some other constant k, 1 [ v(x) = k x 2 + x 2 + x 2 arcsinh(x/ ] 2). 2 Therefore, on using the fact that arcsinh(x) = log(2x) + O(x 2 ) as x, we get, for v(tx) any t > 0, 1 on some algebra. It follows that for iid observations from a t(2) v(x) distribution, on suitable centering and normalizing, the partial sums n X i converge to a normal distribution, although the X i s do not have a finite variance. The centering can be taken to be zero for the centered t-distribution; it can be shown that the normalizing required is b n = n log n (why?) The CLT for the independent not necessarily iid case Theorem (Lindeberg-Feller) Suppose X n is a sequence of independent variables with means µ n and variances σ 2 n <. Let s 2 n = n σ2 i. If for any ɛ > 0 1 s 2 n j=1 where F i is the CDF of X i, then x µ j >ɛs n (x µ j ) 2 df j (x) 0, (1.2) (X i µ i ) s n d N(0, 1). A proof can be seen on page 67 in Shao (2003). The condition (1.2) is called Lindeberg-Feller condition. 23

24 Example Let X 1, X 2..., be independent variables such that X j has the uniform distribution on [ j, j], j = 1, 2,.... Let us verify the conditions of Theorem are satisfied. Note that EX j = 0 and σj 2 = 1 j 2j j x2 dx = j 2 /3 for all j. Hence, s 2 n = σj 2 = 1 3 j=1 j 2 = j=1 n(n + 1)(2n + 1). 18 For any ɛ > 0, n < ɛs n for sufficiently large n, since lim n n/s n = 0. Because X j j n, when n is sufficiently large, E(Xj 2 I { Xj >ɛs n}) = 0. Consequently, lim n n j=1 E(X2 j I { Xj >ɛs n}) <. Considering s n, Lindeberg s condition holds. The Lindeberg- Feller theorem is a landmark theorem in probability and statistics. Generally, it is hard to verify the Lindeberg-Feller condition. A simpler theorem is the following. Theorem (Liapounov) Suppose X n is a sequence of independent variables with means µ n and variances σ 2 n <. Let s 2 n = n σ2 i. If for some δ > 0 1 s 2+δ n E X j µ j 2+δ 0 (1.3) j=1 as n, then (X i µ i ) s n d N(0, 1). A proof is given in Sen and Singer (1993). For instance, if s n, sup j 1 E X j µ j 2+δ < and n 1 s n is bounded, then the condition of Liapounov s theorem is satisfied. In practice, usually one tries to work with δ = 1 or 2 for algebraic convenience. It can be easily checked that if X i is uniformly bounded and s n, the condition is immediately satisfied with δ = 1. Example Let X 1, X 2,... be independent random variables. Suppose that X i has the binomial distribution BIN(p i, 1), i = 1, 2,.... For each i, EX i = p i and E X i EX i 3 = 24

25 (1 p i ) 3 p i + p 3 i (1 p i ) 2p i (1 p i ). Hence, n E X i EX i 3 2s 2 n = 2 n E X i EX i 2 = 2 n p i(1 p i ). Then Liapounov s condition (1.3) holds with δ = 1 if s n. For example, if p i = 1/i or M 1 p i M 2 with two constants belong to (0, 1), s n n holds. Accordingly, by Liapounov s theorem, (X i p i ) d s n N(0, 1). A consequence especially useful in regression is the following theorem, which is also proved in Sen and Singer (1993). Theorem (Hajek-Sidak) Suppose X 1, X 2,... are iid random variables with mean µ and variance σ 2 <. Let c n = (c n1, c n2,..., c nn ) be a vector of constants such that as n. Then max 1 i n c 2 ni c 2 nj j=1 c ni (X i µ) σ c 2 nj j=1 0 (1.4) d N(0, 1). The condition (1.4) is to ensure that no coefficient dominates the vector c n, and is referred as Hajek-Sidak condition in the literatures. For example, if c n = (1, 0,..., 0), then the condition would fail and so would the theorem. The Hajek-Sidak s theorem has many applications, including in the regression problem. Here is an important example. Example (Simplest linear regression) Consider the simple linear regression model y i = β 0 + β 1 x i + ε i, where ε i s are iid with mean 0 and variance σ 2 but are not necessarily normally distributed. The least squares estimate of β 1 based on n observations is n β 1 = (y i ȳ n )(x i x n ) n n (x = β i x n ) ε i(x i x n ) n (x i x n ). 2 So, β1 = β 1 + n ε ic ni / n j=1 c2 nj, where c ni = x i x n. Hence, by the Hajek-Sidak s Theorem j=1 c 2 nj β 1 β 1 σ = n ε ic ni n σ j=1 c2 nj d N(0, 1), 25

26 provided max 1 i n (x i x n ) 2 n j=1 (x j x n ) 2 0 as n. For most reasonable designs, this condition is satisfied. Thus, the asymptotic normality of the LSE (least squares estimate) is established under some conditions on the design variables, an important result. Theorem (Lindeberg-Feller multivariate) Suppose X i is a sequence of independent vectors with means µ i, covariances Σ i and distribution function F i. Suppose that 1 n n Σ i Σ as n, and that for any ɛ > 0 1 x µ n j 2 df j (x) 0, then j=1 x µ j >ɛ n 1 n (X i µ i ) d N(0, Σ). Example (multiple regression) In the linear regression problem, we observe a vector y = Xβ + ε for a fixed or random matrix X of full rank, and an error vector ε with iid components with mean zero and variance σ 2. The least squares estimator of β is β = (X T X) 1 X T y. This estimator is unbiased and has covariance matrix σ 2 (X T X) 1. If the error vector ε is normally distributed, then β is exactly normally distributed. Under reasonable conditions on the design matrix, β is asymptotically normally distributed for a large range of error distributions. Here we fix p and let n tend to infinity. This follows from the representation (X T X) 1/2 ( β β) = (X T X) 1/2 X T ε = a ni ε i, where a n1,..., a nn are the columns of the (p n) matrix (X T X) 1/2 X T =: A. This sequence is asymptotically normal if the vectors a n1 ε 1,..., a nn ε n satisfy the Lindeberg conditions. The norming matrix (X T X) 1/2 has been chosen to ensure that the vectors in the display have covariance matrix σ 2 I p for every n. The remaining condition is a ni 2 Eε 2 i I { ani ε i >ɛ} 0. 26

27 This can be simplified to other conditions in several ways. Because a ni 2 = tr(aa T ) = p, it suffices that max i Eε 2 i I { ani ε i >ɛ} 0, which is also equivalent to max i a ni 0. Alternatively, the expectation Eε 2 i I { ani ε i >ɛ} can be bounded ɛ k E ε i k+2 a ni k and a second set of sufficient conditions is a ni k 0; E ε 1 k <, k > CLT for a random number of summands The canonical CLT for the iid case says that if X 1, X 2,... are iid with mean zero and a finite variance σ 2, then the sequence of partial sums T n = n X i obeys the central limit theorem in the sense Tn σ d N(0, 1). There are some practical problems that arise in applications, for n example in sequential statistical analysis, where the number of terms present in a partial sum is a random variable. Precisely, {N(t)}, t 0, is a family of (nonnegative) integer-valued random variables, and we want to approximate the distribution of T N(t), where for each fixed n, T n is still the sum of n iid variables as above. The question is whether a CLT still holds under appropriate conditions. Here is the Anscombe-Renyi theorem. Theorem (Anscombe-Renyi) Let X i be iid with mean µ and a finite variance σ 2, and let {N n }, be a sequence of (nonnegative) integer-valued random variables and {a n } a sequence of positive constants tending to such that N n /a n p c, 0 < c <, as n. Then, T Nn N n µ σ N n d N(0, 1) as n. Example (coupon collection problem) Consider a problem in which a person keeps purchasing boxes of cereals until she obtains a full set of some n coupons. The assumptions are that the boxes have an equal probability of containing any of the n coupons mutually independently. Suppose that the costs of buying the cereal boxes are iid with some mean µ and some variance σ 2. If it takes N n boxes to obtain the complete set of all n coupons, then N n /(n ln n) p 1 as n The total cost to the customer to obtain the 27

28 complete set of coupons is T Nn = X X Nn. By the Anscombe-Renyi theorem and Slutsky s theorem, we have that T Nn Nnµ σ n ln n is approximately N(0, 1). [On the distribution of N n. Let t i be the boxes to collect the i-th coupon after i 1 coupons have been collected. Observe that the probability of collecting a new coupon given i 1 coupons is p i = (n i+1)/n. Therefore, t i has a geometric distribution with expectation 1/p i and N n = n t i. By Theorem 1.2.5, we know 1 n ln n N p n 1 p 1 i = 1 n 1 n ln n n ln n i = 1 ln n 1 i =: 1 ln n H n. Note that H n is the harmonic number and hence by using the asymptotics of the harmonic numbers (H n = ln n + γ + o(1); γ is Euler-constant), we obtain Nn 1.] n ln n Central limit theorems for dependent sequences The assumption that observed data X 1, X 2,... form an independent sequence is often one of technical convenience. Real data frequently exhibit some dependence and at the least some correlation at small lags. Exact sampling distributions for fixed n are even more complicated for dependent data than in the independent case, and so asymptotics remain useful. In this subsection, we present CLTs for some important dependence structures. The cases of stationary m-dependence and without replacement sampling are considered. Stationary m-dependence We start with an example to illustrate that a CLT for sample means can hold even if the summands are not independent. Example Suppose X 1, X 2,... is a stationary Gaussian sequence with E(X i ) = µ, Var(X i ) = σ 2 <. Then, for each n, n( X n µ) is normally distributed and so n( X n µ) d N(0, τ 2 ), provided τ 2 = lim n Var( n( X n µ)) <. But Var( n( X n µ)) = σ Cov(X i, X j ) = σ n n i j 28 (n i)γ i,

29 where γ i = Cov(X 1, X i+1 ). Therefore, τ 2 < if and only if 1 n (n i)γ i has a finite limit, say ρ, in which case n( X n µ) d N(0, σ 2 + ρ). What is going on qualitatively is that 1 (n i)γ n i is summable when γ i 0 adequately fast. Instances of this are when only a fixed finite number of the γ i are nonzero or when γ i is damped exponentially; i.e., γ i = O(a i ) for some a < 1. It turns out that there are general CLTs for sample averages under such conditions. The case of m-dependence is provided below. Definition A stationary sequence {X n } is called m-dependent for a given fixed m if (X 1,..., X i ) and (X j, X j+1,...) are independent whenever j i > m. Theorem (m-dependent sequence) Let {X i } be a stationary m-dependent sequence. Let E(X i ) = µ and Var(X i ) = σ 2 <. Then n( X n µ) d N(0, τ 2 ), where τ 2 = σ m+1 i=2 Cov(X 1, X i ). See Lehmann (1999) for a proof; m-dependent data arise either as standard time series models or as models in their own right. For example, if {Z i } are i.i.d. random variables and X i = a 1 Z i 1 + a 2 Z i 2, i 3, then {X i } is 1-dependent. This is a simple moving average process of use in time series analysis. X i = h(z i, Z i+1,..., Z i+m ) for some function h. A more general m-dependent sequence is Example Suppose Z i are i.i.d. with a finite variance σ 2, and let X i = (Z i +Z i+1 )/2. Then, obviously n X i = Z 1+Z n n i=2 Z i. Then, by Slutsky s theorem, n( X n µ) d N(0, σ 2 ). Notice we write n( X n µ) into two parts in which one part is dominant and produces the CLT, and the other part is asymptotically negligible. This is essentially the method of proof of the CLT for more general m-dependent sequences. Sampling without replacement Dependent data also naturally arise in sampling without replacement from a finite population. Central limit theorems are available and we will present them shortly. But let us start 29

30 with an illustrative example. Example Suppose, among N objects in a population, D are of type 1 and N D of type 2. A sample without replacement of size n is taken, and let X be the number of sampled units of type 1. We can regard these D type 1 units as having numerical values X 1,..., X D = 1 and the rest as having values X D+1,..., X N X N1,..., X Nn correspond to the sampled units. Of course, X has the hypergeometric distribution N D P (X = x) = Cx D Cn x C n N, 0 x D. = 0, X = n X N i, where Two configurations can be thought of: (a) n is fixed, and D/N p, 0 < p < 1 with N. In this case, by applying Stirlings approximation to N! and D!, P (X = x) C x np x (1 p) x, and so X d Bin(n, p); (b) n, N, N n, D/N p, 0 < p < 1. This is the case where convergence of X to normality holds. Here is a general result; again, see Lehmann (1999) for a proof. Theorem For N 1, let π N be a finite population with numerical values X 1, X 2,... X N. Let X N1, X N2,..., X Nn be the values of the units of a sample without replacement of size n. Let X n = n X N i /n and X N = N X N/N. Suppose n, N n, and (a) (b) max 1 i N (X i X N ) 2 0, N (X i X N ) 2 and n/n 0 < τ < 1 as N ; N max 1 i N (X i X N ) 2 = O(1), as N. N (X i X N ) 2 Then, X n E( X n ) Var( Xn ) d N(0, 1). 30

31 Example Suppose X N1,..., X Nn is a sample without replacement from the set {1, 2,..., N}, and let X n = n X N i /n. Then, by a direct calculation, E( X n ) = N + 1, Var( 2 X n ) = (N n)(n + 1). 12n Furthermore, Hence, by Theorem , N max 1 i N (X i X N ) 2 = N (X i X n ) 2 X n E( X n) Var X n d N(0, 1). 3(N 1) N + 1 = O(1) Accuracy of CLT Suppose a sequence of CDFs F Xn d F X for some F X. Such a weak convergence result is usually used to approximate the true value of F Xn (x) at some fixed n and x by F X (x). However, the weak convergence result by itself says absolutely nothing about the accuracy of approximating F Xn (x) by F X (x) for that particular value of n. To approximate F Xn (x) by F X (x) for a given finite n is a leap of faith unless we have some idea of the error committed; i.e., F Xn (x) F X (x). More specifically, if for a sequence of random variables X 1,..., X n X n E( X n ) Var( Xn ) d Z N(0, 1), then we need some idea of the error ( Xn P E( X ) n ) Var( Xn ) x Φ(x). in order to use the central limit theorem for a practical approximation with some degree of confidence. The first result for the iid case in this direction is the classic Berry-Esseen theorem. Typically, these accuracy measures give bounds on the error in the appropriate CLT for any fixed n, making assumptions about moments of X i. In the canonical iid case with a finite variance, the CLT says that n( X µ)/σ converges in law to the N(0, 1). By Polya s theorem, the uniform error n = sup <x< P ( n( X µ)/σ x) Φ(x) 0 as n. Bounds on n for any given n are called uniform bounds. 31

32 The following results are the classic Berry-Esseen uniform bound and an extension of the Berry-Esseen inequality to the case of independent but not iid variables.; a proof can be seen in Petrov (1975). Introducing higher-order moment assumptions (third), the Berry-Esseen inequality assert for this convergence the rate O(n 1/2 ). Theorem (i) (Berry-Esseen; iid case) Let X 1,..., X n be iid with E(X 1 ) = µ, Var(X 1 ) = σ 2, and β 3 = E X 1 µ 3 <. Then there exists a universal constant C, not depending on n or the distribution of the X i, such that ( ) sup n( P Xn µ) x Φ(x) x σ Cβ 3 σ 3 n. (ii) (independent but not iid case) Let X 1,..., X n be independent with E(X i ) = µ i, Var(X i ) = σ 2 i, and β 3i = E X i µ i 3 <. Then there exists a universal constant C, not depending on n or the distribution of the X i, such that ( Xn sup x P E( X ) n ) n Var( Xn ) x Φ(x) C β 3i ( n σ2 i )3/2. It is the best possible rate in the sense of not being subject to improvement without narrowing the class of distribution functions considered. For some specific underlying CDFs F X, better rates of convergence in the CLT may be possible. This issue will be clearer when we discuss asymptotic expansions for P ( n( X n µ)/σ x). In Theorem (i), the universal constant C may be taken as C = 0.8. Example The Berry-Esseen bound is uniform in x, and it is valid for any n 1. While these are positive features of the theorem, it may not be possible to establish that n ɛ for some preassigned ɛ > 0 by using the Berry-Esseen theorem unless n is very large. Let us see an illustrative example. Suppose X 1,..., X n iid BIN(p, 1) and n = 100. Suppose we want the CLT approximation to be accurate to within an error of n = In the Bernoulli case, β 3 = pq(1 2pq), where q = 1 p. Using C = 0.8, the uniform Berry-Esseen bound is n 0.8pq(1 2pq) (pq) 3/2 n. 32

33 This is less than the prescribed n = iff pq > , which does not hold for any 0 < p < 1. Even for p = 0.5, the bound is less than or equal to n = only when n > 25, 000, which is a very large sample size. Of course, this is not necessarily a flaw of the Berry-Esseen inequality itself because the desire to have a uniform error of at most n = is a tough demand, and a fairly large value of n is probably needed to have such a small error in the CLT. Example As an example of independent variables that are not iid, consider X i BIN(i 1, 1), i 1, and let S n = n X i. Then, E(S n ) = n i 1, Var(S n ) = n (i 1)/i2 and β 3i = (i 1)(i 2 2i + 2)/i 4. Therefore, from Theorem (ii), n C n (i 1)(i2 2i + 2)/i 4 n [(i 1)/i2 ] 3/2 Observe now n (i 1)/i2 = log n + O(1) and n (i 1)(i2 2i + 2)/i 4 = log n + O(1). Substituting these back into the Berry-Esseen bound, one obtains with some minor algebra that n = O(log n) 1/2. For x sufficiently large, while n remains fixed, the quantities F Xn (x) and F X (x)each become so close to 1 that the bound given in Theorem is too rude. There has been a parallel development on developing bounds on the error in the CLT at a particular x as opposed to bounds on the uniform error. Such bounds are called local Berry-Esseen bounds. Many different types of local bounds are available.we present here just one. Theorem Let X 1,..., X n E X i µ i 2+δ < for some 0 < δ 1. Then ( Xn P E( X ) n ) Var( Xn ) x Φ(x) for some universal constant 0 < D <. be independent with E(X i ) = µ i, Var(X i ) = σ 2 i, and D n E X i µ i 2+δ 1 + x 2+δ ( n σ2 i )1+ δ 2. Such local bounds are useful in proving convergence of global error criteria such as FXn (x) Φ(x) p dx or for establishing approximations to the moments of F Xn. Uniform error bounds would be useless for these purposes. If the third absolute moments are finite, 33

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. 1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Convergence in Distribution

Convergence in Distribution Convergence in Distribution Undergraduate version of central limit theorem: if X 1,..., X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal

More information

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University Section 27 The Central Limit Theorem Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 3000, R.O.C. Identically distributed summands 27- Central

More information

Chapter 7: Special Distributions

Chapter 7: Special Distributions This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory Charles J. Geyer School of Statistics University of Minnesota 1 Asymptotic Approximation The last big subject in probability

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

STAT 7032 Probability. Wlodek Bryc

STAT 7032 Probability. Wlodek Bryc STAT 7032 Probability Wlodek Bryc Revised for Spring 2019 Printed: January 14, 2019 File: Grad-Prob-2019.TEX Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221 E-mail address:

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

On the convergence of sequences of random variables: A primer

On the convergence of sequences of random variables: A primer BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM May 2012 2 A sequence a :

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

Lecture 5: Expectation

Lecture 5: Expectation Lecture 5: Expectation 1. Expectations for random variables 1.1 Expectations for simple random variables 1.2 Expectations for bounded random variables 1.3 Expectations for general random variables 1.4

More information