Sampling Distributions and Asymptotics Part II

Similar documents
Contents 1. Contents

Lecture 21: Convergence of transformations and generating a random variable

Economics 241B Review of Limit Theorems for Sequences of Random Variables

ECON 3150/4150, Spring term Lecture 6

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

MC3: Econometric Theory and Methods. Course Notes 4

Lecture 2: Review of Probability

The Delta Method and Applications

2014 Preliminary Examination

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Large Sample Properties of Estimators in the Classical Linear Regression Model

Lecture 14: Multivariate mgf s and chf s

Gibbs Sampling in Linear Models #2

Economics 241B Estimation with Instruments

ECON 4160, Autumn term Lecture 1

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

c 2007 Je rey A. Miron

STAT 512 sp 2018 Summary Sheet

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Lecture 28: Asymptotic confidence sets

Notes on Random Vectors and Multivariate Normal

Simple Linear Regression

We begin by thinking about population relationships.

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Regression #4: Properties of OLS Estimator (Part 2)

Chapter 5 continued. Chapter 5 sections

Stat 5101 Notes: Algorithms

Asymptotic Statistics-III. Changliang Zou

Gibbs Sampling in Endogenous Variables Models

Chapter 5: Joint Probability Distributions

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Economics 583: Econometric Theory I A Primer on Asymptotics

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Part 6: Multivariate Normal and Linear Models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Moments of the maximum of the Gaussian random walk

Lecture Notes on Measurement Error

Chapter 6: Large Random Samples Sections

HT Introduction. P(X i = x i ) = e λ λ x i

review session gov 2000 gov 2000 () review session 1 / 38

A Bahadur Representation of the Linear Support Vector Machine

STA 2201/442 Assignment 2

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Consistent Bivariate Distribution

Convergence in Distribution

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Introduction to Estimation Methods for Time Series models Lecture 2

Stat 5101 Lecture Notes

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Confidence Intervals Unknown σ

Theoretical Statistics. Lecture 1.

Lecture 2: Repetition of probability theory and statistics

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Chapter 12: Bivariate & Conditional Distributions

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

δ -method and M-estimation

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Chapter 3: Maximum Likelihood Theory

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

1: PROBABILITY REVIEW

Economics 620, Lecture 20: Generalized Method of Moment (GMM)

Economics 620, Lecture 8: Asymptotics I

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Is a measure of the strength and direction of a linear relationship

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

Week 9 The Central Limit Theorem and Estimation Concepts

Lecture 1: August 28

ST 740: Linear Models and Multivariate Normal Inference

Asymptotic Statistics-VI. Changliang Zou

The Logit Model: Estimation, Testing and Interpretation

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Econometrics Lecture 1 Introduction and Review on Statistics

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Simple Linear Regression: The Model

Multiple Random Variables

Multivariate Random Variable

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Probability Background

STA 2101/442 Assignment 3 1

MCMC algorithms for fitting Bayesian models

Bivariate distributions

STA 732: Inference. Notes 4. General Classical Methods & Efficiency of Tests B&D 2, 4, 5

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Chapter 7: Special Distributions

conditional cdf, conditional pdf, total probability theorem?

Stochastic Processes

Transcription:

Sampling Distributions and Asymptotics Part II Insan TUNALI Econ 511 - Econometrics I Koç University 18 October 2018 I. Tunal (Koç University) Week 5 18 October 2018 1 / 29

Lecture Outline v Sampling Distributions and Asymptotics Part II Bivariate sampling distributions We closely follow Goldberger Ch.10. Two digit equation numbers (such as 10.1, 10.2...) and result references (S1, S2...) in Goldberger have been preserved. See the syllabus for references to secondary sources (Greene). I. Tunal (Koç University) Week 5 18 October 2018 2 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 1 The primary objective of Econometrics is to uncover relations between variables. This requires shifting the focus from univariate to multivariate sampling distributions. As usual there is much to be gained from studying the bivariate case. Consider a bivariate population for which the pmf/pdf of the pair (X, Y ) is f (x, y). The rst and second moments include E (X ) = µ X, E (Y ) = µ Y, V (X ) = σ 2 X, V (Y ) = σ2 Y, C (X, Y ) = σ XY. For nonnegative integers (r, s), the raw and central moments are E (X r Y s ) = µ, rs, E (X r Y s ) = µ rs, where X = X µ X, Y = Y µ Y. For example, we may write µ 20 = σ 2 X, µ 02 = σ2 Y, µ 11 = σ XY. I. Tunal (Koç University) Week 5 18 October 2018 3 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 2 A random sample of size n from the population f (x, y) consists of n independent draws on the pair (X, Y ). Thus {(X i, Y i ), i = 1, 2,..., n} are independently and identically distributed. Independence applies across observations, not within each observation. In general C (X, Y ) = σ XY 6= 0. Armed with this knowledge, the joint pmf/pdf for the random sample can be derived as g n (x 1, y 1, x 2, y 2,..., x n, y n ) = n i=1 f (x i, y i ). I. Tunal (Koç University) Week 5 18 October 2018 4 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 3 Sample statistics that emerge under random sampling from a bivariate pdf/pmf include single-variable statistics such as X, SX 2, Y, S Y 2, but also joint statistics that involve both components of the random vector (X, Y ). One important example is the sample covariance: S XY = 1 n i (X i X )(Y i Y ). We may also be concerned with the joint distribution of several statistics, such as two sample means: X = 1 n i X i, Y = 1 n i Y i. Obviously (X, Y ) will have a joint (bivariate) sampling distribution, by virtue of (X i, Y i ) f (x, y). I. Tunal (Koç University) Week 5 18 October 2018 5 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 4 What can we say about the sampling distribution of (X, Y )? From the SMT, we know the means and variances of X and Y. We can compute their covariance by extending T5 to sums of n random variables, and invoking the i.i.d. property of the pairs: C (X, Y ) = 1 n 2 C i! X i, Y i i = 1 n 2 C (X i, Y i ) = i = σ XY /n. 1 n 2 nσ XY In deriving the asymptotic results we will exploit the S.M.T., the L.L.N. and the C.L.T., along with the Slutsky theorems (and their extensions). I. Tunal (Koç University) Week 5 18 October 2018 6 / 29

- 1 Bivariate derivations - insights from the univariate case The derivations we examined under the univariate case are instructive for the bivariate case. For example, consider the theory for the sampling distribution of the sample covariance: S XY = 1 n i (X i X )(Y i Y ). This is a single-variable statistic, so we should be able to apply what we learned directly. S XY is an average, but since it is a function of sample means, the terms in the summation are not independently distributed. We need to follow the steps we used in the case of S 2 : work with the "ideal" covariance rst, then turn turn to S XY itself. The theory for the sampling distribution of a pair of sample means requires more work: To examine how the sampling distribution of (X, Y ) behaves as n!, we need bivariate versions of the convergence results we studied earlier. Remark: Technically S XY and (X, Y ) are sequences. In the interest of brevity, we drop the subscript "n" that identi es a sequence. I. Tunal (Koç University) Week 5 18 October 2018 7 / 29

- 2 Example 4a: Ideal Sample Covariance This is the sample second joint moment about the population mean: M 11 = 1 n i (X i µ X )(Y i µ Y ) = 1 n i V i = V, where V i = X i Y i. Now {V i, i = 1, 2,..., n} is a random sample on the variable V = X Y, and M11 = V is a sample mean in that random sample. Hence the earlier theory about the sample mean applies directly. I. Tunal (Koç University) Week 5 18 October 2018 8 / 29

- 3 The population mean and variance are E (V ) = E (X Y ) = C (X, Y ) = σ XY = µ 11, V (V ) = E (V 2 ) E 2 (V ) = E (X 2 Y 2 ) E 2 (X Y ) = µ 22 µ 2 11. From the S.M.T. we have the exact results E (M 11 ) = E (V ) = E (V ) = µ 11 = σ XY, V (M 11 ) = V (V ) = V (V )/n = (µ 22 µ 2 11 )/n. From the L.L.N. and the C.L.T. we also have the asymptotic results M11 P.! µ 11, p n(m 11 µ 11 )! D N(0, µ 22 µ 2 11 ), M 11 A N[µ 11, (µ 22 µ 2 11 )/n]. I. Tunal (Koç University) Week 5 18 October 2018 9 / 29

- 4 Example 4b: Sample Covariance The statistic of interest is the sample second joint moment about the sample mean: S XY = M 11 = 1 n i (X i X )(Y i Y ). We may use algebra to write this as M 11 = M 11 (X µ X )(Y µ Y ). (10.1) Direct calculation yields the exact results and E (S XY ) = E [M 11 (X µ X )(Y µ Y )] = σ XY C (X, Y ) = σ XY σ XY /n = (1 1 n )σ XY, V (S XY ) = V [M 11 (X µ X )(Y µ Y )] =... = [(n 1) 2 (µ 22 µ 2 11 ) + 2(n 1)(µ 20 µ 02 )]/n 3. I. Tunal (Koç University) Week 5 18 October 2018 10 / 29

- 5 Turning to asymptotics, S XY P.! σ XY, p n(sxy σ XY ) D! N(0, µ 22 µ 2 11 ), S XY A N[σXY, (µ 22 µ 2 11 )/n]. The rst two convergence results require use of the Slutsky Theorems. The the third is the asymptotic distribution obtained by applying the limiting distribution to a nite sample. I. Tunal (Koç University) Week 5 18 October 2018 11 / 29

- 6 To prove the rst convergence result, recall that S XY = M 11 (X µ X )(Y µ Y ). Using S2: p lim S XY = p lim M 11 p lim(x µ X )(Y µ Y ) = p lim M 11 p lim(x µ X ).p lim(y µ Y ). Since p lim M11 = σ XY, while p lim(x µ X ) = 0 and p lim(y µ Y ) = 0, the result follows. I. Tunal (Koç University) Week 5 18 October 2018 12 / 29

- 7 To prove the second, we rst rewrite (10.1) as p n(m11 µ 11 ) = p n(m 11 µ 11 ) UW (10.2) where U = 4p n(x µ X ), W = 4p n(y µ Y ). Let s focus on the last term, UW. Since p lim U = 0 and p lim W = 0, by S2 we have p lim UW = 0. So by S3 the limiting distribution of p n(s XY σ XY ) is the same as the limiting distribution of p n(m11 µ 11 ). I. Tunal (Koç University) Week 5 18 October 2018 13 / 29

- 8 Example 5: Pair of sample means As we noted earlier, the joint distribution of a pair of sample means is a bivariate distribution. To proceed, we need the bivariate versions of the convergence results we studied earlier. Before we state these, note that the fundamentals will not change: For a random vector, convergence in probability means that each component of the vector converges in probability. Convergence in distribution means that the sequence of joint cdf s has as its limit some xed joint cdf. I. Tunal (Koç University) Week 5 18 October 2018 14 / 29

Bivariate theorems - 9 BIVARIATE LAW OF LARGE NUMBERS: In random sampling from any bivariate population, the sample mean vector (X, Y ) converges in probability to the population mean vector (µ X, µ Y ). We may write this as: (X, Y ) P! (µ X, µ Y ). BIVARIATE CENTRAL LIMIT THEOREM: In random sampling from any bivariate population, the standardized sample mean vector, (X, Y ) converges in distribution to the SBVN(ρ) distribution, where ρ = σ XY /σ X σ Y. Equivalently, in random sampling from any bivariate population, (X, Y ) A BVN(µ X, µ Y, σ 2 X /n, σ2 Y /n, σ XY /n). I. Tunal (Koç University) Week 5 18 October 2018 15 / 29

Bivariate theorems - 10 Remarks: R1. Although the two theorems on the previous page were stated in the context of Example 5, they can be invoked in deriving the convergence results for any pair of sample moments under random sampling. R2. Slutsky thorems S1-S4 extend to vectors in an obvious way. R3. We need an additional tool (extension of S5) for handling functions of sample means. I. Tunal (Koç University) Week 5 18 October 2018 16 / 29

Bivariate theorems - 11 BIVARIATE DELTA METHOD: Suppose (T 1, T 2 ) A BVN(θ 1, θ 2, φ 2 1 /n, φ2 2 /n, φ 12 /n). Let U = h(t 1, T 2 ) denote a twice di erentiable function at the point (θ 1, θ 2 ). Then where φ 2 = h 2 1 φ2 1 + h2 2 φ2 2 + 2h 1h 2 φ 12, U A BVN[h(θ 1, θ 2 ), φ 2 /n] h 1 h 1 (θ 1, θ 2 ) = h(t 1, T 2 )/ T 1 evaluated at (T 1, T 2 ) = (θ 1, θ 2 ), h 2 h 2 (θ 1, θ 2 ) = h(t 1, T 2 )/ T 2 evaluated at (T 1, T 2 ) = (θ 1, θ 2 ). I. Tunal (Koç University) Week 5 18 October 2018 17 / 29

- 12 Example 6: suppose that µ Y 6= 0. Ratio of sample means Let T = h(x, Y ) = X /Y, and Then h(µ X, µ Y ) = µ X /µ Y, so h(.) is twice di erentiable at the point (µ X, µ Y ). Since X P.! µ X and Y P.! µ Y (by the L.L.N.), h(x, Y ) P.! µ X /µ Y (by S2). That is, T P.! θ = µ X /µ Y. So we know the asymptotic mean of T... To nd the asymptotic variance, we need to apply the bivariate Delta method. I. Tunal (Koç University) Week 5 18 October 2018 18 / 29

- 13 We compute, in turn, and h 1 (X, Y ) = h(x, Y )/ X = 1/Y, so h 1 (µ X, µ Y ) = 1/µ Y, h 2 (X, Y ) = h(x, Y )/ Y = X /Y 2, so h 2 (µ X, µ Y ) = µ X /µ 2 Y = θ/µ Y. From the bivariate C.L.T. (X, Y ) A BVN(µ X, µ Y, σ 2 X /n, σ2 Y /n, σ XY /n ), so T A N(θ, φ 2 /n) where φ 2 = (1/µ Y ) 2 (σ 2 X + θ 2 σ 2 Y 2θσ XY ). (10.3) Remark: Observe that this example also illustrates that the asymptotic mean (variance) and the exact mean (variance) can be di erent: E (T ) = E (X /Y ) 6= E (X )/E (Y ) = µ X /µ Y = θ, V (T ) = V (X /Y ) 6= φ 2 /n. I. Tunal (Koç University) Week 5 18 October 2018 19 / 29

- 14 Example 7: Sample slope In Week 2 we introduced the BLP of Y given X in a bivariate distribution, namely the line BLP(Y jx ) = α + βx, where β = σ XY /σ 2 X, and α = µ Y βµ X. This line is also known as the population linear projection of Y on X in a bivariate population. The sample analog is the sample linear projection of Y on X in a bivariate random sample, namely the line by = A + BX with B = S XY /S 2 X = M 11/M 20, and A = Y BX. Remark: Projection is a linear algebra term. In our bivariate context it describes the best linear approximation to Y that can be formed using X. I. Tunal (Koç University) Week 5 18 October 2018 20 / 29

- 15 Observe that the sample slope B = S XY /SX 2 is a function of two sample statistics, S XY and SX 2. The asymptotic properties of these statistics, examined one by one, were the subjects of examples 1 and 4. We now examine their ratio, following the well-established two-step procedure. I. Tunal (Koç University) Week 5 18 October 2018 21 / 29

- 16 7a: Ideal sample slope Consider B = M11 / M 20 which is obtained by replacing the sample means by their population counterparts. M11 = 1 n i Xi Yi = V, M2 = 1 n 2 i Xi = W, where V = X Y, W = X 2, and X = X µ X, Y = Y µ Y. Inspection reveals that B = M11 / M 20 = V /W = h(v, W ), is the ratio of sample means in a random sample on (V, W ) and the function h(.) has the same form as in example 6. Thus we may anticipate the asymptotic distribution of the ideal sample slope to have the form B A N(θ, φ 2 /n), and focus on the derivation of θ and φ 2 for the case at hand. I. Tunal (Koç University) Week 5 18 October 2018 22 / 29

- 17 Now, by the L.L.N., So V P.! µ V = E (X Y ) = σ XY and W P.! µ W = E (X 2 ) = σ 2 X. h(v, W ) = V /W P.! h(µ V, µ W ) = µ V /µ W = σ XY /σ 2 X = β. Thus θ = β, so we may write where B A N(β, φ 2 /n) φ 2 = (1/µ W ) 2 (σ 2 V + β 2 σ 2 W 2βσ VW ). (10.4) To render (10.4) operational, we need to express the moments of V and W in terms of the moments of X and Y. I. Tunal (Koç University) Week 5 18 October 2018 23 / 29

- 18 We know µ W = σ 2 X = µ 20. It is straightforward to apply the S.M.T. to obtain the expressions for the other variances and the covariances: σ 2 V = V (V ) = E (V 2 ) E 2 (V ) = E (X 2 Y 2 ) E 2 (X Y ) = µ 22 µ 2 11, σ 2 W = V (W ) = E (W 2 ) E 2 (W ) = E (X 4 ) E 2 (X 2 ) = µ 40 µ 2 20, σ VW = C (VW ) = E (VW ) E (V )E (W ) = E (X Y X 2 ) E (X Y )E (X 2 ) = µ 31 µ 11 µ 20. Substitution and rearrangement allows us to express φ 2 as a function of the moments of X and Y. Thus with B A N(β, φ 2 /n), φ 2 = (µ 22 + β 2 µ 40 2βµ 31 )/µ 2 20. (10.5) I. Tunal (Koç University) Week 5 18 October 2018 24 / 29

- 19 7b: Sample slope We now return to B = S XY /SX 2 = M 11/M 20. Based on the work we did P. P. earlier (examples 1 and 3) we know that M 11! µ 11 and M 20! µ 20 so we can apply S2 to get B P.! µ 11 /µ 20 = β. Next, we shift focus to B B β = (B β) + (B B ). β and express it as Clearly p lim(b β) = 0, and from 7a p n(b β) D.! N(0, φ 2 ). What remains to be shown is that the second term vanishes. I. Tunal (Koç University) Week 5 18 October 2018 25 / 29

- 20 The second term is B B = M 11 /M 20 M11 /M 20 = (1/M20 )[(M 11 M11 ) (M 11/M 20 )(M 20 M20 )], so pn(b B ) = (1/M20 )[p n(m 11 M11 ) (M 11/M 20 ) p n(m 20 M20 )]. Now, based on our earlier work, M20 P.! µ 20, (M 11 /M 20 )! P. µ 11 /µ 20, p n(m11 M11 )! P. 0, and p n(m 20 M20 )! P. 0. So by S3, the limiting distribution of p n(b β) is the same as that of p n(b β). We conclude that B A N(β, φ 2 /n), with φ 2 = (µ 22 + β 2 µ 40 2βµ 31 )/µ 2 20. I. Tunal (Koç University) Week 5 18 October 2018 26 / 29

- 21 Variance of the sample slope The sample slope is a good candidate for quantifying the sample relation between two variables. Indeed, it will emerge as a commonly used measure. Thus its sampling variation, captured by φ 2, deserves further scrutiny. Note that the denominator of φ 2 is µ 2 20 = V 2 (X ). The numerator can be written as µ 22 + β 2 µ 40 2βµ 31 = E (X 2 Y 2 ) + β 2 E (X 4 ) 2βE (X 3 Y ) = E [X 2 (Y βx ) 2 = E (X 2 U 2 ), say, where U = Y βx = (Y µ Y ) β(x µ X ) = Y (α + βx ) is the deviation from the population BLP. So φ 2 = E (X 2 U 2 )/V 2 (X ). (10.6) I. Tunal (Koç University) Week 5 18 October 2018 27 / 29

- 22 A special case arises when E (U 2 jx ) does not vary with X. Let σ 2 denote the constant value of E (U 2 jx ). In this case the numerator of (10.6) simpli es as E (X 2 U 2 ) = E X [E (X 2 U 2 jx )] = E X [X 2 E (U 2 jx )] = E [X 2 σ 2 ] = σ 2 E (X 2 ) = σ 2 V (X ). So in this special case, φ 2 = σ 2 /V (X ), and we may conclude that the asymptotic variance of the sample slope will be large when the deviation from the BLP has a large (and constant) variance and/or the marginal variance of X is small. I. Tunal (Koç University) Week 5 18 October 2018 28 / 29

- 23 How can this situation arise, and how can we detect it? Suppose the conditional expectation function E (Y jx ) is linear, and let U = Y E (Y jx ). Then E (U 2 jx ) = V (Y 2 jx ). Thus constant E (U 2 jx ) means the conditional variance function V (Y jx ) is also constant over X. In this case we can say that the asymptotic variation of the sample slope will be large if the conditional variance of Y is large and/or the marginal variance of X is small. These conditions (linear CEF, constant CVF) imply restrictions on magnitues that can be observed in a sample, and can therefore be veri ed (i.e. tested). Remark: We studied one population, the bivariate normal, in which both conditions, linearity of the CEF, and the constancy of the CVF, are satis ed. I. Tunal (Koç University) Week 5 18 October 2018 29 / 29