Sampling Distributions and Asymptotics Part II

Sampling Distributions and Asymptotics Part II Insan TUNALI Econ 511 - Econometrics I Koç University 18 October 2018 I. Tunal (Koç University) Week 5 18 October 2018 1 / 29

Lecture Outline v Sampling Distributions and Asymptotics Part II Bivariate sampling distributions We closely follow Goldberger Ch.10. Two digit equation numbers (such as 10.1, 10.2...) and result references (S1, S2...) in Goldberger have been preserved. See the syllabus for references to secondary sources (Greene). I. Tunal (Koç University) Week 5 18 October 2018 2 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 1 The primary objective of Econometrics is to uncover relations between variables. This requires shifting the focus from univariate to multivariate sampling distributions. As usual there is much to be gained from studying the bivariate case. Consider a bivariate population for which the pmf/pdf of the pair (X, Y ) is f (x, y). The rst and second moments include E (X ) = µ X, E (Y ) = µ Y, V (X ) = σ 2 X, V (Y ) = σ2 Y, C (X, Y ) = σ XY. For nonnegative integers (r, s), the raw and central moments are E (X r Y s ) = µ, rs, E (X r Y s ) = µ rs, where X = X µ X, Y = Y µ Y. For example, we may write µ 20 = σ 2 X, µ 02 = σ2 Y, µ 11 = σ XY. I. Tunal (Koç University) Week 5 18 October 2018 3 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 2 A random sample of size n from the population f (x, y) consists of n independent draws on the pair (X, Y ). Thus {(X i, Y i ), i = 1, 2,..., n} are independently and identically distributed. Independence applies across observations, not within each observation. In general C (X, Y ) = σ XY 6= 0. Armed with this knowledge, the joint pmf/pdf for the random sample can be derived as g n (x 1, y 1, x 2, y 2,..., x n, y n ) = n i=1 f (x i, y i ). I. Tunal (Koç University) Week 5 18 October 2018 4 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 3 Sample statistics that emerge under random sampling from a bivariate pdf/pmf include single-variable statistics such as X, SX 2, Y, S Y 2, but also joint statistics that involve both components of the random vector (X, Y ). One important example is the sample covariance: S XY = 1 n i (X i X )(Y i Y ). We may also be concerned with the joint distribution of several statistics, such as two sample means: X = 1 n i X i, Y = 1 n i Y i. Obviously (X, Y ) will have a joint (bivariate) sampling distribution, by virtue of (X i, Y i ) f (x, y). I. Tunal (Koç University) Week 5 18 October 2018 5 / 29

Bivariate sampling distributions Bivariate set-up Bivariate sampling distributions - 4 What can we say about the sampling distribution of (X, Y )? From the SMT, we know the means and variances of X and Y. We can compute their covariance by extending T5 to sums of n random variables, and invoking the i.i.d. property of the pairs: C (X, Y ) = 1 n 2 C i! X i, Y i i = 1 n 2 C (X i, Y i ) = i = σ XY /n. 1 n 2 nσ XY In deriving the asymptotic results we will exploit the S.M.T., the L.L.N. and the C.L.T., along with the Slutsky theorems (and their extensions). I. Tunal (Koç University) Week 5 18 October 2018 6 / 29

- 1 Bivariate derivations - insights from the univariate case The derivations we examined under the univariate case are instructive for the bivariate case. For example, consider the theory for the sampling distribution of the sample covariance: S XY = 1 n i (X i X )(Y i Y ). This is a single-variable statistic, so we should be able to apply what we learned directly. S XY is an average, but since it is a function of sample means, the terms in the summation are not independently distributed. We need to follow the steps we used in the case of S 2 : work with the "ideal" covariance rst, then turn turn to S XY itself. The theory for the sampling distribution of a pair of sample means requires more work: To examine how the sampling distribution of (X, Y ) behaves as n!, we need bivariate versions of the convergence results we studied earlier. Remark: Technically S XY and (X, Y ) are sequences. In the interest of brevity, we drop the subscript "n" that identi es a sequence. I. Tunal (Koç University) Week 5 18 October 2018 7 / 29

- 2 Example 4a: Ideal Sample Covariance This is the sample second joint moment about the population mean: M 11 = 1 n i (X i µ X )(Y i µ Y ) = 1 n i V i = V, where V i = X i Y i. Now {V i, i = 1, 2,..., n} is a random sample on the variable V = X Y, and M11 = V is a sample mean in that random sample. Hence the earlier theory about the sample mean applies directly. I. Tunal (Koç University) Week 5 18 October 2018 8 / 29

- 3 The population mean and variance are E (V ) = E (X Y ) = C (X, Y ) = σ XY = µ 11, V (V ) = E (V 2 ) E 2 (V ) = E (X 2 Y 2 ) E 2 (X Y ) = µ 22 µ 2 11. From the S.M.T. we have the exact results E (M 11 ) = E (V ) = E (V ) = µ 11 = σ XY, V (M 11 ) = V (V ) = V (V )/n = (µ 22 µ 2 11 )/n. From the L.L.N. and the C.L.T. we also have the asymptotic results M11 P.! µ 11, p n(m 11 µ 11 )! D N(0, µ 22 µ 2 11 ), M 11 A N[µ 11, (µ 22 µ 2 11 )/n]. I. Tunal (Koç University) Week 5 18 October 2018 9 / 29

- 4 Example 4b: Sample Covariance The statistic of interest is the sample second joint moment about the sample mean: S XY = M 11 = 1 n i (X i X )(Y i Y ). We may use algebra to write this as M 11 = M 11 (X µ X )(Y µ Y ). (10.1) Direct calculation yields the exact results and E (S XY ) = E [M 11 (X µ X )(Y µ Y )] = σ XY C (X, Y ) = σ XY σ XY /n = (1 1 n )σ XY, V (S XY ) = V [M 11 (X µ X )(Y µ Y )] =... = [(n 1) 2 (µ 22 µ 2 11 ) + 2(n 1)(µ 20 µ 02 )]/n 3. I. Tunal (Koç University) Week 5 18 October 2018 10 / 29

- 5 Turning to asymptotics, S XY P.! σ XY, p n(sxy σ XY ) D! N(0, µ 22 µ 2 11 ), S XY A N[σXY, (µ 22 µ 2 11 )/n]. The rst two convergence results require use of the Slutsky Theorems. The the third is the asymptotic distribution obtained by applying the limiting distribution to a nite sample. I. Tunal (Koç University) Week 5 18 October 2018 11 / 29

- 6 To prove the rst convergence result, recall that S XY = M 11 (X µ X )(Y µ Y ). Using S2: p lim S XY = p lim M 11 p lim(x µ X )(Y µ Y ) = p lim M 11 p lim(x µ X ).p lim(y µ Y ). Since p lim M11 = σ XY, while p lim(x µ X ) = 0 and p lim(y µ Y ) = 0, the result follows. I. Tunal (Koç University) Week 5 18 October 2018 12 / 29

- 7 To prove the second, we rst rewrite (10.1) as p n(m11 µ 11 ) = p n(m 11 µ 11 ) UW (10.2) where U = 4p n(x µ X ), W = 4p n(y µ Y ). Let s focus on the last term, UW. Since p lim U = 0 and p lim W = 0, by S2 we have p lim UW = 0. So by S3 the limiting distribution of p n(s XY σ XY ) is the same as the limiting distribution of p n(m11 µ 11 ). I. Tunal (Koç University) Week 5 18 October 2018 13 / 29

- 8 Example 5: Pair of sample means As we noted earlier, the joint distribution of a pair of sample means is a bivariate distribution. To proceed, we need the bivariate versions of the convergence results we studied earlier. Before we state these, note that the fundamentals will not change: For a random vector, convergence in probability means that each component of the vector converges in probability. Convergence in distribution means that the sequence of joint cdf s has as its limit some xed joint cdf. I. Tunal (Koç University) Week 5 18 October 2018 14 / 29

Bivariate theorems - 9 BIVARIATE LAW OF LARGE NUMBERS: In random sampling from any bivariate population, the sample mean vector (X, Y ) converges in probability to the population mean vector (µ X, µ Y ). We may write this as: (X, Y ) P! (µ X, µ Y ). BIVARIATE CENTRAL LIMIT THEOREM: In random sampling from any bivariate population, the standardized sample mean vector, (X, Y ) converges in distribution to the SBVN(ρ) distribution, where ρ = σ XY /σ X σ Y. Equivalently, in random sampling from any bivariate population, (X, Y ) A BVN(µ X, µ Y, σ 2 X /n, σ2 Y /n, σ XY /n). I. Tunal (Koç University) Week 5 18 October 2018 15 / 29

Bivariate theorems - 10 Remarks: R1. Although the two theorems on the previous page were stated in the context of Example 5, they can be invoked in deriving the convergence results for any pair of sample moments under random sampling. R2. Slutsky thorems S1-S4 extend to vectors in an obvious way. R3. We need an additional tool (extension of S5) for handling functions of sample means. I. Tunal (Koç University) Week 5 18 October 2018 16 / 29

Bivariate theorems - 11 BIVARIATE DELTA METHOD: Suppose (T 1, T 2 ) A BVN(θ 1, θ 2, φ 2 1 /n, φ2 2 /n, φ 12 /n). Let U = h(t 1, T 2 ) denote a twice di erentiable function at the point (θ 1, θ 2 ). Then where φ 2 = h 2 1 φ2 1 + h2 2 φ2 2 + 2h 1h 2 φ 12, U A BVN[h(θ 1, θ 2 ), φ 2 /n] h 1 h 1 (θ 1, θ 2 ) = h(t 1, T 2 )/ T 1 evaluated at (T 1, T 2 ) = (θ 1, θ 2 ), h 2 h 2 (θ 1, θ 2 ) = h(t 1, T 2 )/ T 2 evaluated at (T 1, T 2 ) = (θ 1, θ 2 ). I. Tunal (Koç University) Week 5 18 October 2018 17 / 29

- 12 Example 6: suppose that µ Y 6= 0. Ratio of sample means Let T = h(x, Y ) = X /Y, and Then h(µ X, µ Y ) = µ X /µ Y, so h(.) is twice di erentiable at the point (µ X, µ Y ). Since X P.! µ X and Y P.! µ Y (by the L.L.N.), h(x, Y ) P.! µ X /µ Y (by S2). That is, T P.! θ = µ X /µ Y. So we know the asymptotic mean of T... To nd the asymptotic variance, we need to apply the bivariate Delta method. I. Tunal (Koç University) Week 5 18 October 2018 18 / 29

- 13 We compute, in turn, and h 1 (X, Y ) = h(x, Y )/ X = 1/Y, so h 1 (µ X, µ Y ) = 1/µ Y, h 2 (X, Y ) = h(x, Y )/ Y = X /Y 2, so h 2 (µ X, µ Y ) = µ X /µ 2 Y = θ/µ Y. From the bivariate C.L.T. (X, Y ) A BVN(µ X, µ Y, σ 2 X /n, σ2 Y /n, σ XY /n ), so T A N(θ, φ 2 /n) where φ 2 = (1/µ Y ) 2 (σ 2 X + θ 2 σ 2 Y 2θσ XY ). (10.3) Remark: Observe that this example also illustrates that the asymptotic mean (variance) and the exact mean (variance) can be di erent: E (T ) = E (X /Y ) 6= E (X )/E (Y ) = µ X /µ Y = θ, V (T ) = V (X /Y ) 6= φ 2 /n. I. Tunal (Koç University) Week 5 18 October 2018 19 / 29

- 14 Example 7: Sample slope In Week 2 we introduced the BLP of Y given X in a bivariate distribution, namely the line BLP(Y jx ) = α + βx, where β = σ XY /σ 2 X, and α = µ Y βµ X. This line is also known as the population linear projection of Y on X in a bivariate population. The sample analog is the sample linear projection of Y on X in a bivariate random sample, namely the line by = A + BX with B = S XY /S 2 X = M 11/M 20, and A = Y BX. Remark: Projection is a linear algebra term. In our bivariate context it describes the best linear approximation to Y that can be formed using X. I. Tunal (Koç University) Week 5 18 October 2018 20 / 29

- 15 Observe that the sample slope B = S XY /SX 2 is a function of two sample statistics, S XY and SX 2. The asymptotic properties of these statistics, examined one by one, were the subjects of examples 1 and 4. We now examine their ratio, following the well-established two-step procedure. I. Tunal (Koç University) Week 5 18 October 2018 21 / 29

- 16 7a: Ideal sample slope Consider B = M11 / M 20 which is obtained by replacing the sample means by their population counterparts. M11 = 1 n i Xi Yi = V, M2 = 1 n 2 i Xi = W, where V = X Y, W = X 2, and X = X µ X, Y = Y µ Y. Inspection reveals that B = M11 / M 20 = V /W = h(v, W ), is the ratio of sample means in a random sample on (V, W ) and the function h(.) has the same form as in example 6. Thus we may anticipate the asymptotic distribution of the ideal sample slope to have the form B A N(θ, φ 2 /n), and focus on the derivation of θ and φ 2 for the case at hand. I. Tunal (Koç University) Week 5 18 October 2018 22 / 29

- 17 Now, by the L.L.N., So V P.! µ V = E (X Y ) = σ XY and W P.! µ W = E (X 2 ) = σ 2 X. h(v, W ) = V /W P.! h(µ V, µ W ) = µ V /µ W = σ XY /σ 2 X = β. Thus θ = β, so we may write where B A N(β, φ 2 /n) φ 2 = (1/µ W ) 2 (σ 2 V + β 2 σ 2 W 2βσ VW ). (10.4) To render (10.4) operational, we need to express the moments of V and W in terms of the moments of X and Y. I. Tunal (Koç University) Week 5 18 October 2018 23 / 29

- 18 We know µ W = σ 2 X = µ 20. It is straightforward to apply the S.M.T. to obtain the expressions for the other variances and the covariances: σ 2 V = V (V ) = E (V 2 ) E 2 (V ) = E (X 2 Y 2 ) E 2 (X Y ) = µ 22 µ 2 11, σ 2 W = V (W ) = E (W 2 ) E 2 (W ) = E (X 4 ) E 2 (X 2 ) = µ 40 µ 2 20, σ VW = C (VW ) = E (VW ) E (V )E (W ) = E (X Y X 2 ) E (X Y )E (X 2 ) = µ 31 µ 11 µ 20. Substitution and rearrangement allows us to express φ 2 as a function of the moments of X and Y. Thus with B A N(β, φ 2 /n), φ 2 = (µ 22 + β 2 µ 40 2βµ 31 )/µ 2 20. (10.5) I. Tunal (Koç University) Week 5 18 October 2018 24 / 29

- 19 7b: Sample slope We now return to B = S XY /SX 2 = M 11/M 20. Based on the work we did P. P. earlier (examples 1 and 3) we know that M 11! µ 11 and M 20! µ 20 so we can apply S2 to get B P.! µ 11 /µ 20 = β. Next, we shift focus to B B β = (B β) + (B B ). β and express it as Clearly p lim(b β) = 0, and from 7a p n(b β) D.! N(0, φ 2 ). What remains to be shown is that the second term vanishes. I. Tunal (Koç University) Week 5 18 October 2018 25 / 29

- 20 The second term is B B = M 11 /M 20 M11 /M 20 = (1/M20 )[(M 11 M11 ) (M 11/M 20 )(M 20 M20 )], so pn(b B ) = (1/M20 )[p n(m 11 M11 ) (M 11/M 20 ) p n(m 20 M20 )]. Now, based on our earlier work, M20 P.! µ 20, (M 11 /M 20 )! P. µ 11 /µ 20, p n(m11 M11 )! P. 0, and p n(m 20 M20 )! P. 0. So by S3, the limiting distribution of p n(b β) is the same as that of p n(b β). We conclude that B A N(β, φ 2 /n), with φ 2 = (µ 22 + β 2 µ 40 2βµ 31 )/µ 2 20. I. Tunal (Koç University) Week 5 18 October 2018 26 / 29

- 21 Variance of the sample slope The sample slope is a good candidate for quantifying the sample relation between two variables. Indeed, it will emerge as a commonly used measure. Thus its sampling variation, captured by φ 2, deserves further scrutiny. Note that the denominator of φ 2 is µ 2 20 = V 2 (X ). The numerator can be written as µ 22 + β 2 µ 40 2βµ 31 = E (X 2 Y 2 ) + β 2 E (X 4 ) 2βE (X 3 Y ) = E [X 2 (Y βx ) 2 = E (X 2 U 2 ), say, where U = Y βx = (Y µ Y ) β(x µ X ) = Y (α + βx ) is the deviation from the population BLP. So φ 2 = E (X 2 U 2 )/V 2 (X ). (10.6) I. Tunal (Koç University) Week 5 18 October 2018 27 / 29

- 22 A special case arises when E (U 2 jx ) does not vary with X. Let σ 2 denote the constant value of E (U 2 jx ). In this case the numerator of (10.6) simpli es as E (X 2 U 2 ) = E X [E (X 2 U 2 jx )] = E X [X 2 E (U 2 jx )] = E [X 2 σ 2 ] = σ 2 E (X 2 ) = σ 2 V (X ). So in this special case, φ 2 = σ 2 /V (X ), and we may conclude that the asymptotic variance of the sample slope will be large when the deviation from the BLP has a large (and constant) variance and/or the marginal variance of X is small. I. Tunal (Koç University) Week 5 18 October 2018 28 / 29

- 23 How can this situation arise, and how can we detect it? Suppose the conditional expectation function E (Y jx ) is linear, and let U = Y E (Y jx ). Then E (U 2 jx ) = V (Y 2 jx ). Thus constant E (U 2 jx ) means the conditional variance function V (Y jx ) is also constant over X. In this case we can say that the asymptotic variation of the sample slope will be large if the conditional variance of Y is large and/or the marginal variance of X is small. These conditions (linear CEF, constant CVF) imply restrictions on magnitues that can be observed in a sample, and can therefore be veri ed (i.e. tested). Remark: We studied one population, the bivariate normal, in which both conditions, linearity of the CEF, and the constancy of the CVF, are satis ed. I. Tunal (Koç University) Week 5 18 October 2018 29 / 29