Lecture 19. Condence Interval December 5, 2011 The phrase condence interval can refer to a random interval, called an interval estimator, that covers the true value θ 0 of a parameter of interest with a prespecied probability. The prespecied probability (usually something 90%, 95% or 99%) is called the coverage probability or the condence level. The phrase could also refer to a realization of that random interval. Theoretical econometricians usually use the rst meaning whereas applied econometricians usually use the second meaning. In this lecture, we do not explicitly distinguish the two. It is worth noting that when the second meaning is used, it does not make sense to say the true value lies in the condence interval with such and such probability. 1 This is because in classical statistics, θ 0 is not considered random, the realization of a random interval is also nonrandom. The probability that a non-random interval covers a nonrandom true value is either zero or one and cannot be anything in between. Thus, it is more conventional to say that the condence interval of condence level 100(1 α)% is [ 0.01, 0.01]. Construct a condence interval To construct a condence interval, one typically start with an estimator and build an interval around it. The idea is that, if the estimator is good then it is likely to be close to the true value. Thus an interval around the estimator is likely to cover the true value. Example 1. Suppose we would like to construct a condence interval for the expection, µ, of X using a random n-sample {X 1,..., X n }. We start with an estimator, say the sample mean X n and construct an interval of the shape: [ X n c n, X n + c n ], for some positive (possibly random) c n. The c n is chosen to make the interval have a pre- 1 For example, it does not make sense to say that θ 0 lies in [ 0.01, 0.01] with 100(1 α)% probability. 1
specied coverage probability either exactly: Pr µ ( X n c n µ X n + c n ) = 1 α; (1) or asymptotically: lim Pr µ( X n c n µ X n + c n ) = 1 α. (2) n Case 1. X N(µ, σ 2 ) with known σ 2. Then, Xn N(µ, σ 2 /n), and hence n( X n µ)/σ N(0, 1). Using this fact, we can determine c n as follows: Pr µ ( X n c n µ X n + c n ) = Pr µ ( X n µ c n ) = Pr µ ( n( X n µ)/σ nc n /σ) = 1 2(1 Φ( nc n /σ)) = 2Φ( nc n /σ) 1. (3) The second to the last step uses the fact that the density of standard normal is symmetric around zero, thus Φ(x) = 1 Φ( x). Set the coverage probability to 1 α, and we have Φ( nc n /σ) = 1 α/2. (4) Use z α to denote the (1 α) quantile of N(0, 1). Then the appropriate c n should be σz α/2 / n. And the condence interval for µ is [ X n σz α/2 / n, X n + σz α/2 / n]. (5) Remark. This is a symmetric condence interval in that it is symmetric around the estimator X n. It is also an equal-tailed condence interval in that the probability that the whole interval falls on one side of µ equals the probability that it falls on the other side of µ: Pr µ ( X n σz α/2 / n µ) = Pr µ ( X n + σz α/2 / n µ) = α/2. (6) The symmetric condence interval and the equal-tailed condence interval coincide in this case because X n (the estimator)'s distribution is symmetric around µ (the true value) Why does one want to restrict the condence interval to be symmetric or equal-tailed? Why not consider condence intervals of the shape, for example, [ X n c n, X n + 10c n ]? It is as easy to nd c n that makes this condence interval have the prespecied coverage probability as above. But the problem is that this interval will be longer than the symmetric one. (c n will be smaller than σz α/2 / n, but 10c n will be much bigger than σz α/2 / n. ) 2
Longer interval estimators are considered less informative typically. There is one special case, though, which is the one-sided condence interval. The one sided condence interval is of the shape (, X n +c n ] or [ X n c n, ). These intervals are of innite length, but they are frequently used when one side of the parameter space (space of µ) is more interesting than the other. For example, when it is of interest to conclude we have 95% condence that µ is above (some number), one can use [ X n c n, ) instead of the two-sided intervals. At the same condence level, this interval gives you the largest lower bound number that you can insert in your conclusion. (Exercise: nd c n for the condence interval [ X n c n, X n + 10c n ] and for the condence interval [ X n c n, ) to have 95% coverage probability.) Case 1. In the above case, we assume that σ is known. If σ is not known, then the condence interval above is not feasible. In order to nd c n, in the second step of (3), we should not divide by σ. Instead, we can divide by the sample variance S X. Then Pr µ ( X n c n µ X n + c n ) = Pr µ ( n X n µ /S X nc n /S X ). (7) The random variable n( X n µ)/s X is not N(0, 1), but it has the student t-distribution, which is shown next. As above we know n( X n µ)/σ N(0, 1). The random variable (n 1)SX 2 /σ2 χ 2 (1) because: (n 1)S 2 X/σ 2 = n (X i X n ) 2 /σ 2 i=1 σ 1 (X 1 µ)... σ 1 (X n µ) / (I n n 1 1 n 1 n) σ 1 (X 1 µ)... σ 1 (X n µ), where I n is a n-dimensional identity matrix, 1 n is a n-vector of ones. The matrix I n n 1 1 n 1 n σ 1 (X 1 µ) is idempotent, and the vector... N(0, I n ). The quadractic form of a σ 1 (X n µ) standard normal vector with an idempotent weight matrix follows a chi-squared distribution 3
of degree equal to the trace of the weight matrix. Thus, (n 1)S 2 X/σ 2 χ 2 (trace(i n n 1 1 n 1 n)) = χ 2 (trace(i n ) n 1 trace(1 n 1 n)) = χ 2 (trace(i n ) n 1 trace(1 n1 n )) = χ 2 (n 1). (8) It can also be shown that n( X n µ)/σ is independent of (n 1)S 2 X /σ2. Therefore, n( Xn µ)/s X = = n( Xn µ)/σ ((n 1)S 2 X /σ 2 )/n 1 Z W/n 1, where Z N(0, 1) and W χ 2 (n 1) and Z is independent of W. By direct calculation Z of the density of, one can show that it is the density for the student t distribution W/n 1 with degree of freedom n 1: t(n 1). Thus, setting (7) to 1 α gives: 1 α = Pr( t(n 1) nc n /S X ) = 2F t(n 1) ( nc n /S X ) 1. Thus, c n = t α/2,n 1 S X / n, where t α/2,n 1 is the 1 α/2 quantile of t(n 1). The condence interval for µ is [ X n t α/2,n 1 S X / n, X n + t α/2,n 1 S X / n]. Case 2. The condence intervals in the above two cases are exact condence intervals, i.e. their coverage probabilities equals the prespecied condence level 1 α (called nominal condence level) exactly for any sample size. This is possible only because the estimator (and its sample standard deviation)'s distribution is known and tractable, which is so because we assume the X has normal distribution. Without the normality assumption, the distributions of Xn and S X are either unknown or in principal known but very untractable. Then, we will need to use condence intervals that do not have exact coverage probability (1), but have asymptotic coverage probability (2). We have shown that (suppose E[X 2 ] < and V ar(x) > 0) n( Xn µ) d N(0, σ 2 ) and S X p σ. By the continuous mappint theorem: n( X n µ)/s X d N(0, 1). If we set nc n /S X to a 4
xed number c, then lim Pr µ( X n c n µ X n + c n ) = lim Pr µ ( n( X n µ)/s X c) n n =2Φ(c) 1. Setting it to 1 α gives c = z α/2. Therefore, we can set c n = z α/2 S X / n and the random interval [ X n c n, X n + c n ] has coverage probability 1 α approximately in large samples. Example 2. The large sample condence interval can be constructed for any θ that has a n-consistent and asymptotically normal estimator ˆθn in the same way as Case 3 above. Suppose that n(ˆθ n θ) d N(0, σ 2 ), σ 2 > 0 and there is a consistent estimator ˆσ n for σ. Then, a two-sided symmetric condence interval for θ of nominal condence level 1 α can be [ˆθ n z α/2ˆσ n / n, ˆθ n + z α/2ˆσ n / n]. (Exercise, nd the one-sided condence interval of nominal condence level 1 α for θ and show that it asymptotically has coverage probability 1 α.) Example 3 (Condence interval for Variance). Condence intervals do not have to be of the shape as above. Suppose X N(µ, σ 2 ) with unknown σ 2. Find a condence interval for σ 2. Suppose that the interval is [c 1n, c 2n ]. Then we want it to satisfy: Pr(c 1n σ 2 c 2n ) = 1 α. In Case 2 above, we have found that (n 1)S 2 X /σ2 χ 2 (n 1). This suggests that we can do the following manipulation to the above equation: 1 α = Pr(c 1n σ 2 c 2n ) = Pr(c 1n /((n 1)SX) 2 σ 2 /((n 1)SX) 2 c 2n /((n 1)SX)) 2 = Pr((n 1)SX/c 2 2n (n 1)SX/σ 2 2 (n 1)SX/c 2 1n ) = F χ 2 (n 1)((n 1)SX/c 2 1n ) F χ 2 (n 1)((n 1)SX/c 2 2n ). Because F χ 2 (n 1)(), SX 2 and n are known, we can choose c 1n and c 2n to make the last expression 1 α. Clearly, there are many pairs (c 1n, c 2n ) that can satisfy the coverage probability requirement. Which ones to use depends on what one is interested in. If one has no preference, a good choice is the equal-tailed condence interval, which makes Pr(c 2n σ 2 ) = α/2 = Pr(c 1n > σ 2 ) 5
or equivalently: 1 F χ 2 (n 1)((n 1)SX/c 2 1n ) = F χ 2 (n 1)((n 1)SX/c 2 2n ) = α/2, or (n 1)SX 2 /c 1n = χ 2 α/2,n 1 and (n 1)S2 X /c 2n = χ 2 1 α/2,n 1, where χ2 α,n 1 is the 1 α quantile of χ 2 (n 1). Consequently, c 1n = (n 1)SX 2 /χ2 α/2,n 1 and c 2n = (n 1)SX 2 /χ2 1 α/2,n 1. (Think: how to construct (large sample) condence intervals for σ 2 without the normality assumption on X? Hint: we have derived the asymptotic distribution of SX 2 in a previous lecture.) 6