Point and Interval Estimation II Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2006-09-13 17:17 BIOS 662 1 Point and Interval Estimation II
Nonparametric CI for the Median Suppose X 1,..., X n iid according to continuous distribution F Let ζ 1/2 be the population median We will show Pr[X (r) < ζ 1/2 < X (n r+1) ] = 1 2 n n r i=r ( ) n i Therefore, for fixed n, we choose largest r such that n r 1 ( ) n 2 n 1 α i i=r BIOS 662 2 Point and Interval Estimation II
Let Y be a Bernoulli r.v. Bernoulli RV Y can take on two values, 0 or 1 Pr[Y = 1] = π; Pr[Y = 0] = 1 π E(Y ) = π; V ar(y ) = π(1 π) BIOS 662 3 Point and Interval Estimation II
Binomial RV Process that produces independent Bernoulli RVs with the same probability of success π Let Y count the number of successes in n trials Y Binomial(n,π) Pr[Y = y] = ( ) n π y (1 π) n y y E(Y ) = nπ; V ar(y) = nπ(1 π) BIOS 662 4 Point and Interval Estimation II
Derivation of CI for Median CDF Pr[X i x] = F (x) Therefore Pr[X (r) x] = Pr[at least r of the X i x] = n i=r ( ni ) F (x) i {1 F (x)} n i BIOS 662 5 Point and Interval Estimation II
Derivation of CI for Median By law of total probability Pr[X (r) ζ p ] = Pr[X (r) ζ p, X (s) ζ p ] + Pr[X (r) ζ p, X (s) < ζ p ] If s > r, then X (s) < ζ p X (r) < ζ p Therefore Pr[X (r) ζ p ] = Pr[X (r) ζ p X (s) ] + Pr[X (s) < ζ p ] BIOS 662 6 Point and Interval Estimation II
Derivation of CI for Median Pr[X (r) ζ p X (s) ] = Pr[X (r) ζ p ] Pr[X (s) < ζ p ] = n i=r ( ni ) F (ζp ) i {1 F (ζ p )} n i n i=s ( ni ) F (ζp ) i {1 F (ζ p )} n i = s 1 i=r If p = 1/2; F (ζ p ) = 1/2, such that Pr[X (r) ζ 0.5 X (s) ] = 1 2 n ( ni ) F (ζp ) i {1 F (ζ p )} n i s 1 i=r ( ) n i BIOS 662 7 Point and Interval Estimation II
Derivation of CI for Median We could choose any r and s such that Pr(X (r) ζ 0.5 X (s) ) = 1 2 n s 1 i=r ( n i But the best choice for s is n r + 1 (why?) Thus we choose r such that n r 1 ( ) n 2 n i i=r 1 α ) 1 α BIOS 662 8 Point and Interval Estimation II
Derivation of CI for Median Values of r for 95% CI for Median n r 1-5 0 6-8 1 9-11 2 12-14 3 15-16 4 17-19 5 20-22 6 23-24 7 25-27 8 28-29 9 30-32 10 33-34 11 Cf page 269-270 van Belle et al. BIOS 662 9 Point and Interval Estimation II
95% CI for Betacarotene Example For n = 23, choose r = 7 such that n r + 1 = 17 Therefore (y (7) = 106, y (17) = 186) gives a 95% CI for the median betacarotene value This CI makes no assumptions about the distribution of the Y s Note: 1 2 23 23 7 i=7 > sum(dbinom(7:16,23,1/2)) [1] 0.9653103 ( ) 23 i = 0.9653 1 α BIOS 662 10 Point and Interval Estimation II
SAS Code and Output proc univariate data=beta cipctldf; var base1; run; 95% Confidence Limits -------Order Statistics------- Quantile Distribution Free LCL Rank UCL Rank Coverage 99%..... 95% 212 298 21 23 58.75 90% 202 298 19 23 83.83 75% Q3 162 252 13 22 97.35 50% Median 106 186 7 17 96.53 25% Q1 74 124 2 11 97.35 10% 68 92 1 5 83.83 5% 68 80 1 3 58.75 1%..... 0% Min BIOS 662 11 Point and Interval Estimation II
Large sample CI for median If n is sufficiently large, say n > 25, we can get an approximate 100(1 α)% CI for the median by counting n z 1 α/2 2 ordered observations to the left and right of the median and rounding out to the next integer Cf Lehmann (1998, p.84) BIOS 662 12 Point and Interval Estimation II
Large Sample CI for Median: Example Suppose n = 100 and α = 0.05 Then n z 1 α/2 2 Rounding yields: = 5(1.96) = 9.8 50.5 ± 9.8 (y (40), y (61) ) Can show r = 40 using exact method > sum(dbinom(40:60,100,1/2)) [1] 0.9647998 > sum(dbinom(41:59,100,1/2)) [1] 0.943112 BIOS 662 13 Point and Interval Estimation II
If general, Large sample CI for any quantile Pr[ζ p < Z (r) ] = r 1 where q = 1 p i=0 = r 1 i=0 ( ni ) F (ζp ) i {1 F (ζ p )} n i ( ni ) p i q n i From CLT, if Y Bin(n, p), then Y np + 1/2 npq N(0, 1) Thus Pr[ζ p Z (r) ] = Pr[Y r 1] = Pr[Z (r 1) np+1/2 npq ] = Φ( r np 1/2 npq ) BIOS 662 14 Point and Interval Estimation II
Large sample CI for any quantile Goal is symmetric (1 α)% CI, so want r np 1/2 α/2 = Pr[ζ p < Z (r) ] = Φ( ) npq That is z 1 α/2 = r np 1/2 npq Implying r = np + 1 2 z 1 α/2 npq For p = 1/2, yields r = n + 1 2 z 1 α/2 n 2 BIOS 662 15 Point and Interval Estimation II
Large sample CI for any quantile Similar reasoning yields s = np + 1 2 + z 1 α/2 npq Thus (1 α)% CI for ζ p is given by (X ( r ), X ( s ) ) Note n large enough ensures r, s {1,..., n} BIOS 662 16 Point and Interval Estimation II
References for Order Statistics A. E. Sarhan and B. G. Greenberg, Contributions to Order Statistics, 1962. H. A. David, Order Statistics D. B. Owen, Handbook of Statistical Tables E. L. Lehmann Nonparametrics: Statistical Methods Based on Ranks, 1998. BIOS 662 17 Point and Interval Estimation II
CI for Variance Recall (result 4.4 p.95 text) Therefore (n 1)s 2 σ 2 1 α = Pr[χ 2 α/2,n 1 Implying 1 α = Pr (n 1)s2 χ 2 1 α/2,n 1 χ 2 n 1 (n 1)s2 σ 2 χ 2 1 α/2,n 1 ] σ 2 (n 1)s2 χ 2 α/2,n 1 BIOS 662 18 Point and Interval Estimation II
CI for Variance Since the χ 2 distribution is not symmetric, need to look up both χ 2 α/2,n 1 and χ2 1 α/2,n 1 This CI is dependent on the Y s being from a normal distribution BIOS 662 19 Point and Interval Estimation II
CI for Variance for Betacarotene Example n = 23; s 2 = 3701.36 χ 2.025,22 = 10.98; χ2.975,22 = 36.78 Therefore, 95% CI for σ 2 (22(3701.36)/36.78, 22(3701.36)/10.98) = (2213.973, 7416.203) 95% CI for σ = (47.05, 86.12) BIOS 662 20 Point and Interval Estimation II
SAS Code and Output proc univariate data=beta cibasic; var base1; run; Basic Confidence Limits Assuming Normality Parameter Estimate 95% Confidence Limits Mean 150.78261 124.47394 177.09128 Std Deviation 60.83880 47.05242 86.10828 Variance 3701 2214 7415 BIOS 662 21 Point and Interval Estimation II
CI for Variance - Nonnormal data Large sample theory n(s 2 n σ 2 ) d N(0, (α 4 1)σ 4 ) where α 4 = E(X µ) 4 /σ 4 is the kurtosis (cf. Dudewicz and Mishra Modern Mathematical Statistics, p. 325) Crude approximation : replace usual CI with (n 1)s 2 χ 2 1 α/2,n 1 (1 + g 2/n), (n 1)s 2 χ 2 α/2,n 1 (1 + g 2/n) where g 2 = b 2 3 and b 2 is an estimate of α 4 (cf Solomon and Stephens, Encyc of Stat Sci) BIOS 662 22 Point and Interval Estimation II
CI for Variance - Nonnormal data Nonparametric approach such as bootstrap (cf Efron and Tibshirani An Introduction to the Bootstrap, Ch 14) Software? BIOS 662 23 Point and Interval Estimation II