Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey

Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval ad problems with it The coservative iterval Examples

Pivots ad Cofidece Itervals For a give statistical model, a pivot is a fuctio of the data ad the parameters which always has the same distributio (whatever the values of the parameters). For the Z-test model (data modelled as radom sample size from a populatio with ukow mea µ kow variace σ 2 ad populatio ormal ad/or sample size large) the with X the sample average, the pivot X µ σ/ N(0, 1) (whatever µ is). For the (oe-sample) t-test model (radom sample size from ormal populatio with ukow mea µ ad variace), with X ad S the sample average ad sd (resp.), the pivot X µ S/ t 1 (whatever µ is).

For the two-sample t-test model (two radom samples, sizes x, y from ormal populatios with ukow meas µ x, µ y ad ukow but equal variaces, the with X, Ȳ the sample averages, S X, S Y the sample sds, ad S p = ( x 1)S 2 X +(y 1)S2 Y x + y 2 X Ȳ (µ x µ y ) S p q 1 x + 1 y the pooled sample sd, the pivot t x + y 2 (whatever the values of µ x, µ y ). I all cases the pivot is of the form EST PARAM SE where the umerator is the differece betwee a parameter ad a estimate of it, the so-called estimatio error ad SE is the sd (or a estimate thereof) of the estimatio error i the umerator, regarded as a radom variable.

Suppose we ca fid a c so that for the pivot i questio, ( ) EST PARAM P c c = 0.95. SE The we ca say that the radom iterval of the form EST ± c SE cotais PARAM with probability 0.95, that is P (EST c SE PARAM EST + c SE) = 0.95. The observed value of this radom iterval is called a 95% cofidece iterval. Differet cofidece levels (e.g. 90%, 99%) ca be obtaied by choosig c differetly, such that the right-had side above is 0.90, 0.99, etc.

Thus for the Z-test model, if x is the observed value of X, we eed upper percetage poits from N(0, 1) (available o the bottom row of a t-table): For a 100(1 α)% cofidece iterval we eed c such that P(Z c) = α/2, sice the we also have P(Z c) = α/2 ad so P( c Z c) = 1 P(Z c) P(Z c) = 1 α/2 α/2 = 1 α. 95% cofidece iterval is x ± 1.96 σ/ 90% cofidece iterval is x ± 1.645 σ/ 99% cofidece iterval is x ± 2.576 σ/

Suppose we have a (oe-sample) t-test model with 16 observatios, ad that x ad s are the observed values of the sample average X ad sd S. The pivot ( X µ)/(s/ 16) t 15 so we cosult that row of the t-table: For 95% (corresp. to α = 0.05), sice P(t 15 > 2.131) = 0.025 (i.e. α/2), we use x ± 2.131(s/4). For 90% (corresp. to α = 0.1), sice P(t 15 > 1.753) = 0.05 (i.e. α/2), we use x ± 1.753(s/4). For 99% (corresp. to α = 0.01), sice P(t 15 > 2.947) = 0.005 (i.e. α/2), we use x ± 2.947(s/4).

Suppose we have a two-sample t-test model with sample sizes 11 ad 15, let x ad ȳ deote the observed sample averages, s x ad s y the observed sample sds ad s p = (10s 2 x + 14s 2 y ) /24 the observed pooled sample sd. The pivot here has a t 24 distributio: For 95% cofidece (corresp. to α = 0.05), the multiplier we eed is 2.064 (sice P(t 24 > 2.064) = 0.025, i.e. α/2). A 95% cofidece iterval for the populatio mea differece µ x µ y is therefore give by ( x ȳ) ± 2.064 (s p 1 11 + 1 15 For 90% cofidece (α = 0.1), sice P(t 24 > 1.711) = 0.05 (i.e. α/2), use 1.711. For 99% cofidece (α = 0.01), sice P(t 24 > 2.797) = 0.005 (i.e. α/2), use 2.797. ).

Iterpretatio May have difficulty i properly iterpretig a cofidece iterval. The cofidece level is a property of the procedure you have used. It says how ofte it covers the target i the log-ru. This is thus a property oly realised after may repetitios. If we just compute a sigle cofidece iterval i practice, the we may or may ot have covered the target. We do t kow, ad we possibly ever will kow exactly. However we kow that if we repeated this procedure may times, i 95% (or whatever the cofidece level is) of the time the cofidece iterval would iclude the ukow parameter value.

Coservative ad approximate cofidece itervals Our ability to costruct exact cofidece itervals for the 3 models cosidered depeded crucially o the fact that we had a pivot with a kow distributio of the form (est-param)/se. I that case we ca defie a radom iterval with the property that P(iterval icludes parameter) = 0.95 I some models the form of (approximate) pivots ad/or se s make are ot so coveiet.

Fallback optios are: a coservative 95% cofidece iterval obtaied by defiig a radom iterval such that P(iterval icludes parameter) 0.95, so that the iterval is possibly wider tha it really eeds to be, but at least still has at least the omial coverage probability; a approximate 95% cofidece iterval whereby P(iterval icludes parameter) 0.95. Such itervals should be used with cautio. Strictly speakig, itervals i the Z-test model where we are usig a Cetral-Limit-Theorem-approximately-ormal argumet are of this type, although i those cases the approximatio is ofte quite accurate. We shall examie such thigs i oe particular example: cofidece itervals for a biomial p-parameter.

Cofidece itervals for a biomial p Suppose we model a cout X as a B(, p) for some kow but ukow p. Example: i a cliical trial, of 100 patiets sufferig from a certai coditio, 68 obtai relief. Modellig this cout as a B(100, p) radom variable, provide a 95% cofidece iterval for p. The estimate is just ˆp = 68 100 = 0.68, the observed proportio obtaiig relief. A first guess would be to work out the stadard error of the estimate ad the, sice X, ad thus ˆp are approximately ormal, use ˆp ± 1.96SE(ˆp). What is the stadard error of the estimate ˆp i geeral?

The radom variable ˆp = X has ( ) ( ) X 1 2 Var = Var(X ) ( ) 1 2 = Var(X ) = p(1 p) = p(1 p) Thus the stadard deviatio of the estimator ˆp = X / is p(1 p)/. However, this depeds o the ukow p, so a computable versio, (i.e. the stadard error) is obtaied by pluggig i the estimate ito this expressio. Thus ˆp(1 ˆp) SE(ˆp) =. So i our example, the estimate is ˆp = 0.68 with stadard error ˆp(1 ˆp)/100 0.047. Ca we use a iterval of the form ˆp ± c SE(ˆp)?.

To do this, we eed ˆp p SE(ˆp) = ˆp p ˆp(1 ˆp) to be a pivot, that is to have a kow distributio, ot depedig o p. Is this a pivot? Is it approximately? If so, with what distributio? If is large eough so that the Cetral Limit applies, we do have a (approximate) pivot here, but ot the ratio above, rather a versio of it with the true p put back ito the deomiator i place of ˆp (i.e. the exact SD istead of its approximatio, the SE): ˆp p SD(ˆp) = ˆp p p(1 p) approx N(0, 1) (ote the differece betwee these two ratios, i particular the deomiators!)

Ufortuately the approximate-n(0, 1)-pivot has the ukow p appearig i the deomiator; it ca t be used to costruct cofidece itervals directly; the ±-factor is ot computable. Eve more ufortuately, if we revert back to our first guess ad plug-i ˆp for p i the deomiator, the resultat ratio is i geeral ot-at-all-a-approximate-n(0, 1)-pivot: Dist of ˆp p SE(ˆp) = ˆp p ˆp(1 ˆp) chages sigificatly for differet p s, (particularly for small-to-moderate, say 50). Eve more ufortuately tha that, this is still recommeded i may textbooks as a good idea. As we illustrate below, this iterval ca have a serious problem. More precisely, for certai ulucky choices of ad p, the coverage probability of the ˆp ± 1.96SE (ˆp) is otably below 0.95.

We illustrate this pheomeo with a particularly ulucky pair, = 32, p = 0.2: > x=rbiom(10000,32,.2) > phat=x/32 > se=sqrt(phat*(1-phat)/32) > lower=phat-1.96*se > upper=phat+1.96*se > sum((lower<=.2)*(upper>=.2)) [1] 8889 # this couts how may simulated itervals # cover the true value of 0.2 This is sigificatly less tha the expected 9500 (P-value of 1-sided test of H 0 : p = 0.95 versus H 1 : p < 0.95 is pretty small!): > pbiom(8889,10000,.95) [1] 5.488768e-131

Cofidece itervals Table summary 1 lists the smallest Coservative after adwhich approximate the coverage cofidece itervals stays atfor 0.93 a biomial or above p for Examples selected values of p for the stadard iterval ad three alterative itervals. s, J, FIG. 1. Coverage probability of the stadard iterval for p = 0.5 ad = 10 100. This shows P(p ˆp ± 1.96 ˆp(1 ˆp)/) for p = 0.5 ad ragig from 10 to 100 (from Brow et al. Aals of Statistics 2002).

This graph seems to suggest that, at least for p = 0.5, the situatio improves as gets bigger (as we expect is should, because the the SE should be almost perfect at estimatig SD(ˆp) ad so the ratio should be like a N(0, 1) the. Let us cosider = 100, = 1000 ad = 2000 with p = 0.2 agai: Firstly, = 100, p = 0.2: > x=rbiom(10000,100,.2) > phat=x/100 > se=sqrt(phat*(1-phat)/100) > lower=phat-1.96*se > upper=phat+1.96*se > sum((lower<=.2)*(upper>=.2)) [1] 9343 > pbiom(9343,10000,.95) [1] 3.154453e-12 Agai, the coverage probability is clearly less tha 0.95.

Next = 1000, p = 0.2: > x=rbiom(10000,1000,.2) > phat=x/1000 > se=sqrt(phat*(1-phat)/1000) > lower=phat-1.96*se > upper=phat+1.96*se > sum((lower<=.2)*(upper>=.2)) [1] 9429 > pbiom(9429,10000,.95) [1] 0.0007532143 Eve here with = 1000, the umber of itervals that work is sigificatly less tha the 9500 that oe would expect if the cofidece level really was 95%.

Fially = 2000, p = 0.2: > x=rbiom(10000,2000,.2) > phat=x/2000 > se=sqrt(phat*(1-phat)/2000) > lower=phat-1.96*se > upper=phat+1.96*se > sum((lower<=.2)*(upper>=.2)) [1] 9485 > pbiom(9485,10000,.95) [1] 0.251702 Here although less that 9500, it is ot sigificatly less ad so we would be happy believig that the actual cofidece level is 95% here. So oly use this iterval for massive sample sizes (well over 1000).

Two sources of error i the approximate cofidece iterval for p There are two approximatios at work with the so-called approximate iterval ˆp ± 1.95 ˆp(1 ˆp)/, where ˆp = X / ad X B(, p): approximatig the SD(ˆp) = p(1 p)/ with the SE = ˆp(1 ˆp)/; approximatig the biomial distributio of X with a ormal. The mai source of error is the first oe; so log as p ad (1 p) are both bigger tha 5 we are happy that the ormal approximatio to the biomial is pretty good.

That is to say we are reasoably happy that ˆp p p(1 p) approx N(0, 1) (ote: the true p appears i the deomiator here, ot ˆp) ad so that ( ) p(1 p) P p i ˆp ± 1.96 0.95 is a pretty accurate approximatio. The problem here is that this iterval, while havig a close-to-95% coverage probability, caot be computed!

The poor performace of the so-called approximate cofidece iterval is because of the difficulty i accurately p(1 p) approximatig the quatity. Aother approach is to determie a upper boud for this. For 0 p 1, p(1 p) is maximised at p = 0.5 where it equals 0.25. p(1 p) Thus because 1 2 for all p, the ucomputable iterval p(1 p) ˆp ± 1.96 (which has coverage probability 95%) is always icluded i the coservative iterval ˆp ± 1.96 1 2.

Coservative cofidece iterval for biomial p Thus we have that ( P ˆp 1.96 1 ) 2 1 p ˆp + 1.96 2 (approx) 0.95 (Note: it is still techically oly approximately coservative sice we are usig a ormal approximatio to a biomial distributio). We refer to the observed value of the radom iterval ˆp ± 1.96 1 2 as a coservative 95% cofidece iterval for p. It is always the maximum width of ay correspodig approximate iterval for that value of.

Summary Thus, we have the followig two optios for providig a 95% 1 cofidece iterval for the biomial p parameter, based o a sigle observatio x modelled as the observed value of a radom variable X B(, p) for kow but p ukow: 1. The approximate 95% C.I. for p: ˆp(1 ˆp) ˆp ± 1.96 which should oly be used for massive (> 1000). 2. The coservative 95% C.I. for p: ˆp ± 1.96 1 2 which ca be (eedlessly) wide, but is (at least approx) valid. 1 Differet cofidece levels are obtaied by replacig 1.96 with the appropriate value from the N(0, 1) table: 1.645 for 90%, 2.326 for 98%, 2.576 for 99%, etc.

Examples Cotiuig our earlier example with 100 patiets, 68 of whom experiece relief, our poit estimate of p is ˆp = 68 100 = 0.68, the stadard error is ˆp(1 ˆp)/100 0.047. The sample size is too small to use the approximate iterval. The coservative 95% iterval is thus ˆp ± 1.96 1 2 0.68 ± 0.098 [0.582, 0, 778].

Left-hadedess A radom sample of 1500 people from a certai populatio was foud to cotai 129 left-haded people. Provide a 95% cofidece iterval for the true proportio of left-haders i the populatio. Our poit estimate is ˆp = 129/1500 0.086 ad its stadard 0.086 0.914 1500 0.00724. error is Sice our here is i the thousads, we ca perhaps use the approximate iterval. It yields ˆp ± 1.96 SE 0.086 ± (1.96 0.00724) [0.072, 0.100] It is of iterest to compare this to the coservative iterval: ( ) 1 0.086 ± 1.96 2 [0.061, 0.111]. 1500 The coservative iterval is cosiderably wider, which will of course happe wheever ˆp is far from 0.5 as it is here.

I light of the last example, we ca do a little simulatio to see how reliable that approximate 95% cofidece iterval is: we simulate from B(1500, 0.08) may times ad see how ofte the iterval covers 0.08: > x=rbiom(10000,1500,.08) > phat=x/1500 > l1=phat-1.96*sqrt(phat*(1-phat)/1500) > u1=phat+1.96*sqrt(phat*(1-phat)/1500) > sum((l1<=.08)*(u1>=.08)) [1] 9494 This is clearly ot sigificatly differet from the ideal 9500! This makes us feel good about the approximate iterval here. How about the coservative 95% iterval? > l2=phat-1.96*sqrt(1/(4*1500)) > u2=phat+1.96*sqrt(1/(4*1500)) > sum((l2<=.08)*(u2>=.08)) [1] 9997 Wow! I all but 3 of the 10000 simulatios the coservative iterval covered 0.08. So although very wide, it will cover the true p at least 95% of the time.

Goodess-of-fit tests Our last topic relates to discrete data, e.g. couts or frequecies. Sometimes it is desired to compare a set of observed frequecies to either 1. a give set of expected probabilties/proportios or 2. a family of such sets to see if the set of probs (or oe member of the family of such sets) ca well explai what is observed.