Tests for Population Proportion(s)

Tests for Population Proportion(s) Esra Akdeniz April 6th, 2016

Motivation We are interested in estimating the prevalence rate of breast cancer among 50- to 54-year-old women whose mothers have had breast cancer. Suppose that in a random sample of 10,000 such women, 400 are found to have had breast cancer at some point in their lives. The best point estimate of the prevalence rate p is the sample proportion p = 400 10000 = 0.04 Given large studies, assume the prevalence rate of breast cancer for U.S. women in this age group is about 2%. The question is: How compatible is the sample rate of 4% with a population rate of 2%?

Motivation Another way of asking this question is to restate it in terms of hypothesis testing: p = prevalence rate of breast cancer in 50- to 54-year-old women whose mothers have had breast cancer H 0 : p = 0.02 = p 0 vs. H 1 : p 0.02 How do we test this hypothesis?

Introduction So far, we dealt with hypothesis concerning population mean. In this lecture, we will learn how to make inference on sample proportions: proportion of times an event occurs rather than the number of times. Example: Every year, approximately 3500 babies were delivered in the Vienna General Hospital in the mid-nineteenth century. Approximately 500 women develop puerperal fever an infection developing during childbirth. Assume X Bin(n, p). Normal approximation to binomial distribution: IF np 5 AND n(1 p) 5, then under H 0, p is approximately normally distributed.

Test about a population proportion Let p denote the proportion of individuals or objects in a population who possess a specified property (labeled as S ). Let X be the number of Ss in the sample. Then p = X n is the sample proportion. X is a binomial random variable with parameters p and n, i.e. X Bin(n, p). Furthermore, when the sample size n itself is large, both X and p are approximately normally distributed, i.e. X N(np, np(1 p)) and p N(p, p(1 p) n ). Test about population proportion p will depend on the sample size.

Test about a population proportion Large-Sample Tests When the sample size is large (n 30) (IF np 5 AND n(1 p) 5), p is approximately normally distributed with mean p and variance p(1-p)/n. In particular, under the null hypothesis H 0 : p = p 0, p is approximately normally distributed with mean p 0 and variance p 0(1 p 0)/n, i.e. p N(p 0, p 0(1 p 0)/n). Therefore the test statistic Z = p p 0 p0(1 p 0)/n has approximately a standard normal distribution.

Example For breast cancer example we compute the test statistic z = = p p 0 p0(1 p 0)/n 0.04 0.02 = 0.02 0.02(0.98)/10000 0.0014 = 14.3 z 1 α/2 = z 0.975 = 1.96 Since 14.3 > 1.96, H 0 is rejected using two sided test with α = 0.05.

Figure: Acceptance and rejection regions for the one-sample binomial test, normal-theory method (two-sided alternative)

Example Suppose that we select a random sample of 30 individuals from the population of adults in Turkey. Assume that the probability that a member of this population currently smokes cigarettes, cigars or pipes is equal to 0.29. Therefore, the total number of smokers in the sample is binomial with n = and p =. For this sample, what is the probability that six or fewer of its members smoke? Continuity correction.

Sample proportion and its distribution Sample proportion is denoted as ˆp and ˆq = 1 ˆp. Its distribution is normal with mean= and variance =, using CLT. Confidence interval for p: ( p z 1 α/2 p q/n, p + z1 α/2 p q/n )

Hypothesis Testing Null hypothesis: H 0 : p = p 0 OR H 0 : p p 0 OR H 0 : p p 0. Test statistic: z = ˆp p 0 p0 (1 p 0 ) n

Example Consider the distribution of five-year survival for individuals under 40 who have been diagnosed with lung cancer. This distribution has an unknown population mean p. In a randomly selected sample of 52 patients, only six survive five-years. Compute the sample proportion. Find the 95% confidence interval for the population proportion. Test the hypothesis that the population proportion is equal to 0.082 at significance level 0.05.

Comparison of two Population Proportions Assume X Bin(m, p 1), Y Bin(n, p 2) and they are independent. Normal approximation under conditions: m ˆp 1 5, m(1 ˆp 1) 5, n ˆp 2 5, n(1 ˆp 2) 5. Confidence interval: ( p 1 p 2 ± z 1 α/2 p1(1 p 1) m + ) p2(1 p2) n

Hypothesis Testing Null hypothesis: H 0 : p 1 p 2 = 0 Test statistic: z = ˆp 1 ˆp 2 0 ˆp(1 ˆp) m + ˆp(1 ˆp) n N(0, 1) Under H 0 the proportions are assumed equal therefore a common p value is estimated by ˆp = x1 + x2 m + n

Example In a study investigating morbidity and mortality among pediatric victims of motor vehicle accidents, information regarding effectiveness of seat belts was collected over an 18-month period. Two random samples were selected, one from the population of children who were wearing a seat belt at the time of the accident, and the other from the population the population who were not. In the sample of 123 children who were wearing a seat belt at the time of the accident, 3 died. In the sample of 290 children who were not wearing a seat belt, 13 died. We want to test whether the population proportions of these two populations are equal at significance level 0.05.