Module 1. Study Population

Size: px

Start display at page:

Download "Module 1. Study Population"

Maximilian Robbins
5 years ago
Views:

1 Module 1 Study Population A study population is a clearly defined collection of objects to be investigated by the researcher. In social and behavioral research, the objects are usually people but the objects also could be other entities such as newspaper articles, historical documents, speeches, patient case files, households, or advertisements. A few examples of study populations are: all preschool children in San Jose, all UCSC undergraduate students, all LA Times newspaper articles from 2010 to 2017, and all NPR health and science podcasts in Measurement Properties In addition to specifying the study population of interest, a researcher will specify some attribute to measure. When studying populations of people, the attribute of interest could be a specific type of academic ability, a personality trait, some particular behavior, an opinion, or a physiological measure. When studying media populations (articles, speeches, TV ads, etc.) the attribute of interest could be presence or absence of certain themes or the amount of certain types of information or opinions. The measurement of the attribute that the researcher wants to examine is called the response variable (or dependent variable). To measure some attribute of an object is to assign a numerical value to that object. These measurements can have different properties. With nominal scale measurements, the numbers are simply names for qualitatively different attributes. For example, Democrat, Republican, and Libertarian voters could be described using nominal scale scores of 1, 2, and 3. A dichotomous scale is a nominal scale with only two categories (e.g., disagree/agree, pass/fail, or correct/incorrect). It is customary to use the numbers 0 and 1 to represent the two categories. A nominal scale measurement is also called a categorical measurement. A categorical measurement can be a nominal scale measurement or an ordinal scale measurement. With an ordinal scale categorical measurement, the numbers assigned to each category reflect an ordering of the attribute. For example, with 1

2 ordinal scale measurements of 1, 2, and 3 corresponding to a response of disagree, neutral, or agree, a score of 3 indicates greater agreement than a score of 2, and a score of 2 indicates greater agreement than a score of 1. Ordinal scale measurements lack important properties of interval scale and ratio scale measurements. Unlike an interval scale measurement, the difference between ordinal scores of 1 and 2 does not necessarily represent the same difference in the attribute as the difference between ordinal scores of 2 and 3 or the difference between ordinal scores of 3 and 4. Unlike a ratio scale measurement, an ordinal scale score of 0 does not represent a complete absence of the attribute. Interval and ratio scale variables will hereafter be referred to as quantitative variables. This course will focus on statistical methods for nominal (primarily dichotomous) and ordinal response variables. Some statistical methods presented in this course will involve variables, called predictor variables (or independent variables) that are assumed to predict or explain the response variable. The predictor variables can be quantitative, ordinal, or nominal. Population Parameters A population parameter is a single unknown numeric value that describes the measurements that could have been assigned to all N objects in the study population. Researchers are interested in discovering the value of a population parameter because this information could be used to make an important decision or to advance knowledge in some area of research. With a dichotomous response variable, researchers are often interested in the population proportion, denoted by the Greek letter π (and not to be confused with the irrational number ). A population proportion could be used to describe a population of dichotomous measurements. For example, in a study population consisting of all 2,450 teachers who work in a particular school district, suppose all 2,450 teachers were asked if they are satisfied or dissatisfied with their job. The population proportion of satisfied teachers is π = N i=1 y i N (1.1) 2

3 where y i = 1 if teacher i is satisfied and y i = 0 if teacher i is dissatisfied. Note that a population proportion is simply the number of objects in the population that have a particular attribute (e.g., satisfied with job) divided by the population size. Note also that 1 π is the proportion of objects in the population that do not have the specified attribute. Random Samples and Parameter Estimates In applications where the study population is large or the cost of the dichotomous measurement is high, the researcher may not have the necessary resources to measure all N objects in the study population. In these applications, the researcher could take a random sample of n objects from the study population of N objects. In studies where random sampling is used, the study population is defined as the population from which the random sample was obtained. A random sample of size n is selected in such a way that every possible sample of size n will have the same chance of being selected. Computer programs can be used to obtain a random sample of size n from a population of size N. A population proportion can be estimated from a random sample. The sample proportion π = n i=1 y i n (1.2) is an estimate of π where y i = 0 or 1. Note that a sample proportion is simply the number of objects in the sample that have a particular attribute (denoted as f) divided by the sample size so that Equation 1.2 also can be written as π = f/n. A carat (^) is placed over π to indicate that it is an estimate and not the actual value of π. Equation 1.2 is the maximum likelihood estimate of π (see Appendix). Researchers would like to know the exact value of π but they usually must settle for a sample estimate of π because the population size is too large or the measurement process is too costly. Sample estimates by themselves are not very informative for the following two reasons. A sample estimate of a population proportion might be larger than the population proportion or smaller than the population proportion but the researcher will not know if the estimate is too small or too large. Furthermore, the value of π π could be small or large but the 3

4 researcher will not know how close the estimate actually is to the population parameter. Standard Error of Estimate The standard error of a parameter estimate provides information about the accuracy of the estimate. A small value for the standard error indicates that the parameter estimate (e.g., a sample proportion) is likely to be close to the unknown population parameter value and a large standard error value indicates that the parameter estimate could be much larger or smaller than the population parameter value. A standard error of an estimated proportion can be estimated from a random sample. The estimated standard error of π is π (1 π ) SE π =. (1.3) n From Equation 1.3, it is clear that increasing the sample size will decrease the value of the standard error which in turn will increase the accuracy of the parameter estimate. Note also that SE π is largest when π =.5. The estimated standard error by itself is not usually an interesting value to interpret, but standard errors are used to compute confidence intervals which do have interesting and important interpretations. Confidence Interval for π One goal of survey research is to learn something about the value of the population parameter by using information from a random sample. By using both the parameter estimate and its standard error, both of which can be computed from a random sample, it is possible to say something about the unknown population parameter in the form of a confidence interval. A confidence interval is a range of values that is believed to contain an unknown population parameter value (e.g., a population proportion) with some specified degree of confidence. An approximate 100(1 α)% confidence interval for π is π ± z α/2 SE π (1.4) 4

5 where 100(1 α)% is the confidence level and z α/2 is a two-sided critical z-value. The lower confidence interval limit is π z α/2 SE π and the upper confidence interval limit is π + z α/2 SE π. The American Psychological Association (APA) recommends reporting the lower and upper confidence interval values within square brackets separated by a comma. For example, if the lower and upper limits from Equation 1.4 are.31 and.45, respectively, these numbers would be reported as [.31,.45]. The APA also recommends that proportions be reported without a leading zero (e.g.,.31 rather than 0.31). The lower and upper limits of Equation 1.4 can be multiplied by 100 to give a confidence interval for the percent of objects in the study population that have the specified attribute. Confidence levels of 90% (α =.1), 95% (α =.05), and 99% (α =.01) confidence intervals are common. The values of z α/2 for these confidence levels are given below. 100(1 α)% z α/2 90% % % 2.58 For example, to compute Equation 1.4 with a 90% confidence level, z α/2 is set to 1.65; to compute Equation 1.4 with a 95% confidence level, z α/2 is set to 1.96; and to compute Equation 1.4 with a 99% confidence level, z α/2 is set to Using a larger sample size will reduce the value of SE π which in turn will reduce the width of the confidence interval. Narrow confidence intervals are more informative than wide confidence intervals. If Equation 1.4 was computed using a very small sample size and gave a 95% confidence interval of [.04,.9], such a result would not provide much information (because we already knew that π is some value between 0 and 1). In comparison, a narrow 95% confidence interval such as [.41,.52] would be very informative. However, a narrow confidence interval for π will require a large sample size. 5

6 The value 1 π represents the proportion of objects in the study population that do not have the specified attribute. To obtain a confidence interval for 1 π using the results of Equation 1.4, simply subtract the lower and upper confidence interval endpoints from 1. Example 1.1. A random sample of 300 participants was obtained from a directory of 43,239 registered UCLA students. All 300 students were interviewed and one of the questions asked if the student had been the target of a micro-aggressions on campus within the last week. In the sample of 300 students, 120 answered affirmatively. A 95% confidence interval for π, the proportion of all 43,239 UCLA students who would have said they had been the target of a micro-aggression within the last week, is computed below π = 120/300 = 0.40 SE π = 0.4(0.6)/300 = upper 95% limit = (0.0283) =.455 lower 95% limit = (0.0283) =.345 The researcher can be 95% confident that between 34.5% and 45.5% of the 43,239 UCLA students have been the target of a micro-aggression within a one-week period. This confidence interval result would be reported using APA style as: 95% CI [34.5%, 45.5%]. Confidence Interval for a Population Total Frequency In some applications the researcher wants to estimate the number of members in the study population, rather than the proportion of members, that have a particular attribute. For example, a researcher might want to estimate the number of students at a university who are skipping meals, the number of elderly residents in a community who need transportation assistance, or the number of children in a school district who need after school tutoring. If the exact size of the study population (N) is known, a population total frequency can be defined as Nπ. A confidence interval for Nπ is obtained by simply multiplying the endpoints of Equation 1.4 by N. Example 1.2. A random sample was taken from a public university with an enrollment of 26,450 full-time students. A short questionnaire was sent to a random sample of 400 fulltime students at this university and one question asked if the student anticipated needing temporary housing assistance within the next three months and 88 responded affirmatively. The 95% confidence interval for π is [.18,.26], and a 95% confidence interval for total number of students at this university who anticipate needing temporary housing is [4761, 6877]. 6

Properties of Confidence Intervals There are two important properties of confidence intervals: increasing the sample size will tend to decrease the width of a confidence interval, and increasing the

7 Properties of Confidence Intervals There are two important properties of confidence intervals: increasing the sample size will tend to decrease the width of a confidence interval, and increasing the confidence level will increase the width of a confidence interval. Increasing the confidence level in Equation 1.4 increases the probability that the confidence interval will capture the unknown value of π. Imagine taking every possible sample of size n from a study population and computing Equation 1.4 in each of these samples. If a 90% confidence interval was computed in each of these samples, it can be shown that the confidence interval will include π in about 90% of the samples. If a 99% confidence interval had been computed in each of these samples, the confidence interval will include π in about 99% of the samples. A 99% confidence interval is more desirable than a 90% confidence interval because it is more likely to include the unknown value of π. However, for any given sample a 99% confidence interval will be wider (less precise) and hence less informative than a 90% interval. For any given sample, a 99% confidence interval is less desirable than a 90% confidence interval in terms of precision. In practice, the researcher should choose a confidence level that represents an acceptable compromise between the probability of a confidence interval including π and the precision of the confidence interval. A 95% confidence interval represents a good compromise between the level of confidence and the confidence interval width as shown in the following graph. Notice that the confidence interval width increases almost linearly up to a confidence level of about 95% and then the width begins to increase dramatically with increasing confidence. Thus, small increases in the level of confidence beyond 95% lead to relatively large increases in the confidence interval width. confidence 7

8 Directional Two-sided Hypothesis Test In some applications, the researcher simply needs to decide if the population parameter (e.g., π) is greater than some value or less than some value. This type of information can be obtained using a directional two-sided hypothesis test. If the parameter is greater than some value, then one course of action will be taken or one theory will be supported; if the parameter is less than some value, then another course of action will be taken or another theory will be supported. The following notation is used to specify a set of hypotheses regarding the value of π: H 0 : π = h H 1 : π < h H 2 : π > h where h is some number specified by the researcher and H 0 is called the null hypothesis. H 1 and H 2 are called the alternative hypotheses. In virtually all applications, H 0 is known to be false (because π will almost never exactly equal h) and the researcher's goal is to decide if H 1 is true or if H 2 is true. A confidence interval can be used to perform a directional two-sided test using the following rules. If the upper limit of a 100(1 α)% confidence interval is less than h, then H 0 is rejected and H 1 : π < h is accepted. If the lower limit of a 100(1 α)% confidence interval is greater than h, then H 0 is rejected and H 2 : π > h is accepted. If the confidence interval includes h, H 0 cannot be rejected and the test is said to be "inconclusive". If the test is inconclusive but the confidence interval is narrow, the researcher could legitimately claim that the value of π is "close to" h. A directional two-sided test also can be performed using a test statistic rather than a confidence interval. The decision rule is given below. 8

9 reject H 0 and accept H 1 : π < h if z > z α/2 and π < h reject H 0 and accept H 2 : π > h if z > z α/2 and π > h fail to reject H 0 if z < z α/2 where z = π h 1 2n h(1 h) n (1.5) 1 is the test statistic and the value is a called a correction for continuity that 2n improves the performance of the test statistic in small samples. Directional Errors and p-values Most statistical packages compute a p-value for the Equation 1.5 test statistic. The magnitude of the test statistic determines the p-value with smaller p-values corresponding to larger test statistic values. The p-value can be used to reject H 0 in a three-decision rule. Specifically, H 0 is rejected if the p-value is less than α (usually.05). If H 0 is rejected, then H 1 or H 2 is selected according to the sign of π h. The p-value is related to the sample size with larger sample sizes tending to give smaller p-values. Thus, the null hypothesis can be rejected with near certainty with a large sample size. It is a common practice to report the results of a statistical test to be "significant" if the p-value is less than.05 and "nonsignificant" if the p-value is greater than.05. This approach is referred to as significance testing rather than hypothesis testing. Significance test results are routinely misinterpreted. A p-value less than.05 simply indicates that z > z α/2 and does not indicate that the population proportion is meaningfully different from the hypothesized value. Researchers also misinterpret p-values greater than.05 to imply that H 0 is true. In a directional two-sided test, a directional error occurs when H 1 : π > h has been accepted but H 2 : π < h is true or when H 2 : π < h has been accepted but H 1 : π > h is true. The probability of making a directional error is at most α/2. For instance, if a 95% confidence interval is used to select H 1 or H 2, the probability of making a directional error is at most

10 Power of a Hypothesis Test In a directional two-sided test, the goal is to reject H 0 : π = h and then choose either H 1 : π > h or H 2 : π < h. Although we know H 0 : π = h is certainly false, we are not sure if π > h or if π < h. Thus, useful scientific information could be obtained from a statistical test that lets us decide if π > h or if π < h. The power of a test is the probability of rejecting H 0. If the power of the test is low, then the probability of an inconclusive result will be high. An inconclusive result, of course, is an undesirable outcome of a study. From an examination of Equation 1.5, we see that using a larger sample size (n) will tend to increase the value of the test statistic which results in a smaller probability of an inconclusive result. The power of a test of H 0 : π = h depends on the sample size, the magnitude of π h, and the value of α. Increasing the sample size will increase the power of the test as illustrated below for α =.05 and π h =.2. Note that increasing the sample size can dramatically increase the power of the test up to a point, but then further increases in the sample size will give relatively small increases in power. sample size Decreasing α will reduce the probability of a directional error (which is desirable) but will also decrease the power of the test (which is undesirable) as illustrated in the graph below for n = 30 and π h = 0.2. Note that there is little loss in power for reductions in α from.20 down to about.10 with power decreasing more dramatically for α values below.05. This relation between power and α explains why α =.05 is a popular choice. 10

alpha For a given sample size and α level, the power of the test increases as the value of π h increases, as illustrated in the graph below for n = 30 and α =.05.

Imagine taking a sample of n objects from this study population, recording their y scores, and then computing the sample proportion (π ).

11 alpha For a given sample size and α level, the power of the test increases as the value of π h increases, as illustrated in the graph below for n = 30 and α =.05. π h Sampling Distribution of π Consider a study population consisting of N objects with y i representing some dichotomous measurement of the i th object. Imagine taking a sample of n objects from this study population, recording their y scores, and then computing the sample proportion (π ). Now imagine doing this for every possible sample of size n. The set of all possible sample proportions, for samples of size n, is called the sampling distribution of the sample proportion. The sampling distribution of π has three important features which are summarized below. 11

12 The mean of the sampling distribution of π is equal to the population proportion π If the sample size is sufficiently large, the sampling distribution of π will be closely approximated by a normal distribution (Central Limit Theorem) The standard deviation of the sampling distribution of π is equal to π(1 π)/n (N n)/(n 1) Because the mean of the sampling distribution of π is equal to the population proportion π, π is said to be unbiased. Unbiased estimates are attractive because they overestimate the population parameter with about the same tendency as they underestimate the population parameter. The standard deviation of the sampling distribution of π tends to be smaller with larger samples sizes. For large sample sizes, the sample proportions in a sampling distribution will be similar to each other and, because the sample proportion is unbiased, they tend to be close to the population proportion. In applications where n is a small fraction of N, the finite population correction factor (N n)/(n 1) will be close to 1 and can be ignored. Ignoring this correction factor, the standard deviation of the sampling distribution of π is approximately π(1 π)/n. The standard error of π defined in Equation 1.3 is an estimate of the standard deviation of the sampling distribution of π. A sampling distribution of π consists of N!/[(N n)!n!] values of π, which is an astronomically large number in typical applications. To concretely illustrate some properties of a sampling distribution, consider a very small population of N = 5 people who have dichotomous scores of y 1 = 1, y 2 = 0, y 3 = 1, y 4 = 0 and y 5 = 0 where the population proportion is π = ( )/5 =.4. With n = 2 as an example, the standard error of π is π(1 π)/n (N n)/(n 1) =. 4(1.4)/2 3/4 = 0.3. With n = 2, the sampling distribution of π consists of only N!/[(N n)!n!] = 5!/(3!2!) = 10 sample proportions which are shown below. 12

13 Sample Participants Sample Scores π 1 1 and 2 1, and 3 1, and 4 1, and 5 1, and 3 0, and 4 0, and 5 0, and 4 1, and 5 1, and 5 0, 0 0 The mean of all 10 possible sample proportions is ( )/10 =.4, which is identical to the population proportion. The standard deviation of all 10 possible proportions is [(.5.4) 2 + (1.4) (0.4) 2 ]/10 =. 09 =.3, which is identical to the standard error of a sample proportions. This example illustrates two facts about a sampling distribution of sample proportions: the mean of the sampling distribution is equal to π and the standard error of a sample proportion (Equation 1.3) is an estimate (ignoring the finite population correction) of the standard deviation of the sampling distribution. The confidence interval for π (Equation 1.4) follows from the fact that the sampling distribution of π has an approximate normal distribution with a mean of π and a standard deviation that is estimated by SE π. Consequently, in about 100(1 α)% of all possible samples of size n, (π π)/se π will be between -z α/2 and z α/2 which can be expressed as the following probability statement P(-z α/2 < (π π)/se π < z α/2 ) = 1 α. Multiplying each term by SE π gives P(-z α/2 SE π < π π < z α/2 ) = 1 α and subtracting π from each term gives P(-π z α/2 SE π < - π < -π + z α/2 SE π ) = 1 α. Finally, multiplying each term by -1 gives P(π + z α/2 SE π > π > π z α/2 SE π ) = 1 α where π + z α/2 SE π is the upper limit of Equation 1.4 and π z α/2 SE π is the lower limit of Equation 1.4. Illustration of the Central Limit Theorem From the Central Limit Theorem, the sampling distribution of π is known to have an approximate normal distribution with the approximation improving as the sample size increases. The Central Limit Theorem, which applies to sample means, also 13

sample size is as small as n = 10 as shown below.

14 applies to a sample proportion because π is a mean of 0 and 1 scores. When π is close to 0.5, the sampling distribution of π (represented by the histogram in the graph below) is closely approximated by a normal distribution even when the sample size is as small as n = 10 as shown below. When π is close to 0 or 1 however, the sample size must be large before the sampling distribution of π is closely approximated by a normal distribution. For π = 0.9 and n = 10, the normal approximation is poor as shown below. For π = 0.9 and n = 30, the normal approximation is much better as shown below. 14

15 Interpreting Confidence Interval Results A 100(1 α)% confidence interval for π will contain the value of π in about 100(1 α)% of all possible samples, but a confidence interval for π either will or will not contain the value of π in the one random sample that was used in a particular study. There will be some degree of uncertainty about whether or not a reported 95% confidence interval for π actually contains the unknown value of π. Researchers need to communicate the certainty of their confidence interval results using some agreed-upon quantitative scale. The certainty of confidence interval results can be quantified using a confidence scale that ranges from 0% to 100%. To assign meaning to specific confidence values, it is helpful to use a concrete example, such as randomly selecting one marble from a jar containing many marbles of equal size and weight that have been thoroughly mixed. Suppose the marbles are either red or green and suppose we know the proportion of green marbles. Assume our subjective probability of selecting a green marble is equal to proportion of green marbles. This marble example will be more similar to confidence interval problems if we also imagine that the marble turns white as soon as it is removed from the jar and that its original color will never be known. We will agree to describe our level of confidence that one randomly selected marble is green by setting our confidence level (a subjective probability x 100) equal to the known percentage of green marbles in the jar. For example, suppose we are told that 95% of the marbles are green and we randomly select one marble from the jar. Although this one randomly selected marble has turned white and we can never know its original color, our subjective probability of selecting a green marble is.95 and we will say that we are 95% confident that the selected marble was green. Our 95% level of confidence in the above example is predicated on two critical assumptions: the marble was randomly selected and that 95% of the marbles in the jar are actually green. If either of these two assumptions does not hold, our stated 95% level of confidence will be misleading. For example, suppose the marbles were not thoroughly mixed and the green marbles might be clustered at the top (where they are more likely to be selected) or at the bottom (where they are less likely to be selected). In this situation, the random selection assumption will be violated and we have no way to assign a level of confidence about the marble s 15

16 original color that everyone would agree upon. Alternatively, if we do not know the true proportion of green marbles, we would have no way of assigning a level of confidence about the marble s original color that everyone would agree upon. The confidence level in the marble example can be used to interpret a 100(1 α)% confidence interval. Consider a 95% confidence interval for π. If a 95% confidence interval for π was computed from every possible sample of size n in a given study population, we know from statistical theory that about 95% of these confidence intervals will capture the unknown value of π. With random sampling, we know that every possible sample of size n has the same chance of being selected (which is analogous to randomly selecting one marble). We know that each sample will be one of two types: samples where the 95% confidence interval contains π and samples where the 95% confidence interval does not contain π (which is analogous to marbles being either green or red). Furthermore, the percentage of all possible samples for which a 95% confidence interval contains π is known to be about 95% (which is analogous to knowing to proportion of green marbles). Knowing that a 95% confidence interval for π will capture π in about 95% of all possible samples, and knowing that the one sample the researcher has used to compute the 95% confidence interval is a random sample, we can say that we are 95% confident that the computed confidence interval includes π. Finally, like the marble that turns white when removed from the jar and its original color is unknown, we will not know if our one confidence interval actually captured or failed to capture the value of π. Another way to think about confidence intervals is to consider a test of H 0 : π = h for many different values of h. For a given value of α, if H 0 is tested for all possible values of h, a 100(1 α)% confidence interval for π is the set of all values of h for which H 0 cannot be rejected. All values of h that are not included in the confidence interval are values for which H 0 would have been rejected at the specified α level. For example, if a 95% confidence interval for π is [.24,.29], then all tests of H 0 : π = h will not reject H 0 if h is any value in the range.24 to.29 but will reject H 0 for any value of h that is less than.24 or greater than

17 Polychotomous Response Variables All of the methods described in this module can be applied to each level of a nominal or ordinal response scale that has three or more categories. A response variable with three or more categories is referred to as polychotomous. For example, suppose one questionnaire item asked if the respondent would be most likely to support a sales tax increase if the money was used to: a) provide more housing for the homeless, b) replace outdated textbooks in the local public schools, c) add new bike lines and walking paths, or d) none of the above. Suppose another questionnaire item asked if the respondent would: a) strongly agree, b) agree, c) disagree, or d) strongly disagree with the statement "the DACA program should be continued". Both of the above example have four categories and let π a, π b, π c, and π d represent the proportion of people in the study population who would select option a, b, c, or d, respectively. Using the number of respondents who select each option, Equation 1.2 can be used to estimate π a, π b, π c, and π d and Equation 1.4 can be used to obtain a confidence interval for π a, π b, π c, and π d. Recall that with a dichotomous response, it was not necessary to compute Equation 1.4 for each of the two categories because the confidence interval for one category could be directly transformed into a confidence interval for the second category. With three or more response categories, a confidence interval for one category cannot be computed from the confidence intervals for the other categories and it is necessary to compute Equation 1.4 for each response category when there are three or more categories. Example 1.3. A survey questionnaire was given to a random sample of 450 donors from a list of about 4,000 donors at a recent fundraiser in San Francisco. One of the survey questions asked about marital status with response options: a) Single (never married), b) Married, or in a domestic partnership, c), Widowed, d) Divorced, or e) Separated. The number of respondents selecting each category were f a = 103, f b = 196, f c = 14, f d = 76, and f e = 62. The 95% confidence interval for the five population proportions are [.190,.268], [.390,.482], [.015,.047], [.134,.204], and [.106,.170]. 17

18 Sample Size Requirements for Desired Precision Larger sample sizes give narrower confidence intervals, and it is possible to approximate the sample size that will give the desired width (upper limit minus lower limit) of a confidence interval with a desired level of confidence. The sample size needed to obtain a 100(1 α)% confidence interval for π having a desired width of w is approximately n = 4π (1 π )( z α/2 w )2 (1.6) where π is a planning value of the population proportion. Planning values of π can be obtained from expert opinion, pilot studies, or previously published research. In applications where the researcher has no idea about the possible value of π, π can be set to.5 which will give a sample size requirement that is always larger than necessary because the product π (1 π ) in Equation 1.6 is maximized at π =.5. If a confidence interval for π from a prior study can be obtained, π could be set to the value closest to.5 within the confidence interval range to give a conservatively large sample size. Example 1.4. A researcher wants to estimate the proportion of California's approximately 18 million registered voters who would support a $2.00 per pack increase in cigarette tax. The researcher suspects that about 60% of all voters will support the tax and so π was set to.6. The researcher would like the 95% confidence interval for π (the proportion of all registered voters who would support the tax) to have a width of about 0.1. The required sample size is approximately n = 4(.6)(1.6)(1.96/0.1) 2 = Suppose the researcher had no prior information about the value of π and used.5 as its planning value. A conservatively large sample size requirement would then be n = 4(.5)(1.5)(1.96/0.1) 2 = Sample Size Requirements for Desired Power The power of a directional two-sided test increases with larger sample sizes, and it is possible to approximate the sample size for which the test will have some desired level of power for a specified value of α. The sample size needed to perform a directional two-sided test with a specified value of α and desired power is approximately n = π (1 π )(z α/2 + z β ) 2 /(π h) 2 (1.7) 18

19 where 1 β is the desired power of the test, π is a planning value of the population proportion, and z β is a one-sided critical z-value. Equation 1.7 shows that larger sample sizes are needed with: 1) smaller values of α, 2) greater desired power, and 3) values of π that are closer to h. It is not necessary or desirable to use the same value of π in the numerator and denominator of Equation 1.7. For instance, a researcher could set π =.5 in the numerator and set π h to some value that represents the smallest difference that would have scientific or practical importance. It is customary to round sample size values from formulas such as Equations 1.6 and 1.7 up to the nearest integer. Example 1.5. A researcher wants to determine if the proportion of California stores that sell tobacco products to minors is less than.15 or greater than.15. Based on previous research and expert opinion, the researcher set π =.07. The researcher will send an underage "customer" to a random sample of stores and count the number of stores where the minor was able to purchase a tobacco product. The researcher wants the test of H0: π =.15 to have power of.9 with α =.05. The required sample size (number of stores) is approximately n = (.07)(1.07)( ) 2 /(.07.15) 2 = Sampling in Two Stages In applications where data can be collected in two stages, the confidence interval obtained in the first stage can be used to determine how many more objects should be sampled in the second stage and added to the initial sample in order to achieve a desired confidence interval width. If the 100(1 α)% confidence interval width from a first-stage sample size of n is w 0, then the number of additional objects (n + ) to sample and add to the first sample in order to obtain a 100(1 α)% confidence interval width of w is approximately n + = [( w 0 w )2 1] n. (1.8) Example 1.6. In a survey study with 25 participants, a 95% confidence interval for π had a width of The results of this study are unlikely to be published because of the wide confidence interval. The researcher would like to obtain a 95% confidence interval for π that has a width of The researcher should sample (0.38/0.15) 2 1]25 = additional participants and re-compute the 95% confidence interval in the sample of = 161 participants. 19

20 Target Population The confidence interval for π (Equation 1.4) provides information about the study population from which the random sample was taken. In most applications, the study population will be a small subset of some larger and more interesting population called the target population. For instance, a researcher might take a random sample of 200 undergraduate students from a particular university because the researcher has easy access to a complete list of all 8,000 undergraduates at that university. The results of Equation 1.4 will apply only to those 8,000 undergraduate students, but the researcher is more interested in the value of π for a target population of all young adults. It might be possible for the researcher to make a persuasive argument that the study population proportion should be very similar to the target population proportion. For instance, suppose the above researcher measured the absence or presence of color blindness in a random sample of 200 college students. The researcher could argue that the proportion of color blindness in the study population of 8,000 undergraduates should be no different than the proportion of color blindness for all young adults. Now suppose that the researcher instead asked the 200 students if they did or did not support abortion, and also suppose that the university was a church-affiliated university. In this situation, the researcher would have serious doubts about the proportion of students who support abortion in the study population being similar to the proportion supporting abortion in a target population of all young adults. Researchers in the natural sciences seldom worry about the distinction between a study population and a target population because the parameter values for many physical or biological attributes (like the color blindness example) are much less likely to differ across different study populations and consequently the study population parameter values are almost automatically assumed to generalize to some large target population. In contrast, social and behavioral researchers, who study complex human behavior that can vary considerably across different study populations, need to be very cautious about how they interpret their confidence interval and hypothesis testing results. It is necessary for social and behavioral 20

21 researchers to clearly describe the characteristics of the study population so that the statistical results can be appropriately interpreted by other researchers. Convenience Sample A convenience sample is a nonrandom sample obtained for reasons of ease or ready availability. Interviewing customers who are exiting a store or asking the students who are in a particular classroom to fill out a questionnaire are examples of convenience sampling. Confidence intervals and hypothesis testing results are uninterpretable without the random sampling assumption. However, in certain applications, a confidence interval or hypothesis test from a convenience sample might be interpretable. In order to interpret a confidence interval or test from a convenience sample, the researcher must first argue that the convenience sample is a random sample from some hypothetical population. The researcher must then argue that the parameter value in the hypothetical population should be very similar to the parameter value in some definable target population. If these two arguments can be made persuasively, the researcher can interpret confidence interval or hypothesis testing results from a convenience sample to be a description of the specified target population. These arguments usually require extensive subject-matter expertise. Assumptions for Confidence Intervals and Tests The confidence intervals and hypothesis tests for π require only two assumptions: the sample is a random sample (random sampling assumption) from the study population and the responses from each object in the sample are independent of each other (independence assumption). Depending on the type of application, violation of the random sample assumption can be very serious. When a convenience sample is taken, researchers typically argue that the sample is a random sample of some hypothetical population and that the confidence interval or test results then apply to that hypothetical population. In order for the results of such a study to have any scientific value, it is necessary to accurately describe the characteristics of the hypothetical population and this can be difficult when working with human populations. 21

22 The confidence interval and hypothesis test will not have their intended interpretations when the independence assumption is violated. Recall that the interpretation of a confidence interval assumed that a 100(1 α)% confidence interval would include the unknown population parameter value (e.g., π) in about 100(1 α)% of all possible samples of a given size. The proportion of all possible samples of a given sample size in which a 100(1 α)% confidence interval includes the population parameter is called the true coverage probability. When the independence assumption is violated the true coverage probability can be far less than 1 α, and the researcher s degree of belief regarding the computed confidence interval will be mistakenly too high. The confidence interval for π (Equation 1.4) is called a Wald interval and can have a true coverage probability that is far less than 1 α if π is close to 0 or 1. Furthermore, the true coverage probability of the Wald confidence interval can change unpredictably with small changes in n and slightly different values of π. In practice, the actual value of π is unknown and the researcher will not know if the Wald confidence interval has a true coverage probability that is close to the specified 1 α value or possibly much smaller. The Wald interval, because of its simplicity, is useful for teaching purposes but is not recommended for research applications. Two useful alternatives to the Wald confidence interval are described below. A Wilson confidence interval with a continuity correction will have a true coverage probability that is greater than 1 α for most every value of π with sample sizes as small as 10. The lower limit of the Wilson interval is 2 [2nπ + z α/2 2 1 z α/2 z α/2 (2 + 1 ) + 4π (n(1 π ) + 1) ]/b (1.9a) n and the upper limit of the Wilson interval is 2 2 [2nπ + z α/ z α/2 z α/2 + (2 1 ) + 4π (n(1 π ) 1) ]/b (1.9b) n where b = 2(n + z 2 α/2 ). If π = 0 then the lower Wilson limit should be set to 0, and if π = 1 the upper Wilson limit should be set to 1. 22

23 The continuity-corrected Wilson interval can be substantially wider than the Wald interval because the Wilson interval can have a true coverage probability that is substantially greater than 1 α. Unless the research application demands a coverage probability that is virtually guaranteed to be greater than 1 α, the following adjusted Wald confidence interval is a good choice for most research applications π ± z α/2 π (1 π ) n + 4 (1.10) where π = (f + 2)/(n + 4). An adjusted Wald interval is generally narrower, and hence more informative, than a continuity-corrected Wilson interval. If n > 10, the adjusted Wald interval will have a true coverage probability that is slightly greater than 1 α in most situations and only slightly less than 1 α in rare situations. In rare cases, the adjusted Wald interval can give an upper limit that is greater than 1 or a lower limit that is less than 0. In these situations, the upper limit should be set to 1 or the lower limit should be set to 0. If a Wald confidence interval (Equation 1.4) or an adjusted Wald confidence interval (Equation 1.10) is used in a directional two-sided test, the decision will not always match the decision obtained when the Equation 1.5 test statistic is used. However, the continuity-corrected Wilson confidence interval and Equation 1.5 test statistic will always lead to identical decisions. Other confidence intervals for π have been proposed. In SPSS, the user can request a Jeffreys confidence interval for π that is computationally intensive but could be more accurate than the adjusted Wald confidence interval in some applications. SPSS also has an option to compute a Clopper-Pearson confidence interval for π that is guaranteed to capture π with probability no less than 1 α but can be very wide like the continuity-corrected Wilson interval. SAS will compute adjusted Wald, continuity-corrected Wilson, Jeffreys, Clopper-Pearson and several other confidence intervals for π. 23

24 Appendix. Maximum Likelihood Estimate of π Assume that a randomly selected member of the population will exhibit some attribute with probability π. It follows from the compliment rule of probability that the probability of not exhibiting the attribute is 1 π. This information can be summarized by the following equation P(y) = π y (1 π) 1 y (A.1) where y = 1 if the attribute is present and y = 0 if the attribute is absent. Note that P(1) = π 1 (1 π) 1 1 = π 1 (1 π) 0 = π and P(0) = π 0 (1 π) 1 0 = π 0 (1 π) 1 = 1 π. In a random sample of size n, the likelihood of observing the n values y 1, y 2,, y n can be expressed as a product of probabilities P(y 1 and y 2 and y n ) = P(y 1 )P(y 2 ) P(y n ) (A.2) assuming that the n observations are independent. The likelihood can be written as a function of π P(y 1 )P(y 2 ) P(y n ) = π y 1(1 π) 1 y 1π y 2(1 π) 1 y 2 π y n(1 π) 1 y n (A.3) which can be expressed more simply as where f = n n L(π) = π i 1 y i(1 π) n i 1 y i = π f (1 π) n f (A.4) n i 1 y i. L(π) is the likelihood function of π. The maximum likelihood estimate (MLE) of π is the value of π that maximizes L(π). In other words, the MLE of π is the value of π that is most likely to have produced the observed set of y 1, y 2,, y n scores in the sample. It is possible to solve for the value of π that maximizes L(π) using calculus. It is easier to solve for the maximum value of the natural logarithm of L(π) called the log-likelihood and denoted as lnl(π). Because the logarithm is a monotonically increasing function, the value of π that maximizes lnl(π) is exactly the same value of π that maximizes L(π). To solve for the value of π that maximizes lnl(π), we first find the derivative of lnl(π) with respect to π which is shown below. d lnl(π) d π = f π n f 1 π (A.5) 24

25 Equation A.5 defines the slope of a line that is tangent to the lnl(π) curve at any specified value of π. The maximum value of lnl(π) occurs when the slope of the tangent line is 0. Setting Equation A.5 to 0 and solving for π, gives π = f/n (A.6) which is the MLE of π and denoted as π. To illustrate the fact that f/n maximize the value of the likelihood function (Equation A.4), suppose f = 4, n = 10, and we compute the likelihood function for nine values of π as shown below. Note that the likelihood function is maximized at π =.4. π π 4 (1 π)

Module 1. Study Populations

Module 1. Study Populations Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population usually consists of a specific