Testing hypotheses hypothesis a claim about the value of some parameter (like p) significance test procedure to assess the strength of evidence provided by a sample of data against the claim of a hypothesized value of the parameter null hypothesis (H 0 ) hypothesis identifying a target value of the parameter, often a status quo value; it has the form H 0 : parameter = value alterative hypothesis (H A ) hypothesis stating that the parameter differs from the hypothesized value in H 0 ; often, the investigator s hypothesis: H A : parameter > value ; or parameter < value ; or parameter value ; the first two of these forms produce a one-tailed alternative, the last is a two-tailed alternative 1
test statistic a statistic computed from the sample data, whose value is evaluated in the light of its sampling distribution, to provide evidence against the null hypothesis (e.g., information about the statistic ˆp can be used to test hypotheses regarding p) P -value conditional probability that, given the truth of H 0, one could obtain a value of the test statistic at least as inconsistent with the null hypothesis as the value that actually results from the sample; the smaller the P -value is, the greater the evidence against the null hypothesis Structure of a hypothesis test 1. State hypotheses: Both null hypothesis (H 0 ) and alternative hypothesis (H A ) 2. Apply sampling distribution model 3. Calculate test statistic and associated P -value 4. Determine conclusion: assess evidence against H 0 in favor of H A depending on how small P -value is. 2
The 1-Proportion z-test State hypotheses: Null hypothesis: H 0 : p = p 0 Alternative hypothesis: H A : p > p 0 ; or p < p 0 ; or p p 0 Choose model: Individuals are independently selected to form a SRS from a population satisfying the 10% Condition and the Success/Failure Condition, so normal model applies to sampling distribution for ˆp Mechanics: Compute z statistic based on sample value ˆp and hypothesized value of p from H 0 : z = ˆp p 0 p0 q 0 n Find P -value probability associated with respective form of H A : P = P (Z z H 0 ); or P = P (Z z H 0 ); or P = 2P (Z z H 0 ) Conclusion: Assess evidence to reject (or fail to reject) H 0 depending on how small P is. [TI-83: STAT TESTS 1-PropZTest] 3
Other considerations for hypothesis testing Decide on hypotheses before collecting data Crafting a hypothesis to test based on what the data will allow you to conclude is inappropriate (cheating!). How small should P be to reject H 0? Ultimately, this depends on context. We may want to require a very small P -value in cases where the null hypothesis is a longstanding belief, or if accepting the alternative hypothesis would lead to some serious outlay of effort or money; or we may want to be generous to the alternative hypothesis and allow a larger P -value if it represents an intervention to treat a disease, or might lead to some other relatively straightforward course of action. Use a one-sided or two-sided test? This depends on the Why of the given scenario. If the value of p can be both larger or smaller than its hypothesized value, then a two-sided alternative is used. If only values larger (or only values smaller) than the hypothesized value are of interest for the alternative, then a one-sided test is used. 4
Use a confidence interval as well To show exactly how strong the evidence is in support of or against the null hypothesis, include a confidence interval for p with your conclusion. An even better confidence interval estimate An even more reliable method of estimating p is called the plus-four method; it replaces the sample proportion with one that assumes two extra successes and two extra failures: where x counts the number of Successes in the sample, the plus-four estimate for p is p = x ñ = x + 2 n + 4. The corresponding confidence interval is p ± z p q ñ. 5
H A is the experimenter s hypothesis There is a tendency to want to identify the experimenters claim as the null hypothesis. This is rarely correct: the null hypothesis is the one that the experimenter wants to rule against by marshaling evidence from the data! Interpret the P -value correctly The P -value is not the probability that the null hypothesis is true! It is a conditional probability that, if the null hypothesis us true, the test statistic would be as extreme as it is observed to be. level of significance (α) In some hypothesis testing, we set a predetermined level α, below which the P -value from the test is deemed to be statistically significant: if the P -value is α, we reject H 0 ; if the P -value is > α, we do not reject H 0. Standard values of α are 0.10, 0.05, 0.01, and sometimes even smaller values. 6
Statistical significance is not the only criterion A test can offer statistically significant evidence to conclude that the value of a parameter is not equal to some value, but the actual difference between the true and hypothesized value could be so small as to be practically insignificant. Look also at a confidence interval estimate for the parameter to help decide this. Confidence intervals and hypothesis tests Confidence intervals and hypothesis tests are determined from the same data, and their corresponding statistics and are based on the same model assumptions. A levelα two-sided hypothesis test rejects the null hypothesis H 0 : p = p 0 whenever the test statistic falls outside a level C = 1 α confidence interval for p. Similarly, a level-α one-sided hypothesis test rejects the same null hypothesis whenever the test statistic falls outside a level C = 1 α 2 confidence interval for p. 7
Errors and the power of a test errors in hypothesis testing If we mistakenly reject a null hypothesis that is in fact true, we have committed a Type I error; in disease testing, this is the error of a false positive. It will happen whenever we select a sample that produces too low a P -value, and occurs with probability exactly equal to α, the level of significance for the test. If we mistakenly fail to reject a false null hypothesis, we have committed a Type II error; in disease testing it is the error of a false negative. It will happen when the value of the population parameter truly differs from the hypothesized value, but we select a sample that produces a P -value large enough to fool us into believing the incorrect null hypothesis; we label the probability of such an event β, and note that its value is very hard to determine because the true value of the parameter is unknown. H 0 is true H 0 is false reject H 0 Type I error correct decision fail to reject H 0 correct decision Type II error 8
power of a test probability that the test will correctly reject a false null hypothesis (see Fig. 21.4, p. 484) Since P (fail to reject H 0 H 0 is false) = P (Type II error) = β, it follows that the power of the test equals power = P (reject H 0 H 0 is false) = 1 β. Unfortunately, β depends on the unknown true value of the parameter, so this formula is merely a theoretical result; it does not make the power of the test any easier to calculate! effect size the difference between the hypothesized value of the parameter and its true value; the larger the effect size, the greater the power of the test and the smaller the Type II error 9
reducing errors An investigator is given control over α, but not over β. Decreasing α requires the evidence against the null hypothesis to be very strong in order to reject it, but this simultaneously increases β. In order to decrease both types of error, one must reduce the standard deviation SD(ˆp) of the sampling distribution by increasing the sample size n. 10