4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Size: px

Start display at page:

Download "4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49"

Roxanne Roberts
6 years ago
Views:

1 4 HYPOTHESIS TESTING 49 4 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether or not θ equals a particular value of interest, or lies in a particular range of values of interest. Estimation and hypothesis testing can be thought of as two related (dual) aspects of the inference problem, as we shall see later. 4.1 Types of hypothesis and types of error Suppose X 1,X 2,...,X n are an independent random sample from a probability density function f X (x θ). Instead of estimating θ, we now wish to use the sample to test hypotheses about θ. Definition 4.1.1: Simple and composite hypotheses We define a hypothesis to be an assertion or conjecture about θ. If the hypothesis completely specifies the distribution of X, it is called a simple hypothesis. Otherwise it is called a composite hypothesis.

2 4 HYPOTHESIS TESTING 50 Example Suppose we take an independent random sample X 1,X 2,...,X n from a random variable X N(µ,σ 2 ). Conisder the following hypotheses. Which are simple and which are composite? (i) H 1 : µ = 100,σ = 15; (ii) H 2 : µ > 100,σ = 15; (iii) H 3 : µ > 100,σ = µ/10; (iv) H 4 : µ = 100; (v) H 5 : σ = 15; (vi) H 6 : µ < 100. Comparing two hypotheses Usually in hypothesis testing we compare two hypotheses, the first, called the null hypothesis is H 0 : θ ω and the second, the alternative hypothesis is H 1 : θ ω where ω S, ω ω = S, ω ω = and S is the set of all possible values for the parameter θ of the distribution of the random variable X.

3 4 HYPOTHESIS TESTING 51 Example We are interested in whether a new method of sealing light bulbs increases the average lifetime of the bulbs. Here, if θ is the mean lifetime of the bulbs sealed by the new method, and we know the mean lifetime of standard bulbs is 140 hours, our hypothesis test will be a test of H 0 : θ = 140 versus H 1 : θ > 140. Now suppose we assume that the lifetime X of a new bulb follows an Exponential distribution, i.e. X Exp(1/θ). Which of H 0 and H 1 is simple and which is composite? What are the sets S, ω and ω which define this hypothesis test? Definition 4.1.2: Acceptance region and rejection region Let A be the sample space of X, i.e. the set of all possible values of a random sample of size n from X. A test procedure divides A into subsets A 0 and A 1 (with A 0 A1 = A, A 0 A1 = ) such that if and if X A 0, we accept H 0 X A 1, we reject H 0 and accept H 1. A 0 is called the acceptance region and A 1 the rejection region of the test.

4 4 HYPOTHESIS TESTING 52 Definition 4.1.3: Type I error and type II error When performing a test we may make the correct decision, or one of two possible errors: (i) Type I error: reject H 0 when it is true; (ii) Type II error: accept H 0 when it is false. The type I error is usually regarded as the more serious mistake. The probabilities of making type I and type II errors are usually denoted by α(θ) and β(θ) respectively. Example Now returning to the lightbulbs sealed by the new method in Example 4.1.2, suppose that once again we wish to test: H 0 : θ = 140 versus H 1 : θ > 140, and we collect some data consisting of ten measurements of lifetimes x 1,...,x 10. Suppose we choose to accept H 0 if the sample mean x satisfies x < 150, and to reject H 0 (and hence accept H 1 ) if x 150. What are the sample space, the acceptance region and the rejection region for this test? What are the Type I and Type II errors in this specific case?

5 4 HYPOTHESIS TESTING 53 In Sections 4.2 to 4.6 we will develop the ideas of hypothesis testing by studying the main important cases. 4.2 Inference for a single Normal sample For this section we will assume that X 1,X 2,...,X n is an i.i.d. random sample from a N(µ,σ 2 ) distribution. For the time being, we assume σ 2 is known, i.e. a constant. Moreover, a particular value µ = µ 0 for the population mean has been suggested by previous work or ideas. In this case the null hypothesis is denoted by H 0 : µ = µ 0. There are a variety of options for the alternative hypothesis. Commonly used alternative hypotheses are: (A) H 1 : µ = µ 1 > µ 0 (B) H 1 : µ = µ 1 < µ 0 (µ 1 fixed constant) (µ 1 fixed constant) (C) H 1 : µ > µ 0 (D) H 1 : µ < µ 0 (E) H 1 : µ µ 0. Example Suppose the marks for a particular test are believed to follow a N(µ, 100) distribution, and the null hypothesis is H 0 : µ = 50. In which category (A) - (E) are each of the following alternative hypotheses: 1. H 1 : µ < 50; 2. H 1 : µ = 57; 3. H 1 : µ 50?

6 4 HYPOTHESIS TESTING 54 Alternative (E) is the most commonly used, and the easiest to justify in most real life situations. All the others assume some knowledge which it is usually unrealistic to assume. The null and alternative hypotheses are treated in the following way: we adopt the null hypothesis unless there is evidence against it. The test statistic we choose to use for a single Normal sample is X, the sample mean. It makes sense to test a hypothesis about the population mean µ using the sample mean X, but more than this, we know the distribution of X under the null hypothesis, which is crucial. If H 0 is true, X 1,...,X n are i.i.d. N(µ 0,σ 2 ) random variables, and so X N ( µ 0,σ 2 /n ) Z = X µ 0 σ/ n N(0,1). We now need to decide for which values of the test statistic we will reject H 0. These values will comprise the rejection region A 1. We reject H 0 in cases (A) or (C) : if Z is sufficiently far into the right-hand tail; (B) or (D) : if Z is sufficiently far into the left-hand tail; (E) : if Z is sufficiently far into either tail. In case (E) the rejection region is split between the tails of the distribution giving a twotailed test. The other cases are one-tailed tests. If P(Type I error)=α, the test is said to have significance level α. Commonly used significance levels are 0.05 (5%), 0.01 (1%) and (0.1%). Once the significance level is chosen, the rejection region is precisely determined.

7 4 HYPOTHESIS TESTING 55 Example Forα = 0.05, calculatetherejection regions(interms ofz) foreach categoryofalternative hypothesis (A) - (E). Example The widths (mm) of 64 beetles chosen from a particular locality were measured and the sample mean was found to be x = Previous extensive measurements of beetles of the same species had shown the widths to be Normally distributed with mean 23mm and variance 16mm. Test at the 5% level whether or not the beetles from the chosen locality have a different mean width from the main population, assuming that they have the same variance.

8 4 HYPOTHESIS TESTING 56

9 4 HYPOTHESIS TESTING cont. A single Normal sample with unknown variance σ 2 Nowweconsider hypothesis testsaboutµwherex 1,X 2,...,X n isani.i.d. randomsample from a N(µ,σ 2 ) distribution, and σ 2 is unknown. This is usually more realistic than assuming we know σ 2, but it is also a more complex problem. We have to estimate µ in the presence of the nuisance parameter σ 2. The solution is to replace σ 2 with a suitable estimate; here we use the sample variance S 2. Example Cola makers test new recipes for loss of sweetness during storage. For one particular recipe, ten trained tasters rate the sweetness before and after, enabling us to calculate the change (sweetness after storage minus sweetness before storage), as follows: Before After Change Is there evidence that in general, the storage causes the cola to lose sweetness?

10 4 HYPOTHESIS TESTING 58 Solution/cont. When we knew σ 2, we used the test statistic which we know has a N(0,1) distribution. Z = X µ 0 σ/ n, Now we are estimating σ 2 using S 2, so our test statistic becomes T = X µ 0 s/ n, and this has a slightly different distribution, called the Student t distribution, or just the t distribution... Definition 4.2.1: The Student t distribution If Z N(0,1) and U χ 2 n are independent random variables then T n = Z U/n has a Student t-distribution on n-degrees of freedom. The distribution is denoted by t n.

11 4 HYPOTHESIS TESTING 59 Example Sketch the t distribution with (a) 1; (b) 5; (c) 100 degrees of freedom. Figure 2: the t 1, t 5 and t 100 distributions pdf pdf pdf t 1 0 t 5 0 t

12 4 HYPOTHESIS TESTING 60 The t-distribution with n degrees of freedom has a p.d.f. which is symmetric and bellshaped, like the Normal, but with somewhat thicker tails. Smaller values of n correspond to the thickest tails. Larger values of n cause the t n distribution to be more like the Normal distribution. All we have to be able to do is to use statistical tables or R to look up the appropriate tail probability, since the distribution of our test statistic is given by: Example T n 1 = X µ 0 s/ n t n 1. For the cola example in the test statistic was t = 2.697, and the sample size was n = 10. Carry out the test of against H 0 : µ = 0 (no loss in sweetness); H 1 : µ < 0 (some loss in sweetness).

13 4 HYPOTHESIS TESTING Hypothesis test for two Normal means: two sample t test Now suppose we have two samples (x 1,x 2,...,x n1 ) and (x n1 +1,x n1 +2,...,x n1 +n 2 ), i.e. samples of sizes n 1 and n 2 from two different populations. We are interested in whether the two population means are equal. Assuming that the data are sampled from Normally distributed populations with equal variance, σ 2, in each population, then if we want to test H 0 : µ 1 = µ 2 versus H 1 : µ 1 µ 2 where µ 1 and µ 2 are the means of each population, we can perform a t-test with test statistic given by... t = x 1 x 2, where s = 1 s n n 2 (n 1 1)s 2 1 +(n 2 1)s 2 2, n 1 +n 2 2 where x 1, x 2, s 1 and s 2 are the sample means and standard deviations from each population. Here s = s 2 is the pooled estimate of the common standard deviation σ. If the null hypothesis is true, then the test statistic comes from a t distribution on n 1 +n 2 2 degrees of freedom, so we use the tables for t n1 +n 2 2 to carry out the test. This test is called the two sample t test.

14 4 HYPOTHESIS TESTING 62 Example Consider the lifetime of two brands of light bulbs. For a random sample of n 1 = 12 bulbs of one brand the mean bulb life is x 1 = 3,400 hours with a sample standard deviation of s 1 = 240 hours. Forthesecond brandofbulbsthemeanbulblifeforasampleofn 2 = 8bulbsis x 2 = 2,800 hours with s 2 = 210 hours. We assume that distribution of bulb life is approximately Normal, and the standard deviations of the two populations are assumed to be equal. Test using a two sample t-test at the 1% level. H 0 : µ 1 = µ 2 versus H 1 : µ 1 µ 2

15 4 HYPOTHESIS TESTING Two Normal populations: testing the assumption of equal variances In Section 4.3 we had to make the assumption that our two Normal populations had equal variance σ 2. Here we see how we can carry out a hypothesis test to check this assumption! We denote the two population variances by σ 2 1 and σ 2 2. We wish to test H 0 : σ 2 1 = σ2 2 versus H 1 : σ 2 1 σ2 2. Notice that these hypotheses don t make any assumptions about the values of µ 1 and µ 2. If the null hypothesis is true, then the ratio of sample variances S 2 1 S 2 2 will have a distribution called the F-distribution, on n 1 1 and n 2 1 degrees of freedom. Definition 4.4.1: The F distribution If U and V are independent chi-square random variables such that U χ 2 r and V χ2 s, then F = U/r V/s has an F distribution on r and s degrees of freedom. The distribution is denoted by F r,s. Note that the F distribution is characterized by two separate measures of degrees of freedom: r corresponds to the numerator and s corresponds to the denominator. Printed F tables are available, and of course we can always use R (except in an exam!). Note that it follows immediately that the reciprocal ratio of sample variances S 2 1 will have an F distribution on n 2 1 and n 1 1 degrees of freedom. S 2 2 In practice, we carry out the hypothesis test for equal variances as follows. We will only consider the case of the two sided alternative ( not equal ), giving rise to a two tailed test. In this case it is sensible to reject H 0 if either s 2 1/s 2 2 or s 2 2/s 2 1 is large. We form our test statistic as { s 2 F = max 1, s2 2 s 2 2 s 2 1 and compare this with F r,s tables, where if s 1 1 > s2 2 we set r = n 1 1 and s = n 2 1, while if s 1 2 > s2 1 we set r = n 2 1 and s = n 1 1. To account for the fact that under H 0, these two outcomes could happen with equal probability, the significance level of the test is *double* the upper tail probability of the F distribution (obtained from tables or R). },

16 4 HYPOTHESIS TESTING 64 Example For the data in Example 4.3.1, test the assumption that the standard deviations of the two populations are equal.

17 4 HYPOTHESIS TESTING Inference for a single Binomial proportion (r not small!) Here we consider the situation where we have a single observation x from a Binomial random variable X Bin(r,θ), and we are interested in testing hypotheses about θ. Note that x can be viewed as the number of successes from r independent trials, each with success probability θ. In this section we consider the case where r is not small, i.e. r > 20. We will test H 0 : θ = θ 0 against an alternative from one of the categories (A) to (E) above. Example UK survey of sexual behaviour: in 2004/05, 11% of UK residents aged claimed to have had more than one sexual partner. Suppose that in , a random sample of 600 UK residents in the age group shows that 83 had more than one sexual partner. Is this evidence for an increase in the population proportion having more than one sexual partner? Formulate this problem as a hypothesis test.

18 4 HYPOTHESIS TESTING 66 We need to derive a test statistic whose distribution we can evaluate conditional on H 0 being true. We use the Normal approximation to the Binomial distribution. I.e. if X Bin(r,θ), with r > 20, then to a reasonable approximation X N[rθ,rθ(1 θ)]. (Note that the approximation involves rounding the outcome of a Normal random variable to the nearest integer! See below.) Now suppose the null hypothesis H 0 is true, i.e. θ = θ 0. Then the Normal approximation implies X N[rθ 0,rθ 0 (1 θ 0 )], and hence the test statistic Z = X rθ 0 rθ0 (1 θ 0 ) has a N(0,1) distribution. This means we can carry out a one sample z test exactly as we did in Section 4.2. N.B. because of the rounding issue, it makes sense to replace x in the test statistic by x 0.5 when x > rθ 0, and by x+0.5 when x < rθ 0. This is called a continuity correction. Example Forthesexual behaviour datainexample 4.5.1wehaver = 600, wehaveobserved x = 83, and we want to test H 0 : θ = 0.11 against H 1 : θ > Carry out the hypothesis test.

19 4 HYPOTHESIS TESTING 67 Notes on significance levels and p values 1. If you are not told what level of significance to use, a sensible procedure is to test at the 5% level. If not significant then stop, otherwise test at the 1% level. If not significant then stop, otherwise test at the 0.1% level. 2. If you have access to the p-value, e.g. from Normal tables, or from R (see Exercises 4B Questions 1 and 2) then you immediately have the result of a hypothesis test at any given significance level. E.g. in Example immediately above, we had p = It follows immediately that our test is significant at 5% but not at 1%, because 0.05 > p > 0.01.

20 4 HYPOTHESIS TESTING Inference for two Binomial proportions (samples not small!) Example Consider a survey of employment carried out seperately in Northern England and Scotland, among people who had left school six months earlier. Suppose we obtain the following data: Unemployed Employed Scotland Northern England Total In general we have two independent samples of size n 1 and n 2, with each observation classified as success or failure: Sample 1 Sample 2 Total Success O 11 O 12 R 1 = O 11 +O 12 Failure O 21 O 22 R 2 = O 21 +O 22 n 1 n 2 n = n 1 +n 2 Assuming all observations are independent, and that the success probability is constant within each sample, we have two Binomial samples. Suppose that the true probabilities of success are θ 1 and θ 2. We wish to test H 0 : θ 1 = θ 2 versus H 1 : θ 1 θ 2. As always with a hypothesis test, we need to find a test statistic whose distribution is known when H 0 is true. Now if H 0 is true, then θ 1 = θ 2 = θ, say. The combined samples give the number of successes in n 1 + n 2 trials, in each of which there is a probability θ of a success. So we may estimate θ by ˆθ = R 1 /n, where R 1 = O 11 +O 12 (total for first row) n = n 1 +n 2 (grand total).

21 4 HYPOTHESIS TESTING 69 Hence, under H 0, the expected number of successes in each of the samples is where E 11 = n 1R 1 n ; E 21 = n 1R 2 n ; E 12 = n 2R 1 n ; E 22 = n 2R 2 n. R 2 = O 21 +O 22 (total for second row). To measure how closely the expected values match the observed values we calculate the test statistic 2 2 X 2 (O ij E ij ) 2 =. E ij i=1 j=1 Under H 0, X 2 has an asymptotic distribution which is a χ 2 1 distribution (a chi square distribution with 1 degree of freedom ). Definition 4.6.1: The chi square distribution χ 2 n If Z 1,...,Z n are independent N(0,1) random variables, then X 2 = n i=1 Z 2 i has a chi square distribution on n-degrees of freedom. The distribution is denoted by χ 2 n. If H 0 is true, the observed values should be close to the expected values, and so X 2 will be small. Hence we reject H 0 if X 2 is large enough, using Tables (or R).

22 4 HYPOTHESIS TESTING 70 Example Consider the data in Example 4.6.1: Test H 0 : the unemployment rates are equal against H 1 : the unemployment rates are not equal.

23 4 HYPOTHESIS TESTING 71 Notes 1. The method we just described for 2 2 tables also works for r c tables, that is tables with r rows and c columns. The test statistic is given by X 2 = r i=1 c j=1 (O ij E ij ) 2 E ij, and this is compared with a chi-square distribution with (r 1) (c 1) degrees of freedom, i.e. χ 2 (r 1)(c 1). 2. Since deviation from what is expected under H 0 always corresponds to higher values of X 2, chi square tests for 2 proportions (and for r c contingency tables) are ***always*** 1 tailed, and always use the upper tail of the chi square distribution!!!

24 4 HYPOTHESIS TESTING The relationship between hypothesis tests and confidence intervals Every hypothesis test we carry out has a corresponding confidence interval associated with it! Example For the beetle widths given in Example 4.2.3, calculate a 95% confidence interval for the population mean µ. Looking back at that example, we can deduce immediately that 23 also lies outside the 99% confidence interval, and the 99.9% confidence interval. (Exercise: check this!) The general rule is: The 100(1 α)% confidence interval consists precisely of all those values which would not be rejected at the 100α% significance level.

25 4 HYPOTHESIS TESTING Hypothesis tests: size and power function Hypothesis tests can be described in terms of their size and power. Definition 4.8.1: the size of a hypothesis test Consider a particular hypothesis test on a single parameter θ. We define the size of the test to be sup{pr(reject H 0 )}. θ ω Note that for a simple null hypothesis, this is just the probability we reject H 0 if it s true, i.e. the probability of a Type I error. For a composite null hypothesis, it is the supremum of this rejection probability over all the values of θ for which the null hypothesis holds. Definition 4.8.2: the power function for a hypothesis test The power K(θ) is the probability of rejecting H 0, considered as a function of θ. A plot of the power function is helpful in determining how good our test is at rejecting the null hypothesis when it is false. Informally, the power of a test is often used to refer to the probability that it will reject the null hypothesis when it is false. However from our different categories of alternative hypothesis (A) - (E), this only makes real sense for (A) and (B), i.e. when we are comparing two simple hypotheses.

26 4 HYPOTHESIS TESTING 74 Example Suppose X 1,...,X 4 is a random sample from X N(µ,36), and we wish to test H 0 : µ = 10 against H 1 : µ > 10. Note that this is a one tailed alternative. Now suppose we base our rejection region on the value of X; specifically we construct it as A1 = {X : X > 17}. (a) Plot the power function for this test in the range 10 µ 20. (b) What is the size of this test? (c) What would be the power of the test if the alternative was, in fact, H 1 : µ = 22?

27 4 HYPOTHESIS TESTING 75 (cont.) Choice of rejection region In Example we found that our rejection region gave a test with desirable properties: a standard size of 1%, and a well-defined power function. So how can we design such a test ourselves? Fortunately there is a very useful theorem which helps us to define an optimal rejection region...

28 4 HYPOTHESIS TESTING 76 The Neyman-Pearson Lemma Suppose we have a random sample x 1,x 2,...,x n from a random variable X with density f X (x θ), and we wish to test H 0 : θ = θ 0 against the simple alternative H 1 : θ = θ 1. Consider the Likelihood Ratio defined as Λ(x) = L(θ 0 x) L(θ 1 x). Suppose we define a test by rejecting H 0 in favour of H 1 if Λ(x) is small enough. Specifically, suppose we choose a cut off point η such that Pr(Λ(x) η H 0 ) = α. Then the test based on the rejection region A 1 = {x : Λ(x) η} is the most powerful test of size α. Now suppose we have a composite alternative hypothesis H 1 : θ Θ 1. If the test is the most powerful for all θ 1 Θ 1, then it is said to be the uniformly most powerful (UMP) test for alternatives in the set Θ 1. Notes 1. Informally, the Neyman-Pearson Lemma says that if we base our test on the value of the likelihood ratio, then we get the best possible test (in the sense of being the most powerful). 2. Note that if we need to define a rejection region in terms of it is often easier to work with Λ(x) = L(θ 0 x) L(θ 1 x), log[λ(x)] = log[l(θ 0 x)] log[l(θ 1 x)]. Example SupposeX 1,X 2,...,X n isarandomsamplefroman(µ,σ 2 )distributionwhereµisknown, and we wish to test where σ 2 0 < σ 2 1. H 0 : σ 2 = σ 2 0 versus H 1 : σ 2 = σ 2 1, Find an appropriate test statistic on which to base a rejection region.

29 4 HYPOTHESIS TESTING 77

30 4 HYPOTHESIS TESTING Small sample methods In this section we consider statistical inference (estimation and hypothesis testing) in situations where the sample size is small. The crucial change from large sample methods is that we can no longer rely on the asymptotic distribution of either the maximum likelihood estimator, or the test statistic, in a hypothesis test. In fact the cases for one and two Normal means have already been dealt with, because the adjustments made to deal with unknown variance (t tests!) work for arbitrarily small samples. The cases which need special treatment are the cases of (a) inference on one Binomial proportion, and (b) the comparison of two Binomial proportions Inference for a single Binomial proportion (r is small!) Suppose we have a single observation x from a Binomial random variable X Bin(r,θ), and we want to test hypotheses about θ. This is the same kind of problem as we considered in Section 4.5, but this time we assume that the number of trials r is small, i.e. 20. The crucial difference is that the Normal approximation is now too poor to use, and we should use the Binomial distribution directly (using Tables or R). The fact that we are now working with a genuinely discrete distribution leads to a complication: we cannot carry out a test precisely for any specified significance level; we have to use the nearest approximate significance level. Example A leading cat food manufacturer has a slogan which could be interpreted as follows: 80% of cats prefer our product. In an experiment to test this, 20 cats are each given the choice between the product in question, Brand W, and the leading market competitor, Brand X. Result: 12 cats go for Brand W, and 8 cats go for Brand X. Is this evidence against Brand W s claim?

31 4 HYPOTHESIS TESTING 79

32 4 HYPOTHESIS TESTING Inference for two Binomial proportions (small samples!) Here we consider the same kind of problem as in Section 4.6, i.e. two Binomial proportions, with the data arranged in a 2 2 contingency table. Here we consider the case when one or both samples are small. Example A small study into the dieting habits of teenagers is undertaken, to investigate whether or not the proportions of males and females who diet are equal. Suppose the population proportions of males and females who are dieting at any one time are denoted by θ M and θ F respectively. We wish to test: H 0 : θ M = θ F against H 1 : θ M θ F. A random sample of 12 boys and 12 girls is selected, and we ascertain whether each individual is currently on a diet. Data: dieting not dieting Total Table 4.1 boys girls Total

33 4 HYPOTHESIS TESTING 81 It certainly appears that in the population, girls are more likely to be dieting, since in our sample: 9 out of 12 girls are dieting; 1 out of 12 boys are dieting. The question is: How significant are these results? In other words, how much evidence do we have against H 0 : θ M = θ F? The way we answer this is that we assume the row totals and the column totals are fixed at the observed values. We then assume that H 0 is true (as ever!) and we ask, how unlikely is the result we have observed? In other words: If we were to choose 10 of the teenagers at random, what is the probability that 9 of them would be among the 12 girls, and only 1 from among the 12 boys? The p value for this test will be the probability of all outcomes which are as extreme as this one, or more so... We introduce the notation: dieting not dieting Total boys girls Total

34 4 HYPOTHESIS TESTING 82 Table 4.2 dieting not dieting Total boys girls Total

35 4 HYPOTHESIS TESTING 83 Review of Section 4 In this section we have: 1. Introduced the principles of hypothesis testing. 2. Seen how to carry out hypothesis tests in some specific cases when the sample size n is reasonably large: (a) the mean of a single Normal population (variance known and unknown); (b) the means of two Normal populations (variances assumed equal); (c) the variances of two Normal populations; (d) the success probability for a single Binomial proportion; (e) the success probabilities for two Binomial proportions; 3. Introduced three new probability distributions needed to carry out the tests: the Student t distribution, the F distribution and the chi square distribution. 4. Learned how to use Statistical Tables to carry out the tests at specific significance levels. 5. Learned how to use R to do some of these tests, and to interpret the precise p value obtained. 6. Understood the relationship between hypothesis tests and confidence intervals. 7. Considered the properties of hypothesis tests, namely size and power. 8. Seen how we may construct the most powerful tests using the Neyman-Pearson Lemma. 9. Seen how to carry out hypothesis tests relating to the success probability of a single Binomial distribution when the number of trials is small ( 20); 10. Seen how to carry out hypothesis tests to compare two Binomial proportions when the sample sizes in a 2 2 contingency table are small. [Note that for the cases of one Normal mean and two Normal means, the methods we developed in Section 3 (z-tests and t-tests) already work for arbitrarily small samples.]

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function