Statistical Inference. Hypothesis Testing

Statistical Inference Hypothesis Testing

Previously, we introduced the point and interval estimation of an unknown parameter(s), say µ and σ 2. However, in practice, the problem confronting the scientist or engineer may not be so much the parameter estimation, but rather the formation of a data-based decision procedure that can produce a conclusion about some scientific system.

For instance, an engineer may decide on the basis of sample data whether the true average lifetime of a certain kind of tire is at least 22,000 miles; an agronomist may want to decide on the basis of experiments whether one kind of fertilizer produces a higher yield of soybeans than another. a manufacturer of pharmaceutical products may decide on the basis of samples whether 90% of all patients given a new medication will recover from a certain disease.

In each of these cases, the scientist or engineer conjectures something about a system. In addition, each must involve the use of experimental data and decision making that is based on the data. Formally, in each case, the conjecture can be put in the form of a statistical test of hypothesis.

For instance, an engineer may decide on the basis of sample data whether the true average lifetime of a certain kind of tire is at least 22,000 miles; In the first of above cases, we might say that the engineer has to test the hypothesis that t θ = E(X) = 1/λ is at least 22,000, where the λ is the parameter of an exponential population;

For instance, an agronomist may want to decide on the basis of experiments whether one kind of fertilizer produces a higher yield of soybeans than another. in the second case we might say that the agronomist has to decide whether µ 1 > µ 2, where µ 1 and µ 2 are the means of two normal populations;

For instance, a manufacturer of pharmaceutical products may decide on the basis of samples whether 90% of all patients given a new medication will recover from a certain disease. and in the last case we might say that the manufacturer has to decide whether θ, the parameter of a binomial i population, equals 0.90.

Statistical hypothesis Let us define precisely what we mean by a statistical hypothesis. Definition: A statistical hypothesis is an assertion or conjecture or statement about the population, usually formulated in terms of population parameters. The two complementary statements concerning the population are called null hypothesis H 0 and alternative ti hypothesis H 1. Remark that in hypothesis testing, what we are interested most is to test the null hypothesis H 0.

Example 1 Example 1 ) ( 0 ll H = ) ( 0 : ) ( 0 : 1 0 alternative H null H μ μ ) ( 1 μ = ) ( : null H μ μ > = ) ( : ) ( : 0 1 0 0 alternative H null H μ μ μ μ = ) ( 1 : null H σ = ) ( 1 : ) ( 1 : 1 0 alternative H null H σ σ 1

Example 2

A test of hypothesis Definition: A test of statistical hypothesis H 0 is a procedure, based upon the observed values of the random sample obtained, that leads to the rejection or non-rejection of the hypothesis H 0. For instance, for testing H 0 : µ = 1, the procedure is a test. X > 2 X < 0 Reject H 0 if or

Caution In statistical hypothesis, we tests H 0 against H 1, i.e. we observe the data to see if there is enough evidence to reject H 0. If we have enough evidence to reject H 0, we can have great confidence that H 0 is false and H 1 is true. However, if we observe the data and find that H 0 is not rejected, it does not mean that we have great confidence in the truth of H 0. Here it only means that we do not have enough evidence to reject H 0. So in statistical hypothesis we should say "do not So, in statistical hypothesis we should say "do not reject H 0 ", instead of "accept H 0 ".

In statistical hypothesis, when we draw the conclusion, Reject H 0 OR i.e. we have enough evidence to reject H 0 Do not reject H 0 i.e. we do not have enough evidence to reject H 0 And we never say that Accept H 0 i.e. we have enough evidence to accept H 0

Test errors Each test procedure can lead to two kinds of errors. Type I error: the error of rejecting H 0 when it is in fact true. Type II error: the error of not rejecting H 0 when it is in fact false. Not reject H 0 Reject H 0 If H 0 is true No error Type I error If H 0 is false Type II error No error

Error probability Define α = P ( Type I error) = P( reject H 0 H 0 is true) β = P ( TypeII error) = P( NotrejectH0 H0 is false) Ideally, we want a test such that these two error probabilities can be minimized. However, normally, we cannot control them simultaneously.

More precisely, there is a trade-off between the two types of error. Making α smaller will lead to a larger β, and vice versa. (see a picture later) Therefore in designing a test we can only control one of them, say, guarantee α in a desired low level and then try to reduce β as much as we could (i.e. type I error is considered as more serious than type II error).

Example 3 Suppose we knew that the light bulbs produced from a standard manufacturing process have life times distributed as normal with a standard deviation σ = 300 hours. However, we did not know the mean lifetime µ. For simplicity, assume that we were sure that the mean lifetime should be either 1200 or 1240. Then we may set up the following hypotheses: H 0 : μ = 1200 ( null) H1 : μ = 1240 ( alternative) Suppose that t we draw a sample of 100 light bulbs and measure their lifetimes. The sample mean X can be used to estimate the true population mean µ.

Example 3 Intuitively, a large value of the sample mean will lead to the rejection of the null hypothesis H 0. So, if we construct atestas as Reject H 0 if X > 1249, Then and α = P ( reject H 0 H 0 is true ) = P ( X > 1249 μ = 1200), β = P( not reject H 0 H 0 is false) = P( X 1249 μ = 1240).

Why cannot control the Type I and II errors simultaneously

Rejection/ acceptance region In Example 3, our test procedure e is Reject H 0 if X > 1249. It means that if the observed value of the random sample, say {x 1,, x n }, is an element of the set { x, K, x : x > 1249} }, 1 n Then we reject H ; otherwise if the observed value Then we reject H 0 ; otherwise, if the observed value does not belong to this set, then we do not reject H 0.

Rejection/ acceptance region It is easy to see that we partition the sample space of the random sample by X to take an action of rejecting or not rejecting H 0. More formally, the partitions of the sample space of the random sample are defined as follows:

Rejection/ acceptance region The rejection region (or called critical region) of the null hypothesis H 0 or of the test, denoted by C 1, is the set of points in the sample space which leads to the rejection of H 0 ; while the set of the points in the sample space which leads to the acceptance of H 0 is called the acceptance region, denoted by C 0. Th t ti ti d t d fi th j ti i i The statistic used to define the rejection region is called a test statistic.

Rejection/ acceptance region The rejection region (or called critical region) of the null hypothesis H 0 or of the test, denoted by C 1, is the set of points in the sample space which leads to the rejection of H 0 ; while the set of the points in the sample space which leads to the non-rejection of H 0 is called the acceptance region, denoted by C 0. Th t ti ti d t d fi th j ti i i The statistic used to define the rejection region is called a test statistic.

Rejection/ acceptance region The rejection region (or called critical region) of the null hypothesis H 0 or of the test, denoted by C 1, is the set of points in the sample space which leads to the rejection of H 0 ; while the set of the points in the sample space which leads to the acceptance of H 0 is called the acceptance region, denoted by C 0. The statistic ti ti used to define the rejection region is called a test statistic. Normally, we use the point estimator of the unknown parameter to be tested in the hypothesis to be our test statistic. For instance, when we want to test the hypothesis of µ, we use the sample mean.

Rejection/ acceptance region In Example 3, our test procedure e is Reject H 0 if X > 1249. Then the rejection region is

Recall that when we design a test of hypothesis, we cannot control the two error probabilities at the same time, and what we can do is to control the Type I error probability α in a desired low level, (often choose 0.01, 0.05 or 0.1), and then reduce the Type II error probability β as much as we could. How to design a test t procedure with this restriction ti?

Again, in this example, a large value of the sample mean will lead to the rejection of the null hypothesis H 0. So, we consider. Critical value

How to determine the critical value

Thus, the rejection region is C 1 = and we would say that

Suppose that if the observed value of the sample mean is, then we could conclude that we DO NOT HAVE ENOUGH EVIDENCE TO REJECT H 0 at a level α = 0.05.

Power of a test

Power of a test β = 0.6225, so 1 - β = 0.3775, i.e. P( Reject H 0 H 0 is false) = 0.3775 A power of a test is a quantity to evaluate the goodness of a test. For the comparison of two tests, first we need both tests to have a common α, and then the test is better if it has a higher power.

Power of a test Increase the power of a test, 1- β Decrease the Type II error probability β Increase the Type I error probability α So, how to increase the power of a test and the value α remains unchanged?

400,

Therefore, the rejection region becomes. Now, let s calculate the power of this test with n = 400 and α = 0.05. Power = 0.3775 Power = 0.8466 (n=100, α =005) 0.05) (n=400, α =005) 0.05)

Question

Similarly, we will also consider these tests for σ 2.