1 Statistical inference for a population mean

Size: px

Start display at page:

Download "1 Statistical inference for a population mean"

Cecil Flynn
5 years ago
Views:

1 1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known variance σ 2. Consider testing the hypothesis H 0 : µ = µ 0 versus H a : µ µ 0. By the CLT, Z = X µ 0 approx N(0, 1). Then, an approximate test at Type 1 error probability 0 < α < 1 rejects H 0 if Z > z 1 α/2 or Z < z α/2. This test is identical to the large-sample confidence interval for µ with known variance, so corresponds to rejecting H 0 if µ 0 is not in the confidence interval that has coverage approximately 1 α. 2. Inference for a normal population with known variance σ 2. Suppose X 1,..., X n represents a random sample of data from a normal population with unknown mean µ and known variance σ 2. Consider testing the hypothesis H 0 : µ = µ 0 versus H a : µ µ 0. The test statistic Z = X µ 0 N(0, 1), exactly. So, the test at Type 1 error probability 0 < α < 1 that rejects H 0 if Z > z 1 α/2 or Z < z α/2 is an exact test. This test is identical to the exact confidence interval for µ, so corresponds to rejecting H 0 if µ 0 is not in the confidence interval that has coverage exactly 1 α. 3. Inference for a large sample, unknown variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and unknown variance σ 2. Consider testing the hypothesis H 0 : µ = µ 0 versus H a : µ µ 0. For a large sample n 30, or so, t = X µ 0 s/ n approx t(n 1) 1

2 where s is the sample standard deviation. Then, an approximate test at Type 1 error probability 0 < α < 1 rejects H 0 if t > t 1 α/2 (n 1) or t < t α/2 (n 1). This test is identical to the large-sample confidence interval for µ with unknown variance, so corresponds to rejecting H 0 if µ 0 is not in the t confidence interval that has coverage approximately 1 α. 4. Inference for a normal population with unknown variance σ 2. Suppose X 1,..., X n represents a random sample of data from a normal population with unknown mean µ and unknown variance σ 2. Consider testing the hypothesis H 0 : µ = µ 0 versus H a : µ µ 0. Then, t = X µ 0 s/ n t(n 1) exactly, where s is the sample standard deviation. Then, an exact test at Type 1 error probability 0 < α < 1 rejects H 0 if t > t 1 α/2 (n 1) or t < t α/2 (n 1). This test is identical to the t confidence interval for µ with unknown variance, so corresponds to rejecting H 0 if µ 0 is not in the t confidence interval that has coverage exactly 1 α. 5. Paired t-test: inference on mean difference for paired data Suppose (X 1, Y 1 ),..., (X n, Y n ) are pairs of samples. Suppose D i = Y i X i has a normal distribution with mean µ D and variance σd 2. Then, to test H 0 : µ D = µ 0 versus H a : µ D µ 0 when σd 2 is unknown, use the paired t-test, i.e., reject H 0 if t > t 1 α/2 (n 1) or t < t α/2 (n 1), where t = D µ 0 s D / n. 2 Power and sample size for tests for one population mean Consider the case with known variance, either large sample or for a normal population. Suppose the test is one-sided, e.g. H 0 : µ = µ 0 versus H a : µ > µ 0. 2

3 Then, we reject H 0 is Z = X µ 0 > z 1 α. Suppose H a is true, and µ = µ > µ 0. Then, the power of the test is Recentering P ower = P (reject H 0 given µ = µ ) ( ) X µ0 = P > z 1 α = P ( X µ > z 1 α + µ ) Now the statistic X µ σ is standard normal (exactly for normal population, n approx. for a large sample problem), = ( )P (Z > z 1 α + µ ) and z 1 α + µ 0 µ σ/ is just a number that we can look up in the normal table to n evaluate the probability. Instead, suppose the power is fixed at 1 β for some 0 < β < 0 (then β is Type 2 error probability). We can solve for the sample size: P ower = P (reject H 0 given µ = µ ) 1 β = P (Z > z 1 α + µ ) β = P (Z < z 1 α + µ ) z β = z 1 α + µ [ ] 2 σ(zβ z 1 α ) = n. µ Remark: The power calculation can be done for one-sample t-tests in exactly the same manner by replacing normal quantiles with quantiles of t(n 1). The sample size calculations are a bit more involved, so are omitted from the final exam. 3

4 3 Statistical inference for a population proportion 1. Inference for one Bernoulli population proportion Suppose X 1,..., X n are a random sample of Bernoulli trials, i.e. X i {0, 1}, P (X i = 1) = p. Consider testing the hypothesis H 0 : p = p 0 versus H a : p p 0. There is no exact test of H 0 for arbitrary Type 1 error α (recall the Binomial distribution will admit an exact test only for some finite set of α values). Instead, consider the DeMoivre-Laplace CLT, which says Z = ˆp p 0 p0 (1 p 0 )/n approx N(0, 1) for large n. Then, an approximate test for Type 1 error α rejects H 0 when Z > z 1 α/2 or Z < z α/2. Note, this is NOT equivalent to the CI for p which is (ˆp + z α/2 ˆp(1 ˆp)/n, ˆp + z1 α/2 ˆp(1 ˆp)/n), the standard errors used in the CI and test statistic are different. 4 Statistical inference for the difference of population means Suppose X 1,..., X n is a random sample from a normal population with mean µ X and unknown variance σx 2. Suppose Y 1,..., Y m is a random sample from a normal population with mean µ Y and unknown variance σy 2. Consider testing the hypothesis H 0 : µ X µ Y = versus H a : µ X µ Y δ. 1. Suppose we assume σx 2 = σ2 Y, i.e. the population variances are equal. Then, under H 0, X t = Ȳ t(n + m 2) where S 2 p = (n 1)S2 X +(m 1)S2 Y n+m 2. The test that rejects H 0 when t > t 1 α/2 (n + m 2) or t < t α/2 (n + m 2) has Type 1 error probability exactly α. 4

5 2. If we do not assume σx 2 = σ2 Y, then let t = X Ȳ approx S 2 x /n + SY 2 /m t( v ) where v = The test that rejects H 0 when ( S 2 X n + S2 Y m ) (S 2 X /n)2 n 1 + (S2 Y /m)2 m 1 t > t 1 α/2 ( v ) or t < t α/2 ( v ) has Type 1 error probability approximately α. 5 Power for testing the difference of population means Consider testing H 0 : µ X µ Y = versus H a : µ X µ Y > δ when the population variances are assumed equal, σx 2 = σ2 Y. Consider the power of this test when µ X µ Y = >. The power of the test is Recentering, = P X Ȳ P ower = P (reject H 0 given µ X µ Y = ) = P X Ȳ > t 1 α (n + m 2) > t 1 α (n + m 2) + = 1 P t(n + m 2) < t 1 α (n + m 2) + and t 1 α (n + m 2) + S 2 p (1/n+1/m) is just a number so this probability can be evaluated using the t-table (values for α and the pooled sample variance must be specified). 5

6 6 Statistical inference for the difference of Bernoulli proportions Suppose X 1,..., X n and Y 1,..., Y m are independent samples, each random samples from Bernoulli populations with population proportions p X and p Y. To test H 0 : p X p Y = versus H a : p X p Y p, consider the statistics Z = ˆp X ˆp Y ˆpX (1 ˆp X )/n + ˆp Y (1 ˆp Y )/m for 0 and Z = ˆp X ˆp Y p(1 p)(1/n1/m) for = 0, where p = nˆp X+mˆp Y n+m. In either case, the test that rejects H 0 if Z > z 1 α/2 or Z < z α/2 has Type 1 error probability approximately α. However, only the first test, where 0 corresponds to the CI for p X p Y, which is (ˆp X ˆp Y + z α/2 ˆpX (1 ˆp X )/n + ˆp Y (1 ˆp Y )/m, ˆp X ˆp Y + z 1 α/2 ˆpX (1 ˆp X )/n + ˆp Y (1 ˆp Y )/m) 7 Testing the equality of independent normal population variances Suppose X 1,..., X n and Y 1,..., Y m are independent samples, each random samples from normal populations with unknown variances σ 2 X, and σ2 Y. Then, F = S2 X S 2 Y F (n 1, m 1), the ratio of sample variances has an F distribution with degrees of freedom n 1 and m 1. Consider testing H 0 : σ 2 X /σ2 Y = 1 versus H a : σ 2 X /σ2 Y 1. Without loss of generality, suppose S 2 X > S2 Y, then the test that rejects H 0 when F > F 1 α/2 (n 1, m 1) has Type 1 error probability exactly 1 α. 6

7 8 Bonferroni Multiple testing procedure Suppose we are testing independent hypotheses: H 1 0,..., H m 0. In order to bound the Type 1 error rate of all the m tests no greater than α, carry out each test at α = α/m. 9 P-values Suppose I have a test statistic Z and my test of some null hypothesis H 0 rejects if Z > Z 1 α/2 for the 1 α/2 quantile of the distribution of Z. (Here Z does not necessarily denote a normal random variable and Z q does not necessarily denote a normal quantile. All of our tests are set up in this manner.) Then, suppose data is collected and the test statistic Z has value z. The p-value for this test statistic is given by p where p solves z = Z 1 p/2. Then, p can be seen to be the smallest Type 1 error probability for which the test rejects H 0 when the test statistic takes value Z = z. 7

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this