F79SM STATISTICAL METHODS

Similar documents
Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Topic 15: Simple Hypotheses

Basic Concepts of Inference

6.4 Type I and Type II Errors

Topic 10: Hypothesis Testing

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Topic 10: Hypothesis Testing

Chapter 7: Hypothesis Testing

LECTURE 5 HYPOTHESIS TESTING

Chapter 10. Hypothesis Testing (I)

Mathematical Statistics

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Institute of Actuaries of India

Introduction to Statistics


simple if it completely specifies the density of x

Introduction to Statistics

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

STAT 830 Hypothesis Testing

Summary of Chapters 7-9

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

STAT 830 Hypothesis Testing

Chapter 9: Hypothesis Testing Sections

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Chapter 8 of Devore , H 1 :

CSE 312 Final Review: Section AA

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Binomial and Poisson Probability Distributions

Introductory Econometrics. Review of statistics (Part II: Inference)

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel)

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Ch. 5 Hypothesis Testing

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Chapter 9 Inferences from Two Samples

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Statistical Inference

STAT Chapter 8: Hypothesis Tests

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Econ 325: Introduction to Empirical Economics

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Chapter 5: HYPOTHESIS TESTING

Chapter Three. Hypothesis Testing

Composite Hypotheses. Topic Partitioning the Parameter Space The Power Function

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Lecture 21. Hypothesis Testing II

Topic 17: Simple Hypotheses

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

SUFFICIENT STATISTICS

Hypothesis tests

Tests and Their Power

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Chapters 10. Hypothesis Testing

Math 494: Mathematical Statistics

14.30 Introduction to Statistical Methods in Economics Spring 2009

Chapters 10. Hypothesis Testing

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Introductory Econometrics

Frequentist Statistics and Hypothesis Testing Spring

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

VTU Edusat Programme 16

1; (f) H 0 : = 55 db, H 1 : < 55.

Statistical Preliminaries. Stony Brook University CSE545, Fall 2016

MTMS Mathematical Statistics

Performance Evaluation and Comparison

Topic 19 Extensions on the Likelihood Ratio

Quality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2

Spring 2012 Math 541B Exam 1

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Problems ( ) 1 exp. 2. n! e λ and

hypothesis testing 1

18.05 Practice Final Exam

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Lecture 4: Parameter Es/ma/on and Confidence Intervals. GENOME 560, Spring 2015 Doug Fowler, GS

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Statistical Inference

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Introduction to Statistical Inference

Master s Written Examination - Solution

Lecture 7: Hypothesis Testing and ANOVA

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

6 The normal distribution, the central limit theorem and random samples

Central Limit Theorem ( 5.3)

Probability and Statistics Notes

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

INTERVAL ESTIMATION AND HYPOTHESES TESTING

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

STAT 801: Mathematical Statistics. Hypothesis Testing

Homework for 1/13 Due 1/22

Transcription:

F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is our model for the process which has generated our data e.g. X ~ N(µ,1), X ~ Poisson(λ). A hypothesis H is a statement about the distribution of X in particular, in this chapter, it is a statement about the unknown value of a parameter θ, ( or θ). A simple hypothesis is a statement which completely specifies the distribution: e.g. if X ~ N(µ,1) then H: µ = 5 is a simple hypothesis. If H is not simple, it is composite e.g. H: µ > 5. A test of H is a rule which partitions the sample space into two subsets: critical region: data in this subset are not consistent with H and we reject H acceptance region: data in this subset are consistent with H and we accept H. The null hypothesis H represents the current theory (the status quo ) e.g. H : θ =, H : θ = θ, H : P <.4, H : P > P, H : µ = 5, H : µ > 5, H : µ < µ H : µ 1 µ = (this is the no difference or no treatment effect hypothesis) H : σ 1 = σ (this is the equal variances or homoscedasticity hypothesis) The null hypothesis H is contrasted with an alternative hypotheses H 1 and our test is written, for example, as follows: H : θ = θ v H 1 : θ = θ 1 a test with simple null and alternative hypotheses H : θ = θ v H 1 : θ > θ a one-sided test with simple null and composite alternative hypotheses H : θ θ v H 1 : θ > θ a one-sided test with composite null and alternative hypotheses H : θ = θ v H 1 : θ θ a two-sided test with simple null and composite alternative hypotheses The fundamental questions we are asking are: do our data provide strong enough evidence to justify our rejecting the null hypothesis? how strong is our evidence against the null hypothesis? and, on a more general plane of enquiry, 1 RJG how good is our procedure given the data we have, are we using the best available test? is there perhaps a better experimental procedure we could have used to make a better testing approach available to us? The decision is based on the value of an appropriate function of the data called the test statistic (e.g. the sample mean x, sample proportion P, sample variance s, maximum value in the sample), whose distribution is completely known under H, that is when H is true. 9. Classical (Neyman Pearson) methodology (a) Simple H v simple H 1 There are two types of testing errors we are exposed to when making our decision: type I error: reject H when it is true type II error: accept H when it is false The probabilities of making these errors are conventionally denoted α and β: α = P(commit a type I error) = P(reject H H true) β = P(commit a type II error) = P(accept H H false)

1 β = P(reject H H false) is called the power of the test it is the probability of making a correct decision to reject the null hypothesis. It measures the effectiveness of the test at detecting departures from the null hypothesis. We want both α and β to be small, but, for a fixed sample size, it is not possible to lower both probabilities of error simultaneously we can of course lower the probabilities by increasing the sample size. The classical, Neyman Pearson, approach to testing is as follows: when testing H : θ = θ v H 1 : θ = θ 1 (i) fix/choose the value of α once chosen, α is called the level of significance of the test (popular choices are α =.5, giving a 5% test and α =.1, giving a 1% test ), and then (ii) use the test for which β is smallest that is to say the test with the highest power i.e. choose the most powerful available test of level α. The method for finding this best test is based on the likelihood function the result is the Neyman Pearson Lemmma, which is expressed in terms of the likelihood ratio L θ / L θ = L / L for short. The lemma states that the form of the best test is given by finding the ( ) ( ) [ ] 1 1 form of the critical region C such that C = {x; L kl 1 } for some constant k. The exact specification of C comes from the chosen level of the test α., and depends on θ. The power of the resulting test depends on θ 1. The criterion comes down in practice to defining C in terms of a range of values of the test statistic for a formal statement and proof of the Lemma see Miller & Miller 1.4. A test with pre-assigned level is often called a significance test. If the level is α and our decision is reject H we say that our result is statistically significant at 1α%. (b) Composite hypotheses In some cases we can use the N P Lemma when we have a composite alternative hypothesis. We may be able to find a test which is best for every value of the parameter specified by H 1. Such a test, if it exists, is said to be uniformly most powerful (UMP). Ex9.1 Random sample, size n, of X ~ N(µ,1). Test of H : µ = µ v H 1 : µ > µ. Consider first a test of H : µ = µ v H 1 : µ = µ 1 (where µ 1 > µ ). ( ) ( ) 1/ 1 f x; µ π exp ( x µ ) = 1 exp ( xi µ ) n L / L1 = = exp { x ( µ 1 µ ) ( µ 1 µ )} 1 exp ( xi µ 1) The best test has critical region defined by those data values x such that L /L 1 k, which is true for x µ µ µ µ k x µ µ k, that is, since µ 1 µ >, for x K. ( ) ( ), that is for ( ) 1 1 1 1 So, the best test is such that we reject H if x exceeds some value, i.e. we reject H for large values of X.

Suppose we want to perform a test at the 1α% level. Under H, X ~ N( µ,1/ n). α = P(type I error) = P ( X K µ µ ) 1 ( µ ) = = Φ ( α ) n K z α 1 [For a 5% test, n ( K µ ) { } > = = ( ) z.5 1.645 P Z > n K µ = α so = = and we reject H for X > µ 1.645/ n. For the case µ = 1, µ 1 = 1.5, and n = 5, we reject H for X > 1.39. The power of the test is then given by 1.39 1.5 P( X > 1.39 µ = µ 1), which is P Z > =P(Z >.855) =.84] 1/ 5 The test is best whatever the particular value of µ specified in H 1 and so is UMP for testing H : µ = µ v H 1 : µ > µ. No UMP test exists for testing H : µ = µ v H 1 : µ µ. The power function of a test (which generalises the concept of power we met earlier) is a function of the parameter given by π(θ) = P(reject H θ) Ex9.1 continued Consider testing H : µ µ v H 1 : µ > µ at the 5% level of significance. The best 5% test is as above and is UMP it rejects H for X > µ 1.645/ n. For general µ, X ~ N( µ,1/ n). 1.645 > = { > 1.645 } n { n ( µ µ ) } π(µ) = P(reject H µ) = P X µ µ P Z n ( µ µ ) = 1 Φ 1.645 This function is graphed below (in the case n = 9, µ = 1): at µ =µ = 1, power =.5 = level of the test of H : µ = 1 v H 1 : µ = µ 1 ( > 1). Power function power...4.6.8 1. -1 1 3 When working with composite hypotheses, the largest value of the power function π(θ) under H is called the size of the test (this generalises the concept of the level of the test). 3 mu

9.3 Some standard cases 9.3.1 Testing a population mean Suppose X ~ N(µ,σ ), random sample, size n, testing H : µ = µ (a) σ known X µ Test statistic is, which is ~ N(,1) under H σ / n (b) σ unknown X µ Test statistic is, which is ~ t n 1 under H this gives the famous t test S / n Large samples from any distribution: X µ Test statistic is S / n, which is ~ N(,1) (approximately) under H 9.3. Testing a population variance Suppose X ~ N(µ,σ ), random sample, size n, testing H : σ = σ n 1 S Test statistic is ( ) σ, which is ~ χ under H n 1 9.3.3 Testing a population proportion Let X be the number of successes in n Bernoulli trials with P(success) = θ, testing H : θ = θ Test statistic is X, which is ~ b(n, θ ) under H, X nθ and, for large n, ~ N(,1) (approximately) under H nθ 1 θ ( ) 9.3.4 Testing a Poisson mean Suppose X ~ Poisson(λ), random sample, size n, testing H : λ = λ Test statistic is X i, which ~ Poisson(nλ ) under H, X i ~ N nλ, nλ or X ~ N λ, under H and, for large n, ( ) Ex9. Random sample of X ~ N(µ,σ ). We want to test H : µ = 1.5 v H 1 : µ < 1.5 at the 5% level. We have data from a random sample of size 1: x = 9.1, x = 877.47. 4 λ n i X µ We reject H for small values of X. The test statistic is, which is ~ t 9 under H. S / n 1 9.1 The data give x = 9.1, s = 877.47 = 3.477. 9 1 X 1.5 Lower 5% point for t 9 is 1.833, so we reject H for < 1.833. This defines the critical region. S / 1 For our sample x = 9.1, s = 3.477 ; the test statistic has value.6 so we do reject H and accept H 1. i

An alternative, and simpler, approach is to calculate the observed value of the test statistic for the sample in hand and compare it with the tabulated percentage point (or go further see P-values later). Here, our observed t = (9.1 1.5) /.3477 1/ =.64, which is lower than the relevant percentage point ( 1.833) our observed value is low enough to be in the tail of the reference distribution and we reject H. Ex9.3 A coin is tossed times and lands heads 5 times and tails 15 times. Investigate whether the coin is fair or biased in favour of tails (i.e. do we have strong enough evidence to conclude that the coin is biased in favour of tails?) Let X be the number of heads. Then X ~ b(, θ) where P(head) = θ. We will test H : θ =.5 v H 1 : θ <.5 at 5%. We reject H for small values of X. From NCST (p), P(X 5 θ =.5) =.7 which is less than.5. Our observation x = 5 is in the lower tail of the reference binomial distribution so we reject H. We conclude that the coin is biased in favour of tails. Suppose the coin was tossed times and landed heads 8 times. P(X 8 θ =.5) =.517. This is far too high to provide evidence against H, which can stand. But suppose now that the coin was tossed 1 times and landed heads 4 times (same proportion of heads, but on many more tosses). X nθ Now X ~ b(1, θ) and we can use the test statistic which is ~ N(,1) (approximately) nθ 1 θ under H. ( ) Our observed statistic is (4 5)/5 = which is less than the lower 5% point of the N(,1) distribution ( 1.645) our observed value is in the tail of the reference distribution this time we have sufficiently strong evidence against H to justify our rejecting it. We reject H and conclude that the coin is biased in favour of tails (but see next section for improved methodology which allows naturally for the use of a continuity correction). 9.4 Significance and P values A typical conclusion of a significance test is simply reject H at the 5% level of significance or just reject H at 5%. This is not as informative as we can be. It is more informative to quantify the strength of the evidence the data provide against H. We do this by calculating the probability value (P value) of our observed test statistic. The P value is the observed significance level of the test statistic it is the probability, assuming H is true, of observing a value of the test statistic as extreme (that is, as inconsistent with H ) as the value we have actually observed The P value is the probability of the smallest critical region which includes the observed test statistic. Given the data we have, the P value is the lowest level at which we can reject H. The smaller is the P value, the stronger is our evidence against H. The use of P values is very widespread in published statistical work and is strongly recommended. 5

In Ex9.1, consider again the case µ = 1, µ 1 = 1.5, and n = 5. Suppose we observe x = 1.41. This value is in the critical region (which is x > 1.39 ) and has P X 1.41 µ = P Z >.5 =. (or.%). So we have strong enough P value given by ( ) ( ) evidence to justify rejecting H, at levels of testing down to %. Suppose however we observe x = 1.7. This value is not in the critical region and has P value P X 1.7 µ = P Z > 1.35 =.89 (or 8.9%). The P value is higher and the given by ( ) ( ) evidence is not strong enough to justify rejecting H. In Ex9., the observed test statistic is.64 and the P value of this statistic is P(t 9 <.64) =.5 (from NCST). So we have strong enough evidence to justify rejecting H, at levels of testing down to.5%. In Ex9.3 with 1 tosses, under H, X ~ N(5,5) approximately, and the P-value of our observation 4 heads is calculated as 4.5 5 P ( X 4 H ) = P Z < = P ( Z < 1.9) =.9. 5 We have strong enough evidence to justify rejecting H, at levels of testing down to about 3%. [Note the use of the continuity correction when using the normal distribution (which is continuous) to calculate an approximation to a probability for the binomial distribution (which is discrete).] P-value Suitable language for your conclusions (in most applications) >.5 insufficient evidence against H to justify rejecting it evidence not strong enough to justify rejecting H H can stand <.5 we have some evidence against H we can reject H at the 5% level of testing <.1 we have strong evidence against H we can reject H at the 1% level of testing we can reject H at levels of testing down to 1% <.1 we have overwhelming evidence against H we can reject H at the.1% level of testing we can reject H at levels of testing down to.1% 9.5 Two sample situations see over 6

9.5 Two sample situations 9.5.1 Difference between two population means Random sample size n 1 from X 1 ~ N(µ 1, σ 1 ); random sample size n from X ~ N(µ, σ ). All variables are independent. Sample means X1 and X and variances S 1 and S. We want to test hypotheses about µ 1 µ. H : µ 1 µ = δ (δ = is the no difference or no treatment effect hypothesis). (a) Population variances known Test statistic is X1 X δ, which is ~ N(,1) under H σ1 σ n n 1 (b) Common population variance σ 1 = σ = σ Test statistic is X1 X δ, which is ~ t with n 1 n df under H 1 1 S p n n 1 (recall the pooled estimator of σ is ( 1) ( 1) This gives the famous two sample t test. Large samples from any distribution: S p = n S n S 1 1 n n 1 Test statistic is X1 X δ, or X1 X δ, both of which are ~ N(,1) (approximately) under H 1 1 S p S1 S n n n n 1 1 9.5. Ratio of two population variances Random sample size n 1 from X 1 ~ N(µ 1, σ 1 ); random sample size n from X ~ N(µ, σ ). All variables are independent. Sample variances S 1 and S. We want to test hypotheses about σ 1 /σ. H : σ 1 /σ = 1 (i.e. σ 1 = σ this is the homoscedasticity hypothesis) S1 Test statistic is S, which is ~ Fn 1 1, n 1 under H 9.5.3 Difference between two population proportions X 1 ~ b(n 1,θ 1 ), X ~ b(n,θ ); large samples; sample proportions P 1 and P respectively H : θ 1 θ = δ ( δ = is the no difference hypothesis in regard to the population proportions) P1 P δ Test statistic is, which is ~ N(,1) under H P1 ( 1 P1 ) P ( 1 P ) n n 1 In the case δ =, H specifies a common population proportion θ = θ 1 = θ, and, under H, 1 X 1 X ~ b(n 1 n, θ). The MLE of the common proportion is then ˆ X X θ =. n n ) 1 7

1 1 = θ θ n1 n In this case the estimated standard error of P 1 P under H is ese( P1 P ) ˆ( 1 ˆ) and the test statistic is P1 P ese P ( P ) 1, which is ~ N(,1) (approximately) under H 9.5.4 Difference between two Poisson means Random sample size n 1 from X 1 ~ Poisson(λ 1 ), random sample size n from X ~ Poisson(λ ); large samples; all variables independent. Sample means X1 and X. H : λ 1 = λ. X1 X The test statistic normally used is, which is ~ N(,1) (approximately) under H X1 X n n 1 Noting that under H, the MLE of λ = λ 1 = λ is λ = statistic X X ese X 1 ( 1 X ) where ese( X1 X ) X X, one can also argue for the test n n ˆ 1i i 1 1 1 = ˆ λ. n1 n 9.5.5 Paired data (non-independent samples) Data arise as physical pairs (x i, y i ), i = 1,,, n with differences d i = x i y i. H : µ D = µ X µ Y = Problem reverts to the one-sample problem of 9.3.1. Ex9.4 See Ex8.1 Test H : µ 1 = µ v H 1 : µ 1 µ Test statistic is t = X1 X 1 1 S p n n 1 and, for a 5% test, we reject H for t >.8 (t has 1 df) For our data, t = 1.87/.717 =.56, and we reject H. The P-value of our statistic is P( t >.56) =.9 =.18 (approx., from NCST). Ex9.5 See Ex8.11 Test H :θ 1 = θ v H :θ 1 θ P 1 =.18, P =.115, P 1 P =.65 Under H, ˆ θ = 77/5 =.154 and ese( P P ) 1 1 =.154.846 3 =.395 1 Test statistic =.65/.395 = 1.973 P-value of result = P(Z > 1.973) =.4 =.48 We reject H at levels of testing down to 4.8% 8

9.6 Tests and confidence intervals A CI for a parameter θ is a set of values which, given the data we have, are plausible for the parameter. So any value θ contained in the CI should be such that the hypothesis H : θ = θ will be accepted in a corresponding hypothesis test. This is in fact generally the case. 1.96 1.96 For example, sampling from N(µ,1). A 95% two-sided CI for µ is given by X, X n n X µ and this interval contains µ if and only if < < 1.96, which is the condition under which 1/ n H : µ = µ is accepted in a 5% significance test when testing H : µ = µ v H 1 : µ µ. In general there is this direct link between the two-sided 1(1 α)% CI and the 1α% two-sided test. Similarly one-sided CIs correspond to one-sided tests. For example, consider again sampling from 1.645 N(µ,1). A 95% lower CI for µ is given by X, and this interval contains precisely those n values of µ which, when specified under H in the 5% test of H : µ = µ v H 1 : µ > µ result in H being accepted. If a CI has already been calculated for a parameter, then many questions which arise in a hypothesis testing framework are answerable immediately, at least in so far as giving us a basic reject or accept decision. Ex9.6 Ex9.1 revisited: N(µ,1), n = 5 H : µ = 1 v H 1 : µ > 1. 1.645 Suppose we observe x =1.4. Then a lower 95% CI for µ is given by 1.4, 5 i.e. (1.7, ). This interval does not contain the value µ = 1 which we therefore reject as being implausible (inconsistent with the value of the sample mean) it is too low. This is the same conclusion we come to in the test, for which the critical region is X > 1.39. 1.645 Suppose we observe x =1.3. Then a lower 95% CI for µ is given by 1.3, 5 i.e. (9.97, ). This interval does contain the value µ = 1 which we therefore accept as being plausible (consistent with the value of the sample mean). This again is the same conclusion we come to in the test. For a general x, the lower limit of the CI is x.39 and so any hypothesised value for µ such that µ > x.39 is contained in the CI i.e. we accept a hypothesised µ provided x < µ.39 and hence reject it for x < µ.39, as in Ex9.1. S Ex9.7 Ex9. revisited: an upper 95% CI for µ is given by, X 1.833, which with 1 x = 9.1 and s = 3.477 gives (, 1.5). The interval does not contain the value µ = 1.5 which we therefore reject as being inconsistent with the value of the sample mean it is too high. This is the same conclusion we come to in the test. 9

Ex9.8 See Ex8.. Testing a population proportion: H : θ =.38 v H 1 : θ.38 based on the result that a random sample of 1 includes 4 with the property. We reject H for large values of X, where X ~ b(1,.38) N(456, 8.7) P(X 4) = P[Z < (4.5 456)/ 8.7] = P(Z <.11) =.174 so the P-value of this (two sided) test is.34. So at 5% we reject H. The 95% CI for θ is.35 ±.7 i.e. (.33,.377) the value.38 is not contained in this interval. 9.7 Other matters (a) When a single best test (in the Neyman Pearson sense) is not available, another, more general approach is used. The test statistic and critical region are found by setting an upper bound on the ratio maxl / maxl where maxl is the maximum value of the likelihood L under the restrictions imposed by H, and maxl is the unrestricted maximum value of L. This method produces tests called likelihood ratio tests. For example, in sampling from N(µ,σ ) and testing H : µ = µ, the method leads to the t test of 9.3.1. (b) We may be able to reject H at a specified level simply by using so much data that our test statistic has a small enough standard error to enable us to detect a departure from H. This departure may, however, be of little or no physical significance. (c) A failure to reject H does not imply that H is true. It indicates that we have failed to reject it our data do not provide sufficiently strong evidence against it. H represents a theory which lives on to fight another day. (d) Good practice in testing State your hypotheses the test statistic the distribution of the test statistic under H the observed value of the test statistic the P-value (at least approximately) of the test statistic your conclusion as regards the hypotheses your conclusion in words which relate to the physical situation concerned 1

Appendix R code to produce the display in Ex9.1 continued x=c(-:6)*.5 y=1 pnorm(3*(1 x)1.6449) c=c( 1,1) d=c(.5,.5) e=c(1,1) f=c(,.5) plot(x,y,type="l",xlab="mu",ylab="power",main="power function") lines(c,d,lty=) lines(e,f,lty=) 11