Ch. 5 Hypothesis Testing

Similar documents
Hypothesis Testing (May 30, 2016)

A Very Brief Summary of Statistical Inference, and Examples

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Hypothesis testing: theory and methods

simple if it completely specifies the density of x

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Chapter 4. Theory of Tests. 4.1 Introduction

Lecture 7 Introduction to Statistical Decision Theory

Theory of Statistical Tests

A Very Brief Summary of Statistical Inference, and Examples

Hypothesis Testing: The Generalized Likelihood Ratio Test

Some General Types of Tests

Chapter 7. Hypothesis Testing

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

8. Hypothesis Testing

Lecture 8: Information Theory and Statistics

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Introduction to Estimation Methods for Time Series models Lecture 2

Statistics and econometrics

Summary of Chapters 7-9

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Lecture 12 November 3

Mathematical Statistics

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Central Limit Theorem ( 5.3)

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

DA Freedman Notes on the MLE Fall 2003

Topic 15: Simple Hypotheses

Topic 10: Hypothesis Testing

The outline for Unit 3

Hypothesis Testing - Frequentist

Testing and Model Selection

F79SM STATISTICAL METHODS

Topic 19 Extensions on the Likelihood Ratio

Greene, Econometric Analysis (6th ed, 2008)

Basic Concepts of Inference

2014/2015 Smester II ST5224 Final Exam Solution

Composite Hypotheses and Generalized Likelihood Ratio Tests

Statistics 135 Fall 2008 Final Exam

STAT 830 Hypothesis Testing

Lecture 21: October 19

Statistical Inference

Likelihood Ratio tests

Exercises Chapter 4 Statistical Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Math 494: Mathematical Statistics

Hypothesis Testing Chap 10p460

STAT 830 Hypothesis Testing

Statistical Inference

Master s Written Examination

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Introductory Econometrics

Spring 2012 Math 541B Exam 1

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Lecture 21. Hypothesis Testing II

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Lecture 32: Asymptotic confidence sets and likelihoods

7. Estimation and hypothesis testing. Objective. Recommended reading

More Empirical Process Theory

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Lectures 5 & 6: Hypothesis Testing

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Topic 10: Hypothesis Testing

Chapters 10. Hypothesis Testing

Applied Statistics Lecture Notes

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lecture 10: Generalized likelihood ratio test

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapters 10. Hypothesis Testing

Asymptotics for Nonlinear GMM

Interval Estimation. Chapter 9

6. MAXIMUM LIKELIHOOD ESTIMATION

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Notes on the Multivariate Normal and Related Topics

ECON 4160, Autumn term Lecture 1

Testing Restrictions and Comparing Models

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Generalized Linear Models

Institute of Actuaries of India

Testing for a Global Maximum of the Likelihood

6.1 Variational representation of f-divergences

Primer on statistics:

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Loglikelihood and Confidence Intervals

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Transcription:

Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of θ in Θ we consider the question whether θ Θ 0 Θ or θ Θ 1 = Θ Θ 0 is most supported by the observed data. The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation. This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many concepts too quickly just to be able to define the problem properly. This difficulty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable. 1 Testing: Definition and Concepts 1.1 The Decision Rule Let X be a random variables defined on the probability space (S, F, P( )) and consider the statistical model associated with X: (a) Φ = {f(x; θ), θ Θ}; (b) x = (X 1, X 2,..., X n ) is a random sample from f(x; θ). The problem of hypothesis testing is one of deciding whether or not some conjectures about θ of the form θ belongs to some subset Θ 0 of Θ is supported by the data x = (x 1, x 2,..., x n ). We call such a conjecture the null hypothesis and denoted it by H 0 : θ Θ 0, where if the sample realization x C 0 we accept H 0, if x C 1 we reject it. Since the observation space X R n, but both the acceptance region C 0 R 1 and rejection region C 1 R 1, we need a mapping from R n to R 1. Definition (The Test Statistics): The mapping τ( ) which enables us to define C 0 and C 1 from observation space 1

Ch. 5 1 TESTING: DEFINITION AND CONCEPTS X is called a test statistic, i.e. τ(x) : X R 1. Example: Let X be the random variables representing the marks achieved by students in an econometric { theory paper an let the statistical model be: (a) Φ = f(x; θ) = 1 8 exp ( [ 1 x θ ) ]} 2, θ Θ [0, 100]; 2π 2 8 (b) x = (X 1, X 2,..., X n ), n=40, is random sample from Φ. The hypothesis to be tested is H 0 : θ = 60 (i.e. X N(60, 64)), Θ 0 = {60} against H 1 : θ 60 (i.e. X N(µ, 64), µ 60), Θ 1 = [0, 100] {60}. Common sense suggests that if some good estimator of θ, say X n = (1/n) n i=1 x i for the sample realization x takes a value around 60 then we will be inclined to accept H 0. Let us formalise this argument: The accept region takes the form: 60 ε X n 60 + ε, ε > 0, or and C 0 = {x : X n 60 ε} C 1 = {x : X n 60 ε}, is the rejection region. Formally, if x C 1 (reject H 0 ) and θ Θ 0 (H 0 is true) we call it type I error; if x C 0 (accept H 0 ) and θ Θ 1 (H 0 is false) we call it type II error. The hypothesis to be tested is formally stated as follows: H 0 : θ Θ 0, Θ 0 Θ. 2 Copy Right by Chingnun Lee R 2006

Ch. 5 1 TESTING: DEFINITION AND CONCEPTS Against the null hypothesis H 0 we postulate the alternative H 1 which takes the form: H 1 : θ Θ 1 Θ Θ 0. It is important to note at the outset that H 0 and H 1 are in effect hypothesis about the distribution of the sample f(x, θ), i.e. H 0 : f(x, θ). θ Θ 0, H 1 : f(x, θ). θ Θ 1. In testing a null hypothesis H 0 against an alternative H 1 the issue is to decide whether the sample realization x support H 0 or H 1. In the former case we say that H 0 is accepted, in the latter H 0 is rejected. In order to be able to make such a decision we need to formulate a mapping which related Θ 0 to some subset of the observation space X, say C 0, we call an acceptance region, and its complement C 1 (C 0 C 1 = X, C 0 C 1 = ) we call the rejection region. The description of testable implications in the preceding paragraph suggests the subset of values in the null hypothesis is contained within the unrestricted set, i.e. Θ 0 Θ. In this way, the models are said to be nested. Now consider an alternative pairs of models: H 0 : f(x, θ) v.s. H 1 : g(x, ϑ). These two models are non-nested. We are concerned only with nested models in this chapter. 1.2 Type of Hypothesis Testing A hypothesis H 0 or H 1 is called simple if Θ 0 or Θ 1 contain just one point respectively. Otherwise, they are called composite. There are three types of hypothesis testing: (a). Simple Null and Simple Alternative. Ex.: H 0 : θ = θ 0 against H 1 : θ = θ 1. (b). Composite Null and Composite Alternative. Ex.: H 0 : θ θ 0 against H 1 : θ θ 0. (c). Simple Null and Composite Alternative. (i). One-sided alternative. Ex.: H 0 : θ = θ 0 against H 1 : θ > θ 0 (or θ < θ 0 ). (ii). Two-sided alternative. Ex.: H 0 : θ = θ 0 against H 1 : θ θ 0. 3 Copy Right by Chingnun Lee R 2006

Ch. 5 1 TESTING: DEFINITION AND CONCEPTS 1.3 Type I and Type II Errors The next question is how do we choose ε? If ε is to small we run the risk of rejecting H 0 when it is true; we call this type I error. On the other hand, if ε is too large we run the risk of accepting H 0 when it is false; we call this type II error. That is, if we were to choose ε too small we run a higher risk of committing a type I error than of committing a type II error and vice versa. That is, there is a trade off between the probability of type I error, i.e. P r(x C 1 ; θ Θ 0 ) = α, and the probability β of type II error, i.e. P r(x C 0 ; θ Θ 1 ) = β. Ideally we would like α = β = 0 for all θ Θ which is not possible for a fixed n. Moreover we cannot control both simultaneously because of the trade-off between them. The strategy adopted in hypothesis testing where a small value of α is chosen and for a given α, β is minimized. Formally, this amounts to choose α such that P r(x C 1 ; θ Θ 0 ) = α(θ) α for θ Θ 0, and P r(x C 0 ; θ Θ 1 ) = β(θ), is minimized for θ Θ 1 by choosing C 1 or C 0 appropriately. In the case of the above example if we were to choose α, say α = 0.05, then P r( X n 60 > ε; θ = 60) = 0.05. How do we determine ε, then? The only random variable involved in the statement is X and hence it has to be its sampling distribution. For the above probabilistic statement to have any operational meaning to enable us to determine ε, the distribution of Xn must be known. In the present case we know that ) X n N (θ, σ2 where σ2 n n = 64 40 = 1.6, 4 Copy Right by Chingnun Lee R 2006

Ch. 5 1 TESTING: DEFINITION AND CONCEPTS which implies that for θ = 60, (i.e. when H 0 is true) we can construct a test statistic τ(x) from sample x such that τ(x) = ( ) ( ) Xn θ Xn 60 = = 1.6 1.6 ( ) Xn 60 N(0, 1), and thus the distribution of τ( ) is known completely (no unknown parameters). When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine ε. In order to do this we need to relate X n 60 to τ(x) (a statistics) for which the distribution is known. The obvious way is to standardize the former. This suggests change the above probabilistic statement to the equivalent statement ( ) Xn 60 P r c α ; θ = 60 = 0.05 where c α = ε. The value of c α given from the N(0, 1) table is c α = 1.96. This in turn implies that the rejection region for the test is { C 1 = x : X } n 60 1.96 = {x : τ(x) 1.96} or C 1 = {x : X n 60 2.48}. That is, for sample realization x which give rise to X n falling outside the interval (57.52, 62.48) we reject H 0. Let us summarize the argument so far. We set out to construct a test for H 0 : θ = 60 against H 1 : θ 60 and intuition suggested the rejection region ( X n 60 ε). In order to determine ε we have to (a) Chose an α; and then (b) define the rejection region in terms some statistic τ(x). The latter is necessary to enable us to determine ε via some known distribution. This is the distribution of the test statistic τ(x) under H 0 (i.e. when H 0 is true). 5 Copy Right by Chingnun Lee R 2006

Ch. 5 1 TESTING: DEFINITION AND CONCEPTS The next question which naturally arise is: What do we need the probability of type II error β for? The answer is that we need β to decide whether the test defined in terms of C 1 (of course C 0 ) is a good or a bad test. As we mentioned at the outset, the way we decided to solve the problem of the trade-off between α and β was to choose a given small value of α and define C 1 so as to minimize β. At this stage we do not know whether the test defined above is a good test or not. Let us consider setting up the apparatus to enable us to consider the question of optimality. 6 Copy Right by Chingnun Lee R 2006

Ch. 5 2 OPTIMAL TESTS 2 Optimal Tests First we note that minimization of P r(x C 0 ) for all θ Θ 1 is equivalent to maximizing P r(x C 1 ) for all θ Θ 1. Definition (Power of the Test): The probability of reject H 0 when false at some point θ 1 Θ 1, i.e. P r(x C 1 ; θ = Θ 1 ) is called the power of the test at θ = θ 1. Note that P r(x C 1 ; θ = θ 1 ) = 1 P r(x C 0 ; θ = θ 1 ) = 1 β(θ 1 ). In the above example we can define the power of the test at some θ 1 Θ 1, say θ = 54, to be P r[( X n 60 )/ 1.96; θ = 54]. Under the alternative hypothesis that θ = 54, then it is true that X n 54 N(0, 1). We would like to know that the probability of the statistics constructed under the null hypothesis that X n 60 would fall in the rejection region; that is, the power of the test at θ = 54 to be ( ) ( ) Xn 60 Xn 54 (54 60) P r 1.96; θ = 54 = P r 1.96 ( ) Xn 54 (54 60) +P r 1.96 = 0.993. Hence, the power of the test defined by C 1 above is indeed very high for θ = 54. From this we know that to calculate the power of a test we need to know the distribution of the test statistics τ(x) under the alternative hypothesis. In this case it is the distribution of X n 54.1 Following the same procedure the power of the test defined by C 1 is as fol- 1 In the example above, the test statistic τ(x) have a standard normal distribution under both the null and the alternative hypothesis. However, it is quite often the case when it happen that a test statistics have a different distribution under the null and the alternative hypotheses. For example, the unit root test. See Chapter 21. 7 Copy Right by Chingnun Lee R 2006

Ch. 5 2 OPTIMAL TESTS lowing for all θ Θ 1 : P r( τ(x) 1.96; θ = 56) = 0.8849; P r( τ(x) 1.96; θ = 58) = 0.3520; P r( τ(x) 1.96; θ = 60) = 0.0500; P r( τ(x) 1.96; θ = 62) = 0.3520; P r( τ(x) 1.96; θ = 64) = 0.8849; P r( τ(x) 1.96; θ = 66) = 0.9973. As we can see, the power of the test increase as we go further away from θ = 60(H 0 ) and the power at θ = 60 equals the probability of type I error. This prompts us to define the power function as follows. Definition (Power Function): P(θ) = P r(x C 1 ), θ Θ is called the power function of the test defined by the rejection region C 1. Definition (Size of a Test): α = max θ Θ0 P(θ) is defined to be the size (or the significance level) of the test. 2 In the case where H 0 is simple, say θ = θ 0, then α = P(θ 0 ). Definition (Uniformly Most Powerful Test): A test of H 0 : θ Θ 0 against H 1 : θ Θ 1 as defined by some rejection region C 1 is said to be uniformly most powerful (UMP) test of size α if (a) max θ Θ0 P(θ) = α; (b) P(θ) P (θ) for all θ Θ 1, where P (θ) is the power function of any other test of size α. As will be seen in the sequential, no UMP tests exists in most situations of interest in practice. The procedure adopted in such cases is to reduce the class of all tests to some subclass by imposing some more criteria and consider the 2 That, the probability that P r(x C 1 ), but H 0 is correct, i.e. θ Θ 0. 8 Copy Right by Chingnun Lee R 2006

Ch. 5 2 OPTIMAL TESTS question of UMP tests within the subclass. Definition: A test of H 0 : θ Θ 0 against θ Θ 1 is said to be unbiased if max θ Θ0 P(θ) max θ Θ1 P(θ). In other word, a test is unbiased if it reject H 0 more often when it is false than when it is true. Collecting all the above concepts together we say that a test has been defined when the following components have been specified: (a) a test statistic τ(x); (b) the size of the test α; (c) the distribution of τ(x) under H 0 and H 1 ; (d) the rejection region C 1 (or, equivalently, C 0 ). The most important component in defining a test is the test statistics for which we need to know its distribution under both H 0 and H 1. Hence, constructing an optimal test is largely a matter of being able to find a statistic τ(x) which should have the following properties: (a) τ(x) depends on x via a good estimator of θ; and (b) the distribution of τ(x) under both H 0 and H 1 does not depend on any unknown parameters. We call such a statistic a pivot. Example: Assume a random sample of size 11 is drawn from a normal distribution N(µ, 400). In particular, y 1 = 62, y 2 = 52, y 3 = 68, y 4 = 23, y 5 = 34, y 6 = 45, y 7 = 27, y 8 = 42, y 9 = 83, y 10 = 56 and y 11 = 40. Test the null hypothesis that H 0 : µ = 55 versus H 1 : µ 55. Since σ 2 is known, the sample mean will distributed as Ȳ N(µ, σ 2 /n) N(µ, 400/11), therefore under H 0 : µ = 55, Ȳ N(55, 36.36) 9 Copy Right by Chingnun Lee R 2006

Ch. 5 2 OPTIMAL TESTS or Ȳ 55 36.36 N(0, 1). We accept H 0 when the test statistics τ(x) = (Ȳ 55)/ 36.36 lying in the interval C 0 = [ 1.96, 1.96] under the size of the test α = 0.05. Then We now have 11 i=1 y i = 532 and ȳ = 532 11 = 48.4. 48.4 55 36.36 = 1.01 which is in the accept region. H 0 : µ = 55. Therefore we accept the null hypothesis that Example: Assume a random sample of size 11 is drawn from a normal distribution N(µ, σ 2 ). In particular, y 1 = 62, y 2 = 52, y 3 = 68, y 4 = 23, y 5 = 34, y 6 = 45, y 7 = 27, y 8 = 42, y 9 = 83, y 10 = 56 and y 11 = 40. Test the null hypothesis that H 0 : µ = 55 versus H 1 : µ 55. Since σ 2 is unknown, the sample mean distributed as Ȳ N(µ, σ 2 /n), therefore under H 0 : µ = 55 Ȳ N(55, σ 2 /n) or Ȳ 55 σ2 /n N(0, 1), however it is not a pivotal test statistics since an unknown parameters σ 2. From the fact that (Y i Ȳ )2 /σ 2 χ 2 n 1 or s 2 (n 1)/σ 2 χ 2 n 1 where s 2 = n i=1 (Y i Ȳ )2 /(n 1) is an unbiased estimator of σ 2, 3 we have (Ȳ 55)/( σ 2 /n) (n 1)s2 /(n 1) = Ȳ 55 s2 /n t n 1. 3 See p. 22 of Chapter 3. 10 Copy Right by Chingnun Lee R 2006

Ch. 5 2 OPTIMAL TESTS We accept H 0 when the test statistics τ(x) = s Ȳ 55 2 /n C 0 = [ 2.23, 2.23]. lying in the interval We now have 11 i=1 y i = 532 11 i=1 y 2 i = 29000 and s 2 = y 2 i nȳ 2 10 = 29000 11(48.4)2 10 = 323.19. Then 48.4 55 323.19/11 = 1.2 which is also in the accept region C 0. Therefore we accept the null hypothesis that H 0 : µ = 55. 11 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES 3 Asymptotic Test Procedures As discussed in the last section, the main problem in hypothesis testing is to construct a test statistics τ(x) whose distribution we know under both the null hypothesis H 0 and the alternative H 1 and it does not depend on the unknown parameters θ. The first part of the problem, that of constructing τ(x), can be handled relatively easily using various methods (Neyman-Pearson likelihood ratio) when certain condition are satisfied. The second part of the problem, that of determining the distribution of τ(x) under both H 0 and H 1, is much more difficult to solve and often we have to resort to asymptotic theory. This amount to deriving the asymptotic distribution of τ(x) and using that to determine the rejection region C 1 (or C 0 ) and associated probabilities. 3.1 Asymptotic properties For a given sample size n, if the distribution of τ n (x) is not known (otherwise we use that), we do not know how good the asymptotic distribution of τ n (X) is an accurate approximation of its finite sample distribution. This suggest that when asymptotic results are used we should be aware of their limitations and the inaccuracies they can lead to. Consider the test defined by the rejection region C1 n = {x : τ n (x) c n }, and whose power function is P n (θ) = P r(x C1 n ), θ Θ. Since the distribution of τ n (X) is not known we cannot determine c n or P n. If the asymptotic distribution of τ n (x) is available, however we can use that instead to define c n from some fixed α and the asymptotic power function π(θ) = P r(x C1 ), θ Θ. In this sense we can think of {τ n (x), n 1} as a sequence of test statistics defining a sequence of rejection region {C1 n, n 1} with power function {P n (θ), n 1, θ Θ}. 12 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES Definition (Consistent of a Test): The sequence of tests for H 0 : θ Θ 0 against H 1 : θ Θ 1 defined by {C n 1, n 1} is said to be consistent of size α if and max π(θ) = α θ Θ 0 π(θ) = 1, θ Θ 1. Definition (Unbias of a Test): A sequence of test as defined above is said to be asymptotically unbiased of size α if and max π(θ) = α θ Θ 0 α < π(θ) < 1, θ Θ 1. 3.2 Three asymptotically equivalent test procedures In this section three general test procedures Holy Trinity which gives rise to asymptotically optimal tests will be considered: the likelihood ratio; W ald and Lagarange multiplier test. All three test procedures can be interpreted as utilizing the information incorporated in the log likelihood function in different but asymptotically equivalently ways. For expositional purpose the test procedures will be considered in the context of the simplest statistical model where (a). Φ = {f(x; θ), θ Θ} is the probability model; and (b). x (X 1, X 2,..., X n ) is a random sample, and we consider the simple null hypothesis that H 0 against H 1 : θ θ 0. : θ = θ 0, θ Θ R m 13 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES 3.2.1 The Likelihood Ratio Test (test statistic is calculated under both the null and the alternative hypothesis) Thee likelihood ratio test statistics takes the form λ(x) = where ˆθ is the MLE of θ. L(θ 0 ; x) max θ Θ L(θ; x) = L(θ 0; x) L(ˆθ; x), Under certain regularity conditions which include RC1-RC3 (see Chapter 3), log L(θ; x) can be expanded in a Taylor series at θ = ˆθ: log L(θ; x) log L(ˆθ; x) + log L (θ ˆθ) + 1 2 (θ ˆθ) 2 log L ˆθ ˆθ (θ ˆθ), (see Alpha Chiang (1984), p261; Lagranage Form of the Remainder). Since log L = 0, ˆθ being the FOC of the MLE, and ( ) 2 log L(ˆθ; p x) I n (θ), (see for example Greene, p. 131, 132) the above expansion can be simplified to log L(θ; x) = log L(ˆθ; x) 1 2 (ˆθ θ 0 ) I n (θ)(ˆθ θ). This implies that under the null hypothesis that H 0 : θ = θ 0 2 log λ(x) = 2[log L(ˆθ; x) L(θ 0 ; x)] = (ˆθ θ 0 ) I n (θ)(ˆθ θ 0 ). From the asymptotic properties of the MLE s it is known that under certain regularity conditions (ˆθ θ 0 ) N(0, In 1 (θ)). 14 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES Using this we can deduce that LR = 2 log λ(x) (ˆθ θ 0 ) I n (θ)(ˆθ θ 0 ) H 0 χ 2 (m), being a quadratic form in asymptotically normal random variables. 3.2.2 Wald test (test statistic is computed under H 1 ) Wald (1943), using the above approximation of 2 log λ(x), proposed an alternative test statistics by replacing I n (θ) with I n (ˆθ): given that I n (ˆθ) W = (ˆθ θ 0 ) I n (ˆθ)(ˆθ θ 0 ) H 0 χ 2 (m), p I n (θ). 4 3.2.3 The Lagrange Multiplier Test (test statistic is computed under H 0 ) Rao (1947) propose the LM test. Expanding score function of log L(θ; x) (i.e. log L(θ;x) ) around ˆθ we have: log L(θ; x) log L(ˆθ; x) + 2 log L(ˆθ; x) (θ ˆθ). As in the LR test, log L(ˆθ;x) = 0, and the above equation reduces to log L(θ; x) 2 log L(ˆθ; x) (θ ˆθ). Now we consider the test statistics under the null hypothesis H 0 : θ = θ 0 ( ) ( ) log L(θ0 ; x) log LM = In 1 L(θ0 ; x) (θ 0 ), which is equivalent to LM = (θ 0 ˆθ) 2 log L(ˆθ; x) In 1 (θ 0 ) 2 log L(ˆθ; x) (θ 0 ˆθ). 4 I n (ˆθ) can be anyone of the three estimators which estimate the asymptotic variance of the MLE. See section 3.3.6 of Chapter 3. 15 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES Given that 2 log L(ˆθ;x) LM = as in the proof of LR test. p I n (θ) and that under H 0, I n (θ 0 ) I n (θ), we have ( ) ( ) log L(θ0 ; x) log In 1 L(θ0 ; x) (θ 0 ) = (θ 0 ˆθ) I n (θ)(θ 0 ˆθ) χ 2 (m) p Example: Reproduce the results on Greene (4th.) P.157 example or Greene (5th.) p.490 Table 17.1. Example: Of particular interest in practice is the case where θ (θ 1, θ 2) and H 0 : θ 1 = θ 0 1 against H 1 : θ 1 θ 0 1, θ 1 is r 1 with θ 2 (m r) 1 left unrestricted. In this case the three test statistics takes the form where LR = 2(ln L( θ; x) ln L(ˆθ; x)); W = (ˆθ 1 θ 0 1) [I 11 (ˆθ) I 12 (ˆθ)I 1 22 (ˆθ)I 21 (ˆθ)]( ˆθ 1 θ 0 1 ) LM = µ( θ) [I 11 ( θ) I 12 ( θ)i 1 22 ( θ)i 21 ( θ)] 1 µ( θ), ˆθ ( ˆθ 1, ˆθ 2 ), θ (θ 0 1, θ2 ), θ 2 is the solution to ln(θ; x) 2 θ1 =θ 0 1 = 0, and µ( θ) = ln(θ; x) 1 θ1 =θ 0 1. Since Proof: ˆθ N(θ, I 1 (ˆθ)), under H 0 : θ 1 = θ 0 1, we have the results from partitioned inverse rule ˆθ 1 N(θ 0 1, [I 11(ˆθ) I 12 (ˆθ)I 1 22 (ˆθ)I 21 (ˆθ] 1 ) That is the Wald test statistics is W = ( ˆθ 1 θ 0 1 ) [I 11 (ˆθ) I 12 (ˆθ)I 1 22 (ˆθ)I 21 (ˆθ)]( ˆθ 1 θ 0 1 ). 16 Copy Right by Chingnun Lee R 2006

Ch. 5 3 ASYMPTOTIC TEST PROCEDURES By definition the LM test statistics is ( LM = ) ( log L( θ; x) I 1 n ( θ) ) log L( θ; x), but ( ) log L( θ; x) = log L θ1 1 =θ 0 1 θ2 = θ 2 log L 2 = ( µ( θ) 0 ), using partitioned inverse rule we have LM = µ( θ) [I 11 ( θ) I 12 ( θ)i 1 22 ( θ)i 21 ( θ)] 1 µ( θ). The proof of LR test statistics is straightforward. Reminder: for a general 2 2 partitioned matrix, [ A11 A 12 A 21 A 22 ] 1 [ = F 1 F 1 A 12 A 1 22 A 1 22 A 21 F 1 A 1 22 (I + A 21 F 1 A 12 A 1 22 ) ], where F 1 = (A 11 A 12 A 1 22 A 21 ) 1. 17 Copy Right by Chingnun Lee R 2006