Some General Types of Tests

Similar documents
Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Lecture 17: Likelihood ratio and asymptotic tests

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

simple if it completely specifies the density of x

Ch. 5 Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Lecture 32: Asymptotic confidence sets and likelihoods

Chapter 4. Theory of Tests. 4.1 Introduction

Lecture 26: Likelihood ratio tests

Statistics and econometrics

8. Hypothesis Testing

Master s Written Examination

Chapter 9: Hypothesis Testing Sections

STAT 461/561- Assignments, Year 2015

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

A Very Brief Summary of Statistical Inference, and Examples

Statistical Inference

DA Freedman Notes on the MLE Fall 2003

TUTORIAL 8 SOLUTIONS #

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

2014/2015 Smester II ST5224 Final Exam Solution

Central Limit Theorem ( 5.3)

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

To appear in The American Statistician vol. 61 (2007) pp

Stat 710: Mathematical Statistics Lecture 27

Generalized Linear Models

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Statistics 135 Fall 2008 Final Exam

Lecture 10: Generalized likelihood ratio test

The Multinomial Model

Chapter 7. Hypothesis Testing

Exercises Chapter 4 Statistical Hypothesis Testing

Exercises and Answers to Chapter 1

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Association studies and regression

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

The outline for Unit 3

A Very Brief Summary of Statistical Inference, and Examples

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

BTRY 4090: Spring 2009 Theory of Statistics

Testing Statistical Hypotheses

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Greene, Econometric Analysis (6th ed, 2008)

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Testing Linear Restrictions: cont.

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Lecture 21: October 19

Testing and Model Selection

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Master s Written Examination - Solution

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Hypothesis Testing (May 30, 2016)

Introduction to Estimation Methods for Time Series models Lecture 2

Institute of Actuaries of India

Math 494: Mathematical Statistics

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Lecture 7 Introduction to Statistical Decision Theory

Lecture 8: Information Theory and Statistics

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Hypothesis testing: theory and methods

Lecture 12 November 3

Lecture 13: p-values and union intersection tests

Testing Statistical Hypotheses

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Empirical Likelihood

Topic 19 Extensions on the Likelihood Ratio

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Topic 15: Simple Hypotheses

Multivariate Time Series: VAR(p) Processes and Models

Mathematical Statistics

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Pubh 8482: Sequential Analysis

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Composite Hypotheses and Generalized Likelihood Ratio Tests

Math 494: Mathematical Statistics

Stat 5102 Final Exam May 14, 2015

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Alpha-Investing. Sequential Control of Expected False Discoveries

Fundamental Probability and Statistics

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Statistical Data Analysis Stat 3: p-values, parameter estimation

Topic 3: Hypothesis Testing

Linear Models and Estimation by Least Squares

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

MS&E 226: Small Data

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Transcription:

Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

Likelihood Ratio Tests We see that the Neyman-Pearson Lemma leads directly to use of the ratio of the likelihoods in constructing tests. Now we want to generalize this approach and to study the properties of tests based on that ratio. Although as we have emphasized, the likelihood is a function of the distribution rather than of the random variable, we want to study its properties under the distribution of the random variable.

Likelihood Ratio Tests Using the idea of the ratio as in the test of H 0 : θ Θ 0, but inverting that ratio and including both hypotheses in the denominator, we define the likelihood ratio as λ(x) = sup θ Θ 0 L(θ; X) sup θ Θ L(θ; X). The test rejects H 0 if λ(x) < c, where c is some value in [0, 1]. (The inequality goes in the opposite direction because we have inverted the ratio.) Tests such as this are called likelihood ratio tests.

Likelihood Ratio Tests We should note that there are other definitions of a likelihood ratio; in particular, in TSH3 its denominator is the sup over the alternative hypothesis. If the alternative hypothesis does not specify Θ Θ 0, such a definition requires specification of both H 0, and H 1 ; whereas the definition above requires specification only of H 0. The likelihood ratio may not exist, but if if is well defined, clearly it is in the interval [0, 1], and values close to 1 provide evidence that the null hypothesis is true, and values close to 0 provide evidence that it is false.

Asymptotic Likelihood Ratio Tests Some of the most important properties of LR tests are asymptotic ones. There are various ways of using the likelihood to build practical tests. Some are asymptotic tests that use MLEs (or RLEs).

Asymptotic Properties of Tests For use of asymptotic approximations for hypothesis testing, we need a concept of asymptotic significance. We assume a family of distributions P, a sequence of statistics {δ n } based on a random sample X 1,..., X n. In hypothesis testing, the standard setup is that we have an observable random variable with a distribution in the family P. Our hypotheses concern a specific member P P. We want to test H 0 : P P 0 versus H 1 : P P 1, where P 0 P, P 1 P, and P 0 P 1 =.

Asymptotic Properties of Tests Letting β(δ n, P ) = Pr(δ n = 1), we define lim sup n β(δ n, P ) P P 0, if it exists, as the asymptotic size of the test. If lim sup n β(δ n, P ) α P P 0, then α is an asymptotic significance level of the test. δ n is consistent for the test iff lim sup n β(δ n, P ) = 0 P P 1. δ n is Chernoff-consistent for the test iff δ n is consistent and furthermore, lim sup n β(δ n, P ) = 0 P P 0. The asymptotic distribution of a maximum of a likelihood is a chi-squared and the ratio of two is asymptotically an F.

Regularity Conditions The interesting asymptotic properties of LR tests depend on the Le Cam regularity conditions, which go slightly beyond the Fisher information regularity conditions. These are the conditions to ensure that superefficiency can only occur over a set of Lebesgue measure 0 (Shao Theorem 4.16), the asymptotic efficiency of RLEs (Shao Theorem 4.17), and the chi-squared asymptotic significance of LR tests (Shao Theorem 6.5).

Asymptotic Significance of LR Tests We consider a general form of the null hypothesis, versus the alternative H 0 : R(θ) = 0 H 1 : R(θ) 0, for a continuously differential function R(θ) from IR k to IR r. (Shao s notation, H 0 : θ = g(ϑ) where ϑ is a (k r)-vector, although slightly different, is equivalent.)

Asymptotic Significance of LR Tests The key result is Theorem 6.5 in Shao, which, assuming the Le Cam regularity conditions, says that under H 0, 2 log(λ n ) d χ 2 r, where χ 2 r is a random variable with a chi-squared distribution with r degrees of freedom and r is the number of elements in R(θ). (In the simple case, r is the number of equations in the null hypothesis.) This allows us to determine the asymptotic significance of an LR test. It is also the basis for constructing asymptotically correct confidence sets.

Wald Tests and Score Tests There are two types of tests that arise from likelihood ratio tests. These are called Wald tests and score tests. Score tests are also called Rao test or Lagrange multiplier tests. These tests are asymptotically equivalent. They are consistent under the Le Cam regularity conditions, and they are Chernoff-consistent if α is chosen so that as n, α 0 and χ 2 r,α n = o(n).

Wald Tests The Wald test uses the test statistics W n = ( R(ˆθ) ) ( ( S(ˆθ) ) ( I n (ˆθ) ) 1 S(ˆθ)) 1 R(ˆθ), where S(θ) = R(θ)/ θ and I n (θ) is the Fisher information matrix, and these two quantities are evaluated at an MLE or RLE ˆθ. The test rejects the null hypothesis when this value is large. Notice that for the simple hypothesis H 0 : θ = θ 0, this simplifies to (ˆθ θ 0 ) I n (ˆθ)(ˆθ θ 0 ).

Wald Tests An asymptotic test can be constructed because W n d Y, where Y χ 2 r and r is the number of elements in R(θ). (This is proved in Theorem 6.6 of Shao, page 434.) The test rejects at the α level if W n > χ 2 r,1 α, where χ2 r,1 α is the 1 α quantile of the chi-squared distribution with r degrees of freedom. (Note that Shao denotes this quantity as χ 2 r,α.)

Score Tests A related test is the Rao score test, sometimes called a Lagrange multiplier test. It is based on a MLE or RLE θ under the restriction that R(θ) = 0, and rejects H 0 when the following is large: R n = (s n ( θ)) ( I n ( θ) ) 1 sn ( θ), where s n (θ) = l L (θ)/ θ, and is called the score function. The information matrix can either be the Fisher information matrix (that is, the expected values of the derivatives) evaluated at the RLEs or the observed information matrix in which instead of expected values, the observed values are used. An asymptotic test can be constructed because R n d Y, where Y χ 2 r and r is the number of elements in R(θ). This is proved in Theorem 6.6 (ii) of Shao.

Example 1 tests in a linear model Consider a general regression model: X i = f(z i, β) + ɛ i, where ɛ i i.i.d. N(0, σ 2 ). For given k r matrix L, we want to test H 0 : Lβ = β 0. Let X be the sample (it s an n-vector). whose rows are the z i. Let Z be the matrix The log likelihood is log l(β; X) = c(σ 2 ) 1 2σ 2(X f(z, β))t (X f(z, β)). The MLE is the LSE, ˆβ.

Let β be the maximizer of the log likelihood under the restriction Lβ = β 0. The likelihood ratio is the same as the difference in the log likelihoods. The maximum of the unrestricted log likelihood (minus a constant) is the minimum of the residuals: 1 2σ 2(X f(z, ˆβ)) T (X f(z, ˆβ)) = 1 2σ 2SSE(ˆβ) and likewise, for the restricted: 1 2σ 2(X f(z, β)) T (X f(z, β)) = 1 2σ 2SSE( β).

Now, the difference, SSE(ˆβ) SSE( β) σ 2, has an asymptotic χ 2 (r) distribution. (Note that the 2 goes away.) We also have that SSE(ˆβ) σ 2 has an asymptotic χ 2 (n k) distribution. So for the likelihood ratio test we get an F -type statistic: (SSE(ˆβ) SSE( β))/r. SSE(ˆβ)/(n k)

Use unrestricted MLE ˆβ and consider Lˆβ β 0. V(ˆβ) ( J T f(ˆβ) J f(ˆβ)) 1 σ 2, and so where J f(ˆβ) V(Lˆβ) L ( J T f(ˆβ) J f(ˆβ)) 1 L T σ 2, is the n k Jacobian matrix. Hence, we can write an asymptotic χ 2 (r) statistic as (Lˆβ β 0 ) T (L ( ) ) 1 1 J T f(ˆβ) J f(ˆβ) L T s 2 (Lˆβ β 0 )

We can form a Wishart-type statistic from this. If r = 1, L is just a vector (the linear combination), and we can take the square root and from a pseudo t : s L T ˆβ β 0 L T (J T f(ˆβ) J f(ˆβ) ) 1 L. Get MLE with the restriction Lβ = β 0 using a Lagrange multiplier, λ of length r. Minimize 1 2σ 2(X f(z, β))t (X f(z, β)) + 1 σ 2(Lβ β 0) T λ.

To minimize, differentiate and set = 0: J T f(ˆβ) (X f(z, ˆβ)) + L T λ = 0 Lˆβ β 0 = 0. J T f(ˆβ) (X f(z, ˆβ)) is called the score vector. It is of length k. Now V(X f(z, ˆβ)) σ 2 I n, so the variance of the score vector, and hence, also of L T λ, goes to σ 2 J T f(β) J f(β). (Note this is the true β in this expression.) Estimate the variance of the score vector with σ 2 J T f( β) J f( β), where σ 2 = SSE( β)/(n k + r). Hence, we use L T λ and its estimated variance (previous slide). Get ( 1 1 σ 2 λ T L J T f( β) f( β)) J L T λ

It is asymptotically χ 2 (r). This is the Lagrange multiplier form. Another form: Use J T f( β) (X f(z, β)) in place of L T λ. Get 1 σ 2(X f(z, β)) T J f( β) ( ) 1 J T f( β) J f( β) J T f( β) (X f(z, β)) This is the score form. Except for the method of computing it, it is the same as the Lagrange multiplier form. This is the SSReg in the AOV for a regression model.

Example 2 an anomalous score test Morgan, Palmer, and Ridout (2007) illustrate some interesting issues using a simple example of counts of numbers of stillbirths in each of a sample of litters of laboratory animals. They suggest that a zero-inflated Poisson is an appropriate model. This distribution is an ω mixture of a point mass at 0 and a Poisson distribution. The CDF (in a notation we will use often later) is P 0,ω (x λ) = (1 ω)p (x λ) + ωi [0, [ (x), where P (x) is the Poisson CDF with parameter λ. (Write the PDF (under the counting measure). Is this a reasonable probability model? What are the assumptions? Do the litter sizes matter?)

If we denote the number of litters in which the number of observed stillbirths is i by n i, the log-likelihood function is l(ω, λ) = n 0 log ( ω + (1 ω)e λ) + i=1 n i log(1 ω) i=1 n i λ+ i=1 in i log(λ) Suppose we want to test the null hypothesis that ω = 0. The score test has the form s T J 1 s, where s is the score vector and J is either the observed or the expected information matrix. For each we substitute ω = 0 and λ = ˆλ 0, where ˆλ 0 = i=1 in i /n with n = i=0 n i, which is the MLE when ω = 0.

Let n + = and d = n i i=1 i=0 in i. The frequency of 0s is important. Let f 0 = n 0 /n. Taking the derivatives and setting ω = 0, we have l ω = n 0e λ n, l λ = n + d/λ, 2 l ω 2 = n n 0e 2λ + n 0 e λ,

continued... and 2 l ωλ = n 0e λ, 2 l λ 2 = d/λ2. So, substituting the observed data and the restricted MLE, we have observed information matrix O(0, ˆλ 0 ) = n 1 + f 0e 2ˆλ 0 2f 0 eˆλ 0 f 0 eˆλ 0. f 0 eˆλ 0 1/ˆλ 0 Now, for the expected information matrix when ω = 0, we first observe that E(n 0 ) = ne λ, E(d) = nλ, and E(n + ) = n(1 e λ ); hence I(0, ˆλ 0 ) = n [ eˆλ 0 1 1 1 1/ˆλ 0 ].

continued... Hence, the score test statistic can be written as κ(ˆλ 0 )(n 0 eˆλ 0 n) 2, where κ(ˆλ 0 ) is the (1,1) element of the inverse of either O(0, ˆλ 0 ) or I(0, ˆλ 0 ).

Inverting the matrices (they are 2 2), we have as the test statistic for the score test, either or s O = s I = ne ˆλ 0 (1 θ) 2 1 e ˆλ 0 ˆλ 0 e ˆλ 0 ne ˆλ 0 (1 θ) 2 e ˆλ 0 + θ 2θe ˆλ 0 θ 2ˆλ 0 e ˆλ 0, where θ = f 0 eˆλ 0, which is the ratio of the observed proportion of 0 counts to the estimated probability of a zero count under the Poisson model. (If n 0 is actually the number expected under the Poisson model, then θ = 1.)

Now consider the actual data reported by Morgan, Palmer, and Ridout (2007) for stillbirths in each litter of a sample of 402 litters of laboratory animals. No. stillbirths 0 1 2 3 4 5 6 7 8 9 10 11 No. litters 314 48 20 7 5 2 2 1 2 0 0 1 For these data, we have n = 402, d = 185, ˆλ 0 = 0.4602, e ˆλ 0 = 0.6312, and θ = 1.2376. What is interesting is the difference in s I and s O. In this particular example, if all n i for i 1 are held constant at the observed values, but different values of n 0 are considered, as n 0 increases the ratio s I /s O increases from about 1/4 to 1 (when the n 0 is the expected number under the Poisson model; i.e., θ = 1), and then decreases, actually becoming negative (around n 0 = 100).

This example illustrates an interesting case. The score test is inconsistent because the observed information generates negative variance estimates at the MLE under the null hypothesis. The score test can also be inconsistent if the expected likelihood equation has spurious roots.

Sequential Tests In the simplest formulation of statistical hypothesis testing, corresponding to the setup of the Neyman-Pearson lemma, we test a given hypothesized distribution versus another given distribution. After setting some ground rules regarding the probability of falsely rejecting the null hypothesis, and then determining the optimal test in the case of simple hypotheses, we determined more general optimal tests in cases for which they exist, and for other cases, we determined optimal tests among classes of tests that had certain desirable properties. In some cases, the tests involved regions within the sample space in which the decision between the two hypotheses was made randomly; that is, based on a random process over and above the randomness of the distributions of interest.

Another logical approach to take when the data generated by the process of interest does not lead to a clear decision is to decide to take more observations. Recognizing at the outset that this is a possibility, we may decide to design the test as a sequential procedure. We take a small number of observations, and if the evidence is strong enough either to accept the null hypothesis or to reject it, the test is complete and we make the appropriate decision. On the other hand, if the evidence from the small sample is not strong enough, we take some additional observations and perform the test again. We repeat these steps as necessary.

Multiple Tests There are several ways to measure the error rate. Letting m be the number of tests, F be the number of false positives, T be the number of true positives, and S = F + T be the total number of discoveries, or rejected null hypotheses, we define error measures are the per comparison error rate, the family wise error rate, an the false discovery rate, PCER = E(F )/m; FWER = Pr(F 1); FDR = E(F/S).

The Benjamini-Hochberg (BH) method for controlling FDR works as follows. First, order the m p-values from the tests: p 1 p m. Then determine a threshold value for rejection by finding the largest integer j such that p j jα/m. Finally, reject any hypothesis whose p-value is smaller than or equal to p j. Benjamini and Hochberg (1995) prove that this procedure is guaranteed to force FDR α. Storey (2003) proposed use of the proportion of false positives for any hypothesis (feature) incurred, on average, when that feature defines the threshold value. The q-value can be calculated for each feature under investigation.