Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Similar documents
Some General Types of Tests

Lecture 17: Likelihood ratio and asymptotic tests

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Ch. 5 Hypothesis Testing

simple if it completely specifies the density of x

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Chapter 4. Theory of Tests. 4.1 Introduction

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Lecture 12 November 3

Lecture 26: Likelihood ratio tests

Lecture 32: Asymptotic confidence sets and likelihoods

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Statistical Inference

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples

Testing Statistical Hypotheses

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Statistics 135 Fall 2008 Final Exam

Stat 710: Mathematical Statistics Lecture 27

Chapter 9: Hypothesis Testing Sections

8. Hypothesis Testing

Master s Written Examination

Chapter 7. Hypothesis Testing

Central Limit Theorem ( 5.3)

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Statistics and econometrics

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

DA Freedman Notes on the MLE Fall 2003

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Generalized Linear Models

STAT 461/561- Assignments, Year 2015

ST5215: Advanced Statistical Theory

Exercises Chapter 4 Statistical Hypothesis Testing

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Testing Statistical Hypotheses

Hypothesis Testing (May 30, 2016)

INTERVAL ESTIMATION AND HYPOTHESES TESTING

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

2014/2015 Smester II ST5224 Final Exam Solution

Exercises and Answers to Chapter 1

Mathematical Statistics

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Association studies and regression

Lecture 7 Introduction to Statistical Decision Theory

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Math 494: Mathematical Statistics

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

TUTORIAL 8 SOLUTIONS #

Lecture 8: Information Theory and Statistics

The Multinomial Model

Review and continuation from last week Properties of MLEs

To appear in The American Statistician vol. 61 (2007) pp

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

BTRY 4090: Spring 2009 Theory of Statistics

Testing and Model Selection

Lecture 10: Generalized likelihood ratio test

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Hypothesis testing: theory and methods

Math 494: Mathematical Statistics

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Testing Linear Restrictions: cont.

Master s Written Examination - Solution

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Statistical Data Analysis Stat 3: p-values, parameter estimation

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Introduction to Estimation Methods for Time Series models Lecture 2

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Lecture 21: October 19

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Multivariate Time Series: VAR(p) Processes and Models

Lecture 2: Basic Concepts of Statistical Decision Theory

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Detection Theory. Composite tests

Hypothesis Testing - Frequentist

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Pubh 8482: Sequential Analysis

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Qualifying Exam in Probability and Statistics.

Composite Hypotheses and Generalized Likelihood Ratio Tests

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

The outline for Unit 3

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Lecture 21. Hypothesis Testing II

Institute of Actuaries of India

Qualifying Exam in Probability and Statistics.

Lecture 13: p-values and union intersection tests

Interval Estimation. Chapter 9

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Transcription:

Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or θ θ 2 versus H 1 : θ 1 < θ < θ 2. (3) two-sided; null in center H 0 : θ 1 θ θ 2 versus H 1 : θ < θ 1 or θ > θ 2. (4)

Whether or not a UMP exists for the last three kinds of hypotheses depends on the distribution and the regions Θ 0 and Θ 1. We usually cannot make blanket statements. In a one-parameter exponential family, the two-sided null case is addressed by Theorem 6.3 in Shao. Exercises 6.28 and 6.29 in Shao give examples for the two-sided alternative case.

One of the most interesting cases in which a UMP test cannot exist is when the alternative hypothesis is two-sided as in hypothesis (4). Hypothesis (4) is essentially equivalent to the pair of hypotheses with a simple null: H 0 : θ = θ 0 versus H 1 : θ θ 0. It is easy to see that no UMP test can exist for thess hypotheses, and you should reason through this statement to see that it is true. So what can we do? There are basically two ways of approaching the problem. We can add a requirement on the uniformity or we can introduce an additional criterion.

In point estimation when we realized we could not have an estimator that would uniformly minimize the risk, we required unbiasedness or invariance, or we added some global property of the risk, such as minimum averaged risk or minimum maximum risk. We might introduce similar criteria for the testing problem. First, let s consider a desirable property that we will call unbiasedness.

Unbiasedness Recall that there are a couple of standard definitions of unbiasedness. If a random variable X has a distribution with parameter θ, for a point estimator T (X) of an estimand g(θ) to be unbiased means that E θ (T (X)) = g(θ). Although no loss function is specified in this meaning of unbiasedness, we know that such an estimator minimizes the risk based on a squared-error loss function. (L-unbiased.)

Unbiasedness Another definition of unbiasedness is given with direct reference to a loss function. This is sometimes called L-unbiasedness. The estimator (or more generally, the procedure) T (X) is said to be L-unbiased under the loss function L, if for all θ and θ, E θ (L(θ, T (X))) E θ (L( θ, T (X))). Notice the subtle differences in this property and the property of an estimator that may result from an approach in which we seek a minimum-risk estimator; that is, an approach in which we seek to solve the minimization problem, min E θ(l(θ, T (X))) T for all θ. This latter problem does not have a solution.

UMP Unbiased Tests Unbiasedness in hypothesis testing is the property that the test is more likely to reject the null hypothesis at any point in the parameter space specified by the alternative hypothesis than it is at any point in the parameter space specified by the null hypothesis. More formally, the α-level test δ with power function β(θ) = E θ (δ(x)) of the hypothesis H 0 : θ Θ H0 versus H 1 : θ Θ H1 is said to be unbiased if β δ (θ) α θ Θ H0 and β δ (θ) α θ Θ H1. Notice that this unbiasedness depends not only on the hypotheses, but also on the significance level.

This definition of unbiasedness for a test is L-unbiasedness if the loss function is 0-1. In many cases of interest, the power function β δ (θ) is a continuous function of θ. In such cases, we may be particularly interested in the power on any common boundary point of Θ H0 and Θ H1, that is, B = Θ H0 Θ H1. The condition of unbiasedness of the definition implies that β δ (θ) = α for any θ B. We recognize this condition in terms of the similar regions that we have previously defined, and we immediately have Theorem 1 An unbiased test with continuous power function is similar on the boundary.

UMP Unbiased (UMPU) Tests We will be interested in UMP tests that are unbiased; that is, in UMPU tests. We first note that if an α-level UMP test exists, it is unbiased, because its power is at least as great as the power of the constant test (for all x), δ(x) = α. Hence, any UMP test is automatically UMPU. Unbiasedness becomes relevant when no UMP exists, such as when the alternative hypothesis is two-sided: H 0 : θ = θ 0 versus H 1 : θ θ 0. Hence, we may restrict our attention to tests with the desirable property of unbiasedness.

In the following we consider the hypothesis H 0 : θ Θ H0 versus H 1 : θ Θ H1, and we seek a test that is UMP within the restricted class of unbiased tests. We will also restrict our attention to hypotheses in which B = Θ H0 Θ H1, (5) and to tests with power functions that are continuous in θ. Theorem 2 Let δ(x) be an α-level test of hypotheses that satisfy (5) that is similar on B and that has continuous power function in θ Θ H0 Θ H1. If δ (X) is uniformly most powerful among such tests, then δ (X) is a UMPU test. Proof. Because δ (X) is uniformly at least as powerful as δ(x) α, δ (X) is unbiased, and hence δ (X) is a UMPU test.

Use of Theorem 2, when it applies, is one of the simplest ways of determining a UMPU test, or given a test, to show that it is UMPU. This theorem has immediate applications in tests of hypotheses in exponential families. Theorem 6.4 in Shao summarizes those results. Similar UMPU tests remain so in the presence of nuisance parameters.

Likelihood Ratio Tests We see that the Neyman-Pearson Lemma leads directly to use of the ratio of the likelihoods in constructing tests. Now we want to generalize this approach and to study the properties of tests based on that ratio. Although as we have emphasized, the likelihood is a function of the distribution rather than of the random variable, we want to study its properties under the distribution of the random variable.

Likelihood Ratio Tests Using the idea of the ratio as in the test of H 0 : θ Θ 0, but inverting that ratio and including both hypotheses in the denominator, we define the likelihood ratio as λ(x) = sup θ Θ 0 L(θ; X) sup θ Θ L(θ; X). The test rejects H 0 if λ(x) < c, where c is some value in [0, 1]. (The inequality goes in the opposite direction because we have inverted the ratio.) Tests such as this are called likelihood ratio tests.

Likelihood Ratio Tests We should note that there are other definitions of a likelihood ratio; in particular, in TSH3 its denominator is the sup over the alternative hypothesis. If the alternative hypothesis does not specify Θ Θ 0, such a definition requires specification of both H 0, and H 1 ; whereas the definition above requires specification only of H 0. The likelihood ratio may not exist, but if if is well defined, clearly it is in the interval [0, 1], and values close to 1 provide evidence that the null hypothesis is true, and values close to 0 provide evidence that it is false.

Asymptotic Likelihood Ratio Tests Some of the most important properties of LR tests are asymptotic ones. There are various ways of using the likelihood to build practical tests. Some are asymptotic tests that use MLEs (or RLEs).

Asymptotic Properties of Tests For use of asymptotic approximations for hypothesis testing, we need a concept of asymptotic significance. We assume a family of distributions P, a sequence of statistics {δ n } based on a random sample X 1,..., X n. In hypothesis testing, the standard setup is that we have an observable random variable with a distribution in the family P. Our hypotheses concern a specific member P P. We want to test H 0 : P P 0 versus H 1 : P P 1, where P 0 P, P 1 P, and P 0 P 1 =.

Asymptotic Properties of Tests Letting β(δ n, P ) = Pr(δ n = 1), we define lim sup n β(δ n, P ) P P 0, if it exists, as the asymptotic size of the test. If lim sup n β(δ n, P ) α P P 0, then α is an asymptotic significance level of the test. δ n is consistent for the test iff lim sup n β(δ n, P ) 0 P P 1. δ n is Chernoff-consistent for the test iff δ n is consistent and furthermore, lim sup n β(δ n, P ) 0 P P 0. The asymptotic distribution of a maximum of a likelihood is a chi-squared and the ratio of two is asymptotically an F.

Regularity Conditions The interesting asymptotic properties of LR tests depend on the Le Cam regularity conditions, which go slightly beyond the Fisher information regularity conditions. These are the conditions to ensure that superefficiency can only occur over a set of Lebesgue measure 0 (Shao Theorem 4.16), the asymptotic efficiency of RLEs (Shao Theorem 4.17), and the chi-squared asymptotic significance of LR tests (Shao Theorem 6.5).

Asymptotic Significance of LR Tests We consider a general form of the null hypothesis, versus the alternative H 0 : R(θ) = 0 H 1 : R(θ) 0, for a continuously differential function R(θ) from IR k to IR r. (Shao s notation, H 0 : θ = g(ϑ) where ϑ is a (k r)-vector, although slightly different, is equivalent.)

Asymptotic Significance of LR Tests The key result is Theorem 6.5 in Shao, which, assuming the Le Cam regularity conditions, says that under H 0, 2 log(λ n ) d χ 2 r, where χ 2 r is a random variable with a chi-squared distribution with r degrees of freedom and r is the number of elements in R(θ). (In the simple case, r is the number of equations in the null hypothesis.) This allows us to determine the asymptotic significance of an LR test. It is also the basis for constructing asymptotically correct confidence sets.

Wald Tests and Score Tests There are two types of tests that arise from likelihood ratio tests. These are called Wald tests and score tests. Score tests are also called Rao test or Lagrange multiplier tests. These tests are asymptotically equivalent. They are consistent under the Le Cam regularity conditions, and they are Chernoff-consistent if α is chosen so that as n, α 0 and χ 2 r,α n = o(n).

Wald Tests The Wald test uses the test statistics W n = ( R(ˆθ) ) ( ( S(ˆθ) ) ( I n (ˆθ) ) 1 S(ˆθ)) 1 R(ˆθ), where S(θ) = R(θ)/ θ and I n (θ) is the Fisher information matrix, and these two quantities are evaluated at an MLE or RLE ˆθ. The test rejects the null hypothesis when this value is large. Notice that for the simple hypothesis H 0 : θ = θ 0, this simplifies to (ˆθ θ 0 ) I n (ˆθ)(ˆθ θ 0 ).

Wald Tests An asymptotic test can be constructed because W n d Y, where Y χ 2 r and r is the number of elements in R(θ). (This is proved in Theorem 6.6 of Shao, page 434.) The test rejects at the α level if W n > χ 2 r,1 α, where χ2 r,1 α is the 1 α quantile of the chi-squared distribution with r degrees of freedom. (Note that Shao denotes this quantity as χ 2 r,α.)

Score Tests A related test is the Rao score test, sometimes called a Lagrange multiplier test. It is based on a MLE or RLE θ under the restriction that R(θ) = 0, and rejects H 0 when the following is large: R n = (s n ( θ)) ( I n ( θ) ) 1 sn ( θ), where s n (θ) = l L (θ)/ θ, and is called the score function. The information matrix can either be the Fisher information matrix (that is, the expected values of the derivatives) evaluated at the RLEs or the observed information matrix in which instead of expected values, the observed values are used. An asymptotic test can be constructed because R n d Y, where Y χ 2 r and r is the number of elements in R(θ). This is proved in Theorem 6.6 (ii) of Shao.

Example 1 tests in a linear model Consider a general regression model: X i = f(z i, β) + ɛ i, where ɛ i i.i.d. N(0, σ 2 ). For given k r matrix L, we want to test H 0 : Lβ = β 0. Let X be the sample (it s an n-vector). whose rows are the z i. Let Z be the matrix The log likelihood is log l(β; X) = c(σ 2 ) 1 2σ 2(X f(z, β))t (X f(z, β)). The MLE is the LSE, ˆβ.

Let β be the maximizer of the log likelihood under the restriction Lβ = β 0. The likelihood ratio is the same as the difference in the log likelihoods. The maximum of the unrestricted log likelihood (minus a constant) is the minimum of the residuals: 1 2σ 2(X f(z, ˆβ)) T (X f(z, ˆβ)) = 1 2σ 2SSE(ˆβ) and likewise, for the restricted: 1 2σ 2(X f(z, β)) T (X f(z, β)) = 1 2σ 2SSE( β).

Now, the difference, SSE(ˆβ) SSE( β) σ 2, has an asymptotic χ 2 (r) distribution. (Note that the 2 goes away.) We also have that SSE(ˆβ) σ 2 has an asymptotic χ 2 (n k) distribution. So for the likelihood ratio test we get an F -type statistic: (SSE(ˆβ) SSE( β))/r. SSE(ˆβ)/(n k)

Use unrestricted MLE ˆβ and consider Lˆβ β 0. V(ˆβ) ( J T f(ˆβ) J f(ˆβ)) 1 σ 2, and so where J f(ˆβ) V(Lˆβ) L ( J T f(ˆβ) J f(ˆβ)) 1 L T σ 2, is the n k Jacobian matrix. Hence, we can write an asymptotic χ 2 (r) statistic as (Lˆβ β 0 ) T (L ( ) ) 1 1 J T f(ˆβ) J f(ˆβ) L T s 2 (Lˆβ β 0 )

We can form a Wishart-type statistic from this. If r = 1, L is just a vector (the linear combination), and we can take the square root and from a pseudo t : s L T ˆβ β 0 L T (J T f(ˆβ) J f(ˆβ) ) 1 L. Get MLE with the restriction Lβ = β 0 using a Lagrange multiplier, λ of length r. Minimize 1 2σ 2(X f(z, β))t (X f(z, β)) + 1 σ 2(Lβ β 0) T λ.

To minimize, differentiate and set = 0: J T f(ˆβ) (X f(z, ˆβ)) + L T λ = 0 Lˆβ β 0 = 0. J T f(ˆβ) (X f(z, ˆβ)) is called the score vector. It is of length k. Now V(X f(z, ˆβ)) σ 2 I n, so the variance of the score vector, and hence, also of L T λ, goes to σ 2 J T f(β) J f(β). (Note this is the true β in this expression.) Estimate the variance of the score vector with σ 2 J T f( β) J f( β), where σ 2 = SSE( β)/(n k + r). Hence, we use L T λ and its estimated variance (previous slide). Get ( 1 1 σ 2 λ T L J T f( β) f( β)) J L T λ

It is asymptotically χ 2 (r). This is the Lagrange multiplier form. Another form: Use J T f( β) (X f(z, β)) in place of L T λ. Get 1 σ 2(X f(z, β)) T J f( β) ( ) 1 J T f( β) J f( β) J T f( β) (X f(z, β)) This is the score form. Except for the method of computing it, it is the same as the Lagrange multiplier form. This is the SSReg in the AOV for a regression model.

Example 2 an anomalous score test Morgan, Palmer, and Ridout (2007) illustrate some interesting issues using a simple example of counts of numbers of stillbirths in each of a sample of litters of laboratory animals. They suggest that a zero-inflated Poisson is an appropriate model. This distribution is an ω mixture of a point mass at 0 and a Poisson distribution. The CDF (in a notation we will use often later) is P 0,ω (x λ) = (1 ω)p (x λ) + ωi [0, [ (x), where P (x) is the Poisson CDF with parameter λ. (Write the PDF (under the counting measure). Is this a reasonable probability model? What are the assumptions? Do the litter sizes matter?)

If we denote the number of litters in which the number of observed stillbirths is i by n i, the log-likelihood function is l(ω, λ) = n 0 log ( ω + (1 ω)e λ) + i=1 n i log(1 ω) i=1 n i λ+ i=1 in i log(λ) Suppose we want to test the null hypothesis that ω = 0. The score test has the form s T J 1 s, where s is the score vector and J is either the observed or the expected information matrix. For each we substitute ω = 0 and λ = ˆλ 0, where ˆλ 0 = i=1 in i /n with n = i=0 n i, which is the MLE when ω = 0.

Let n + = and d = n i i=1 i=0 in i. The frequency of 0s is important. Let f 0 = n 0 /n. Taking the derivatives and setting ω = 0, we have l ω = n 0e λ n, l λ = n + d/λ, 2 l ω 2 = n n 0e 2λ + n 0 e λ,

continued... and 2 l ωλ = n 0e λ, 2 l λ 2 = d/λ2. So, substituting the observed data and the restricted MLE, we have observed information matrix O(0, ˆλ 0 ) = n 1 + f 0e 2ˆλ 0 2f 0 eˆλ 0 f 0 eˆλ 0. f 0 eˆλ 0 1/ˆλ 0 Now, for the expected information matrix when ω = 0, we first observe that E(n 0 ) = ne λ, E(d) = nλ, and E(n + ) = n(1 e λ ); hence I(0, ˆλ 0 ) = n [ eˆλ 0 1 1 1 1/ˆλ 0 ].

continued... Hence, the score test statistic can be written as κ(ˆλ 0 )(n 0 eˆλ 0 n) 2, where κ(ˆλ 0 ) is the (1,1) element of the inverse of either O(0, ˆλ 0 ) or I(0, ˆλ 0 ).

Inverting the matrices (they are 2 2), we have as the test statistic for the score test, either or s O = s I = ne ˆλ 0 (1 θ) 2 1 e ˆλ 0 ˆλ 0 e ˆλ 0 ne ˆλ 0 (1 θ) 2 e ˆλ 0 + θ 2θe ˆλ 0 θ 2ˆλ 0 e ˆλ 0, where θ = f 0 eˆλ 0, which is the ratio of the observed proportion of 0 counts to the estimated probability of a zero count under the Poisson model. (If n 0 is actually the number expected under the Poisson model, then θ = 1.)

Now consider the actual data reported by Morgan, Palmer, and Ridout (2007) for stillbirths in each litter of a sample of 402 litters of laboratory animals. No. stillbirths 0 1 2 3 4 5 6 7 8 9 10 11 No. litters 314 48 20 7 5 2 2 1 2 0 0 1 For these data, we have n = 402, d = 185, ˆλ 0 = 0.4602, e ˆλ 0 = 0.6312, and θ = 1.2376. What is interesting is the difference in s I and s O. In this particular example, if all n i for i 1 are held constant at the observed values, but different values of n 0 are considered, as n 0 increases the ratio s I /s O increases from about 1/4 to 1 (when the n 0 is the expected number under the Poisson model; i.e., θ = 1), and then decreases, actually becoming negative (around n 0 = 100).

This example illustrates an interesting case. The score test is inconsistent because the observed information generates negative variance estimates at the MLE under the null hypothesis. The score test can also be inconsistent if the expected likelihood equation has spurious roots.

Sequential Tests In the simplest formulation of statistical hypothesis testing, corresponding to the setup of the Neyman-Pearson lemma, we test a given hypothesized distribution versus another given distribution. After setting some ground rules regarding the probability of falsely rejecting the null hypothesis, and then determining the optimal test in the case of simple hypotheses, we determined more general optimal tests in cases for which they exist, and for other cases, we determined optimal tests among classes of tests that had certain desirable properties. In some cases, the tests involved regions within the sample space in which the decision between the two hypotheses was made randomly; that is, based on a random process over and above the randomness of the distributions of interest.

Another logical approach to take when the data generated by the process of interest does not lead to a clear decision is to decide to take more observations. Recognizing at the outset that this is a possibility, we may decide to design the test as a sequential procedure. We take a small number of observations, and if the evidence is strong enough either to accept the null hypothesis or to reject it, the test is complete and we make the appropriate decision. On the other hand, if the evidence from the small sample is not strong enough, we take some additional observations and perform the test again. We repeat these steps as necessary.

Multiple Tests There are several ways to measure the error rate. Letting m be the number of tests, F be the number of false positives, T be the number of true positives, and S = F + T be the total number of discoveries, or rejected null hypotheses, we define error measures are the per comparison error rate, the family wise error rate, an the false discovery rate, PCER = E(F )/m; FWER = Pr(F 1); FDR = E(F/S).

The Benjamini-Hochberg (BH) method for controlling FDR works as follows. First, order the m p-values from the tests: p 1 p m. Then determine a threshold value for rejection by finding the largest integer j such that p j jα/m. Finally, reject any hypothesis whose p-value is smaller than or equal to p j. Benjamini and Hochberg (1995) prove that this procedure is guaranteed to force FDR α. Storey (2003) proposed use of the proportion of false positives for any hypothesis (feature) incurred, on average, when that feature defines the threshold value. The q-value can be calculated for each feature under investigation.