Composite Hypotheses and Generalized Likelihood Ratio Tests

Similar documents
10. Composite Hypothesis Testing. ECE 830, Spring 2014

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Topic 3: Hypothesis Testing

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Lecture 8: Signal Detection and Noise Assumption

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Topic 5: Generalized Likelihood Ratio Test

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

10. Linear Models and Maximum Likelihood Estimation

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

simple if it completely specifies the density of x

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Chapter 4. Theory of Tests. 4.1 Introduction

Mathematical Statistics

A Very Brief Summary of Statistical Inference, and Examples

Hypothesis testing: theory and methods

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lecture 10: Generalized likelihood ratio test

Parameter Estimation and Fitting to Data

DS-GA 1002 Lecture notes 10 November 23, Linear models

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Lecture 6: Gaussian Mixture Models (GMM)

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Ch. 5 Hypothesis Testing

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Detection theory. H 0 : x[n] = w[n]

Lecture 3. Inference about multivariate normal distribution

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Introductory Econometrics

ECE531 Lecture 4b: Composite Hypothesis Testing

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Lecture 22: Error exponents in hypothesis testing, GLRT

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Chapter 7. Hypothesis Testing

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Answer Key for STAT 200B HW No. 7

Generalized Linear Models (1/29/13)

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Exercises Chapter 4 Statistical Hypothesis Testing

Testing Algebraic Hypotheses

Lecture 7 Introduction to Statistical Decision Theory

Hypothesis Testing: The Generalized Likelihood Ratio Test

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 26: Likelihood ratio tests

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Computational functional genomics

Parametric Techniques Lecture 3

Lecture 21: October 19

4.5.1 The use of 2 log Λ when θ is scalar

Likelihood-Based Methods

STAT 730 Chapter 4: Estimation

Chapter 9: Hypothesis Testing Sections

Lecture 32: Asymptotic confidence sets and likelihoods

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Parametric Techniques

Lecture 21. Hypothesis Testing II

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Performance Evaluation and Comparison

If we want to analyze experimental or simulated data we might encounter the following tasks:

Math 152. Rumbos Fall Solutions to Assignment #12

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

DS-GA 1002 Lecture notes 12 Fall Linear regression

Topic 19 Extensions on the Likelihood Ratio

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Lecture 8: Information Theory and Statistics

MS&E 226: Small Data

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Asymptotic Tests and Likelihood Ratio Tests

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

1 Degree distributions and data

Introduction to Estimation Methods for Time Series models Lecture 2

Probability Theory and Statistics. Peter Jochumzen

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Rowan University Department of Electrical and Computer Engineering

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Passive Sonar Detection Performance Prediction of a Moving Source in an Uncertain Environment

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

14.30 Introduction to Statistical Methods in Economics Spring 2009

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Summary of Chapters 7-9

Lecture : Probabilistic Machine Learning

Asymptotics for Nonlinear GMM

Lecture 5: GPs and Streaming regression

DETECTION theory deals primarily with techniques for

Bayesian Decision Theory

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

MAS223 Statistical Inference and Modelling Exercises

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

LECTURE NOTE #3 PROF. ALAN YUILLE

Lecture 12 November 3

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Transcription:

Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters or other characteristics. Here are a few motivating examples. Example: Unknown amplitudes/delays in wireless communications. We don t always know how many relays a signal will go through, how strong the signal will be at each receiver, the distance between relay stations, etc. Example: Unknown signal amplitudes in functional brain imaging. Example: Unknown expression levels in gene microarray experiments.

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypothesis Tests We can represent uncertainty by specifying a collection of possible models for each hypothesis. The collections are indexed by a parameter. : X p 0 (x θ 0 ), θ 0 Θ 0 H : X p (x θ ), θ Θ In general, the distributions p 0 and p may have different parametric forms. The sets Θ 0 and Θ represent the possible values for the parameters. If a set contains a single element (i.e., a single value for the parameter), then we have a simple hypothesis, as discussed in past lectures. When a set contains more than one parameter value, then the hypothesis is called a composite hypothesis, because it involves more than one model. The name is even clearer if we consider the following equivalent expression for the hypotheses above. Example: Brain imaging Recall the brain imaging problem. : X p 0, p 0 {p 0 (x θ 0 )} θ0 Θ 0 H : X p, p {p (x θ )} θ Θ : X N (0, ) H : X N (µ, ), µ > 0 but otherwise unknown equivalently X p, p {N (µ, )} µ>0 In this example, is simple and H is composite. Uniformly Most Powerful Tests Let us begin by considering special cases in which the usual likelihood ratio test is computable and optimal. Here is an example. Log LRT: n log i= n : x,, x n iid N (0, ) H : x,, x n iid N (µ, ), µ > 0 i= π e (x i µ) / π e (x i) / = = µ i= (x i µ) i= x i nµ + x i

Composite Hypotheses and Generalized Likelihood Ratio Tests 3 Test statistic: µ i= x i H γ i= x i H γ /µ = γ We were able to divide both sides by µ since µ > 0. We do not need to know the exact value of µ in order to compute the test i x H i γ for any value of γ. Let t = n i= x i denote the test statistic. It is easy to determine its distribution(s) under each hypothesis (a composite in the case of H ). : t N (0, n) H : t N (nµ, n) µ > 0 unknown Since distribution of t under is known, we can choose threshold to control P F A. ( ) γ P F A = Q n γ = nq (P F A ) This is optimal detector (most powerful) according to NP lemma. Several ROC curves corresponding to different values of the unknown parameter µ > 0 are depicted below. We cannot know which curve we are operating on, but we can choose a threshold for a desired P F A and the resulting P D is the best possible (for the unknown value of µ). In such cases we say that the test is uniformly most powerful, that is most powerful no matter what the value of the unknown parameter. Figure : ROC for various µ > 0 for the simple case. Definition: Uniformly Most Powerful Test A uniformly most powerful (UMP) test is a hypothesis test which has the greatest power (i.e. greatest probability of detection) among all possible tests yielding a given false alarm rate regardless of the underlying true parameter(s).

Composite Hypotheses and Generalized Likelihood Ratio Tests 4 3 Two-sided Tests To see how special the UMP condition is, consider the following simple generalization of the testing problems above. The log-likelihood ratio statistic is and the log-lrt has the form : x N (0, ) H : x N (µ, ), µ 0 (x µ) log Λ(x) = + x = µx µ / µ x µ / H γ. We can move the term µ / to the other side and absorb it into the threshold, but this leaves us with a test of the form µ x H γ. Since µ is unknown (and not necessarily positive) the test is uncomputable. How can we proceed? Look at two densities in the microarray experiment. Intuitively the test x H γ seems reasonable. This is called the Wald Test. The P F A of the Wald test can be seen below. P F A = Q(γ) γ = Q (P F A /) P D = e (x µ) / dx + π = γ γ µ N (0, )dy + γ µ = Q(γ µ) + Q(γ + µ). γ π e (x µ) / dx y = x µ N (0, )dy

Composite Hypotheses and Generalized Likelihood Ratio Tests 5 Figure : The P D depends on µ, which is unknown. Model µ as a deterministic, but unknown, parameter. Estimate µ from the data and plug the estimate into the LRT. Under H the distribution is X N (µ, ), so a natural estimate for µ is µ = x, the observation itself. The plugging this into the likelihood ratio yields Λ(x) = p(x µ) p(x 0) = exp( (x µ) /) = e x /. exp( x /) This is the generalized likelihood ratio. In effect, this compares the best fitting model in the composite hypothesis H with the model. Taking the log yields the test which is equivalent to the Wald test. log Λ(x) = x H γ, 4 The Generalized Likelihood Ratio Test (GLRT) Consider a composite hypothesis test of the form : X p 0 (x θ 0 ), θ 0 Θ 0 H : X p (x θ ), θ Θ The parametric densities p 0 and p need not have the same form. The generalized likelihood ratio test (GLRT) is a general procedure for composite testing problems. The basic idea is to compare the best model in class H to the best in, which is formalized as follows. Definition: Generalized Likelihood Ratio Test (GLRT)

Composite Hypotheses and Generalized Likelihood Ratio Tests 6 The GLRT based on an observation x of X is Λ(x) = max p (x θ ) θ Θ max p 0 (x θ 0 ) θ 0 Θ 0 H γ, or equivalently log Λ(x) H γ. Example: We observe a vector X R n, and consider two hypotheses: :X N (0, σ I n ) H :X N (Hθ, σ I n ). where σ > 0 is known, θ R k is unknown, and H R n k is known and full rank. The mean vector Hθ is a model for a signal that lies in the k-dimensional subspace spanned by the columns of H (e.g., a narrowband subspace, polynomial subspace, etc.). In other words, the signal has the representation s = k θ i h i, H = [h,..., h k ]. i= The null hypothesis is that no signal is present (noise only). The log likelihood ratio is Thus we may consider the test Λ(x) = σ ( (x Hθ) (x Hθ) x x ) = σ ( θ H x + θ H Hθ) θ H x H γ, but this test is not computable without knowledge of θ. Recall that H : x N (Hθ, σ I), θ R k H : x p, p {N (Hθ, σ I)} θ R k. We want to pick p in {N (Hθ, σ I)} that matches x the best. { p(x θ, H ) = exp } (πσ ) n/ σ (x Hθ) (x Hθ)

Composite Hypotheses and Generalized Likelihood Ratio Tests 7 Find θ that maximizes the likelihood of observing x: ˆθ = arg min(x Hθ) (x Hθ) θ R k }{{} Taking the gradient with respect to θ x Hθ. θ (x x θ H x + θ H Hθ) = 0 0 H x + H Hθ = 0 θ = (H H) H x Plugging ˆθ into the test statistic θ H x, we have ˆθ H x = x H(H H) H x H γ Recall that the projection matrix onto the subspace is defined as P H := H(H H) H log Λ(x) = σ x P H x = σ P Hx. Thus the GLRT computes the energy in the signal subspace and if the energy is large enough, then H is accepted. In other words, we are taking the projection of x onto H and measuring the energy. The expected value of this energy under (noise only) is E H0 [ PH X ] = kσ, since a fraction k/n of the total noise energy nσ falls into this subspace. The performance of the subspace energy detector can be quantified as follows. We choose a γ for the desired P F A : σ x P H x H γ What is the distribution of x P H x under? First use the decomposition P H = UU where U R n k with orthonormal columns spanning columns of H, and let y := U x. Then σ x P H x = σ x UU x = σ y y y N (0, σ U U) N (0, σ I k k ) y i /σ iid N (0, ) y y σ, i =,..., k χ k, chi-squared with k degrees of freedom

Composite Hypotheses and Generalized Likelihood Ratio Tests 8 Under, σ x P H x χ k = P F A = P(χ k > γ) To calculate the tails on χ k distributions you can software such as Matlab (chicdf(x,k), chiinv(γ,k), chicdf(x,k)). Remember the mean of a χ k distribution is k, so we want to choose a γ bigger than k to produce a small P F A. 5 Wilks Theorem Wilk s Theorem (938) Consider a composite hypothesis testing problem : X, X,..., X n iid p(x θ 0 ), where θ 0,,..., θ 0,l R are free parameters and θ 0,l+ = a l+,..., θ k = a k are fixed at the values a l+,..., a k H : X, X,..., X n iid p(x θ ), θ R k are all free parameters and the parametric density has the same form in each hypothesis. In this case family of models in is a subset of those in H, and we say that the hypotheses are nested. (This is a key condition that must hold for this theorem.) If the st and nd order derivatives of p(x θ i ) with respect to θ i exist and if E [ ] log p(x θi ) θ i = 0 (which guarantees that the MLE θ i θ i as n ), then the generalized likelihood ratio statistic, based on an observation X = (X,..., X n ), Λ n (X) = max p(x θ ) θ max p(x θ 0 ) θ 0 ()

Composite Hypotheses and Generalized Likelihood Ratio Tests 9 has the following asymptotic distribution under : log Λ(x) n χ k l i.e., log Λ(x) D χ k l Proof: (Sketch) under the conditions of the theorem, the log GLRT tends to the log GLRT in a Gaussian setting according to the Central Limit Theorem (CLT). Example: Nested Condition :x i iid N (0, ) H :x i iid N (0, σ ), i =,,..., n, σ > 0 unknown log LR: MLE of σ : i= ( ( log σ x i σ )) log GLR under : [ log ( n i= x i σ = n ) x i i= ( x i n n i= x i ) ] n χ