Topic 5: Generalized Likelihood Ratio Test

Similar documents
Topic 3: Hypothesis Testing

Composite Hypotheses and Generalized Likelihood Ratio Tests

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Topic 2: Logistic Regression

Lecture 10: Generalized likelihood ratio test

Hypothesis Testing: The Generalized Likelihood Ratio Test

Statistical Inference

Math 494: Mathematical Statistics

Wald s theorem and the Asimov data set

Likelihood-based inference with missing data under missing-at-random

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Asymptotics for Nonlinear GMM

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Hypothesis testing: theory and methods

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

simple if it completely specifies the density of x

4.5.1 The use of 2 log Λ when θ is scalar

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

Lecture 7 Introduction to Statistical Decision Theory

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Lecture 21: October 19

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Mathematical statistics

Association studies and regression

Chapter 5: HYPOTHESIS TESTING

Homework 2 Solutions Kernel SVM and Perceptron

BTRY 4830/6830: Quantitative Genomics and Genetics

Evolutionary Models. Evolutionary Models

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

Bayesian Concept Learning

Likelihood-Based Methods

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Topic 2: Review of Probability Theory

Just Enough Likelihood

Parametric Models: from data to models

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Inferring from data. Theory of estimators

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Frequentist Statistics and Hypothesis Testing Spring

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Zero-Inflated Models in Statistical Process Control

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Discovery significance with statistical uncertainty in the background estimate

DA Freedman Notes on the MLE Fall 2003

STAT 461/561- Assignments, Year 2015

Machine Learning 4771

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Categorical Data Analysis Chapter 3

CSC321 Lecture 18: Learning Probabilistic Models

Some General Types of Tests

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Lecture 14 October 13

Stat 5101 Lecture Notes

Distribution Fitting (Censored Data)

A Very Brief Summary of Statistical Inference, and Examples

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CSE 546 Final Exam, Autumn 2013

Statistics 3858 : Contingency Tables

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Ch. 5 Hypothesis Testing

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Robustness and Distribution Assumptions

F79SM STATISTICAL METHODS

Statistical Distribution Assumptions of General Linear Models

Random Number Generation. CS1538: Introduction to simulations

Answer Key for STAT 200B HW No. 7

Stat 609: Mathematical Statistics I (Fall Semester, 2016) Introduction

Parametric Empirical Bayes Methods for Microarrays

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Recall the Basics of Hypothesis Testing

Statistics 511 Additional Materials

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

COMP90051 Statistical Machine Learning

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Practice Problems Section Problems

Qualifying Exam in Probability and Statistics.

Confidence intervals for parameters of normal distribution.

2014/2015 Smester II ST5224 Final Exam Solution

Introduction to Hidden Markov Models (HMMs)

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

4.2 Estimation on the boundary of the parameter space

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Mixture Models and Expectation-Maximization

Statistical Data Analysis Stat 3: p-values, parameter estimation

Detection theory 101 ELEC-E5410 Signal Processing for Communications

1 Hypothesis Testing and Model Selection

Transcription:

CS 885: Advanced Machine Learning Fall 7 Topic 5: Generalized Likelihood Ratio Test Instructor: Daniel L. Pimentel-Alarcón Copyright 7 5. Introduction We already started studying composite hypothesis problems of the form: H : x p x θ ), θ Θ, H : x p x θ ), θ Θ. 5.) where the parameters θ and/or θ are unknown. Example 5.. In Example 3.3 we wanted to determine whether two meteorites came from the same asteroid in space using the difference x of their magnesium composition. In Example 3.4 we wanted to determine whether a certain gene was associated with a disease, using the difference x of the gene s average activation levels between healthy and sick people. Both of these problems can be modeled as H : x N, ) H : x N µ, ) same asteroid/gene unrelated to disease, different asteroids/gene related to disease, where µ is unknown. Example 5.. In Example 4. we use an array of sensors to record small voltages generated by your brain and store them in a signal vector x R D. A machine phone, computer, server, etc.) should Figure 5.: Gene microarrays are data matrices indicating gene activation levels. Each row corresponds to one gene, and each column corresponds to one individual. We want to know which genes are related to a disease. See Example 5.. 5-

Topic 5: Generalized Likelihood Ratio Test 5-... Figure 5.: Sensors record small voltages generated by your brain and store them in a signal vector x R D. A machine phone, computer, server, etc.) should interpret x and skip the song if that is what you thought about. See Example 5.. interpret x and skip the song if that is what you thought about. This can be modeled as H : x N, I) H : x N µ, I) do nothing, skip song, where µ is unknown. In Section 3.7 we already studied the hypothesis problems in Example 5., which led us to the intuitive answer of Wald s test in Example 3.4. We now generalize this using our knowledge from estimation theory to obtain the generalized likelihood ratio test GLRT). Definition 5. Generalized likelihood ratio test GLRT)). Consider a hypothesis problem as in 5.), where θ and/or θ are unknown. The generalized likelihood ratio statistic is defined as: ˆΛx) := max p x θ ) θ Θ max p x θ ), θ Θ and the generalized likelihood ratio test is defined as ˆΛx) H H τ. The idea behind the GLRT is actually quite simple: if you don t now a parameter, then first estimate it using maximum likelihood, and then use it as if it were the true parameter in a likelihood ratio test LRT). Example 5.3 Derivation of Wald s test as a GLRT). Let us show that Wald s test is just one particular case of the GLRT. Consider The setup in Example 5.. Here θ is known, and θ = µ, so Θ = R.

Topic 5: Generalized Likelihood Ratio Test 5-3 Then arg max θ Θ p x θ ) µ R µ R µ R p x µ ) µ R ) log e x µ e x µ µ R }{{} x µ µ x µ R constant log } {{ } constant x µ To find this maximum we use our usual tricks: take derivative with respect to w.r.t.) µ, set to zero and solve for µ. Which yields ˆµ = x. Then µ µ x = µ x) = ˆΛx) = max p x θ ) θ Θ max p x θ ) = θ Θ max p x µ ) µ R p x θ ) = p x ˆµ ) p x θ ) = e x ˆµ = e x x e x e x = e x, and our GLRT is e x H H τ. Taking log on both sides this becomes x x H H τ, where τ = log τ, which is precisely Wald s test from Example 3.4. H H log τ, or equivalently 5. Asymptotics Recall that in hypothesis testing we often want to bound the probability p of deciding H when H was true, and so we select our threshold test τ accordingly. For instance, in Wald s test, p = P x > τ ) given that H is true. Since H : x N, ), if we want p < α, all we need to do is find the τ such that the probability of the tails of a N, ) is α see Figure 5.3 to build some intuition). We can do this because we know the distribution of x. Similarly, if we had a test like: N i= x i H τ, 5.) H where H : x i N, ), we would also know that N i= x i χ N). Again, if we wanted p < α, all we would need to do is find the τ such that the probability of the tail of a χ N) is α see Figure 5.3 for more intuition). However, we don t always know the distribution of our test. For example, if we had the same test in 5.), but with H : x i Poisson, or Cauchy, or Weibull, then what is the distribution of N i= x i? The following theorem states that if N is large enough, we don t need to worry about it, because under H, the GLRT is

Topic 5: Generalized Likelihood Ratio Test 5-4 Figure 5.3: We want our test to have p < α. Left: If our test is x H that the probability of the tails of a N, ) is α. Right: if our test is N i= x i H H τ and x N, ), then we need to find the τ such H τ and x i N, ), then we need to find the τ such that the probability that the tail of a χ N) is α. We can do this because we know the distribution of our test. What happens if we don t? Wilks Theorem gives us an answer. asymptotically distributed χ N). Theorem 5. Wilks Theorem Asymptotic distribution of the GLRT). Consider a composite hypothesis problem of the form: H : x,..., x N H : x,..., x N px θ ), θ R K, px θ ), θ R K, where p has the same form in both hypotheses, and the K unknown parameters in θ are a subset of the K unknown parameters in θ these are called [ nested hypotheses). ] Suppose that for every k, l {,..., K}, px θ) θ k and px θ) θ k θ l exist, and that E log px θ) θ k = this guarantees that the MLE ˆθ ML converges to the true θ ). Then under H, log ˆΛx) N χ K K ). Proof. The proof follows as a consequence of the central limit theorem. For a detailed proof see Theorems.3. and.3.3 in Statistical Inference by George Casella and Roger L. Berger, second edition. Example 5.4. Consider H : x,..., x N H : x,..., x N N µ, ), N µ, ), >.

Topic 5: Generalized Likelihood Ratio Test 5-5 The MLE of is arg max θ Θ p x θ ) R + R + R + log p x ) R + ) N e x µ) T x µ) ) N e log π) }{{ N } constant x µ) T x µ) ) log N + log e x µ) T x µ) N R + log x µ) T x µ) Taking derivative w.r.t. and setting to zero we have: x µ) T x µ) N 4 = ) x µ) T x µ) N =. Solving for we obtain the MLE: ˆ = N x µ)t x µ) = N Then N x i µ. i= ˆΛx) = max p x θ ) θ Θ max p x θ ) = θ Θ max p x ) R + p x θ ) = p x ˆ ) p x ) = πˆ e ˆ x µ) T x µ) ) N e x µ) T x µ) )N = ) Ne ˆ ˆ Nˆ {}}{ ) x µ) T x µ) = ˆ ) N e N ˆ ). Taking log on both sides we obtain: log ˆΛx) = N log ˆ {}} { N log N N xi µ i= } {{ } H : χ N) N ˆ {}}{ N xi µ + N. i= } {{ } H : χ N) Now the question is: do you know the distribution of log χ N)+χ N)? Sure as hell I don t! Luckily, there are no unknown parameters under H, and one unknown parameter under H namely ), so these are nested hypotheses, and hence can use Theorem 5. to know that under H, log ˆΛx) N χ ). Hence, if we want p < α, it suffices to select the threshold τ for which the tail probability of a χ ) random variable is α.

Topic 5: Generalized Likelihood Ratio Test 5-6 Figure 5.4: Results of the simulation of Example 5.4. In black is the histogram of log ˆΛx) over T = trials. In blue is the asymptotic distribution of log ˆΛx). In red is the probability of error p set to α =.5. The code for this simulation is in the appendix. 5.3 Experiments Let us now run some simulations to verify Theorem 5. and our results from Example 5.4. We will generate x,..., x N N µ, ) and we will compute log ˆΛx) as in Example 5.4. We will repeat this T trials, plot a histogram, and compare it with the χ ) distribution predicted by Theorem 5.. Figure 5.4 shows some results. The code for this simulation is in the appendix.

Topic 5: Generalized Likelihood Ratio Test 5-7 A Code for Simulation of Example 5.4 clear all; close all; clc; warning'off','all'); 3 % === Code to simulate the asymptotic distribution of log Lambda hat === 4 % ===== See Example 5.4. 5 6 T = ; % Number of trials. 7 N = ; % Number of samples. 8 mu = 5; % Known mean under both hypotheses. 9 sigma star = ; % True standard deviation of X i under H. sigma star = ; % True standard deviation of X i under H. alpha =.5; % Allowed probability of error ). 3 log Lambda hat = zerost,); %log Lambda hat over trials. 4 for t=:t, 5 X = randnn,)+mu) * sigma star; 6 7 log Lambda hatt) = -N/ * log /N * sumx-mu)/sigma star).ˆ) )... 8 + / * sumx-mu)/sigma star).ˆ)... 9 - N/; end % Plot results. 3 figure); 4 axes'box','on'); 5 hold on; 6 [h,rangeh] = hist*log Lambda hat); 7 stemrangeh,h/t,'k','linewidth',4,'markersize',); 8 9 % Asymptotic distribution of *log Lambda hat. 3 rangeh = minrangeh):.:maxrangeh); 3 chi = chipdfrangeh,); 3 plotrangeh,chi,'b','linewidth',4); 33 34 % Highlight the probability alpha. 35 tau = chiinv-alpha,); 36 rangeh = tau:.5:maxrangeh); 37 chi = chipdfrangeh,); 38 stemrangeh,chi,'r','linewidth',5,'markersize',); 39 4 % Make figure look sexy. 4 axis tight; 4 setgca,'xtick',[],'xticklabel',[]); 43 setgca,'ytick',[],'yticklabel',[]); 44 ylabel'$\mathsf{p}\log\hat{\lambda}$\textbf{x}$)$','interpreter','latex','fontsize',5); 45 xlabel'$\log\hat{\lambda}$\textbf{x}$)$','interpreter','latex','fontsize',3); 46 47 % Save figure. 48 setgcf, 'renderer','default'); 49 figurename = 'wilks.pdf'; 5 saveasgcf,figurename);