Composite Hypotheses testing

Similar documents
3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Linear Approximation with Regularization and Moving Least Squares

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Statistical pattern recognition

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

First Year Examination Department of Statistics, University of Florida

Lecture 12: Classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Feb 14: Spatial analysis of data fields

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Goodness of fit and Wilks theorem

APPENDIX A Some Linear Algebra

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Error Probability for M Signals

Lecture Notes on Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

2.3 Nilpotent endomorphisms

Chapter 13: Multiple Regression

Professor Chris Murray. Midterm Exam

Chapter 11: Simple Linear Regression and Correlation

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Comparison of Regression Lines

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Negative Binomial Regression

Chapter 7 Channel Capacity and Coding

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Convergence of random processes

Basic Statistical Analysis and Yield Calculations

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

STAT 3008 Applied Regression Analysis

Estimation: Part 2. Chapter GREG estimation

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

Solutions Homework 4 March 5, 2018

Lecture 3: Shannon s Theorem

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

/ n ) are compared. The logic is: if the two

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Economics 130. Lecture 4 Simple Linear Regression Continued

Communication with AWGN Interference

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Maximum Likelihood Estimation (MLE)

Lecture 3 Stat102, Spring 2007

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Digital Modems. Lecture 2

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Chapter 7 Channel Capacity and Coding

x i1 =1 for all i (the constant ).

Lecture 3: Probability Distributions

Limited Dependent Variables

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

The written Master s Examination

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

10-701/ Machine Learning, Fall 2005 Homework 3

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

The Geometry of Logit and Probit

Classification as a Regression Problem

für Mathematik in den Naturwissenschaften Leipzig

Statistics for Economics & Business

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Multi-dimensional Central Limit Theorem

Uncertainty as the Overlap of Alternate Conditional Distributions

Unified Subspace Analysis for Face Recognition

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Notes on Frequency Estimation in Data Streams

Laboratory 1c: Method of Least Squares

Tracking with Kalman Filter

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

Affine transformations and convexity

Stat 543 Exam 2 Spring 2016

Homework Notes Week 7

Chapter 8 Indicator Variables

Econometrics of Panel Data

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Explaining the Stein Paradox

Transcription:

Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter space χ). The hypothess corresponds to subsets of χ. The probablty densty coverng the mappng from the parameter space to the observaton space s denoted by p ( R ) r θ θ and s assumed to be known for all values of θ n χ. The fnal component s a decson rule.

Source Probablstc transton mechansm Decson M M Observaton space χ R Observaton space Source θ Probablstc transton mechansm Observaton space Observaton space χ Decson rule Composte hypothess testng problem Decson

Example: For two hypothess the observed varable wll be: R : p ( R ) = exp r πσ σ ( R M) : p ( R ) = exp, M M M r πσ σ.

Baysan formulaton of the composte hypothess testng problem. We assume that the parameter s a random varable θ takng on the values n χ. Random varable θ The known probablty densty on θ enables us to reduce the problem to a smple hypothess-testng problem by ntegratng over θ. ( θ) p r θ Λ ( R) R s nterpreted as the condtonal dstrbuton of R gven θ. p r ( R ) ( R ) = χ ( R ) ( ) p θ p θ dθ r θ θ ( R ) ( ) p p θ p θ dθ r χ r θ θ

Example We assume that the probablty densty governng m on s M p ( M ) = exp, M m < < πσ σ. The lkelhood rato becomes: Λ ( R) = m m ( R M) M exp exp dm πσ σ πσ m σm R exp πσ σ By ntegratng and takng logarthm η

( ) log log m m m R σ σ σ σ η σ σ + + +

When θ s a random varable wth an unknown densty, the test procedure s not clearly specfed. - Mnmax test over the unknown densty. - To try several denstes based on partal knowledge of θ that s avalable. In many cases the test structure wll be nsenstve to the detaled behavor of the probablty densty.

θ nonrandom varable Because θ has no probablty densty over whch to average the Bayes test n not meanngful. (We use Neyman-Pearson tests). Over all possble detectors that have a gven P F the one that yelds the hghest P D s called Unform Most Powerful (UMP) test. The best performance we could acheve would be obtaned f an actual test curve equals the bound for all M χ. For gven P F a unform most powerful UMP test to exst: An UMP exst f we are able to desgn a complete lkelhood rato test (ncludng the threshold) for every M χ wthout knowng M. In general the bound can be reached for any partcular θ smply by desgnng an ordnary LRT for that partcular θ. The UMP test must be as good as any other test for every θ.

p ( R ) r R P F ( =.5, ) p R M r ( =, ) p R M r R P for M =.5 P for M = D D

A necessary and suffcent condton for UMP. A UMP test exst f and only f the lkelhood rato test for every θ χcan be completely defned (ncludng threshold) wthout knowledge of θ. If UMP does not exst. Generalzed lkelhood rato test. The perfect measurement bound suggests that a logcal procedure s to estmate θ assumng s true, then estmate θ assumng s true, and use these estmates n a lkelhood rato test as f they were correct. max ( ) p R θ r θ θ Λ g ( R) = γ p R θ max θ r θ ( )

where θ ranges over all θ n and θ ranges over all θ n We make a ML estmate of θ, assumng that s true. We then R θ = θ ˆ and use ths value n numerator. p θ θ for evaluate ( ) r A test contans a nusance parameter. We are not drectly concerned wth the parameter t enters nto the problem snce t affects the PDF under and. The GLRT decdes f the ft to the data of the sgnal under produces a much smaller error, as measured by ˆθ than a ft to the sgnal under wth estmated parameter ˆθ

For large data records the detector the GLRT easy to fnd. The condtons under whch the asymptotc condtons hold are: - When the data record s large and the sgnal s weak - When the Maxmum Lkelhood Estmaton (MLE) attans t asymptotc PDF. The composte ypothess testng problem can be cast as parameter test of the PDF. Consder a PDF p ( R, θ ) where θ s a p vector of unknown parameters. The parameter test s: p ( R; θˆ, ) r θ Λ g ( R) = γ p R; θ, r θ ( )

Where ˆθ s the MLE of θ under, the unrestrcted MLE. ˆθ s the MLE of θ under, the restrcted MLE. As N and for unbased estmaton the varance of the estmaton s gven by the Cramer-Rao bound. We can express the ML estmaton of the parameter ˆθ and use ths value n the GLRT calculaton:

Detecton of Gaussally dstrbuted random varables. The general Gaussan problem ypotheses testng n case of Gaussan dstrbuton Equal Covarance Matrces. Equal Mean vectors.

Defnton. A set of random varables r, r,, r N s defned as jontly Gaussan f all ther lnear combnatons are Gaussan random varables. A vector r s a jontly Gaussan random vector when ts components are jontly Gaussan. In other words f z N = gr = T Gr s a Gaussan random varable for all fnte vector. T G, then r s a Gaussan A hypothess-testng problem s called a general Gaussan problem f p R s a Gaussan densty on all hypotheses. ( ) r

We defne: E () r = m () ( ) ( T T Cov r = E m m ) r r Λ T j ( ) v ( T M T r jv E r e = exp jv m v Λ v ) N T T p ( R ) ( ) = π Λ exp ( m ) Λ ( m) R R r Let the observaton space to be N dmensonal vector (or column matrx) r: r r r = r N

Under the hypothess we assume that r s a Gaussan random vector, completely specfed by ts mean vector and covarance matrx. E ( ) ( ) E r m E r m = r m m N E( r N ) The covarance matrx s

K K K K N K K K { T T ( )( ) } N E r m r m = K K K N N NN The nverse of KQ=QK K = I = Q - The probablty densty of r on N T T p ( R ) = ( π) K exp ( m ) Q( m) R R r The probablty densty of r on

N T T p ( R ) = ( π) K exp ( m) Q( m) R R r Lkelhood rato test T T p ( R ) K exp ( R m ) Q( R m) r ( ) Λ R = η p ( R ) T T r K exp ( R m ) Q( R m) T T T T ( R m ) Q ( R m ) ( R m ) Q ( R m ) ln η + ln K ln K γ The test conssts of fndng the dfference between two quadratc forms.

Specal case: Equal covarance matrces. K = K K. Q - = K. + γ. T T T T ( m m ) Q R lnη ( m Qm m Qm ) m m m. T T T T l ( R) m QR R Q m γ. ( R) l s a scalar Gaussan random varable obtaned by lnear transform of jontly Gaussan random varables.

The test can completely descrbed by the dstance between the means of the two hypotheses when the varance was normalzed to be equal to one. d E( l ) E( l ) Var( l ) T ( ) m Qm T ( ) m Qm T T T ( ) ( ) ( ) T ( l ) = E l E l { } Var l = E m Q R m R m Q m Var m Q m T d = mq m The performance for the equal covarance Gaussan case s completely determned by the quadratc form.

Examples. Case : Independent Components wth Equal Varance. Each r has the same varance σ and s statstcally ndependent: K = σ I, Q = I, σ The suffcent statstcs s just the dot product of the observed vector R and the mean dfference vector m. T l ( R) = m R σ p ( R ) r p ( R ) r R

T T d = m I m= m m= m. σ σ σ d corresponds to the dstance between the two mean value vectors dvded by the standard devaton of R.

Case : Independent components wth Unequal Varance. N σ σ σ = K, N σ σ σ = Q. The suffcent statstc s ( ) N m R l R σ = =. ( ) N m d σ = =.

The result can be nterpreted n a new co-ordnate system σ m m σ ' m'= and R = σ R. σ m N N Scale of each axs s changed so that the varances are all equal to one. d corresponds to the dfference vector n ths scaled coordnate system. ' In the scaled coordnate system: l ( R) = m' R.

Case 3: Egenvectors representaton. Equal mean vectors. We represent the R n a new coordnate system n whch the components are statstcally ndependent random varables. The new set of coordnate aces s defned by the orthogonal unt vectors φ, φ,, φ N T φφ = δ j j We denote the observaton n the new coordnate system by r. We select the orentaton of the new system so that the components r and r j are uncorrelated. New component s expressed smply as a dot product: r = r φ.

R Observaton r Observaton R r R φ 45 R R φ R

The varance matrx n the new coordnate system s calculated as T λδ = φ Kφ. j j j The coordnate vectors should satsfy λ φ = Kφ. Propertes of the K: Because K s symmetrc, ts egenvalues are real. Because K s a covarance matrx, the egenvalues are nonnegatve. If the roots λ are dstnct, the correspondng egenvectors are orthogonal. If a partcular root s of multplcty M the M assocated egenvectors are lnearly ndependent.

The mean dfference vector m = φ m T m = φ m T m = φ m N T N. The resultng suffcent statstc n the new coordnate system s l( R) N =. = m R λ There always exst a coordnate system for whch the random varables are uncorrelated and that the new system s related to the old system by a lnear transformaton.

Equal Mean vectors. The mean vectors are equal m m m = K T T ( )( )( ) R m Q Q R m ln η + ln = γ K The mean value vector does not contan any nformaton tellng us whch of the hypothess s true. The lkelhood test subtracts them from the receved vector (we may assume m = ). The dfference of nverse matrces: Q Q Q T Lkelhood rato test l ( R) R QR γ

Specal cases. Case : Dagonal Covarance Matrx: Equal Varances. In case of the r contans the same varable as on plus addtonal sgnal components that may be correlated. : r = n : r = s + n = σ n = s +σn K I; = σ s K K I Q I; ( ) Q = I+ K = ( I ) ( σ ) σs σs S σs = I + K K = Q Q = Q s s s T l ( R) R R γ σ s

Case : Symmetrc ypotheses, Uncorrelated Nose. r = s + n : r = n r = n : r = s + n K K Ks + σ ni = σ I σni = Ks + σni n

( + σ ) I K I Q = s n σn ( ) s σ I K + ni σn Q = R R = R l ( ) = T γ σ T R R R R R n

Conclusons. The suffcent statstc for the general Gaussan problem s the dfference between two quadratc forms T T T T l ( R) = ( R m ) Q( R m) ( R m ) Q( R m). A partcular smple case was the one where the covarance matrxes of T the hypotheses were equal. Then LLR test s l ( R) = m Q R. And the performance s characterzed by d = m T Q m. The results descrbed above can be obtaned smlarly for the M - hypothess case.