Applications of Information Geometry to Hypothesis Testing and Signal Detection

Similar documents
The Geometry of Signal Detection with Applications to Radar Signal Processing

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Information geometry for bivariate distribution control

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Lecture 8: Information Theory and Statistics

Inference. Data. Model. Variates

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Lecture 8: Information Theory and Statistics

Lecture 22: Error exponents in hypothesis testing, GLRT

Doug Cochran. 3 October 2011

Perelman s Dilaton. Annibale Magni (TU-Dortmund) April 26, joint work with M. Caldarelli, G. Catino, Z. Djadly and C.

topics about f-divergence

On Design Criteria and Construction of Non-coherent Space-Time Constellations

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Composite Hypotheses and Generalized Likelihood Ratio Tests

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

1 First and second variational formulas for area

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

Minimax lower bounds I

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information geometry of mirror descent

Detection theory. H 0 : x[n] = w[n]

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Applications of Fisher Information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Nishant Gurnani. GAN Reading Group. April 14th, / 107

PATTERN RECOGNITION AND MACHINE LEARNING

Covariance function estimation in Gaussian process regression

Lecture 7 Introduction to Statistical Decision Theory

Introduction to Bayesian Statistics

Information geometry of Bayesian statistics

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Hypothesis Testing with Communication Constraints

Information Theory and Hypothesis Testing

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Statistical Inference

EECS 750. Hypothesis Testing with Communication Constraints

Estimating Unnormalised Models by Score Matching

Scalable robust hypothesis tests using graphical models

Introduction to Information Geometry

Introduction to Statistical Inference

Minimum Message Length Analysis of the Behrens Fisher Problem

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

A Detailed Analysis of Geodesic Least Squares Regression and Its Application to Edge-Localized Modes in Fusion Plasmas

Change Detection in Multivariate Data

A Geometric View of Conjugate Priors

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

Information Geometric Structure on Positive Definite Matrices and its Applications

Statistical Pattern Recognition

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Information geometry of the power inverse Gaussian distribution

TARGET DETECTION WITH FUNCTION OF COVARIANCE MATRICES UNDER CLUTTER ENVIRONMENT

Information Geometry

Finding the sum of a finite Geometric Series. The sum of the first 5 powers of 2 The sum of the first 5 powers of 3

Exercise 1 (Formula for connection 1-forms) Using the first structure equation, show that

Hands-On Learning Theory Fall 2016, Lecture 3

Machine learning - HT Maximum Likelihood

A Geometric View of Conjugate Priors

arxiv: v1 [cs.lg] 1 May 2010

Expectation Maximization

4.7 The Levi-Civita connection and parallel transport

CSC2541 Lecture 5 Natural Gradient

7.4* General logarithmic and exponential functions

Ch 4. Linear Models for Classification

WARPED PRODUCTS PETER PETERSEN

Section 2. Basic formulas and identities in Riemannian geometry

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

DETECTION theory deals primarily with techniques for

Recursive Karcher Expectation Estimators And Geometric Law of Large Numbers

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria.

Information Theory in Intelligent Decision Making

Nonlinear Programming Models

Algorithms for Variational Learning of Mixture of Gaussians

Predictive Hypothesis Identification

LECTURE 9: MOVING FRAMES IN THE NONHOMOGENOUS CASE: FRAME BUNDLES. 1. Introduction

MIT Spring 2016

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.

Lecture Outline. Target Tracking: Lecture 3 Maneuvering Target Tracking Issues. Maneuver Illustration. Maneuver Illustration. Maneuver Detection

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Active learning in sequence labeling

Week 6: Differential geometry I

Expectation Propagation Algorithm

Chapter 3 : Likelihood function and inference

Introduction to Signal Detection and Classification. Phani Chavali

Correlations in Populations: Information-Theoretic Limits

Warped Products. by Peter Petersen. We shall de ne as few concepts as possible. A tangent vector always has the local coordinate expansion

Machine Learning Lecture 5

Information Geometric view of Belief Propagation

Assignment 4. u n+1 n(n + 1) i(i + 1) = n n (n + 1)(n + 2) n(n + 2) + 1 = (n + 1)(n + 2) 2 n + 1. u n (n + 1)(n + 2) n(n + 1) = n

TOTAL BREGMAN DIVERGENCE FOR MULTIPLE OBJECT TRACKING

Clustering with k-means and Gaussian mixture distributions

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Transcription:

CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016

Outline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection

1. Principles of Information Geometry Important problems in statistics Distribution (likelihood): where x is a vector of data θ is a vector of unknowns p( x θ) 1. How much does the data x tell about the unknown θ? 2. How good is an estimator? 3. How to measure difference between two distributions? 4. How about the structure of a statistical model specified by a family of distributions? ˆθ 3

1. Principles of Information Geometry What is information geometry? Data Distributions Data processing Statistics Manifold Information Geometry 4

1. Principles of Information Geometry Information geometry is the study of intrinsic properties of manifolds of probability distributions by way of differential geometry. The main tenet of information geometry is that many important structures in information theory and statistics can be treated as structures in differential geometry by regarding a space of probabilities as a differentiable manifold endowed with a Riemannian metric and a family of affine connections. Information Theory Statistics Probability Theory Relationships with other subjects Physics Information Geometry Differential Geometry Systems Theory Riemannian Geometry 5

1. Principles of Information Geometry Statistical manifold Riemannian metric Affine connections Distance and geodesic 2 ds g ( ) T ij did j d ( ) d Curvatures R θ θ G θ θ l l l l s l s ijk j ik k ij js ik ks ij x x s, G, p( x θ) x R n, θ log p log p G E θ i j θ θ 1 jim ( θ) E jil(, ) ml(, ) E jl(, ) il(, ) ml(, ) θ x θ x θ θ 2 x θ x θ x θ 6

Outline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection

2. Geometry of Hypothesis Testing 1)Start from target detection 0-30 -20-10 0 10 20 30-30 -20-10 0 10 20 30 100 90 80 70 60 50 40 30 20 10 P( x 0 ) P( x 1 ) x Hypothesis testing 8

2. Geometry of Hypothesis Testing 2)Likelihood ratio test Principles Make division of observation space Basic method The detector decides if the likelihood ratio exceeds a threshold L( x) p( x ) p( x ) 1 0 Essentials of signal detection Discrimination between two identically distributed distributions with different parameters. 1 L( x) P( x 0 ) p( x ) P( x 1 ) 1 p( x 0 ) Likelihood ratio test x 9

2. Geometry of Hypothesis Testing 3)Geometric interpretation of hypothesis testing 1 px;, exp 2 x 2 2 2 H 2, R ; 0 A B B C D A df ( A, B) D C df ( C, D) Familyofdistributions Statisticalmanifold Consider hypothesis testing from a geometric viewpoint 10

2. Geometry of Hypothesis Testing 3)Geometric interpretation of hypothesis testing Equivalence between LRT and Kullback-Leibler divergence x1, x2,, xn N qx ( ) p ( x ) 1 KLD Suppose are i.i.d. observations from a distribution, and there are two models (hypotheses) for, denoted by and. Then, the likelihood ratio is p ( x) 0 qx ( ) qx ( ) D( q p) q( x)ln dx px ( ) L N i1 0 p 1 1( xi ) p ( x ) Error exponent: Stein s lemma: i 0 K lim N Dq ( p) Dq ( p ) 0 1 1 0 1 N Minimum distance detector 1 log P M N K D( p0 p1) P M 2 NK 11

2. Geometry of Hypothesis Testing 3)Geometric interpretation of hypothesis testing The problem of hypothesis testing can be regarded as a discrimination problem where the decision is made by comparing distances from the signal distribution estimates to two hypotheses in the sense of the KL divergence, i.e., selecting the model that is closer to signal distribution estimates. x p( x 0 ) p( x θ0) d 0 p( x θ) d 1 p( x 1 ) X p( x θ ) Dq ( p) Dq ( p ) 0 1 1 1 0 S 1 N Minimum distance detector 12

Outline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection

3. Matrix CFAR Detection 1)Constant false alarm rate detector Classical CFAR detector Decision by comparing the content of the cell under test with an adaptive threshold given by the arithmetic mean of the reference cells to achieve the desired constant probability of false alarm. x1 xn 21 x N 2 detectioncell a samples b targets x D xn 2 xn 21 Arithmetic mean Decision Threshold 0 1 c x N 0: target absent 1: target present 14

3. Matrix CFAR Detection 2)Matrix CFAR detector In 2008, F. Barbaresco proposed a generalized CFAR technique based on R1 Ri Ri 1 R N the manifold of symmetric R R positive definite (SPD) matrices. It has been proved that the Riemannian distance-based R 1 R N detector has better detection performance than the classical CFAR detector. R2 R D R Ri Ri 1 15

3. Matrix CFAR Detection 2)Matrix CFAR detector Riemannian distance between two SPD matrices d 2 n R1, R2 lnr1 R2R1 ln k 2 1 2 1 2 2 Riemannian center of N SPD matrices i1 where p=1, R denotes the median; p=2, R is the mean. The matrix CFAR detector N p R arg min wd R, R d RR, i R i i k 1 16

3. Matrix CFAR Detection 2)Matrix CFAR detector Initial spectra of measurements Mean spectra of measurements Intensity Classical detector Geometric detector 17

3. Matrix CFAR Detection 3)Robust matrix CFAR detector Two shortcomings of the Riemannian distance based matrix CFAR detector a) Computational cost is expensive for exponential operations in the calculation of Riemannian distance and its average; b) Riemannian mean and median are not robust to outlier. 18

3. Matrix CFAR Detection 3)Robust matrix CFAR detector Symmetrized Kullback-Leibler (skl) divergence based matrix CFAR detector Total Kullback-Leibler (tkl) divergence based matrix CFAR detector Sample Data Covariance Matrix R R1 CUT R Ri i+1 RN skl mean, skl median, tkl t center Divergence Computation >Threshold 19

3. Matrix CFAR Detection 3)Robust matrix CFAR detector skl divergence between two SPD matrices 1 1 skl R, R tr( R R R R 2 I) 1 2 1 2 2 1 skl mean of N SPD matrices R = R R N N 1 1 1 1 i k N i=1 N k=1 skl median of N SPD matrices 12 R k1 1, 1 i, skl Rk Ri j skl Rk Rj 1 N N 1 R R i j 12 20

3. Matrix CFAR Detection 3)Robust matrix CFAR detector The tkl divergence is a special case of the total Bregman divergence tbd, which is invariant to linear transformation. BD( xy, ) f x f y x y, f y tbd( x, y), f x f y x y f y BD x, y tbd x, y 1f y 2 BD x, y tbd x, y More robust 21

3. Matrix CFAR Detection 3)Robust matrix CFAR detector tkl divergence between two SPD matrices tkl R, R 1 2 tkl center of N SPD matrices R i wr 1 1 R1 R2 tr R2 R1 log det 2 log det R2 n1 log2 n 2 c log det R 4 2 1 1 i i i, w 2 log det i n1 log2 where R i 2 c log det R 4 2 j i j inversely proportional to the value of divergence gradient, which is robust to outliers i 1 2 22

3. Matrix CFAR Detection 3)Robust matrix CFAR detector Comparisons of dissimilarity measures between Riemannian distance, skl divergence and tkl divergence The signal-to-clutter ratio (SCR) is significantly improved by the mapping of tkl divergence. 23

3. Matrix CFAR Detection 3)Robust matrix CFAR detector Comparisons of detection performance between Riemannian distance, skl divergence and tkl divergence The tkl divergence based matrix CFAR detector has better performance. 24

3. Matrix CFAR Detection 3)Robust matrix CFAR detector Table I The time taken by different algorithms Algorithm Time (s) Riemannian mean detector 29.74 Riemannian median detector 41.66 skl mean detector 0.09 skl median detector 2.81 tkl t center detector 0.15 25

Outline 1. Principles of Information Geometry 2. Geometry of Hypothesis Testing 3. Matrix CFAR Detection on Manifold of Symmetric Positive-Definite Matrices 4. Geometry of Matrix CFAR Detection

4. Geometry of Matrix CFAR Detection Classical CFAR detector Euclidean space R 1 R Euclidean distance measure N Matrix CFAR detector R D R Ri 1 Matrix manifold Riemannian distance measure R2 Ri KL divergence, etc. A good detector should R R 1 N Properly characterize the R D R Ri 1 intrinsic structure of the R 2 R i measurement space Maximize the divergence between two hypotheses (clusters) 27

4. Geometry of Matrix CFAR Detection Future work Other divergences which have better performance to measure the dissimilarity between distributions Better approaches for clustering the distributions Detectors for heavy clutters Detectors for nonstationary clutters Detectors for few samples 28

Thank you for your attention!