Statistical pattern recognition

Similar documents
Lecture 12: Classification

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Lecture 10: Dimensionality reduction

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Composite Hypotheses testing

Maximum Likelihood Estimation (MLE)

Unified Subspace Analysis for Face Recognition

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Pattern Classification

Communication with AWGN Interference

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Fisher Linear Discriminant Analysis

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

APPENDIX A Some Linear Algebra

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Regularized Discriminant Analysis for Face Recognition

Singular Value Decomposition: Theory and Applications

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Generative classification models

10-701/ Machine Learning, Fall 2005 Homework 3

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

The exam is closed book, closed notes except your one-page cheat sheet.

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Linear Approximation with Regularization and Moving Least Squares

Feb 14: Spatial analysis of data fields

CSE 252C: Computer Vision III

Pattern Classification

Classification learning II

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Classification as a Regression Problem

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Gaussian process classification: a message-passing viewpoint

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

The big picture. Outline

2.3 Nilpotent endomorphisms

U-Pb Geochronology Practical: Background

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Review: Fit a line to N data points

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Clustering & Unsupervised Learning

Multi-dimensional Central Limit Theorem

VQ widely used in coding speech, image, and video

Which Separator? Spring 1

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Probabilistic Classification: Bayes Classifiers. Lecture 6:

MDL-Based Unsupervised Attribute Ranking

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Non-linear Canonical Correlation Analysis Using a RBF Network

A Novel Biometric Feature Extraction Algorithm using Two Dimensional Fisherface in 2DPCA subspace for Face Recognition

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

First Year Examination Department of Statistics, University of Florida

Support Vector Machines

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Lecture 10 Support Vector Machines II

Clustering & (Ken Kreutz-Delgado) UCSD

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Eigenvalues of Random Graphs

Primer on High-Order Moment Estimators

Report on Image warping

Statistical learning

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Structure from Motion. Forsyth&Ponce: Chap. 12 and 13 Szeliski: Chap. 7

INF 4300 Digital Image Analysis REPETITION

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

IV. Performance Optimization

Support Vector Machines

Kristin P. Bennett. Rensselaer Polytechnic Institute

EM and Structure Learning

Pattern. Classification

Absolute chain codes. Relative chain code. Chain code. Shape representations vs. descriptors. Start

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

SDMML HT MSc Problem Sheet 4

Comparison of Regression Lines

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

A Bayesian Approach to Stein-Optimal Covariance Matrix Estimation

The Geometry of Logit and Probit

Lecture 12: Discrete Laplacian

Lecture Notes on Linear Regression

Transcription:

Statstcal pattern recognton

Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve Someone wthout t condton may come out postve (false postve est propertes SPECIFICIY or true negatve rate P(NEG COND SENSIIVIY or true postve rate P(POS COND Rcardo Guterrez Osuna AMU CSE

Problem defnton Assume a populaton of, where out of every people has the medcal condton Assume that we desgn a test wth 98% specfcty P(NEG COND and 9% senstvty P(POS COND You take the test, and t comes POSIIVE What condtonal probablty are we after? How lkely s t that you have the condton? Rcardo Guterrez Osuna AMU CSE

Soluton: Jont frequency table he answer s the rato of ndvduals wth the condton to total ndvduals (consderng only ndvduals that tested postve or 9/88. HAS CONDIION FREE OF CONDIION ES IS POSIIVE ES IS NEGAIVE ROW OAL rue postve P(POS COND.99 False postve P(POS COND 9,9 (.9898 False negatve P(NEG COND (.9 rue negatve P(NEG COND 9,9.989, 9,9 COLUMN OAL 88 9,, Rcardo Guterrez Osuna AMU CSE

Condtonal probablty S S P(A I B P (A B for P(B > P(B A A B B B has A A B B occurred otal probablty P(A P(A I S P(A I B +... + P(A I BN P(A B P(B +... + P(A B P(B N P(A B k k P(B k N N B B B N- A B B N Rcardo Guterrez Osuna AMU CSE

Alternatve soluton: Bayes theorem P (A B P(B AP(A P(B P( + cond P(cond P (cond + P( + P( + cond P(cond P( + cond P(cond + P( + cond P( cond.9..9. + (.98.99. Rcardo Guterrez Osuna AMU CSE

In SPR, Bayes theorem s expressed as Posteror P(ω x P(x ω P(ω j j j N k P(x ω k P(ω k Lkelhood P(x ω j Pror P(ω P(x Norm constant j And we assgn sample x to the class ω k wth the hghest posteror It It can be shown ths rule mnmzes the prob. of error Rcardo Guterrez Osuna AMU CSE

Dscrmnant functons Class assgnment Select max Costs g (x g (x g C (x Dscrmnant functons x x x x d Features x ω where g (x > g (x g (x p( ω j j x Rcardo Guterrez Osuna AMU CSE

Quadratc classfers For normally dstrbuted classes, the posteror can be reduced to a very smple expresson Recall an n dmensonal Gaussan densty s p(x ( π n/ / exp (x μ (x μ U Usng Bayes rule, the DF can be wrtten as g (x P(ω x P(x ω P(ω P(x exp (x μ (x μ P(ω n/ / ( π P(x Rcardo Guterrez Osuna AMU CSE

Elmnatng constant terms g ( x -/ exp (x μ (x μ P( ω And takng logs g (x (x μ (x μ log + log P(ω hs s known as a quadratc dscrmnant functon (because t s a functon of x Rcardo Guterrez Osuna AMU CSE

Case : Σ σ I Features are statstcally ndependent, and have the same varance for all classes In ths case, the quadratc dscrmnant functon becomes g (x ( σ I - (x μ (x μ- log σ (x μ (x μ + log P(ω σ I + log P(ω Assumng equal prors and droppng constant terms g (x (x μ (x μ - DIM ( x μ hs s called an Eucldean dstance or nearest mean classfer Rcardo Guterrez Osuna AMU CSE

[ ] [ ] [ ] μ μ μ Σ Σ Σ Rcardo Guterrez Osuna AMU CSE

Case : Σ Σ All classes have the same covarance matrx, but the matrx s not dagonal In ths case, the quadratc dscrmnant becomes ( g (x (x μ (x μ - log + ( log ( P(ω assumng g equal prors and elmnatng constants g (x (x μ Σ - (x μ x hs s known as a Mahalanobs dstance classfer μ x x - μ K x - μ Κ Rcardo Guterrez Osuna AMU CSE

[ ] [ ] [ ]... μ μ μ.. Σ.. Σ.. Σ Rcardo Guterrez Osuna AMU CSE

General case [ ] [ ] [ ] μ μ μ [ ] [ ] [ ]... Σ Σ Σ μ μ μ Zoom out Rcardo Guterrez Osuna AMU CSE

k nearest neghbors Non parametrc approxmaton Lkelhood of each class P(x ω k N V x V And prors P(ω N N hen, the posteror becomes P(ω x P(x ω P(ω P(x k N N V N k NV Rcardo Guterrez Osuna AMU CSE k k

Example Gven the three classes, assgn a class label for the unknown example x u Assume the Eucldean dstance and k neghbors Of the closest neghbors, belong to ω and belongs ω to ω, so x s assgned to ω u, the predomnant class ω x u ω Rcardo Guterrez Osuna AMU CSE

Rcardo Guterrez Osuna AMU CSE

-NN -NN -NN Rcardo Guterrez Osuna AMU CSE

Advantages Smple mplementaton Nearly optmal n the large sample lmt (N P[error] Bayes <P[error] NN <P[error] Bayes Uses local nformaton, whch can yeld hghly adaptve behavor Lends tself very easly to parallel mplementatons Dsadvantages Large storage requrements Computatonally ntensve recall Hghly susceptble to the curse of dmensonalty Rcardo Guterrez Osuna AMU CSE

Dmensonalty reducton

Why do dmensonalty reducton? he so called curse of dmensonalty Exponental growth n the number of examples requred to accurately estmate a functon Exploratory data analyss Vsualzng the structure of the data n a lowdmensonal subspace Rcardo Guterrez Osuna AMU CSE

wo approaches to perform dmensonalty reducton Feature selecton: choose a subset of all the features [ x x...x ] [ x x ] N...x M Feature extracton: create new features by combnngthe exstng ones [ x x...x ] [ y y...y ] f ( [ x x ] N y M...x M Feature extracton s typcally a lnear transform x x M x N y lnear feature extracton y y M w w M w M w w w M M Rcardo Guterrez Osuna AMU CSE L L O w w w N N M MN x x M x N

Representaton vs. classfcaton Fe eature Feature Rcardo Guterrez Osuna AMU CSE

PCA Soluton Project the data onto the egenvectors of the largest egenvalues of the covarance matrx PCA fnds orthogonal drectons of largest varance Propertes If data s Gaussan, PCA fnds ndependent axes Otherwse, t smply de correlates lt the axes Lmtaton Drectons of hgh h varance do not necessarly contan dscrmnatory nformaton Rcardo Guterrez Osuna AMU CSE

LDA Defne scatter matrces x Wthn class μ S W S B S W C S C x ω ( x μ ( x μ S B μ S B μ S W Between class μ S W S B C N ( μ μ( μ μ x hen maxmze rato J(W W S W S B W W W Rcardo Guterrez Osuna AMU CSE

Soluton NOE Optmal projectons are the egenvectors of the largest egenvalues of the generalzed egenvalue problem ( S λ S w B W S B s the sum of C matrces of rank one or less and the mean vectors are constraned by Σμ μ herefore, S B wll be at most of rank (C, and LDA produces at most C feature projectons Lmtatons Overfttng Informaton not n the mean of the data Classes sgnfcantly non Gaussan Rcardo Guterrez Osuna AMU CSE

PCA axs 6 - - -6-8 9 6 6 8 9 8 66 9 9 8 6 6 6 6 6 8 8 6 8 8 9 9 8 9 8 9 6 6 6 98 8 9 6 9 8 8 8 6 6 8 8 9 8-6 8 - - axs axs - - 6-8 9 6 6 6 6 6 6 6 9 6 6 9 8 9 88 9 8 9 9 8 9 9 6 6 6 8 9 9 8 8 8 8 9 8 9 8 89 9 8 8 - - axs axs x - LDA axs - 9 8 9 9 9 88 98 88 99 8 888 8 9 8 8 6 6 6 6 6 6 66 -. -...... axs axs x - - - - -. 9 8 8 89 8 8 8 9 9 9 9 8 9 6 666 6 6 6.. - axs axs x - Rcardo Guterrez Osuna AMU CSE

LDA and overfttng Generate an artfcal dataset h l l l h h lk lh d hree classes, examples per class, wth the exact same lkelhood: a multvarate Gaussan wth zero mean and dentty covarance dmensons dmensons dmensons dmensons Rcardo Guterrez Osuna AMU CSE