Pattern Classification, Ch4 (Part 1)

Similar documents
Statistical Pattern Recognition

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Pattern Classification

Title: Damage Identification of Structures Based on Pattern Classification Using Limited Number of Sensors

Pattern Classification

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

10-701/ Machine Learning Mid-term Exam Solution

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Lecture 2: Monte Carlo Simulation

Vector Quantization: a Limiting Case of EM

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Sequences and Series of Functions

Math 116 Final Exam December 12, 2014

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Math 142, Final Exam. 5/2/11.

Chapter 10: Power Series

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Lecture 19: Convergence

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Machine Learning Theory (CS 6783)

Lecture 10 October Minimaxity and least favorable prior sequences

Sieve Estimators: Consistency and Rates of Convergence

6.867 Machine learning, lecture 7 (Jaakkola) 1

Math 116 Second Midterm November 13, 2017

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Properties and Hypothesis Testing

1.010 Uncertainty in Engineering Fall 2008

Distribution of Random Samples & Limit theorems

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

Chapter 4. Fourier Series

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Z ß cos x + si x R du We start with the substitutio u = si(x), so du = cos(x). The itegral becomes but +u we should chage the limits to go with the ew

Random Variables, Sampling and Estimation

STAT Homework 1 - Solutions

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Empirical Process Theory and Oracle Inequalities

MATH 1080: Calculus of One Variable II Fall 2017 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

The Expectation-Maximization (EM) Algorithm

Chapter 6 Principles of Data Reduction

32 estimating the cumulative distribution function

Analysis of Experimental Data

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Real Variables II Homework Set #5

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

6. Uniform distribution mod 1

Exponential Families and Bayesian Inference

7.1 Convergence of sequences of random variables

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Axis Aligned Ellipsoid

Optimally Sparse SVMs

Lecture 15: Learning Theory: Concentration Inequalities

Problem Set 4 Due Oct, 12

CS284A: Representations and Algorithms in Molecular Biology

1 Review of Probability & Statistics

Estimation of the essential supremum of a regression function

Lecture 33: Bootstrap

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Expectation-Maximization Algorithm.

Mixtures of Gaussians and the EM Algorithm

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

Math 122 Test 3 - Review 1

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Lecture 20: Multivariate convergence and the Central Limit Theorem

Statistical inference: example 1. Inferential Statistics

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

7.1 Convergence of sequences of random variables

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

5. Likelihood Ratio Tests

Discrete probability distributions

Mathematical Statistics - MS

Math 5C Discussion Problems 2 Selected Solutions

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Information-based Feature Selection

Ma 530 Introduction to Power Series

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

4. Partial Sums and the Central Limit Theorem

U8L1: Sec Equations of Lines in R 2

ON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES. 1. Introduction Basic hypergeometric series (cf. [GR]) with the base q is defined by

Entropy Rates and Asymptotic Equipartition

Advanced Stochastic Processes.

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

Chapter 7 z-transform

INFINITE SEQUENCES AND SERIES

Stat 421-SP2012 Interval Estimation Section

Statistical Inference Based on Extremum Estimators

Solutions to Homework 1

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Transcription:

Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

Chapter 4 (Part 1): No-Parametric Classificatio (Sectios 41-43) 43) Itroductio Desity Estimatio Parze Widows

Itroductio 2 Uderlyig desity fuctios are kow commo parametric forms rarely fit the desities actually ecoutered For ex - All parametric desities are uimodal (have a sigle local maximum), whereas may practical problems ivolve multi-modal modal desities Goal Noparametric procedures that ca be used with arbitrary distributios ad without the assumptio that the forms of the uderlyig desities are kow There are two types of oparametric methods: Estimatig desity fuctio p p(x ω j ) Bypass probability ad go directly to a-posteriori probability estimatio p(ω j x )

Desity Estimatio 3 Basic idea: Probability that a vector x will fall i regio R is: P = R p( x') dx' P is a smoothed (or averaged) versio of the desity fuctio p(x) A sample of size the probability that k poits fall i R is the: k k Pk = P (1 P) k ad the expected value for k is: E(k) = P

4 p(x) is cotiuous ad that the regio R is so small that p does ot vary sigificatly withi it p ( x' )dx' = p( x' ) dx' = p( x' ) 1 ( x )dx' = p( x' ) µ ( R R R R R ) Where: µ(r) is: a surface i the Euclidea space R 2 a volume i the Euclidea space R 3 a hypervolume i the Euclidea space R

5 Sice p(x) p(x ) = costat, therefore i the Euclidea space R 3 : p( x') dx' p( x) V p( x) R k V Where x is a poit withi R ad V the volume eclosed by R

6 Max P k wrt k k P Therefore, the ratio k/ is a good estimate for the probability P ad hece for the desity fuctio p k / p( x ) V

7

Covergece 8 The fractio k/(v) is a space averaged value of p(x) p(x) is obtaied oly if V approaches zero lim V 0, k = 0 p( x ) = 0 (if = fixed) This is the case where o samples are icluded i R: it is a uiterestig case! lim V 0, k 0 p( x ) = I this case, the estimate diverges: it is a uiterestig case! R

9 The volume V eeds to approach 0 ayway if we wat to use this estimatio Practically, V caot be allowed to become small sice the umber of samples is always limited Oe will have to accept a certai amout of variace i the ratio k/ Theoretically, if a ulimited umber of samples is available, we ca circumvet this difficulty To estimate the desity of x, we form a sequece of regios R 1, R 2, cotaiig x: the first regio cotais oe sample, the secod two samples ad so o Let V be the volume of R, k the umber of samples fallig i R ad p (x) be the th estimate for p(x): p (x) = (k /)/V (7)

10 Three ecessary coditios should apply if we wat p (x) to coverge to p(x): 1 ) limv 2 ) lim k 3 ) lim k There are two differet ways of obtaiig sequeces of regios that satisfy these coditios: = / 0 = (a) Shrik a iitial regio where V = 1/ ad show that p ( x ) This is called the Parze-widow estimatio method (b) Specify k as some fuctio of, such as k = ; ; the volume V is grow util it ecloses k eighbors of x This is called the k -earest eighbor estimatio method = p( 0 x )

11

12 Parze Widows Parze-widow approach to estimate desities assume that the regio R is a d-dimesioal dimesioal hypercube V = h d (h Let ϕ (u) be the : legth of 1 1 u j j = ϕ(u) = 2 0 otherwise the edge of followig widow 1,,d R ) fuctio : ϕ((x-x i )/h ) is equal to uity if x i falls withi the hypercube of volume V cetered at x ad equal to zero otherwise

The umber of samples i this hypercube is: 13 k = i = i= 1 x ϕ h x i By substitutig k i equatio (7), we obtai the followig estimate: p (x) i 1 = = 1 V x ϕ h i= 1 x i P (x) estimates p(x) as a average of fuctios of x ad the samples (x i ) (i = 1,,) These fuctios ϕ ca be geeral!

14 Parze-Widow Desity Estimates

15 Illustratio The behavior of the Parze-widow method Case where p(x) N(0,1) Let ϕ(u) = (1/ (2π) exp(-u 2 /2) ad h = h 1 / (>1) Thus: p (h 1 : kow parameter) is a average of ormal desities cetered at the samples x i i 1 = ( x ) = 1 h x ϕ h i= 1 x i

16 Numerical results: For = 1 ad h 1 =1 1 1 / 2 2 p1( x ) = ϕ( x x1 ) = e ( x x1 ) N( x1 2π,1 ) For = 10 ad h = 01,, the cotributios of the idividual samples are clearly observable!

17

18 Aalogous results are also obtaied i two dimesios as illustrated:

19

Case where p(x) = λ 1 U(a,b) + λ 2 T(c,d) (ukow desity) (mixture of a uiform ad a triagle desity) 20

21

22 Classificatio example I classifiers based o Parze-widow estimatio: We estimate the desities for each category ad classify a test poit by the label correspodig to the maximum posterior The decisio regio for a Parze-widow classifier depeds upo the choice of widow fuctio as illustrated i the followig figure

23

Parze Widows Probabilistic Neural Networks 24 Patter Classificatio, Chapter 4 (Part 2)

Parze Widows Probabilistic Neural Networks 25 Compute a Parze estimate based o patters Iput uit Patters with d features sampled from c classes The iput uit is coected to patters x 1 x 2 x d W 11 W d W d2 p 1 p 2 p Iput patters Modifiable weights (traied) Patter Classificatio, Chapter 4 (Part 2)

26 Iput patters p p 1 p 2 ω p k 1 ω 2 ω c Category uits p Activatios (Emissio of oliear fuctios) Patter Classificatio, Chapter 4 (Part 2)

Traiig the etwork 27 Patter Classificatio, Chapter 4 (Part 2)

Traiig the etwork 28 Algorithm 1 Normalize each patter x of the traiig set to 1 2 Place the first traiig patter o the iput uits 3 Set the weights likig the iput uits ad the first patter uits such that: w 1 = x 1 4 Make a sigle coectio from the first patter uit to the category uit correspodig to the kow class of that patter 5 Repeat the process for all remaiig traiig patters by settig the weights such that w k = x k (k = 1, 2,, ) Patter Classificatio, Chapter 4 (Part 2)

29 1Normalize the test patter x ad place it at the iput uits 2 Each patter uit computes the ier product i order to yield the et activatio t etk = wk x ad emit a oliear fuctio x wk ( x wk )^T ( x wk ) w x ϕ( ) exp exp( ) 2 2 h 2σ σ 3 Each output uit sums the cotributios from all patter uits coected to it P ( x ω ) = i= 1 4 Classify by selectig the maximum value of P (x ω j ) (j = 1,, c) j ϕ i T P( ω x ) j Patter Classificatio, Chapter 4 (Part 2)