ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Similar documents
P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Lecture 12: Classification

Composite Hypotheses testing

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Classification as a Regression Problem

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Statistical pattern recognition

The big picture. Outline

Maximum Likelihood Estimation (MLE)

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Conjugacy and the Exponential Family

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Mixture o f of Gaussian Gaussian clustering Nov

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Relevance Vector Machines Explained

Homework Assignment 3 Due in class, Thursday October 15

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Generative classification models

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Clustering & Unsupervised Learning

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Learning from Data 1 Naive Bayes

Lecture Nov

Learning with Maximum Likelihood

Engineering Risk Benefit Analysis

The exam is closed book, closed notes except your one-page cheat sheet.

Maximum Likelihood Estimation

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Clustering & (Ken Kreutz-Delgado) UCSD

Differentiating Gaussian Processes

10-701/ Machine Learning, Fall 2005 Homework 3

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Evaluation for sets of classes

Lecture Notes on Linear Regression

Machine learning: Density estimation

Computing MLE Bias Empirically

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Goodness of fit and Wilks theorem

Limited Dependent Variables

3.1 ML and Empirical Distribution

SDMML HT MSc Problem Sheet 4

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Hidden Markov Models

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

First Year Examination Department of Statistics, University of Florida

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Probability Density Function Estimation by different Methods

Tracking with Kalman Filter

Gaussian process classification: a message-passing viewpoint

Course 395: Machine Learning - Lectures

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Hidden Markov Models

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 12: Bayes Classifiers. Dr. Yanjun Qi. University of Virginia

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Chapter Newton s Method

Retrieval Models: Language models

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

1 Motivation and Introduction

1 Convex Optimization

Statistical Foundations of Pattern Recognition

Review - Probabilistic Classification

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Development Pattern and Prediction Error for the Stochastic Bornhuetter-Ferguson Claims Reserving Method

Estimation: Part 2. Chapter GREG estimation

The Basic Idea of EM

CS47300: Web Information Search and Management

Explaining the Stein Paradox

Stat 543 Exam 2 Spring 2016

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Natural Language Processing and Information Retrieval

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Classification Bayesian Classifiers

Classification learning II

Maximum likelihood. Fredrik Ronquist. September 28, 2005

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Basically, if you have a dummy dependent variable you will be estimating a probability.

18.1 Introduction and Recap

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Stat 543 Exam 2 Spring 2016

Solutions Homework 4 March 5, 2018

9.2 Maximum A Posteriori and Maximum Likelihood

Rockefeller College University at Albany

Transcription:

EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30 PM E-000 (untl furr notce Revew Bayesan Classfcaton For a gven pattern x, classfy t to most probable class P (C x Use Bayes heorem to fnd probabltes: C C j P (C j x (after measurement pro a posteror probablty of C gven x class-condtonal pdf, derved from samples P (C x p C P (C p a pror probablty of class C total pdf of x, ndependent of class EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton

Revew Maxmum A Pror (Bayes Classfer p C P (C C C j p C j P (C j p C P (C p C j P (C j θ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 3 Revew PMAP(error! PML(error! PMICD(error! PMED(error P(error P(classfyng as C Cj P(Cj + P(classfyng as Cj C P(C EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 4

Loss and Condtonal Rsk Loss λ j λ(α C j cost of acton gven class j Condtonal Rsk Classfer c R(α x λ(α C j P (C j x R(α x j α j α R(α j x EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 5 Probablstc Classfcaton Wth pror probabltes and class condtonal denstes, we can desgn optmal classfers based on MAP classfer Unfortunately, n real world, we rarely have ths nformaton avalable We typcally have a number of tranng samples from whch we must determne nformaton about class If we can assume form of probablstc dstrbutons, such as Gaussan, n we smply need to fnd (estmate necessary parameters (mean and covarance Orwse we need to use nonparametrc technques to estmate dstrbuton EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 6

Gven labelled samples {x xɛc,,,, } { Estmate Probablstc Classfcaton p C, P (C a Parameter estmaton - assume a known form for pdf and estmate necessary parameters (mean and varance for normal dstrbutons # b Densty estmaton - estmate non-parametrc pdf s from gven samples Evaluate dscrmnant 3 Assgn x accordngly ˆp C ˆP (C ˆp C ˆP (C estmated values C C 0 EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 7 Parameter Estmaton - Assume a form for dstrbuton, whch wll dctate requred parameters - Gaussan pdf: ˆµ, ˆσ for -D and ˆµ, ˆΣ for mult-d Assumng two classes, n re are also + samples P (C P P (C P eed to fnd P! here are approaches to estmatng parameters: Maxmum lkelhood Bayes estmaton (Bayesan learnng EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 8

Maxmum Lkelhood Parameter Estmaton Choose as estmates values of parameters that maxmze lkelhood (probablty of observed set of tranng samples (labelled A Pror Probablty Estmates Assumng that we knew P(CPP and P(CP(-P, n we could wrte P (, P Probablty of occurrences of C n samples, wth P(C for any sngle sample gven by P EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 9 ( P (, P P ( P!!!(! P ( P We need to fnd value of P that maxmzes P(,P occurrng Label ths value of P as( ˆP ML So we need to take dervatve wrt P and set t to zero: ( P {P (, P } ( { P ( P ( P ( P } 0 ( P ( P 0 P 0 EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton ˆP ML 0

So maxmum lkelhood estmate of a pror probablty of C s relatve frequency of occurrence of C n samples EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Gaussan Parameter Estmaton Follow a smlar process to fnd parameters for assumed Gaussan densty If we knew µ, Σ n we could wrte probablty of samples x x as p x µ, Σ (π d/ Σ / e Σ for, Σ that maxmze ths hat s, for EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton

Σ as µ xp Σ wrte e of samples If we knew µ,p Σn we µ, could probablty x d/ e xµ / d/ x µ, Σ / (π Σ!! (π Σ µ Σ µ Σ µ p µ x µ, Σ e x µ, Σ! e / d/ d/ Σ / µ Σ µ (πare Σ (π x µ, Σ We e lookng µ, Σ for thatµ,maxmze ths! hatths maxmze s, for are for lookng Σ that hat for Σ µ (πd/we Σ / s, µ p x µ, Σ! / e µ Σ µ d/ (π Σ e p maxmze x µ, Σ ths hat r µ, Σ that maxmze hat for s, for We areths lookng µ, Σ that s, Σ for d/ / (π r µ, Σ that maxmze ths hat s, for We are lookng for µ, Σ that maxmze ths hat s, for We are lookng for µ, Σ that maxmze ths hat s, for p x µ, Σ 0 p x µ, Σ 0 p x µ, Σ 0 p x p xx µ, µ,σ Σ 0 p µ, Σ 0 Σ p p xσ µ, µ, x 0Σ 0 Σ x µ, Σ p 0 p x µ, Σ 0 Σ p x µ, Σ 0 p x µ, Σ 0 Σ Frst, take log to smplfy: Σ p x µ, Σ 0 Frst, take log to smplfy: Σp x µ, Σ 0 smplfy: Σ smplfy: Frst, takefrst, log to smplfy: take log to smplfy: $ # Frst, take log to smplfy: n/ / Σ log (π # Σ $ $ n/ / Σ Σ log (π $ Σ / Σ / # Σ Σ $ # $ Σ $ /n/ Σ / n/ log # (π Σ Σ (π log Σ n/ / log (π Σ Σ Σ Σ Σ (Σ Σ 5 MAXIMUM LIKELIHOOD 8 ΣΣ (Σ (Σ 0 to maxmze (Σ! µ M L x m sample mean (Σ 0(Σ to maxmze 0 to maxmze (Σ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 3 0 to maxmze Smlarly for covarance matrx Σ Consder θ Σ for convenence 0 to maxmze 0 to maxmze 0 to maxmze 5 MAXIMUM LIKELIHOOD # $ M LIKELIHOOD 5 8 n/ MAXIMUM LIKELIHOOD 8 log (π + log θ θ θ M LIKELIHOOD 8! 5 MAXIMUM LIKELIHOOD 8! µ x m sample mean! M L θ µ M L x m sample mean! x m sample mean! x m sample mean θ θ µ Σ x m mean Consder forσ covarance Σ Consder θ Σ for M L matrx or covarance sample Σ for Smlarly convenence Smlarly for covarance matrx Consder θ Σmatrx for convenence Smlarly for θ covarance matrx covarance matrx Σ Consder θ Σ changes for convenence ote transpose poston dfferentaton Smlarly for that covarance matrx Σ Consder θ after Σ for convenence conv We use result that # $ n/ (π $ + log θ (π log θ # $ θ log θ + n/ # θ$ log (π + log θ θ πn/ + log θ θ n/ log θ θ log (π + θ ' ( θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ For maxmum value (set equaton to zero transpose changes poston after dfferentaton ote that transpose changes poston after dfferentaton e transpose changes poston after dfferentaton ote that transpose changes poston after dfferentaton result that We use result that transpose changes poston after dfferentaton ote * that esult that θ We use result that ML n/ $ We use result that θ ' ( θ ' ( But θ Σ and θσ Σ θ θ ' ( θ θ θ ' ( θ θ θ θ θ θ θ θ um value (set equaton tofor zero maxmum value (set equaton to zero θ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton m value (set equaton to zero Σ M S L For maxmum value (set equaton to zero θ θ * * matrx whch s sample covarance θ θ M L value(set M*L For maxmum equaton to zero * ˆ '4 ( θ

[ ] [ˆθ ] ML But θ Σ and Σ Σ ˆΣ ML S whch s sample covarance matrx! ote that ths form s based EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 5 Bas and Based Estmates We now have maxmum lkelhood values for pdf parameters hese values maxmze probablty of samples observed n tranng set he estmated class a pror probabltes he estmated mean and covarance matrces for a normal dstrbuton ˆµ ML x m ˆΣ ML ˆ ˆ [ Are se formulatons based? ] EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 6

Defnton: An estmate s unbased f ts expected value s ts true value Is maxmum lkelhood estmate of mean based? [ ] E x E[ˆµ E[x ] ( µ ML ] So ML mean s unbased EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 7 What s expected value of ML estmate of covarance matrx? [ ] E[ˆΣ ML ] E ˆ ˆ [ E ˆ ˆ ] E [ ( (ˆµ ( (ˆµ ] [ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 8

[ ] E [ ( (ˆµ ( (ˆµ ] [ ] E [ (ˆµ (ˆµ + (ˆµ (ˆµ ] [ ] [ [ ] However, [ snce x [ ˆµ E [ [ [ (ˆµ (ˆµ (ˆµ (ˆµ + (ˆµ (ˆµ ] [ ] E [ (ˆµ (ˆµ ] Σ E[(ˆµ (ˆµ ] So expected value of ML covarance matrx s actual covarance matrx mnus a small amount It s based EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 9 [ E[(ˆµ (ˆµ ] E ] E j j [ ] ] E samples are ndependent Σ In unvarate case, ths s equvalent to sayng: var σ var( x σ he varance of sample mean s varance dvded by number of samples EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 0

[ he ML covarance matrx s slghtly less than actual covarance matrx, but becomes a better estmate for larger values of E[ˆΣ ML ] Σ Σ Σ An unbased estmate of covarance matrx s n: ˆΣ u ˆΣ ML m m EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton