The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Similar documents
Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Maximum Likelihood Estimation (MLE)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Lecture 12: Classification

Lecture Nov

Mixture o f of Gaussian Gaussian clustering Nov

Clustering & Unsupervised Learning

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Clustering & (Ken Kreutz-Delgado) UCSD

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Generative classification models

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Classification learning II

Which Separator? Spring 1

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Pattern Classification

Statistical pattern recognition

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Error Probability for M Signals

Lecture Notes on Linear Regression

Homework Assignment 3 Due in class, Thursday October 15

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

e i is a random error

Classification as a Regression Problem

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Lecture 10 Support Vector Machines. Oct

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Learning from Data 1 Naive Bayes

15-381: Artificial Intelligence. Regression and cross validation

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Pattern Classification

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

The big picture. Outline

Lecture 10 Support Vector Machines II

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Evaluation for sets of classes

Composite Hypotheses testing

CHAPTER 3: BAYESIAN DECISION THEORY

Error Bars in both X and Y

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Maximal Margin Classifier

β0 + β1xi and want to estimate the unknown

Quantifying Uncertainty

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Uncertainty as the Overlap of Alternate Conditional Distributions

Lecture 3: Shannon s Theorem

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Linear Approximation with Regularization and Moving Least Squares

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Independent Component Analysis

Chapter 1. Probability

Expectation Maximization Mixture Models HMMs

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Communication with AWGN Interference

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Advanced Introduction to Machine Learning

Ensemble Methods: Boosting

STAT 3008 Applied Regression Analysis

3.1 ML and Empirical Distribution

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Credit Card Pricing and Impact of Adverse Selection

Engineering Risk Benefit Analysis

1 Motivation and Introduction

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Chapter 7 Channel Capacity and Coding

7. Multivariate Probability

Digital Modems. Lecture 2

A random variable is a function which associates a real number to each element of the sample space

Some modelling aspects for the Matlab implementation of MMA

CSE 252C: Computer Vision III

Support Vector Machines

Support Vector Machines

Logistic Regression Maximum Likelihood Estimation

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Estimation: Part 2. Chapter GREG estimation

Chapter 7 Channel Capacity and Coding

Hidden Markov Models

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Comparison of Regression Lines

Transcription:

he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD

Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s the rule that mnmzes the rsk Rsk E [ L X, ] X, gven, t conssts of pckng the predcton of mnmum condtonal rsk g M arg mn g X L [ g, ]

MA rule for the - loss L[ g,, y], g y g y the optmal decson rule s the mamum a-posteror probablty rule g arg ma X the assocated rsk s the probablty of error of ths rule Bayes error there s no other decson functon wth lower error 3

MA rule by applcaton of smple mathematcal laws Bayes rule, monotoncty of the we have shown that the followng three decson rules are optmal and equvalent arg ma X arg ma[ ] X 3 arg ma[ ] X s usually hard to use, 3 s frequently easer than 4

Eample the Bayes decson rule s usually hghly ntutve wehaveusedaneamplefromcommuncatons an eample a bt s transmtted by a source, corrupted by nose, and receved by a decoder channel X Q: what should the optmal decoder do to recover? 5

Eample ths was modeled as a classfcaton problem wth Gaussan classes X G,, X G,, G,, e π or, graphcally, 6

BDR for whch the optmal decson boundary s a threshold pck f < pck pck 7

BDR what s the pont of gong through all the math? now we know that the ntutve threshold s actually optmal, and n whch sense t s optmal mnmum probablty or error the Bayesan soluton keeps us honest. t forces us to make all our assumptons eplct assumptons we have made unform class probabltes Gaussanty the varance s the same under the two states G,, X, nose s addtve even for a trval problem, we have made lots of assumptons X ε 8

BDR what f the class probabltes are not the same? e g codng scheme 7 e.g. codng scheme 7 n ths case >> how does ths change the optmal decson rule? { } arg ma X arg ma e π arg ma π 9 mn arg

BDR or arg mn arg mn arg mn the optmal decson s, therefore k f mn arg pck f l < < or, pck f < <

BDR what s the role of the pror for class probabltes? < the pror moves the threshold up or down, n an ntutve way > : threshold ncreases snce has hgher probablty, we care more about errors on the sde by usng a hgher threshold we are makng t more lkely to pck f, all we care about s, the threshold becomes nfnte we never say how relevant s the pror? t s weghed by

BDR how relevant s the pror? t s weghed by the nverse of the normalzed dstance between the means dstance between the means n unts of varance f the classes are very far apart, the pror makes no dfference ths s the easy stuaton, the observatons are very clear, Bayes says forget the pror knowledge f the classes are eactly equal same mean the pror gets nfnte weght n ths case the observatons do not say anythng about the class, Bayes says forget about the data, just use the knowledge that you started wth even f that means always say or always say

he Gaussan classfer ths s one eample of a Gaussan classfer n practce we rarely have only one varable n practce we rarely have only one varable typcally X X,, X n s a vector of observatons the BDR for ths case s equvalent, but more nterestng q g the central dfferent s the class-condtonal dstrbutons are multvarate Gaussan X ep d π 3

he Gaussan classfer n ths case the BDR ep d X π the BDR [ ] arg ma X becomes l l arg ma d 4 d π

he Gaussan classfer ths can be wrtten as [ d, α ] arg mn wth dscrmnant: X.5 d, y y y α π d the optmal rule s to assgn to the closest class closest s measured wth the Mahalanobs dstance d,y to whch the α constant s added to account for the class pror 5

he Gaussan classfer frst specal case of nterest: all classes have the same covarance, the BDR becomes, [ d, α ] arg mn wth d, y y y same metrc for all classes α d π constant, not functon of, can be dropped 6

he Gaussan classfer n detal [ ] arg mn [ ] arg mn [ ] mn arg [ ] arg mn l 444 4 3 44 4 3 arg ma w w 7

he Gaussan classfer n summary, dscrmnant: X.5 arg ma g wth g w w w w the BDR s a lnear functon or a lnear dscrmnant 8

46