The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Similar documents
The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Maximum Likelihood Estimation (MLE)

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Pattern Classification

Pattern Classification

Classification learning II

Lecture 12: Classification

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Evaluation of classifiers MLPs

Which Separator? Spring 1

Clustering & Unsupervised Learning

Error Bars in both X and Y

Statistical pattern recognition

Review: Fit a line to N data points

Generative classification models

Clustering & (Ken Kreutz-Delgado) UCSD

Communication with AWGN Interference

Support Vector Machines

Support Vector Machines

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Lecture Nov

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Maximal Margin Classifier

Optimization. Nuno Vasconcelos ECE Department, UCSD

Lecture 10 Support Vector Machines. Oct

( ) [ ] MAP Decision Rule

Mixture o f of Gaussian Gaussian clustering Nov

The big picture. Outline

15-381: Artificial Intelligence. Regression and cross validation

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Support Vector Machines CS434

Section 8.3 Polar Form of Complex Numbers

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

β0 + β1xi. You are interested in estimating the unknown parameters β

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Logistic Classifier CISC 5800 Professor Daniel Leeds

Introduction to the Introduction to Artificial Neural Network

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Normal Random Variable and its discriminant functions

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Lecture Notes on Linear Regression

Kernel Methods and SVMs Extension

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Multi-layer neural networks

Fisher Linear Discriminant Analysis

Neural networks. Nuno Vasconcelos ECE Department, UCSD

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

VQ widely used in coding speech, image, and video

Composite Hypotheses testing

PHYS 705: Classical Mechanics. Calculus of Variations II

β0 + β1xi. You are interested in estimating the unknown parameters β

Absolute chain codes. Relative chain code. Chain code. Shape representations vs. descriptors. Start

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Classification as a Regression Problem

7. Multivariate Probability

Differentiating Gaussian Processes

Linear Classification, SVMs and Nearest Neighbors

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Advanced Introduction to Machine Learning

Pattern. Classification

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

Unit 5: Quadratic Equations & Functions

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

Structure from Motion. Forsyth&Ponce: Chap. 12 and 13 Szeliski: Chap. 7

Support Vector Machines

Parameter estimation class 5

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

INF 4300 Digital Image Analysis REPETITION

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Flux-Uncertainty from Aperture Photometry. F. Masci, version 1.0, 10/14/2008

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

Multilayer Perceptron (MLP)

Chapter 6 Support vector machine. Séparateurs à vaste marge

where v means the change in velocity, and t is the

Lecture 3: Probability Distributions

Heteroscedastic Variance Covariance Matrices for. Unbiased Two Groups Linear Classification Methods

Ensemble Methods: Boosting

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Probabilistic Classification: Bayes Classifiers. Lecture 6:

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Engineering Risk Benefit Analysis

15 Lagrange Multipliers

Video Data Analysis. Video Data Analysis, B-IT

CSE 252C: Computer Vision III

Evaluation for sets of classes

Transcription:

he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD

Bayesan decson theory recall that e have state of the orld X observatons decson functon L[,y] loss of predctn y th Bayes decson rule s the rule that mnmzes the rsk Rsk for the - loss E [ L X, ] X, L[, y],, y y

MA rule the optmal decson rule can be rtten as * ar ma X [ ] * ar ma X 3 [ l ] * ar ma X e have started to study the case of Gaussan classes X ep d π 3

he Gaussan classfer BDR can be rtten as [ d, α ] * ar mn th dscrmnant: X.5 d, y y y α π d the optmal rule s to assn to the closest class closest s measured th the Mahalanobs dstance d,y to hch the α constant s added to account for the class pror 4

he Gaussan classfer, If then dscrmnant: X.5 * ar ma th the BDR s a lnear functon or a lnear dscrmnant 5

Geometrc nterpretaton classes, share a boundary f there s a setofsuch that there s a set of such that or 6

Geometrc nterpretaton note that can be rtten as net e use net, e use 7

Geometrc nterpretaton hch can be rtten as usn ths n 8

Geometrc nterpretaton leads to 3 4444444444 4 3 4444444444 4 b b ths s the equaton of the hyper-plane of parameters 9 hyper plane of parameters and b

Geometrc nterpretaton hch can also be rtten as or

Geometrc nterpretaton ths s the equaton of the hyper-plane of normal vector that passes throuh n 3 optmal decson boundary for Gaussan classes, equal covarance

Geometrc nterpretaton specal case I optmal boundary has I

Geometrc nterpretaton ths s vector alon the lne throuh and Gaussan classes, equal covarance I 3

Geometrc nterpretaton for equal pror probabltes optmal boundary: - plane throuh mdpont beteen and - orthoonal to the lne that ons and md-pont beteen and Gaussan classes, equal covarance I 4

Geometrc nterpretaton dfferent pror probabltes moves alon lne throuh and Gaussan classes, equal covarance I 5

Geometrc nterpretaton hat s the effect of the pror? moves aay from f > makn t more lkely to pck Gaussan classes, equal covarance I 6

Geometrc nterpretaton hat s the strenth of ths effect? nversely proportonal to the dstance beteen means n unts of standard devaton Gaussan classes, equal covarance I 7

Geometrc nterpretaton note the smlartes th scalar case, here hle here e have < hle here e have 8 hyper-plane s the hh-dmensonal verson of the threshold!

Geometrc nterpretaton boundary hyper-plane p n,, and 3D for varous pror confuratons 9

Geometrc nterpretaton specal case optmal boundary bascally the same, strenth of the pror nversely proportonal to Mahalanobs dstance beteen means s multpled by -, hch chanes ts drecton and the slope of the hyper-plane

Geometrc nterpretaton equal but arbtrary covarance Gaussan classes, equal covarance

Geometrc nterpretaton n the homeork you ll sho that the separatn plane s tanent to the pdf so-contours at Gaussan classes, equal covarance reflects the fact that the natural dstance s no Mahalanobs

Geometrc nterpretaton boundary hyperplane n,, and 3D for varous pror confuratons 3

Geometrc nterpretaton hat about the enerc case here covarances are dfferent? n ths case [ ] d α, ar mn *, y y y d there s not much to smplfy d π α 4

Geometrc nterpretaton and l l hch can be rtten as W W for classes the decson boundary s hyper-quadratc ths could mean hyper-plane par of hyper-planes hyper- 5 ths could mean hyper-plane, par of hyper-planes, hyperspheres, hyper-elpsods, hyper-hyperbolods, etc.

Geometrc nterpretaton n and 3D: 6

he smod e have derved all of ths from the -based BDR [ ] l l * h th l t l t l t t t [ ] ar ma X hen there are only to classes, t s also nterestn to look at the ornal defnton ar ma * th ar ma X X X X 7 X X X

he smod note that ths can be rtten as ar ma * ar ma X and, for Gaussan classes, the posteror probabltes are X { } ep α α d d here, as before,, y y y d 8 d π α

he smod the posteror ep d d α α s a smod and looks lke ths { } dscrmnant: C.5 9

he smod the smod d appears n neural netorks t s the true posteror for Gaussan problems here the covarances are the same Equal varances Snle boundary at halfay beteen means 3

he smod but not necessarly hen the covarances are dfferent Varances a are dfferent e o boundares 3

Bayesan decson theory advantaes: BDR s optmal and cannot be beaten Bayes keeps you honest models reflect causal nterpretaton of the problem, ths s ho e thnk natural decomposton nto hat e kne already pror and hat data tells us CCD no need for heurstcs to combne these to sources of nfo BDR s, almost nvarably, ntutve Bayes rule, chan rule, and marnalzaton enable modularty, and scalablty to very complcated models and problems problems: BDR s optmal only nsofar the models are correct. 3

33