Generative classification models

Similar documents
Classification learning II

Evaluation of classifiers MLPs

Generative classification models

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Multi-layer neural networks

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Machine learning: Density estimation

Multilayer neural networks

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Probabilistic Classification: Bayes Classifiers 2

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Support Vector Machines

Support Vector Machines

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Classification as a Regression Problem

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

15-381: Artificial Intelligence. Regression and cross validation

Homework Assignment 3 Due in class, Thursday October 15

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Classification : Logistic regression. Generative classification model.

SDMML HT MSc Problem Sheet 4

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lecture 12: Classification

Mixture o f of Gaussian Gaussian clustering Nov

Composite Hypotheses testing

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Statistical pattern recognition

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Evaluation for sets of classes

Maximum Likelihood Estimation (MLE)

Chapter 14 Simple Linear Regression

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

10-701/ Machine Learning, Fall 2005 Homework 3

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Supervised learning: Linear regression Logistic regression

Pattern Classification

I. Decision trees II. Ensamble methods: Mixtures of experts

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Relevance Vector Machines Explained

Lecture Notes on Linear Regression

Which Separator? Spring 1

β0 + β1xi and want to estimate the unknown

Multilayer Perceptron (MLP)

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Learning from Data 1 Naive Bayes

Logistic Classifier CISC 5800 Professor Daniel Leeds

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Week 5: Neural Networks

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$

Support Vector Machines

A REVIEW OF ERROR ANALYSIS

Support Vector Machines

1 Convex Optimization

Pattern Classification

Lecture Nov

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 12: Bayes Classifiers. Dr. Yanjun Qi. University of Virginia

EM and Structure Learning

Support Vector Machines

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Machine Learning for Signal Processing Linear Gaussian Models

Linear Classification, SVMs and Nearest Neighbors

The exam is closed book, closed notes except your one-page cheat sheet.

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Conjugacy and the Exponential Family

Expectation Maximization Mixture Models HMMs

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Lecture 10: Dimensionality reduction

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

CSE 252C: Computer Vision III

e i is a random error

Bayesian belief networks

Hydrological statistics. Hydrological statistics and extremes

Gaussian process classification: a message-passing viewpoint

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Intro to Visual Recognition

Binomial Distribution: Tossing a coin m times. p = probability of having head from a trial. y = # of having heads from n trials (y = 0, 1,..., m).

Hidden Markov Models

Learning with Maximum Likelihood

Decision Analysis (part 2 of 2) Review Linear Regression

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

5. POLARIMETRIC SAR DATA CLASSIFICATION

Nonlinear Classifiers II

Transcription:

CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn f : X Y Bnar classfcaton A specal case when Y {,} Frst step: we need to devse a model of the functon f

Dscrmnant functons A common wa to represent a classfer s b usng Dscrmnant functons Works for both the bnar and mult-wa classfcaton Idea: For ever class =,, k defne a functon g mappng X When the decson on nput should be made choose the class wth the hghest value of g * arg ma g Logstc regresson model Dscrmnant functons: g g w g g w Values of dscrmnant functons var n nterval [,] Probablstc nterpretaton f,w w, g g w w w w z, w Input vector w d d

Logstc regresson We learn a probablstc functon f : X [,] where f descrbes the probablt of class gven f, w g w, w Note that:, w, w Makng decsons wth the logstc regresson model: If / then choose Else choose When does the logstc regresson fal? Quadratc decson boundar s needed 3 Decson boundar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Machne Learnng 3

When does the logstc regresson fal? Another eample of a non-lnear decson boundar 5 4 3 - - -3-4 -4-3 - - 3 4 5 CS 75 Machne Learnng Non-lnear etenson of logstc regresson use feature bass functons to model nonlneartes the same trck as used for the lnear regresson Lnear regresson f w w j j m j j - an arbtrar functon of w w w Logstc regresson f g w w j j m j d m w m CS 75 Machne Learnng 4

Generatve approach to classfcaton Logstc regresson: Represents and learns a model of An eample of a dscrmnatve approach Generatve approach:. Represents and learns the jont dstrbuton,. Uses t to defne probablstc dscrmnant functons E.g. g o g How? pcall the jont s,,, Generatve approach to classfcaton pcal jont model, = Class-condtonal dstrbutons denstes bnar classfcaton: two class-condtonal dstrbutons = Prors on classes probablt of class for bnar classfcaton: Bernoull dstrbuton 5

6 Quadratc dscrmnant analss QDA Model: Class-condtonal dstrbutons multvarate normal dstrbutons Prors on classes class, Bernoull dstrbuton ep / / Σ Σ,Σ d p for, ~ N Σ for, ~ N Σ Multvarate normal, ~ Σ N p, ~ Bernoull {,} Learnng of parameters of the QDA model Denst estmaton n statstcs We see eamples we do not know the parameters of Gaussans class-condtonal denstes ML estmate of parameters of a multvarate normal for a set of n eamples of Optmze log-lkelhood: How about class prors? ep, / / Σ Σ Σ d p n n ˆ n n ˆ ˆ ˆ Σ log,, Σ, Σ n p D l, Σ N

Learnng Quadratc dscrmnant analss QDA Learnng Class-condtonal dstrbutons Learn parameters of multvarate normal dstrbutons ~ ~ N, Σ for N, Σ for Use the denst estmaton methods Learnng Prors on classes class, ~ Bernoull Learn the parameter of the Bernoull dstrbuton Agan use the denst estmaton methods, {,} QDA.5.5 g g -.5 - -.5 - - -.5 - -.5.5.5 7

Gaussan class-condtonal denstes. QDA: Makng class decson Bascall we need to desgn dscrmnant functons Posteror of a class choose the class wth better posteror probablt then = g g else =, Σ It s suffcent to compare:, Σ, Σ, Σ, Σ 8

QDA: Quadratc decson boundar Contours of class-condtonal denstes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 QDA: Quadratc decson boundar 3 Decson boundar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 9

Lnear dscrmnant analss LDA Assume covarances are the same ~ N, Σ, ~ N, Σ, LDA: Lnear decson boundar Contours of class-condtonal denstes.5.5 -.5 - -.5 - - -.5 - -.5.5.5

LDA: lnear decson boundar Decson boundar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Generatve classfcaton models Idea:. Represent and learn the dstrbuton,. Use t to defne probablstc dscrmnant functons E.g. g o g pcal model, = Class-condtonal dstrbutons denstes bnar classfcaton: two class-condtonal dstrbutons = Prors on classes - probablt of class bnar classfcaton: Bernoull dstrbuton

Naïve Baes classfer A generatve classfer model wth an addtonal smplfng assumpton One of the basc ML classfcaton models ver often performs ver well n practce All nput attrbutes are condtonall ndependent of each other gven the class. So we have:, d p p d d Learnng parameters of the model Much smpler denst estmaton problems We need to learn: and and Because of the assumpton of the condtonal ndependence we need to learn: for ever varable : and Much easer f the number of nput attrbutes s large Also, the model gves us a fleblt to represent nput attrbutes of dfferent forms!!! E.g. one attrbute can be modeled usng the Bernoull, the other as Gaussan denst, or as a Posson dstrbuton

Makng a class decson for the Naïve Baes Dscrmnant functons Posteror of a class choose the class wth better posteror probablt then = else = d d, d,, Net: two nterestng questons wo models wth lnear decson boundares: Logstc regresson LDA model Gaussans wth the same covarance matrces ~ N, for ~ N, for Queston: Is there an relaton between the two models? wo models wth the same gradent: Lnear model for regresson Logstc regresson model for classfcaton have the same gradent update n w w f Queston: Wh s the gradent the same? 3

Logstc regresson and generatve models wo models wth lnear decson boundares: Logstc regresson Generatve model wth Gaussans wth the same covarance matrces ~ N, for ~ N, for Queston: Is there an relaton between the two models? Answer: Yes, the two models are related!!! When we have Gaussans wth the same covarance matr the probablt of gven has the form of a logstc regresson model!!!,,, Σ g w CS 75 Machne Learnng Logstc regresson and generatve models Members of the eponental faml can be often more naturall descrbed as θ f θ,φ h, φep θ - A locaton parameter A θ a φ Clam: A logstc regresson s a correct model when class condtonal denstes are from the same dstrbuton n the eponental faml and have the same scale factor φ Ver powerful result!!!! We can represent posterors of man dstrbutons wth the same small logstc regresson model φ - A scale parameter CS 75 Machne Learnng 4

Lnear regresson w w w he gradent puzzle f Logstc regresson f w f, w g w w w w z f w d w d d d Gradent update: n w w f Onlne: CS 75 Machne Learnng Gradent update: w w f he same w w f Onlne: n w w f he gradent puzzle he same smple gradent update rule derved for both the lnear and logstc regresson models Where the magc comes from? Under the log-lkelhood measure the functon models and the models for the output selecton ft together: Lnear model + Gaussan nose Gaussan nose w ~ N, w w w w d w Logstc + Bernoull Bernoull g w d w w w w d z g w Bernoull tral d CS 75 Machne Learnng 5

Generalzed lnear models GLIMs Assumptons: he condtonal mean epectaton s: f w Where f. s a response functon Output s characterzed b an eponental faml dstrbuton wth a condtonal mean Gaussan nose w Eamples: Lnear model + Gaussan nose w ~ N, Logstc + Bernoull Bernoull g w e w CS 75 Machne Learnng d d w w w w d w w w d z w g w Bernoull tral Generalzed lnear models GLIMs A canoncal response functons f. : encoded n the samplng dstrbuton θ θ,φ h, φep Leads to a smple gradent form Eample: Bernoull dstrbuton p log Logstc functon matches the Bernoull CS 75 Machne Learnng A θ a φ ep log log e 6

Evaluaton of classfers ROC CS 75 Machne Learnng Evaluaton For an data set we use to test the classfcaton model on we can buld a confuson matr: Counts of eamples wth: class label that are classfed wth a label predct j 4 target 7 54 7

Evaluaton For an data set we use to test the classfcaton model on we can buld a confuson matr: Counts of eamples wth: class label that are classfed wth a label predct j 4 target 7 54 Evaluaton For an data set we use to test the model we can buld a confuson matr: target 4 7 predct 54 Accurac = 94/3 Error = 37/3 = - Accurac CS 75 Machne Learnng 8

Evaluaton for bnar classfcaton Entres n the confuson matr for bnar classfcaton have names: target P FP predct FN N P: rue postve ht FP: False postve false alarm N: rue negatve correct rejecton FN: False negatve a mss Addtonal statstcs Senstvt recall Specfct SENS SPEC P P FN N N FP Postve predctve value precson P PP P FP Negatve predctve value N NPV N FN 9

Bnar classfcaton: addtonal statstcs Confuson matr target predct 4 8 SENS 4/6 SPEC 8/9 PPV 4/5 NPV 8/ Row and column quanttes: Senstvt SENS Specfct SPEC Postve predctve value PPV Negatve predctve value NPV Classfers Project dataponts to one dmensonal space: Defned for eample b: w or =,w.5 Decson boundar w.5 w > -.5 - -.5 w < Decson boundar w = - - -.5 - -.5.5.5

Bnar decsons: Recever Operatng Curves..8.6.4. * -. - -5 - -5 5 5 Probabltes: SENS SPEC threshold p * p * Recever Operatng Characterstc ROC ROC curve plots : SN= * -SP= * for dfferent *..8.6.4. * -. - -5 - -5 5 5 SENS p *.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 -SPEC p *

ROC curve..8...8.8 Case Case Case 3.6.6.6.4.4.4... -. - -5 - -5 5 5 -. - -5 - -5 5 5 -. - -5 - -5 5 5 p *.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 p * Recever operatng characterstc ROC shows the dscrmnablt between the two classes under dfferent decson bases Decson bas can be changed usng dfferent loss functon Qualt of a classfcaton model: Area under the ROC Best value, worst no dscrmnablt:.5