UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$

Similar documents
UVA CS / Introduc8on to Machine Learning and Data Mining

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Logistic Classifier CISC 5800 Professor Daniel Leeds

Generative classification models

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 12: Bayes Classifiers. Dr. Yanjun Qi. University of Virginia

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Classification learning II

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Homework Assignment 3 Due in class, Thursday October 15

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Discriminative classifier: Logistic Regression. CS534-Machine Learning

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Logistic regression models 1/12

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Pattern Classification

Composite Hypotheses testing

Decision Analysis (part 2 of 2) Review Linear Regression

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

15-381: Artificial Intelligence. Regression and cross validation

Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE)

10-701/ Machine Learning, Fall 2005 Homework 3

Bayesian classification CISC 5800 Professor Daniel Leeds

Classification as a Regression Problem

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Clustering gene expression data & the EM algorithm

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Evaluation of classifiers MLPs

Learning from Data 1 Naive Bayes

EM and Structure Learning

The big picture. Outline

Bayes (Naïve or not) Classifiers: Generative Approach

Support Vector Machines

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$9:$Support$Vector$Machine$ (Cont.$Revised$Advanced$Version)$

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Limited Dependent Variables

Lecture 12: Classification

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Gaussian process classification: a message-passing viewpoint

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Binomial Distribution: Tossing a coin m times. p = probability of having head from a trial. y = # of having heads from n trials (y = 0, 1,..., m).

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Lecture Notes on Linear Regression

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Classification Bayesian Classifiers

Professor Chris Murray. Midterm Exam

Recitation 2. Probits, Logits, and 2SLS. Fall Peter Hull

Pattern Classification (II) 杜俊

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Statistical pattern recognition

Bayesian Decision Theory

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Support Vector Machines

Probabilistic Classification: Bayes Classifiers 2

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

Chapter 14: Logit and Probit Models for Categorical Response Variables

CHAPTER 3: BAYESIAN DECISION THEORY

Support Vector Machines

Lecture 10 Support Vector Machines II

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Data Abstraction Form for population PK, PD publications

Hidden Markov Models

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Time to dementia onset: competing risk analysis with Laplace regression

Basically, if you have a dummy dependent variable you will be estimating a probability.

Probability Theory (revisited)

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Probabilistic & Unsupervised Learning. Introduction and Foundations

Course 395: Machine Learning - Lectures

1 Binary Response Models

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Support Vector Machines

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

1 Convex Optimization

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Week 5: Neural Networks

Low default modelling: a comparison of techniques based on a real Brazilian corporate portfolio

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Clustering & Unsupervised Learning

Evaluation for sets of classes

Transcription:

Dr.YanjunQ/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachneLearnng Lecture15:LogsAcRegresson/ GeneraAvevs.DscrmnaAve 10/21/15 Dr.YanjunQ UnverstyofVrgna Departmentof ComputerScence 1 Wherearewe?! FvemajorsecHonsofthscourse " Regresson(supervsed " ClassfcaHon(supervsed " Unsupervsedmodels " Learnngtheory " Graphcalmodels Dr.YanjunQ/UVACS6316/f15 10/21/15 2

Wherearewe?! hreemajorsechonsforclassfcahon We can dvde the large varety of classfcaton approaches nto roughly three major types 1. Dscrmnatve - drectly estmate a decson rule/boundary - e.g., logstc regresson, support vector machne, decsonree 2. Generatve: - buld a generatve statstcal model - e.g., naïve bayes classfer, Bayesan networks 3. Instance based classfers - Use observaton drectly (no models - e.g. K nearest neghbors Dr.YanjunQ/UVACS6316/f15 10/21/15 3 C3 Dr.YanjunQ/UVACS6316/f15 ADatasetfor classfcahon C3 Output as Dscrete Class Label C 1, C 2,, C L GeneraHve DscrmnaHve argmax P(C X = argmax C C P(C X C = c 1,,c L P(X,C = argmax P(X CP(C C Data/ponts/nstances/examples/samples/records:[rows] Features/a0rbutes/dmensons/ndependent3varables/covarates/predctors/regressors:[columns,exceptthelast] arget/outcome/response/label/dependent3varable:specalcolumntobepredcted[lastcolumn] 10/21/15 4

Dr.YanjunQ/UVACS6316/f15 Establshng a probablstc model for classfcaton (cont. (1 Generatve model P x c ( 1 argmax C = argmax C P( x c2 P(C X = argmax P(X,C C P(X CP(C P( x cl Generatve Probablstc Model for Class 1 Generatve Probablstc Model for Class 2 Generatve Probablstc Model for Class L x1 x2 x p x1 x2 x p x1 x2 x p x = (x 1, x 2,, x p 10/21/15 AdaptfromProf.KeChenNBsldes 5 Establshng a probablstc model for classfcaton (2 Dscrmnatve model P(C X C = c 1,,c L, X = (X 1,, X n P ( c 1 x P ( c 2 x P( c L x Dr.YanjunQ/UVACS6316/f15 Dscrmnatve Probablstc Classfer x1 x2 x = (x 1, x 2,, x n 10/21/15 AdaptfromProf.KeChenNBsldes 6 xn

oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # GeneraHvevs.DscrmnaHve 10/21/15 7 Dr.YanjunQ/UVACS6316/f15 MulHvaratelnearregressonto LogsHcRegresson y = α + β1 x1 + β2x2 +... + βx Dependent Independentvarables Predcted Predctorvarables Responsevarable Explanatoryvarables Outcomevarable Covarables LogsHcregressonfor bnaryclassfcahon P( y x ln 1 P( y x! = α + β x + β x +...+ β x 1 1 2 2 p p 10/21/15 8

! y {0,1} (1Lneardecsonboundary P( y x ln 1 P( y x! = α + β x + β x +...+ β x 1 1 2 2 p p Dr.YanjunQ/UVACS6316/f15 (2p(y x 10/21/15 9 Dr.YanjunQ/UVACS6316/f15 helogshcfunchon(1 eesacommon"s"shapefunc e.g. Probabltyof dsease P (Y=1 X 1.0 0.8 α+ βx e P(y x = α+ βx 1+ e 0.6 0.4 0.2 0.0 10/21/15 10 x

RECAP:ProbablsHcInterpretaHon oflnearregresson Dr.YanjunQ/UVACS6316/f15 Letusassumethatthetargetvarableandthenputsare relatedbytheequahon: y = θ x + ε whereεsanerrortermofunmodeledeffectsorrandomnose Nowassumethatε3followsaGaussanN(0,σ,thenwe have: 2 1 ( y θ x p( y ; θ = exp x 2 2πσ 2σ ByIIDassumpHon!lkelhood!MLEesHmator n n n 1 L( θ = p( y x; θ = exp = 2πσ 1 1 1 n 2 10/21/15 l( θ = nlog = ( y 2 1 θ x 2πσ σ 2 = 1 1 2 ( y θ x 2 σ 2 11 Dr.YanjunQ/UVACS6316/f15 LogsHcRegresson when? LogsHcregressonmodelsareappropratefortarget varablecodedas0/1. Weonlyobserve 0 and 1 forthetargetvarable but wethnkofthetargetvarableconceptuallyasa probabltythat 1 wlloccur. hs means we use Bernoull dstrbuton to model the target varable wth ts Bernoull parameter p=p(y=1 x predefned. he man nterest! predctng the probablty that an event occurs (.e., the probablty that p(y=1 x. 10/21/15 12

DscrmnaHve e.g. Probabltyof dsease LogsHcregressonmodelsfor bnarytargetvarablecoded0/1. P (C=1 X 1.0 0.8 Dr.YanjunQ/UVACS6316/f15 0.6 logshcfunchon 0.4 0.2 LogtfuncHon 0.0 eα+βx P(c =1 x = 1+ e α+βx DecsonBoundary!equalstozero! P(c =1 x! P(c =1 x ln# & = ln# & = α + β 1 x 1 + β 2 x 2 +... + β p x p 10/21/15 13 " P(c = 0 x % " 1 P(c =1 x % x helogshcfunchon(2 α + e P( y x = 1 + e P( y x ln = α + βx 1 P( y x { βx α + βx LogtofP(y x Dr.YanjunQ/UVACS6316/f15 10/21/15 14

Dr.YanjunQ/UVACS6316/f15 From probablty to logt,.e. log odds (and back agan p z = log 1 p!!!!!!!!!!logt!/!log!odd!functon! p = p 1 p = ez!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ez 1+ e = 1 z 1+ e!!!!!!!logstc!functon z 10/21/15 15 helogshcfunchon(3 Advantagesofthelogt SmpletransformaHonofP(y x LnearrelaHonshpwthx Dr.YanjunQ/UVACS6316/f15 CanbeconHnuous(Logtbetweenenfto+nfnty DrectlyrelatedtothenoHonoflogoddsoftarget event P ln = α + βx 1- P P 1- P = e α+βx 10/21/15 16

LogsHcregresson BnaryoutcometargetvarableY Dr.YanjunQ/UVACS6316/f15 10/21/15 17 Dr.YanjunQ/UVACS6316/f15 LogsHcRegressonAssumpHons Lneartynthelogt theregresson equahonshouldhavealnearrelahonshp wththelogtformofthetargetvarable heresnoassumphonaboutthefeature varables/predctorsbenglnearlyrelated toeachother. 10/21/15 18

Bnary Logstc Regresson Dr.YanjunQ/UVACS6316/f15 In summary that the logstc regresson tells us two thngs at once. ransformed, the log odds (logt are lnear. ln[p/(1-p] Odds=3p/(1=p 3 Logstc Dstrbuton P (Y=1 x x hs means we use Bernoull dstrbuton to model the target varable wth ts Bernoull parameter p=p(y=1 x predefned. 10/21/15 19 x p 1ep Bnary!MulHnomal LogsHcRegressonModel Dr.YanjunQ/UVACS6316/f15 DrectlymodelstheposterorprobablHesastheoutputofregresson exp( βk 0 + βk x Pr( G = k X = x =, K 1 1+ exp( β + β x Pr( G = K X = x = 1+ l= 1 K 1 l= 1 1 l0 exp( β l0 l + β x l k = 1,, K 1 10/21/15 xspedmensonalnputvector \beta k sapedmensonalvectorforeachk3 3 otalnumberofparameterss(ke1(p+1 Notethattheclassboundaresarelnear 20

oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # ParameteresHmaHon # GeneraHvevs.DscrmnaHve 10/21/15 21 ParameterEsHmaHonforLR!MLEfromthedata Dr.YanjunQ/UVACS6316/f15 RECAP:Lnearregresson!Leastsquares LogsHcregresson:!Maxmumlkelhood eshmahon 10/21/15 22

RECAP:ProbablsHcInterpretaHon oflnearregresson(cont. Hencethelogelkelhoods: Dr.YanjunQ/UVACS6316/f15 1 1 1 n l( θ = nlog = ( y 2 1 θ x 2πσ σ 2 2 MLE Doyourecognzethelastterm? n Yests: 1 J ( θ = ( x θ y 2 = 1 husunderndependenceassumphon,resdualsquare error(rrssequvalenttomleof\theta3! 2 10/21/15 23 YanjunQ/UVACS4501e01e6501e07 MLEforLogsHcRegressonranng Let sftthelogshcregressonmodelfork=2,.e.,numberofclassess2 ranngset:(x,y,=1,,n3 ForBernoulldstrbuHon p(y x y (1 p 1 y (condhonal Logelkelhood: How? N l(β= {logpr(y = y X = x } N =1 = y log(pr(y = 1 X = x +(1 y log(pr(y = 0 X = x =1 N = ( y log exp(β x 1+ exp(β x +(1 y log 1 1+ exp(β x =1 N = ( y β x log(1+ exp(β x! =1 x are(p+1edmensonalnputvectorwthleadngentry1 \betasa(p+1edmensonalvector 10/21/15 WewanttomaxmzethelogelkelhoodnordertoesHmate\beta3 24

Dr.YanjunQ/UVACS6316/f15 N l(β= {logpr(y = y X = x }! =1 10/21/15 25 Dr.YanjunQ/UVACS6316/f15 NewtoneRaphsonforLR(opHonal l( β = β N = 1 ( y exp( β x x 1+ exp( β x = 0 (p+1nonelnearequahonstosolvefor(p+1unknowns SolvebyNewtoneRaphsonmethod: where, ( 2 l(β β β = - β new β old [( 2 l(β β β ]-1 l(β β, N =1 x x ( exp(β x 1+ exp(β x ( 1 1+ exp(β x mnmzesaquadrahcapproxmahon tothefunchonwearereallynterestedn. 10/21/15 p(x ;β 1ep(x ;β 26

NewtoneRaphsonforLR N l(β β = (y exp(β x 1+ exp(β x x = X (y p =1 Dr.YanjunQ/UVACS6316/f15 x 1 x2 X =! xn So,NRrulebecomes: N by ( p+ 1 y1 2, y y =! yn ( 2 l(β β β = X WX, N by 1 β new β old + ( X exp( β x1 /(1 + exp( β x 1 exp( β x2 /(1 + exp( β x2 p =! exp( β xn /(1 + exp( β xn X : N (p + 1 matrx of x y : N 1 matrx of y p : N 1 matrx of p( x ; β W : N N old dagonal matrx of WX, N by 1 p( x ; β old 1 X (1 p( x ; β ( y p, old 10/21/15 exp( β x 1 ( (1 (1+ exp( β x (1+ exp( β x 27 NewtoneRaphsonforLR Dr.YanjunQ/UVACS6316/f15 NewtoneRaphson 10/21/15 β new = ( X = ( X = β old WX WX + ( X 1 1 X X Wz Adjustedresponse z = Xβ old + W WX W ( Xβ 1 ( y 1 + W ( y p ( y p IteraHvelyreweghtedleastsquares(IRLS new β arg mn( z Xβ W ( z Xβ β arg mn( y p β old p X W 1 1 ( y p Reexpressng Newtonstepas weghtedleast squarestep 28

YanjunQ/UVACS4501e01e6501e07 Logstc Regresson ask classfcaton Representaton Score Functon Log-odds = lnear functon of X s EPE, wth condtonal Log-lkelhood Search/Optmzaton Iteratve (Newton method Models, Parameters Logstc weghts eα+βx P(c =1 x = 1+ e α+βx 10/21/15 29 oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # GeneraHvevs.DscrmnaHve 10/21/15 30

Dscrmnatve vs. Generatve GeneraHveapproach emodelthejontdstrbuhonp(x,cusng p(x C=c k andp(c=c k DscrmnaHveapproach Classpror emodelthecondhonaldstrbuhonp(c X drectly e.g., Pr Dscrmnatve vs. Generatve LogsHcRegresson Gaussan Heght

LDAvs.LogsHcRegresson Dr.YanjunQ/UVACS6316/f15 10/21/15 33 Dscrmnatve vs. Generatve DefnHons h gen andh ds :generahveanddscrmnahve classfers h gen,nf andh ds,nf :sameclassfersbuttranedon theenhrepopulahon(asymptohcclassfers n nfnty,h gen h gen,nf andh ds h ds,nf Ng,Jordan,."OndscrmnaHvevs.generaHveclassfers:A comparsonoflogshcregressonandnavebayes."advances3n3 neural3nformahon3processng3systems14(2002:841.

Dscrmnatve vs. Generatve ProposHon1: ProposHon2: ep:numberofdmensons en:numberofobservahons eϵ:generalzahonerror Logstc Regresson vs. NBC DscrmnaHveclassfer(LogsHcRegresson esmallerasymptohcerror eslowconvergence~o(p GeneraHveclassfer(NaveBayes elargerasymptohcerror ecanhandlemssngdata(em efastconvergence~o(lg(p

generalzahonerror Ng,Jordan,."OndscrmnaHvevs.generaHveclassfers:A comparsonoflogshcregressonandnavebayes."advances3n3 neural3nformahon3processng3systems14(2002:841. LogsHcRegresson NaveBayes Szeoftranngset generalzahonerror Szeoftranngset Xue,JngeHao,andD.Mchael}erngton."Commenton OndscrmnaHvevs.generaHveclassfers:Acomparson oflogshcregressonandnavebayes."neural3processng3le0ers28.3(2008:169e187.

Dscrmnatve vs. Generatve Emprcally,generaHveclassfersapproach therasymptohcerrorfasterthan dscrmnahveones Goodforsmalltranngset Handlemssngdatawell(EM Emprcally,dscrmnaHveclassfershave lowerasymptohcerrorthangenerahveones Goodforlargertranngset References " Prof.an,Stenbach,Kumar s IntroducHon todatamnng slde " Prof.AndrewMoore ssldes " Prof.ErcXng ssldes YanjunQ/UVACS4501e01e6501e07 " HasHe,revor,etal.he3elements3of3 stahshcal3learnng.vol.2.no.1.newyork: Sprnger,2009. 10/21/15 40