Generative classification models

Similar documents
Classification : Logistic regression. Generative classification model.

Generative classification models

Supervised learning: Linear regression Logistic regression

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Linear regression (cont.) Linear methods for classification

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Classification learning II

Bayes (Naïve or not) Classifiers: Generative Approach

Binary classification: Support Vector Machines

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Linear regression (cont) Logistic regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Support vector machines II

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Bayesian belief networks

Dimensionality reduction Feature selection

Point Estimation: definition of estimators

Part I: Background on the Binomial Distribution

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Linear models for classification

Econometric Methods. Review of Estimation

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Classification with linear models

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Support vector machines

7. Joint Distributions

Chapter 14 Logistic Regression Models

Linear Regression with One Regressor

Naïve Bayes MIT Course Notes Cynthia Rudin

Regression and the LMS Algorithm

ρ < 1 be five real numbers. The

STK3100 and STK4100 Autumn 2017

Chapter 4 Multiple Random Variables

STK3100 and STK4100 Autumn 2018

ε. Therefore, the estimate

9.1 Introduction to the probit and logit models

Lecture 7: Linear and quadratic classifiers

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Objectives of Multiple Regression

Unsupervised Learning and Other Neural Networks

TESTS BASED ON MAXIMUM LIKELIHOOD

Section 2 Notes. Elizabeth Stone and Charles Wang. January 15, Expectation and Conditional Expectation of a Random Variable.

Simple Linear Regression

Radial Basis Function Networks

Kernel-based Methods and Support Vector Machines

Lecture 3 Probability review (cont d)

An Introduction to. Support Vector Machine

CSE 5526: Introduction to Neural Networks Linear Regression

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Introduction to local (nonparametric) density estimation. methods

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Parameter, Statistic and Random Samples

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

ENGI 4421 Propagation of Error Page 8-01

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Study of Correlation using Bayes Approach under bivariate Distributions

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

6.867 Machine Learning

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Special Instructions / Useful Data

Advanced Introduction to Machine Learning

Maximum Likelihood Estimation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Mathematics HL and Further mathematics HL Formula booklet

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)


3. Basic Concepts: Consequences and Properties

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Lecture Notes Types of economic variables

Qualifying Exam Statistical Theory Problem Solutions August 2005

Dimensionality Reduction and Learning

6. Nonparametric techniques

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Multiple Linear Regression Analysis

ENGI 3423 Simple Linear Regression Page 12-01

Model Fitting, RANSAC. Jana Kosecka

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Chapter Two. An Introduction to Regression ( )

Chapter 13 Student Lecture Notes 13-1

Module 7. Lecture 7: Statistical parameter estimation

Lecture 8: Linear Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Bayesian belief networks

4. Standard Regression Model and Spatial Dependence Tests

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Transcription:

CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato A specal case whe Y {,} Frst step: we eed to devse a model of the fucto f

Dscrmat fuctos A commo wa to represet a classfer s b usg Dscrmat fuctos Works for both the bar ad mult-wa classfcato Idea: For ever class =,, k defe a fucto g () mappg X Whe the decso o put should be made choose the class wth the hghest value of () g * arg ma ( ) g Logstc regresso model Dscrmat fuctos: g ( ) g( w ) g ( ) g( w ) Values of dscrmat fuctos var terval [,] Probablstc terpretato f (,w) w, ) g( ) g( w ) w w w z, w) Iput vector w d d

Whe does the logstc regresso fal? Nolear decso boudar 3 Decso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Whe does the logstc regresso fal? Aother eample of a o-lear decso boudar 5 4 3 - - -3-4 -4-3 - - 3 4 5 3

No-lear eteso of logstc regresso use feature (bass) fuctos to model oleartes the same trck as used for the lear regresso Lear regresso f ( ) w w j j ( ) m j j () ( ) ( ) - a arbtrar fucto of w w w Logstc regresso ) g( w w j j ( )) m j d m () w m CS 75 Mache Learg Regularzed logstc regresso If the model s too comple ad ca cause overfttg, ts predcto accurac ca be mproved b removg some puts from the model = settg ther coeffcets to zero Recall the lear model: f ( ) w w w w w 3 3 d d w Iput vector w w w w d d f ( ) w w w w 3 3 d d w 4

Regularzed logstc regresso If the model s too comple ad ca cause overfttg, ts predcto accurac ca be mproved b removg some puts from the model = settg ther coeffcets to zero We ca appl the same dea to the logstc regresso: ) g( w ) Iput vector w w, w, w w w w d k - parameters (weghts) d ) g( w w w3 3 wd d ) g( w ) J Rdge (L) pealt Lear regresso Rdge pealt: J ( w) ( w ) w L,.. w Logstc regresso: J L d Ft to data Model complet pealt w w w ad ( w) log P( D w) w L Ft to data Model complet pealt ( w) log g( w ) ( )log( g( w )) w L Ft to data measured usg the egatve log lkelhood 5

J Lasso (L) pealt Lear regresso Lasso pealt: J Logstc regresso: ( w) ( w ) w L,.. J Ft to data Model complet pealt d w L w ad ( w) log P( D w) w L Ft to data Model complet pealt ( w) log g( w ) ( )log( g( w )) w L Ft to data measured usg the egatve log lkelhood Geeratve approach to classfcato Logstc regresso: Represets ad lears a model of ) A eample of a dscrmatve classfcato approach Model s uable to sample (geerate) data staces (, ) Geeratve approach: Represets ad lears the jot dstrbuto, ) Model s able to sample (geerate) data staces (, ) he jot model defes probablstc dscrmat fuctos How? (, ) ) ) g ) ) ) ), ) ) ) g o ( ) ) ) ) ) ) 6

Geeratve approach to classfcato pcal jot model, ) ) ) ) = Class-codtoal dstrbutos (destes) bar classfcato: two class-codtoal dstrbutos ) ) ) = Prors o classes probablt of class for bar classfcato: Beroull dstrbuto ) ) ) ) Quadratc dscrmat aalss (QDA) Model: Class-codtoal dstrbutos are multvarate ormal dstrbutos μ,σ) d / ( ) Σ ~ ~ N( μ, Σ ) for N( μ, Σ ) for Multvarate ormal ~ N( μ, Σ) / ep Prors o classes (class,) Beroull dstrbuto, ) ( ) ( μ) Σ ~ Beroull {,} ( μ) 7

Learg of parameters of the QDA model Dest estmato statstcs We see eamples we do ot kow the parameters of Gaussas (class-codtoal destes) μ, Σ) d / ( ) Σ ML estmate of parameters of a multvarate ormal for a set of eamples of Optmze log-lkelhood: l( D, μ, Σ) log μˆ How about class prors? / ep Σˆ ( μ) ( Σ μˆ)( ( μ) μˆ) N( μ, Σ) μ, Σ) Learg Quadratc dscrmat aalss (QDA) Learg Class-codtoal dstrbutos Lear parameters of multvarate ormal dstrbutos ~ ~ N( μ, Σ ) for N( μ, Σ ) for Use the dest estmato methods Learg Prors o classes (class,) ~ Beroull Lear the parameter of the Beroull dstrbuto Aga use the dest estmato methods, ) ( ) {,} 8

QDA.5.5 g( ) g( ) -.5 - -.5 - - -.5 - -.5.5.5 Gaussa class-codtoal destes. 9

QDA: Makg class decso Bascall we eed to desg dscrmat fuctos Posteror of a class choose the class wth better posteror probablt ) ) the = g ( ) else = ), Σ Notce t s suffcet to compare:, Σ) ) ) ), Σ ) ), Σ) ), Σ) ) QDA: Quadratc decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5

QDA: Quadratc decso boudar 3 Decso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Lear dscrmat aalss (LDA) Assumes covaraces are the same ~ N( μ, Σ), ~ N( μ, Σ),

LDA: Lear decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 LDA: lear decso boudar Decso boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5

Geeratve classfcato models Idea:. Represet ad lear the dstrbuto, ). Model s able to sample (geerate) data staces (, ) 3. he model s used to get probablstc dscrmat fuctos g o ( ) ) g( ) ) pcal model, ) ) ) ) = Class-codtoal dstrbutos (destes) bar classfcato: two class-codtoal dstrbutos ) ) ) = Prors o classes - probablt of class bar classfcato: Beroull dstrbuto ) ) Naïve Baes classfer A geeratve classfer model wth a addtoal smplfg assumpto: All put attrbutes are codtoall depedet of each other gve the class. Oe of the basc ML classfcato models (ofte performs ver well practce) ) So we have:, ) ) ) ) d ) p ( ) p ( ) d ) d 3

Learg parameters of the model Much smpler dest estmato problems We eed to lear: ) ad ) ad ) Because of the assumpto of the codtoal depedece we eed to lear: for ever put varable : ) ad ) Much easer f the umber of put attrbutes s large Also, the model gves us a fleblt to represet put attrbutes of dfferet forms!!! E.g. oe attrbute ca be modeled usg the Beroull, the other usg Gaussa dest, or a Posso dstrbuto Makg a class decso for the Naïve Baes Dscrmat fuctos Posteror of a class choose the class wth better posteror probablt ) ) ) the = else = d d, ) ) d, ) ), ) ) 4

Net: two terestg questos () wo models wth lear decso boudares: Logstc regresso LDA model ( Gaussas wth the same covarace matrces ~ N(, ) for ~ N(, ) for Questo: Is there a relato betwee the two models? () wo models wth the same gradet: Lear model for regresso Logstc regresso model for classfcato have the same gradet update w w ( f ( )) Questo: Wh s the gradet the same? Logstc regresso ad geeratve models wo models wth lear decso boudares: Logstc regresso Geeratve model wth Gaussas wth the same covarace matrces ~ N(, ) for ~ N(, ) for Questo: Is there a relato betwee the two models? Aswer: Yes, the two models are related!!! Whe we have Gaussas wth the same covarace matr the probablt of gve has the form of a logstc regresso model!!!, μ, μ, Σ) g( w ) CS 75 Mache Learg 5

Logstc regresso ad geeratve models Members of the epoetal faml ca be ofte more aturall descrbed as θ f ( θ,φ) h(, φ)ep θ - A locato parameter A( θ) a( φ) Clam: A logstc regresso s a correct model whe class codtoal destes are from the same dstrbuto the epoetal faml ad have the same scale factor φ Ver powerful result!!!! We ca represet posterors of ma dstrbutos wth the same small logstc regresso model φ - A scale parameter CS 75 Mache Learg Lear regresso w w w he gradet puzzle f () Logstc regresso f ( ) w f ( ), w) g( w ) w w w z f () ) w d w d d d Gradet update: w w ( f ( )) Ole: CS 75 Mache Learg Gradet update: w w ( f ( )) he same w w ( f ( )) Ole: w w ( f ( )) 6

he gradet puzzle he same smple gradet update rule derved for both the lear ad logstc regresso models Where the magc comes from? Uder the log-lkelhood measure the fucto models ad the models for the output selecto ft together: Lear model + Gaussa ose Gaussa ose w ~ N(, ) w w w w d w Logstc + Beroull Beroull() ) g( w ) d w w w w d z g( w ) Beroull tral d Geeralzed lear models (GLIMs) Assumptos: he codtoal mea (epectato) s: f ( w ) Where f (.) s a respose fucto Output s characterzed b a epoetal faml dstrbuto wth a codtoal mea Gaussa ose w Eamples: Lear model + Gaussa ose w ~ N(, ) Logstc + Beroull Beroull() g( w ) e w d d w w w w d w w w d z w g( w ) Beroull tral 7

8 Geeralzed lear models (GLIMs) A caocal respose fuctos : ecoded the samplg dstrbuto Leads to a smple gradet form Eample: Beroull dstrbuto Logstc fucto matches the Beroull ) ( ) ( )ep, ( ) ( φ θ θ φ θ,φ a A h p (.) f p ) ( ) ( ) log( log ep log e