Classification : Logistic regression. Generative classification model.

Similar documents
Linear regression (cont.) Linear methods for classification

Linear models for classification

Generative classification models

Linear regression (cont) Logistic regression

Supervised learning: Linear regression Logistic regression

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Binary classification: Support Vector Machines

Support vector machines II

15-381: Artificial Intelligence. Regression and neural networks (NN)

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Generative classification models

Support vector machines

Bayes (Naïve or not) Classifiers: Generative Approach

Unsupervised Learning and Other Neural Networks

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Part I: Background on the Binomial Distribution

Lecture 7: Linear and quadratic classifiers

CSE 5526: Introduction to Neural Networks Linear Regression

6. Nonparametric techniques

Point Estimation: definition of estimators

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Introduction to local (nonparametric) density estimation. methods

Classification with linear models

Classification learning II

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Econometric Methods. Review of Estimation

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Lecture 12: Multilayer perceptrons II

Kernel-based Methods and Support Vector Machines

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Regression and the LMS Algorithm

Chapter 14 Logistic Regression Models

Applications of Multiple Biological Signals

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Dimensionality reduction Feature selection

Lecture Notes Types of economic variables

An Introduction to. Support Vector Machine

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Chapter 8: Statistical Analysis of Simulated Data

STK4011 and STK9011 Autumn 2016

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

Objectives of Multiple Regression

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

TESTS BASED ON MAXIMUM LIKELIHOOD

9.1 Introduction to the probit and logit models

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

Model Fitting, RANSAC. Jana Kosecka

Naïve Bayes MIT Course Notes Cynthia Rudin

I. Decision trees II. Ensamble methods: Mixtures of experts

Radial Basis Function Networks

Bayesian belief networks

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Chapter 4 Multiple Random Variables

STK3100 and STK4100 Autumn 2017

ρ < 1 be five real numbers. The

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

1 0, x? x x. 1 Root finding. 1.1 Introduction. Solve[x^2-1 0,x] {{x -1},{x 1}} Plot[x^2-1,{x,-2,2}] 3

Line Fitting and Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter Two. An Introduction to Regression ( )

ECE 194C Target Classification in Sensor Networks Problem. Fundamental problem in pattern recognition.

Bayes Decision Theory - II

LECTURE 9: Principal Components Analysis

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

STK3100 and STK4100 Autumn 2018

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

X ε ) = 0, or equivalently, lim

Lecture 3 Probability review (cont d)

Qualifying Exam Statistical Theory Problem Solutions August 2005

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Simple Linear Regression

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

Module 7. Lecture 7: Statistical parameter estimation

Maximum Likelihood Estimation

Bayesian belief networks

Study of Correlation using Bayes Approach under bivariate Distributions

MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

ENGI 3423 Simple Linear Regression Page 12-01

Evaluation of classifiers MLPs

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Parameter, Statistic and Random Samples

The Mathematical Appendix

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Announcements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Linear Regression with One Regressor

Transcription:

CS 75 Mache Lear Lecture 8 Classfcato : Lostc reresso. Geeratve classfcato model. Mlos Hausrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Lear Bar classfcato o classes Y {} Our oal s to lear to classf correctl to tpes of eamples Class labeled as Class labeled as We ould le to lear f : X { } Zero-oe error loss fucto f Error f Error e ould le to mme: E Error Frst step: e eed to devse a model of the fucto CS 75 Mache Lear

scrmat fuctos Oe a to represet a classfer s b us scrmat fuctos Wors for bar ad mult-a classfcato Idea: For ever class defe a fucto mapp X R Whe the decso o put should be made choose the class th the hhest value of So hat happes th the put space? Assume a bar case. CS 75 Mache Lear scrmat fuctos.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

scrmat fuctos.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear scrmat fuctos.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

efe decso boudar scrmat fuctos.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear Quadratc decso boudar 3 ecso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

Lostc reresso model efes a lear decso boudar scrmat fuctos: here / + e f - s a lostc fucto Iput vector f d Lostc fucto d CS 75 Mache Lear Lostc fucto fucto + e Is also referred to as a smod fucto Replaces the threshold fucto th smooth stch taes a real umber ad outputs the umber the terval [].9.8.7.6.5.4.3.. - -5 - -5 5 5 CS 75 Mache Lear

Lostc reresso model scrmat fuctos: Values of dscrmat fuctos var [] Probablstc terpretato f p p Iput vector d d CS 75 Mache Lear Lostc reresso We lear a probablstc fucto f : X [] here f descrbes the probablt of class ve f p Note that: p p rasformato to bar class values: If p / the choose Else choose CS 75 Mache Lear

Lear decso boudar Lostc reresso model defes a lear decso boudar Wh? Aser: Compare to dscrmat fuctos. ecso boudar: For the boudar t must hold: o lo o lo lo ep + ep lo lo ep + ep CS 75 Mache Lear Lostc reresso model. ecso boudar LR defes a lear decso boudar Eample: classes blue ad red pots ecso boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

CS 75 Mache Lear Lelhood of outputs Let he Fd ehts that mame the lelhood of outputs Appl the lo-lelhood trc he optmal ehts are the same for both the lelhood ad the lo-lelhood Lostc reresso: parameter lear l lo lo µ µ µ µ P L µ µ p µ lo lo µ µ + > < CS 75 Mache Lear Lostc reresso: parameter lear Lo lelhood ervatves of the lolelhood Gradet descet: lo lo l µ µ + f l ] [ l α Nolear ehts!! + f ] [ α j j l

Lostc reresso. Ole radet descet O-le compoet of the lolelhood J lo µ + lo µ ole O-le lear update for eht J ole α [ J ] ole th update for the lostc reresso ad < > + α [ f ] CS 75 Mache Lear Ole lostc reresso alorthm Ole-lostc-reresso umber of teratos tale ehts Kd for :: umber of teratos do select a data pot < > from set α / update ehts parallel + α [ f ] ed for retur ehts CS 75 Mache Lear

Ole alorthm. Eample. CS 75 Mache Lear Ole alorthm. Eample. CS 75 Mache Lear

CS 75 Mache Lear Ole alorthm. Eample. CS 75 Mache Lear ervato of the radet Lo lelhood ervatves of the lolelhood lo lo l µ µ + f l [ ] j j l + lo lo µ µ [ ] + + lo lo µ µ ervatve of a lostc fucto + j j

Geeratve approach to classfcato Idea:. Represet ad lear the dstrbuto p. Use t to defe probablstc dscrmat fuctos E.. o p p pcal model p p p p Class-codtoal dstrbutos destes bar classfcato: to class-codtoal dstrbutos p p p Prors o classes - probablt of class bar classfcato: Beroull dstrbuto p + p CS 75 Mache Lear Geeratve approach to classfcato Eample: Class-codtoal dstrbutos multvarate ormal dstrbutos N µ Σ for p µ Σ ~ ~ N µ Σ for Multvarate ormal ~ N µ Σ π d / Σ / ep µ Σ µ Prors o classes class Beroull dstrbuto p θ θ θ ~ Beroull {} CS 75 Mache Lear

Lear of parameters of the model est estmato statstcs We see eamples e do ot o the parameters of Gaussas class-codtoal destes p µ Σ π ML estmate of parameters of a multvarate ormal N µ Σ for a set of eamples of Optme lo-lelhood: l µ Σ lo p µ Σ µ ˆ Ho about class prors? d / Σ / ep Σˆ CS 75 Mache Lear µ Σ µ ˆ µ µ ˆ Geeratve model.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

Gaussa class-codtoal destes. CS 75 Mache Lear Ma class decso Bascall e eed to des dscrmat fuctos o possble choces: Lelhood of data choose the class Gaussa that eplas the put data better lelhood of the data p µ Σ > p µ Σ the else Posteror of a class choose the class th better posteror probablt p > p the else p p µ Σ p µ Σ p p + p µ Σ CS 75 Mache Lear p

Gaussas: Quadratc decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear Gaussas: Quadratc decso boudar 3 ecso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

Gaussas: Lear decso boudar Whe covaraces are the same ~ N µ Σ ~ N µ Σ CS 75 Mache Lear Gaussas: Lear decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear

Gaussas: lear decso boudar ecso boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Lear