Lecture 7: Linear Classification Methods

Homeork

Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue.

What is liear classificatio? lassificatio is itrisicall oliear It uts oidetical thigs i the same class, so a differece i iut vector sometimes causes ero chage i the aser Liear classificatio meas that the art that adats is liear The adative art is folloed b a fied oliearit. It ma be receded b a fied oliearit e.g. oliear basis fuctios. T +, Decisio f adative liear fuctio fied oliear fuctio.5

Reresetig the target values for classificatio For to classes, e use a sigle valued outut that has target values for the ositive class ad or for the other class For robabilistic class labels the target value ca the be Pt ad the model outut ca also rereset P. For N classes e ofte use a vector of N target values cotaiig a sigle for the correct class ad eros elsehere. For robabilistic labels e ca the use a vector of class robabilities as the target vector.

Three aroaches to classificatio Use discrimiat fuctios directl ithout robabilities: overt iut vector ito real values. A simle oeratio like thresholdig ca get the class. hoose real values to maimie the useable iformatio about the class label that is i the real value. Ifer coditioal class robabilities: class k omute the coditioal robabilit of each class. The make a decisio that miimies some loss fuctio omare the robabilit of the iut uder searate, classsecific, geerative models. E.g. fit a multivariate Gaussia to the iut vectors of each class ad see hich Gaussia makes a test data vector most robable. Is this the best bet?

Discrimiat fuctios The laar decisio surface i datasace for the simle liear discrimiat fuctio: T + ³ X o lae > > Distace from lae

Discrimiat fuctios for N> classes Oe ossibilit is to use N toa discrimiat fuctios. Each fuctio discrimiates oe class from the rest. Aother ossibilit is to use NN/ toa discrimiat fuctios Each fuctio discrimiates betee to articular classes. Both these methods have roblems More tha oe good aser Toa refereces eed ot be trasitive!

Use N discrimiat fuctios, ad ick the ma., k k A simle solutio 4.. i, j k... This is guarateed to give cosistet ad cove decisio regios if is liear. A imlies a + a > a + a A > j for A ad ositive a B k that j B A > j B B Decisio boudar?

Maimum Likelihood ad Least Squares from lecture 3 omutig the gradiet ad settig it to ero ields Solvig for, here The MoorePerose seudoiverse,.

LSQ for classificatio Each class k is described b its o liear model so that k T k + k 4.3 here k,...,k. We ca coveietl grou these together usig vector otatio so that WT 4.4 osider a traiig set {" #, $ # }, ' N Defie X ad T { } { } LSQ solutio: W XT X XT T X T 4.6 Ad redictio X WT T T X T. 4.7

Usig least squares for classificatio It does ot ork as ell as better methods, but it is eas: It reduces classificatio to least squares regressio. logistic regressio least squares regressio

PA do t ork ell

icture shoig the advatage of Fisher s liear discrimiat Whe rojected oto the lie joiig the class meas, the classes are ot ell searated. Fisher chooses a directio that makes the rojected classes much tighter, eve though their rojected meas are less far aart.

Math of Fisher s liear discrimiats What liear trasformatio is best for discrimiatio? The rojectio oto the vector searatig the class meas seems sesible: T µ m m But e also at small variace ithi each class: s s å e å e m m Fisher s objective fuctio is: J m s m + s betee ithi

: m m S m m m m S m m m m S S S µ + + Î Î å å W T T W T B W T B T solutio Otimal s s m m J More math of Fisher s liear discrimiats

We have robalistic classificatio!

Probabilistic Geerative Models for Discrimiatio Bisho 96 Use a geerative model of the iut vectors for each class, ad see hich model makes a test iut vector most robable. The osterior robabilit of class is: l l here e + + is called the logit ad is give b the log odds

A eamle for cotiuous iuts Assume iut vectors for each class are Gaussia, all classes have the same covariace matri. For to classes, ad, the osterior is a logistic: { } e k T k k a µ µ S l T T T + + + µ Σ µ µ Σ µ µ µ Σ s iverse covariace matri ormaliig costat

! #$ % & % % & * % *

The role of the iverse covariace matri If the Gaussia is sherical o eed to orr about the covariace matri. So, start b trasformig the data sace to make the Gaussia sherical This is called hiteig the data. It remultilies b the matri square root of the iverse covariace matri. I trasformed sace, the eight vector is the differece betee trasformed meas. Σ gives the for aff ad T aff gives for same value as : Σ µ µ µ Σ T aff Σ aff µ

The osterior he the covariace matrices are differet for differet classes Bisho Fig The decisio surface is laar he the covariace matrices are the same ad quadratic he ot.

Beroulli distributio Radom variable!, oi fliig: heads, tails Beroulli Distributio ML for Beroulli Give:

The logistic fuctio The outut is a smooth fuctio of the iuts ad the eights. d d e i i i i T + + s.5 Its odd to eress it i terms of.

! " # $ & $ Observatios Likelihood & $! $,! 4 $, Loglikelihood Miimie log like Derivative Logistic regressio Bisho 5 EF! 4 $,

Logistic regressio age 5 Whe there are ol to classes e ca model the coditioal robabilit of the ositive class as T s + here s + e If e use the right error fuctio, somethig ice haes: The gradiet of the logistic ad the gradiet of the error fuctio cacel each other: E l t, ÑE å t N

The atural error fuctio for the logistic Fittig logistic model usig maimum likelihood, requires miimiig the egative log robabilit of the correct aser summed over the traiig set. l l l N N t t t E t t t E + + å å error derivative o traiig case if t if t

Usig the chai rule to get the error derivatives T t d d E E d d t E,, +

Softma fuctio For the case of K>classes, e have k k k j j j ea k j ea j 4.6 a k l k k. 4.63 l is also ko as the softma fuctio, as it reresets

rossetro or softma fuctio for multiclass classificatio i i j i j j i j j j i i i i j i t E E t E e e j i å å å l The outut uits use a olocal oliearit: The atural cost fuctio is the egative log rob of the right aser The steeess of E eactl balaces the flatess of the softma. outut uits 3 3 target value

A secial case of softma for to classes So the logistic is just a secial case that avoids usig redudat arameters: Addig the same costat to both ad has o effect. The overarameteriatio of the softma is because the robabilities must add to. e e e e + +