Lecture 7: Linear Classification Methods

Size: px

Start display at page:

Download "Lecture 7: Linear Classification Methods"

Leslie Owen
5 years ago
Views:

1 Homeork

2 Homeork

3 Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June.

4 What is linear classification? lassification is intrinsically nonlinear It uts nonidentical things in the same class, so a difference in inut vector sometimes causes zero change in the anser Linear classification means that the art that adats is linear he adative art is folloed by a fied nonlinearity. It may be receded by a fied nonlinearity e.g. nonlinear basis functions. y +, Decision f y adative linear function fied nonlinear function y.5 z

5 Reresenting the target values for classification For to classes, e use a single valued outut that has target values for the ositive class and or for the other class For robabilistic class labels the target value can then be Pt and the model outut can also reresent Py. For N classes e often use a vector of N target values containing a single for the correct class and zeros elsehere. For robabilistic labels e can then use a vector of class robabilities as the target vector.

6 hree aroaches to classification Use discriminant functions directly ithout robabilities: onvert inut vector into real values. A simle oeration like thresholding can get the class. hoose real values to maimize the useable information about the class label that is in the real value. Infer conditional class robabilities: class k omute the conditional robability of each class. hen make a decision that minimizes some loss function omare the robability of the inut under searate, classsecific, generative models. E.g. fit a multivariate Gaussian to the inut vectors of each class and see hich Gaussian makes a test data vector most robable. Is this the best bet?

7 Discriminant functions he lanar decision surface in datasace for the simle linear discriminant function: + ³ X on lane > y > Distance from lane

8 Discriminant functions for N> classes One ossibility is to use N toay discriminant functions. Each function discriminates one class from the rest. Another ossibility is to use NN/ toay discriminant functions Each function discriminates beteen to articular classes. Both these methods have roblems More than one good anser oay references need not be transitive!

9 Use N discriminant functions, and ick the ma. y y, y y y k k A simle solution 4.. i, j k... his is guaranteed to give consistent and conve decision regions if y is linear. A > imlies for a + a > y a + a A y j A and ositive a B y k that j B > A y j B B Decision boundary?

10 Maimum Likelihood and Least Squares from lecture 3 omuting the gradient and setting it to zero yields Solving for, here he MoorePenrose seudoinverse,.

11 LSQ for classification Each class k is described by its on linear model so that y k k + k 4.3 here k,...,k. We can conveniently grou these together using vector notation so that y W 4.4 onsider a training set {" #, $ # }, ' N Define X and { } { } LSQ solution: W X X X X 4.6 And rediction X y W X. 4.7

12 Using least squares for classification It does not ork as ell as better methods, but it is easy: It reduces classification to least squares regression. logistic regression least squares regression

13 PA don t ork ell

14 icture shoing the advantage of Fisher s linear discriminant When rojected onto the line joining the class means, the classes are not ell searated. Fisher chooses a direction that makes the rojected classes much tighter, even though their rojected means are less far aart.

15 Math of Fisher s linear discriminants What linear transformation is best for discrimination? he rojection onto the vector searating the class means seems sensible: But e also ant small variance ithin each class: s s y µ m m å ne å ne y y n n m m Fisher s objective function is: J m m + s s beteen ithin

16 : m m S m m m m S m m m m S S S µ + + Î Î å å W n n n n n n W B W B solution Otimal s s m m J More math of Fisher s linear discriminants

17 We have robalistic classification!

18 Probabilistic Generative Models for Discrimination Bisho 96 Use a generative model of the inut vectors for each class, and see hich model makes a test inut vector most robable. he osterior robability of class is: ln ln z here e z + + z is called the logit and is given by the log odds

19 An eamle for continuous inuts Assume inut vectors for each class are Gaussian, all classes have the same covariance matri. For to classes, and, the osterior is a logistic: { } e k k k a µ µ S ln µ Σ µ µ Σ µ µ µ Σ s inverse covariance matri normalizing constant

20 ! #$ % & % % & * % *

21 he role of the inverse covariance matri If the Gaussian is sherical no need to orry about the covariance matri. So, start by transforming the data sace to make the Gaussian sherical his is called hitening the data. It remultilies by the matri square root of the inverse covariance matri. In transformed sace, the eight vector is the difference beteen transformed means. Σ gives the for aff and aff µ gives for same value as : Σ µ µ Σ Σ aff aff µ

22 he osterior hen the covariance matrices are different for different classes Bisho Fig he decision surface is lanar hen the covariance matrices are the same and quadratic hen not.

23 Bernoulli distribution Random variable!, oin fliing: heads, tails Bernoulli Distribution ML for Bernoulli Given:

24 he logistic function he outut is a smooth function of the inuts and the eights. y y dz dy z z z e z y z i i i i + + s.5 z y Its odd to eress it in terms of y.

25 ! " # $ & $ Observations Likelihood & $! $,! 4 $, Logistic regression Bisho 5 Loglikelihood EF! 4 $, Minimize log like Derivative

26 Logistic regression age 5 When there are only to classes e can model the conditional robability of the ositive class as s + here s z + e z If e use the right error function, something nice haens: he gradient of the logistic and the gradient of the error function cancel each other: E ln t, ÑE å y n t N n n n

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue. What is liear classificatio? lassificatio