Machine Learning: Fisher s Linear Discriminant. Lecture 05

Size: px

Start display at page:

Download "Machine Learning: Fisher s Linear Discriminant. Lecture 05"

Buck Stevens
5 years ago
Views:

1 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05

2 upervised Learning ask learn an (unkon) function t : X that aps input instances x Î X to output targets t(x) Î : Classification: he output t(x) Î is one of a finite set of discrete categories. Regression: he output t(x) Î is continuous, or has a continuous coponent. arget function t(x) is knon (only) through (noisy) set of training exaples: (x,t ), (x,t ), (x n,t n ) Lecture 05

3 hree Paraetric Approaches to Classification ) Discriinant Functions: construct f : X that directly assigns a vector x to a specific class C k. Inference and decision cobined into a single learning proble. Linear Discriinant: the decision surface is a hyperplane in X: Fisher s Linear Discriinant Perceptron upport Vector Machines 3

4 hree Paraetric Approaches to Classification ) Probabilistic Discriinative Models: directly odel the posterior class probabilities p(c k x). Inference and decision are separate. Less data needed to estiate p(c k x) than p(x C k ). Can accoodate any overlapping features. Logistic Regression Conditional Rando Fields 4

5 hree Paraetric Approaches to Classification 3) Probabilistic Generative Models: Model class-conditional p(x C k ) as ell as the priors p(c k ), then use Bayes s theore to find p(c k x). or odel p(x,c k ) directly, then arginalize to obtain the posterior probabilities p(c k x). Inference and decision are separate. Can use p(x) for outlier or novelty detection. Need to odel dependencies beteen features. Naïve Bayes. Hidden Markov Models. 5

6 Generative vs. Discriinative Left-hand ode has no effect on posterior class probabilities. 6

7 Linear Discriinant Functions: o classes (K ) Use a linear function of the input vector: y( x) j( x) + 0 eight vector bias - threshold Decision: x Î C if y(x) ³ 0, otherise x Î C. Þ decision boundary is hyperplane y(x) 0. Properties: is orthogonal to vectors lying ithin the decision surface. 0 controls the location of the decision hyperplane. 7

8 Linear Discriinant Functions: o Classes (K ) 8

9 Linear Discriinant Functions: Multiple Classes (K > ) ) rain K or K- one-versus-the-rest classifiers. ) rain K(K-)/ one-versus-one classifiers. 3) rain K linear functions: y x) j( x) + Decision: k ( k k 0 x Î C k if y k (x) > y j (x), for all j ¹ k. Þ decision boundary beteen classes C k and C j is hyperplane defined by y k (x) y j (x) i.e. ( k - j) j( x) + ( k 0 - j0) 0 Þ sae geoetrical properties as in binary case. 9

10 Linear Discriinant Functions: Multiple Classes (K > ) 4) More general ranking approach: y ( x) arg ax j( x, t) here { c, c..., c tî K It subsues the approach ith K separate linear functions. } Useful hen is very large (e.g. exponential in the size of input x), assuing inference can be done efficiently. 0

11 Linear Discriinant Functions: o Classes (K ) What algoriths can be used to learn y(x) j(x) + 0? Assue a training dataset of N N + N exaples in C and C. Fisher s Linear Discriinant Perceptron: Voted/Averaged Perceptron Kernel Perceptron upport Vector Machines: Linear Kernel

12 Fisher s Linear Discriinant Discriinant function y(x) x + 0 can be interpreted as follos:. Project D-diensional x don to one diension Þ x. Use a threshold - 0 to classify x Þ xîc, if x ³ - 0 xîc, otherise. Fisher s idea: Maxiize the beteen-class separation of projected dataset. Miniize the ithin-class variance of projected dataset.

13 Fisher s Linear Discriinant Line joining the class eans vs. Line inferred ith Fisher s criterion. 3

14 Fisher s Linear Discriinant ) Measure of the separation beteen the classes is the beteen class variance: N N å nîc n C x n x n - ( - ) ( - ) 4

15 Fisher s Linear Discriinant ) Measure of the ithin-class variance: n C s ( x n ) n C s ( x n ) s + s 5

16 Fisher s Linear Discriinant Maxiize the beteen-class separation and iniize the ithin-class variance Þ Fisher s criterion: he objective function can be reritten as: here 6 ) ( ) ( s s J + - ) ( arg ax * J W B ) ( J B ) )( ( - - å å Î Î W ) )( ( ) )( ( C n n n C n n n x x x x, here

17 Fisher s Linear Discriinant Optiization forulation: olution: If W is nonsingular: 7 W B * arg ax ) ( ax arg J W B B W ) ( ) ( 0 ) ( J Þ W B W W B B l Þ Þ generalized eigenvalue proble l Þ B - W conventional eigenvalue proble

18 Fisher s Linear Discriinant No need to solve the eigenvalue proble: B ( - )( - ) is a vector in the direction ( ) he nor of is iaterial, only its direction is iportant. Þ can take Ho to find 0 : - W( - ) Assue p( x C ) and p( x C ) are Gaussians. Estiate eans and variances using axiu likelihood. Use decision theory to find 0 i.e. p(- 0 C ) p(- 0 C ) 8

19 uppleentary Reading PRML ection.4 (he Curse of Diensionality). PRML ection.5 (Decision heory). PRML ection 4 (Linear Models for Classification): 4.. to

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input