Lecture 8: Multiclass Classification (I)

Size: px

Start display at page:

Download "Lecture 8: Multiclass Classification (I)"

Ruth Haynes
6 years ago
Views:

1 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

2 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

3 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multiclass Classificatin General setup Bayes rule fr multiclass prblems under equal csts under unequal csts Linear multiclass methds Linear regressin mdel Liner discriminant analysis (LDA) Multiple lgistic regressin Mdels Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

4 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multiclass Prblems Class label Y {,..., K}, K. The classifier f : R d {,..., K}. Examples: zip cde classificatin, multiple-type cancer classificatin, vwel recgnitin The lss functin C(k, l) = cst f classifying a sample in class k t class l. In general, C(k, k) = 0 fr any k =,, K. The lss can be described as an K K matrix. Fr example, K =, 0 C = 0. 0 Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

5 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Terminlgies Prir class prbabilities (marginal distributin f Y ): π k = P(Y = k), k =,, K Cnditinal distributin f X given Y : assume that g k is the cnditinal density f X given Y = k, k =,, K. Marginal density f X (a mixture): g(x) = K π k g k (x), k =,, K. k= Psterir class prbabilities (cnditinal dist. f Y given X): P(Y = k X = x) = π kg k (x), k =,, K. g(x) Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

6 Bayes Rule General Setup Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels The ptimal (Bayes) rule aims t minimize the average lss functin ver the ppulatin where φ B (x) = arg min E X,Y C(Y, f (X)) f = arg min E X E Y X C(Y, f (X)), f E Y X L(Y, f (X)) = K K I (f (X) = k) C(l, k)p(y = l X). k= l= Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

7 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Bayes Rule Under Equal Csts If C(k, l) = I (k l), then R[f ] = E X E Y X I (Y f (X )) and E Y X C(Y, f (X)) = = K K I (f (X) = k) P(Y = l X ) k= l k K I (f (X) = k)[ P(Y = k X)] l=k Given x, the minimizer f E Y X=x C(Y, f (X)) is f (x) = k, where The Bayes rule is k = arg min k=,..,k P(Y = k x). φ B (x) = k if P(Y = k x) = max P(Y = k x), k=,...,k which assigns x t the mst prbable class using Pr(Y = k X = x). Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

8 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Bayes Rule Under Unequal Csts Fr the general lss functin, the Bayes rule can be derived as φ B (x) = k if k = arg min k=,...,k K C(l, k)p(y = l x). l= There is nt a simple analytic frm fr the slutin. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

9 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Decisin (Discriminating) Functins Fr multiclass prblems, we generally need t estimate multiple discriminant functins f k (x), k =,, K Each f k (x) is assciated with class k. f k (x) represents the evidence strength f a sample (x, y) belnging t class k. The decisin rule cnstructed using f k s is ˆf (x) = k, where k = arg max k=,...,k f k(x). The decisin bundary f the classificatin rule ˆf between class k and class l is defined as {x : ˆf k (x) = ˆf l (x)}. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

10 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Traditinal Methds: Divide and Cnquer Main ideas: (i) Decmpse the multiclass classificatin prblem int multiple binary classificatin prblems. (ii) Use the majrity vting principle (a cmbined decisin frm the cmmittee) t predict the label Cmmn appraches: simple but effective One-vs-rest (ne-vs-all) appraches Pairwise (ne-vs-ne, all-vs-all) appraches Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

11 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels One-vs-rest Apprach One f the simplest multiclass classifier; cmmnly used in SVMs; als knwn as the ne-vs-all (OVA) apprach (i) Slve K different binary prblems: classify class k versus the rest classes fr k =,, K. (ii) Assign a test sample t the class giving the largest f k (x) (mst psitive) value, where f k (x) is the slutin frm the kth prblem Prperties: Very simple t implement, perfrm well in practice Nt ptimal (asympttically): the decisin rule is nt Fisher cnsistent if there is n dminating class (i.e. arg max p k (x) < ). Read: Rifkin and Klautau (004) In Defense f One-vs-all Classificatin Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

12 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Pairwise Apprach Als knwn as all-vs-all (AVA) apprach (i) Slve ( K ) different binary prblems: classify class k versus class j fr all j k. Each classifier is called g ij. (ii) Fr predictin at a pint, each classifier is queried nce and issues a vte. The class with the maximum number f (weighted) vtes is the winner. Prperties: Training prcess is efficient, by dealing with small binary prblems. If K is big, there are t many prblems t slve. If K = 0, we need t train 45 binary classifiers. Simple t implement; perfrm cmpetitively in practice. Read: Park and Furnkranz (007) Efficient Pairwise Classificatin Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

13 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Linear Classifier fr Multiclass Prblems Assume the decisin bundaries are linear. Cmmn linear classificatin methds: Multivariate linear regressin methds Linear lg-dds (lgit) mdels Linear discriminant analysis (LDA) Multiple lgistic regressin Separating hyperplane explicitly mdel the bundaries as linear Perceptrn mdel, SVMs Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

14 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Cding fr Linear Regressin Multivariate Linear Regressin Multiple Lgistic Regressin Fr each class k, cde it using an indicatr variable Z k : Z k = if Y = k; Z k = 0 if Y k. The respnse y is cded using a vectr z = (z,..., z K ). If Y = k, nly the kth cmpnent f z =. The entire training samples can be cded using a n K matrix, dented by Z. Call the matrix Z the indicatr respnse matrix. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

15 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Lineaer Regressin Multivariate Linear Regressin Multiple Lgistic Regressin Cnsider the multivariate mdel Z = XB + E, r Z k = f k Xb k + ɛ k, k =,, K. Z k is the kth clumn f the indicatr respnse matrix Z. X is n (d + ) input matrix E is n K matrix f errrs. B is (d + ) K matrix f parameters, b k is the kth clumn f B. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

16 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Ordinary Least Square Estimates Multivariate Linear Regressin Multiple Lgistic Regressin Minimize the residual sum f squares K n RSS(B) = [z ik β 0k k= i= d β jk x ij ] j= = tr[(z XB) T (Z XB)]. The LS estimates has the same frm as univariate case ˆB = (X T X ) X T Z, Ẑ = X (X T X ) X T Z X ˆB. Cefficient b k fr y k is same as univariate OLS estimates. (Multiple utputs d nt affect the OLS estimates). [ T Fr x, cmpute ˆf (x) = ( x) ˆB] and classify it as ˆf (x) = argmax k=,...,k ˆfk (x). Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

17 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Abut Multiple Linear Regressin Multivariate Linear Regressin Multiple Lgistic Regressin Justificatin: Each ˆf k is an estimate f cnditinal expectatin f Y k, which is the cnditinal prbability f (X, Y ) belnging t the class k E(Y k X = x) = Pr(Y = k X = x). If the intercept is in the mdel (clumn f in X), then K k= ˆf k (x) = fr any x. If the linear assumptin is apprpriate, then ˆf k give a cnsistent estimate f the prbability fr k =,, K, as the sample size n ges t infinity. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

18 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Limitatins f Linear Regressin Multivariate Linear Regressin Multiple Lgistic Regressin There is n guarantee that all ˆf k [0, ]; sme can be negative r greater than. Linear structure f X s can be rigid. Serius masking prblem fr K due t rigid linear structure. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

19 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö Linear Regressin Ö ÔÐ Ñ ÒØ ½ ½ ¾ ¾ ÙÖ º¾ Ì Ø ÓÑ ÖÓÑ Ø Ö Ð Ò ÁÊ ¾ Ò Ö ÐÝ Ô Ö Ø Ý Ð Ò Ö ÓÒ ÓÙÒ ¹ Ö º Ì Ö Ø ÔÐÓØ ÓÛ Ø ÓÙÒ Ö ÓÙÒ Ý Ð Ò Ö Ö Ñ Ò ÒØ Ò ÐÝ º Ì Ð Ø ÔÐÓØ ÓÛ Ø ÓÙÒ Ö ÓÙÒ Ý Ð Ò Ö Ö Ö ÓÒ Ó Ø Ò ¹ ØÓÖ Ö ÔÓÒ Ú Ö Ð º Ì Ñ Ð Ð ÓÑÔÐ Ø ÐÝ Ñ Ò Ú Ö ÓÑ Ò Ø µº Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

20 General Setup Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö Degree = ; Errr = 0.5 Degree = ; Errr = ÙÖ º Ì «Ø Ó Ñ Ò ÓÒ Ð Ò Ö Ö Ö ¹ ÓÒ Ò ÁÊ ÓÖ Ø Ö ¹Ð ÔÖÓ Ð Ñº Ì ÖÙ ÔÐÓØ Ø Ø Ò Ø Ø ÔÓ Ø ÓÒ Ò Ð Ñ Ñ Ö Ô Ó Ó ÖÚ Ø ÓÒº Ì Ø Ö ÙÖÚ Ò Ô Ò Ð Ö Ø ØØ Ö Ö ÓÒ ØÓ Ø Ø Ö ¹Ð Ò ØÓÖ Ú Ö ¹ Ð ÓÖ Ü ÑÔÐ ÓÖ Ø Ö Ð Ý Ö ½ ÓÖ Ø Ö Ó ÖÚ Ø ÓÒ Ò ¼ ÓÖ Ø Ö Ò Ò ÐÙ º Ì Ø Ö Ð Ò Ö Ò ÕÙ Ö Ø ÔÓÐÝÒÓÑ Ð º ÓÚ ÔÐÓØ Ø ØÖ Ò Ò ÖÖÓÖ Ö Ø º Ì Ý ÖÖÓÖ Ö Ø ¼ ¼¾ ÓÖ Ø ÔÖÓ Ð Ñ Ø Ä ÖÖÓÖ Ö Ø º Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

21 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Abut Masking Prblems Multivariate Linear Regressin Multiple Lgistic Regressin One-dimensinal example: (see the previus figure) Prjected the data nt the line jining the three centrid There is n infrmatin in the nrthwest-sutheast directin Three curves are regressin lines The left panel is fr the linear regressin fit: the middle class is never dminant! The right panel is fr the quadratic fit (masking prblem is slved!) Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

22 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Hw t Slve Masking Prblems Multivariate Linear Regressin Multiple Lgistic Regressin In the previus example with K = Using Quadratic regressin rather than linear regressin Linear regressin errr 0.5; Quadratic regressin errr 0.0; Bayes errr In general, If K classes are lined up, we need plynmial terms up t degree K t reslve the masking prblem. In the wrst case, need O(p K ) terms. Thugh LDA is als based n linear functins, it des nt suffer frm masking prblems. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

23 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Large K, Small d Prblem Multivariate Linear Regressin Multiple Lgistic Regressin Masking ften ccurs fr large K and small d. Example: Vwel data K = and d = 0. (Tw-dimensinal fr training data using LDA prjectin) A difficult classificatin prblem, as the class verlap is cnsiderate the best methds achieve 40% test errr Methd Traing Errr Test Errr Linear regressin LDA QDA Lgistic regressin Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

24 General Setup Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Elements f Statistical Learning cflhastie, Tibshirani & Friedman 00 Chapter 4 Crdinate fr Training Data Crdinate fr Training Data Figure 4.4: A tw-dimensinal plt f the vwel training data. There are eleven classes with X IR 0, and this is the best view in terms f a LDA mdel (Sectin 4..). The heavy circles are the prjected mean vectrs fr each class. The class verlap is cnsiderable. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

25 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Mdel Assumptins fr Mdel Setup Let π k be prir prbabilities f class k Let g k (x) be the class-cnditinal densities f X in class k. The psterir prbability Pr(Y = k X = x) = g k(x)π k K l= g l(x)π l. Mdel Assumptins: Assume each class density is multivariate Gaussian N(µ k, Σ k ). Further, we assume Σ k = Σ fr all k (equal cvariance required!) Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

26 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Linear Discriminant Functin Multivariate Linear Regressin Multiple Lgistic Regressin The lg-rati between class k and l is lg Pr(Y = k X = x) Pr(Y = l X = x) = lg π k + lg g k(x) π l g l (x) Fr each class k, its discriminant functin is = [x T Σ µ k µt k Σ µ k + lg π k ] [x T Σ µ l µt l Σ µ l + lg π l ] f k (x) = x T Σ µ k µt k Σ µ k + lg π k, The quadratic term canceled due t the equal cvariance matrix. The decisin rule is Ŷ (x) = argmax k=,...,k f k (x). Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

27 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö ÙÖ º Ì Ð Ø Ô Ò Ð ÓÛ Ø Ö Ù Ò ØÖ ¹ ÙØ ÓÒ Û Ø Ø Ñ ÓÚ Ö Ò Ò «Ö ÒØ Ñ Ò º ÁÒÐÙ Ö Ø ÓÒØÓÙÖ Ó ÓÒ Ø ÒØ Ò ØÝ ÒÐÓ Ò ± Ó Ø ÔÖÓ Ð ØÝ Ò º Ì Ý ÓÒ ÓÙÒ Ö ØÛ Ò Ô Ö Ó Ð Ö ÓÛÒ ÖÓ¹ Ò ØÖ Ø Ð Ò µ Ò Ø Ý ÓÒ ÓÙÒ Ö Ô Ö Ø Ò ÐÐ Ø Ö Ð Ö Ø Ø Ö ÓÐ Ð Ò Ù Ø Ó Ø ÓÖÑ Öµº ÇÒ Ø Ö Ø Û ÑÔÐ Ó ¼ Ö ÛÒ ÖÓÑ Ù Ò ØÖ ÙØ ÓÒ Ò Ø ØØ Ä ÓÒ ÓÙÒ Ö º Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

28 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Interpretatin f LDA Multivariate Linear Regressin Multiple Lgistic Regressin lg Pr(Y = k X = x) = (x µ k) T Σ (x µ k ) + lg π k + cnst. The cnstant term des nt invlve x. If prir prbabilities are same, the LDA classifies x t the class with centrid clsest t x, using the squared Mahalanbis distance, d Σ (x, µ), based n the cmmn cvariance matrix. If d Σ (x µ k ) = d Σ (x µ l ), then the prir determines the classifier. Special Case: Σ k = I fr all k f k (x) = x µ k + lg π k. nly Euclidean distance is needed! Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

29 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Mahalanbis Distance Multivariate Linear Regressin Multiple Lgistic Regressin Mahalanbis distance is a distance measure intrduced by P. C. Mahalanbis in 96. Def: The Mahalanbis distance f x = (x,..., x d ) T frm a set f pints with mean µ = (µ,..., µ d ) T and cvariance matrix Σ is defined as: [ / d Σ (x, µ) = (x µ) T Σ (x µ)]. It differs frm Euclidean distance in that it takes int accunt the crrelatins f the data set it is scale-invariant. It can als be defined as a dissimilarity measure between x and x f the same distributin with the cvariance matrix Σ: [ / d Σ (x, x ) = (x x ) T Σ (x x )]. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

30 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Special Cases Multivariate Linear Regressin Multiple Lgistic Regressin If Σ is the identity matrix, the Mahalanbis distance reduces t the Euclidean distance. If Σ is diagnal diag(σ,, σ d ), then the resulting distance measure is called the nrmalized Euclidean distance: d Σ (x, x ) = [ d i= ] / (x i x i ) σi, where σ i is the standard deviatin f the x i and x i. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

31 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Nrmalized Distance Multivariate Linear Regressin Multiple Lgistic Regressin Assume we like t decide whether x belngs t a class. Intuitively, the decisin relies n the distance f x t the class center µ: the clser t µ, the mre likely x belngs t the class. T decide whether a given distance is large r small, we need t knw if the pints in the class are spread ut ver a large range r a small range. Statistically, we measure the spread by the standard deviatin f distances f the sample pints t µ. If d Σ (x, µ) is less than ne standard deviatin, then it is highly prbable that x belngs t the class. The further away it is, the mre likely that x nt belnging t the class. This is the cncept f the nrmalized distance. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

32 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Mtivatin f Mahalanbis Distance Multivariate Linear Regressin Multiple Lgistic Regressin The drawback f the nrmalized distance is that the sample pints are assumed t be distributed abut µ in a spherical manner. Fr nn-spherical situatins, fr instance ellipsidal, the prbability f x belnging t the class depends nt nly n the distance frm µ, but als n the directin. In thse directins where the ellipsid has a shrt axis, x must be clser. In thse where the axis is lng, x can be further away frm the center. The ellipsid that best represents the class s prbability distributin can be estimated by the cvariance matrix f the samples. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

33 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Linear Decisin Bundary f LDA Linear discriminant functins Multivariate Linear Regressin Multiple Lgistic Regressin f k (x) = β T k x + β 0k, f j (x) = β T jx + β 0j. Decisin bundary functin between class k and j is (β k β j ) T x + (β 0k β 0j ) = β T x + β 0 = 0. Since β k = Σ µ k and β j = Σ µ j, the decisin bundary has the directinal vectr β = Σ (µ k µ j ) β in generally nt in the directin f µ k µ j The discriminant directin β minimizes the verlap fr Gaussian data Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

34 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö ÙÖ º ÐØ ÓÙ Ø Ð Ò Ó Ò Ò Ø ÒØÖÓ Ò Ø Ö Ø ÓÒ Ó Ö Ø Ø ÒØÖÓ ÔÖ Ø ÔÖÓ Ø Ø ÓÚ ÖÐ Ô Ù Ó Ø ÓÚ Ö Ò Ð Ø Ô Ò Ðµº Ì Ö Ñ Ò ÒØ Ö Ø ÓÒ Ñ Ò Ñ Þ Ø ÓÚ ÖÐ Ô ÓÖ Ù Ò Ø Ö Ø Ô Ò Ðµº Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

35 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Parameter Estimatin in LDA Multivariate Linear Regressin Multiple Lgistic Regressin In practice, we estimate the parameters frm the training data ˆπ k = n k /n, where n k is sample size in ˆµ k = Y i =k x i/n k The sample cvariance matrix fr the kth class: S k = n k (x i ˆµ k )(x i ˆµ k ) T. Y i =k (Unbiased) pled sample cvariance is a weighted average ˆΣ = = K k= n k K l= (n l ) S k K (x i ˆµ k )(x i ˆµ k ) T /(n K). k= Y i =k Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

36 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Implemenatin f LDA Multivariate Linear Regressin Multiple Lgistic Regressin Cmpute the eigen-decmpsitin: ˆΣ = UDU T Sphere the data using ˆΣ, using the fllwing transfrmatin X = D / U T X. The cmmn cvariance estimate f X is the identity matrix. In the transfrmed space, classify the pint t the clsest transfrmed centrid, with the adjustment using π k s. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

37 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Cnnectin between LDA and Linear Regressin fr K = Suppse we cde the targets in the tw classes as + and. Fit linear mdel y i = β 0 + β T x i, i =,, n Then the OLS ( ˆβ 0, β ) satisfy β ˆΣ ( µ µ ) ˆβ 0 is different frm the LDA unless n = n. The linear regressin gives a different decisin rule frm LDA, unless n = n. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

38 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multiple Lgistic Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Mdel the K psterir prbabilities by linear functins f x. We use lgit transfrmatins Pr(Y = X = x) lg = β 0 + β T x Pr(Y = K X = x) Pr(Y = X = x) lg = β 0 + β T x Pr(Y = K X = x) Pr(Y = K X = x) lg = β Pr(Y = K X = x) (K )0 + β T K x The chice f denminatr in the dd-ratis is arbitrary. The parameter vectr θ = {β 0, β,..., β (K )0, β K }. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

39 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Prbability Estimates Multivariate Linear Regressin Multiple Lgistic Regressin We use the lgit transfrmatin t assure that the ttal prbabilities sum t ne each estimated prbability lies in [0, ] p k (x) Pr(Y = k x) = exp (β k0 + β T k x) + K l= exp (β l0 + β T l x) fr k =,..., K. p K (x) Pr(Y = K x) = + K l= exp (β l0 + β T l x) Check K k= p k(x) = fr any x. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

40 Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Multivariate Linear Regressin Multiple Lgistic Regressin Maximum Likelihd Estimate fr Lgistic Mdels The jint cnditinal likelihd f y i given x i is l(θ) = n lg p yi (θ; x i ) i= Using the Newtn-Raphsn methd: slve iteratively by minimizing re-weighted least squares. prvide cnsistent estimates if the mdels are specified crrectly. Ha Helen Zhang Lecture 8: Multiclass Classificatin (I)

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551