Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Suggetion - Problem Set 3 4.2 (a) Show the dicriminant condition (1) take the form x D Ð.. Ñ. D.. D. ln ln, a deired. We then replace the quantitie. 3ß D3 by their etimate to get the proper form for thi dicriminant. (b) Here uing the output notation C œ ß C œ for clae 1 and 2 repectively, you want to minimize ÐC x Ñ œ Ðy Ñ, 3œ 3! 3 Ô x x! x Ô where œ, letting œ Ö Ùand œ Ò1 Ó œ ã ã Þ ã Õ Õ x x In general vector/matrice with a ~ on them can repreent vector augmented with 1' (and in ome cae!'). Ue the uual leat quare o that approximate. œð ) y Thu œ y. ÐÑ Firt conider the right hand ide, y. Without lo you can arrange the data o the firt example Ð ßCÑ are in the firt cla and the lat are in the econd. x 3 3 Thu how the right ide of (1) become: 1 1 y y y! œ œ œ y y. Meantime how y œ x3 x3 œ Ð.. Ñ 4œ 4œ. So

! y œ Ð.. Ñ. ÐÑ To calculate the left ide of (1), you can write Ô x ã x ë œ œ x, Ö ã ã Ù Õ x Let 1 1 Më œ œ Ò1 M 1 1 ] i.e. M i the matrix whoe firt row are copie of., and whoe lat row are copie of.. Here i alway a column vector of length with all 1'. Then how 1 1 1 1 1 œ = Ò Ó œ 1.... Thu how! ë.. œ.. Now from the relationhip Ð Ñ œ!.. Þ Ð.. Ñ! y œ! œ Ò 1 Ó œ 1!, if you average over the entrie C 3 of y, how o!œ1 y œ 1 œ Ð.. Ñ,!!!œ Ð.. Ñ,!

o now œ..! Š, You can write Ô! œ Þ Ð$Ñ Õ Ð.. ÑŠ.. œ Ð MÑ Ð MÑ M M M M But how Ð MÑ Ð MÑ œ Ð Ñ D M œ x. x. œ.... 4œ 4 4 4œ Thu So by (3) above, how M M œ.... Þ œð ÑD..... Ô! œ Ö Ù Þ Ð Ñ Ð Ñ Õ œ.. Œ.. D.... (4) Now how the bottom term coefficient i Ð Ñ Ð Ñ.. Š.. D.... œ Ð Ñ D Ð.. ÑÐ.. Ñ

Now ue (1), (2), (4) and (5). œ Ð Ñ D DF (5) (c) It follow that D œð.. ÑÒÐ.. Ñ ÓœÒÐ.. Ñ ÓÐ.. Ñß F which i clearly in the direction of Ð.. Ñß ince ÒÐ.. Ñ Ó i a calar (why?) Finally from (4.56), œ ÐÐ ÑD Ñ Ð.. Ñ ÒÐ.. Ñ ÓÐ.. Ñ œðcalarñ Þ D Ð.. ÑÞ (d) Changing the coding for the two C value tranform the pair of number and repectively into another pair + and, of poible C value. Show that there i a linear calar tranformation C œ-c.œ0ðcñuch that 0 Š œ+ and 0 Š œ,. What are - and.? Now how that if y ha only entrie and, then in their place the vector y œ-y. 1 will have + and, repectively Þ Further how that if we replace y by y in the dataet Ö x ßC, then we will have a new 3 3 3œ y œ œ Ð Ñ y œ Hy œ HÐ-y. 1 Ñ œ -y. 1. (why i H1 œ 1à recall H i a projection). Thu the tranformation to y i exactly the ame a the tranformation to y above. Show in fact that the tranformation act in exactly the ame way on each component of y. Now how that the final election of Ô C C clae baed on the new y œ Ö Ù will be baed on each entry C3, and whether it i ã Õ C cloer to + (chooe cla 1 Ñ or, (chooe cla 2). Show that C 3 i cloer to + iff C3 i cloer to. (e) Now you have and! and the regreion function 0ÐxÑœ! x.

From part (c), œ5d Ð.. Ñ for ome 5. Thu how from above that œ.. 5D! Š Ð.. Ñ. ecall the group target (y-value) on which we have trained the regreion are: cla À Cœ à cla2 À Cœ. For an input tet vector x, how that the correponding C will be in cla 1 if 0ÐxÑ i cloer to, than to, and otherwie cla 2. Show C hould be aigned to cla 2 if 0ÐxÑ Š. Show from above that the criterion for cla 2 aignment i: 0ÐxÑœ Š. x 5. D Ð.. Ñ Š or x D Ð Ñ. Ð Ñ. Š.. D.. Š Þ 5 I thi the ame a the LDA criterion in (a)? Now aume happen then? œ œ Î - what 4.3 ecall the LDA criterion for chooing the group 6 out of group ß ÞÞÞß O given a tet feature vector x i 6œ arg max $ ÐxÑ, 5 5 i.e., finding the 6œ5 which make $ 5 ÐxÑ the larget. Here a uual $ 5 ÐxÑœx D. 5. 5 D ln 15Þ Thi problem i related to the dicuion in ection 4.2 involving the ue of a regreion approach to ditinguih among the O prediction. Thi work by chooing target (repreentative of the O clae to be et equal to the repone variable C) a follow. For a vector x whoe cla i 5, we chooe the repone variable to be y œ Ð!ß ÞÞÞß!ß ß!ÞÞÞß!Ñ (a row vector), with a 1 only in the 5 >2 poition. Then if we are given a training et 7 œöðx3ßy3ñ 3œ, the repone are no longer C3 œ 0 or 1, but >2 vector y with a 1 in the 5 poition if the cla aigned to x i group 5. 3 3 Ð(Ñ

A hown in the text, the appropriate regreion here work exactly a in the cae the repone are calar, except that the uual vector C 3 Ô C C y œ Ö Ù ã ÕC with each row a calar (0 or 1) repreenting the cla of the Ô y y Y œ Ö Ù, ã Õy x 3 i replaced by a matrix with each cla indicator y 3 indicating the cla through the poition of it only entry 1 (note again that each i a row vector). By adding the uual column of 1' we form y 3 Ô y Y œ ã ã Þ Õ y Otherwie the regreion proce i the ame, with the vector y replaced by the matrix Y. Now following the regreion dicuion in the text, the uual etimated value y of y i replaced uing the ame formula to get an etimated value Y of Y: Y œ Ð Ñ Y, (7) which ha exactly the ame form a tandard regreion, with y replaced by Y. Notice with Y œ B (8a) B œ ( Ñ Y. (8c) Note that a in our general regreion dicuion the matrix i aumed to already contain an initial column of 1'. ow-wie, defining y 3 to be the 3 >2 row of Ô y y Y œ Ö Ù, equation (8a) i equivalent to y œ x B ã 3 3 ; here and elewhere the tilde ~ on Õy a vector mean we have added 1' in the initial poition: x 3 œ x. 3

Note that wherea previouly wa given by the ame formula (8c), now B i a matrix intead of a vector. Wherea previouly we had C 3 œ x 3 a the etimated value of C3 within the dataet, we now have intead where y 3 i a vector (the 3 >2 row of Y ). y œ x B, (9) 3 3 We are aking what would happen if we imply replace thi training et 7 œöðx3ßy3ñ with a new training et replacing the input vector x 3 by the correponding etimate y3, o that the training et now look like 7 w œöðy3 ßy3Ñ 3œ. Note that we are uing the tranpoe in y 3 becaue we want it to be a column vector (why?), replacing the original column input vector x 3. Equivalently, we are replacing the training matrix Ô x Ô y œ ã ã by the repone vector et Y œ ã ã. Õ x Õ y We wih to how that if we ue thi new dataet in both training and teting, then we will till get the ame cla prediction for a new tet vector x, but uing LDA (not regreion here). Show that given that 6œarg max 5$ 5ÐxÑ, we jut need to check how the computation of the $ 5 ÐxÑ change uing the new data et. Note the training data now have the form 7 w œöðy ßy Ñ œ ÖÐB x ßy Ñ Þ 3 3 3œ 3 3 3œ The original dicriminant function ha the form $ 5 Ðx) œ x D. 5. 5D. 5 ln 1 5 ß Ð!Ñ where where 1 œ Þ Show the new dicriminant function $ ÐxÑ ha the form 5 5 5 $ 5Ðy ) œyd. 5. 5 D. 5 ln 1 5,. 5 œ y œ B x œ B Þ ë4 ë x4 œ B.ë 5 4œà1Ð4Ñœ5 4 5 5 4œà1Ð4Ñœ5 5 4œà1Ð4Ñœ5 D œ 4œ Ð O y 4. 1Ð4ÑÑÐy4. 1Ð4ÑÑ where, a uual, thi etimator repreent the pooled etimate of the variance of the

vector of interet baed on their individual group, but with x replaced by y. Here >2 1Ð4Ñ repreent the group (out of the O total) of the 4 ample. 3 3 You wih to how that the modified dicriminant function make the ame deciion a the original one, i.e., that whenever y œ x B, But note that and how $ 5 Ðy Ñ 6 5 6 $ 5 Ðy Ñ $ Ðy Ñ iff $ ÐxÑ $ ÐxÑÞ ha the form (10), with. œð.ë B œ B 5 5 D œ 4œ Ð O y ).ë 5 3. ÑÐy3. 1Ð3Ñ Ñ 1Ð3Ñ œ O B Ðx 3. ÑÐx3 Ñ B 1Ð3Ñ. 1Ð3Ñ 4œ œ B D B where, becaue our vector are augmented to have a 1 in the firt poition (and thu are of length : ) we mut alo augment the covariance matrix D in order to be Ð: Ñ Ð: Ñ, by adding a firt row and firt column of 0'. That i, we define! 0: D œ ß 0 D : where 0 : i a column vector of length : with all zeroe, and the upper left corner i Þ Of coure D i the etimator of D. Thu how we can write $ 5 Ðy ÑœyD.. D. ln 1 5 5 5 œ x B B B B B B Ð D Ñ.. Ð D BÑ B. ln1 5 5 5 5 Ð!+Ñ Now to define the quare root of a matrix. For any : : quare ymmetric invertible : matrix E, aume that Ö- 3 ß a 3 3œ are it eigenvalue and correponding eigenvector. For a function 0ÐBÑ define 0ÐEÑ to be the matrix with the ame eigenvector a 3, but Î Î : eigenvalue 0Ð-3Ñ. Thu E would have Ö-3 ß a 3 3=1 a it eigenvalue-eigenvector pair.

Now we replace our dataet x Î 3 Ä D x3. Thi lead to the replacement Ä D. Clearly Y in (7) doe not change under thi tranformation (why?). However, now with the tranformed value, how we have Î Î Î Î ZÐÑœZÐD ÑœD DD œm. Thi tranformed dataet thu lead to the ame function which ha changed to (now D œm) Y, but give a linear dicriminant $ 5 ÐxÑœx. 5. 5. 5 ln 1 5, (11) Show thi i actually identical to that before remember that the new x equal the old x Î time D and we have alo computed the. 5 from the new dataet; thu the clae obtained from uing the dicriminant in (11) (uing the new tranformed data et) will be identical to what the predicted clae were before. Alo how the tranformed dicriminant function (now changing thee x into the unchanged y and forming the reulting dicriminant) mut be exactly the ame a when we replaced by y before, ince the dataet Öy 3 ß y3 3œ i identical (ee (9a)). Thu we need only how the reult of thi problem for the new (tranformed) dataet Öx3ß y3, where the new are defined a above. x 3 Show uing the ame argument we will again replace our dataet o that each current datapoint x will be replaced by x., with., i.e. the overall current mean of all x (regardle of cla). Show thi doe not change the covariance D, and in term of the new dataet the identical dicriminant function will now be 3 3 3 $ 5 ÐxÑ œ Ðx. Ñ Ð. 5. Ñ Ð. 5. Ñ Ð. 5. Ñ ln 1 5 (13) (again with x obtained from the new mean-ubtracted dataet). Show by tranlating all data by the ame amount. will not change the relative ize of the dicriminant function, and o if we replace the old dicriminant function (13) by $ 5 ÐxÑœx. 5. 5. 5 ln 1 5 (15) then clearly thi will not affect whether $ 5 ÐxÑ $ 6 ÐxÑ or not. Furthermore, ince Y i a tandard regreion etimator (jut with multiple column), how a tranlation of the dataet will not affect the prediction, o that with thi new dataet the Y we obtain i identical to the previou one.

Thu at thi point you have reduced the problem to having a dataet with empirical mean 0 and tandard deviation 1 for each coordinate, and we till have the ame dicriminant function (15) and etimator Y, derived in the ame way from the new data and the outcome matrix Y. Thi mean we are uing the dicriminant function (11) above à how equation (10a) become $ 5 Ðy Ñœx BB Ð B B BB Ð B ). 5. B 5 ). 5 ln. 5Þ But how that H B ÐB B) B i jut the projection onto the column pace of B (ee ' ' the dicuion on p. 46 of the hat function, which project y onto the column pace of, giving y). Now how that. 5 i in the column pace of B. Note our aumption of covariance M and mean! for the x3, it follow that œm, o B œð Ñ Y œ YÞThu the 5 >2 column b5 of B i jut b5 œ C5, where C5 denote the 5 >2 column (not >2 row) of Y. But from the definition of œ Ò x x ÞÞÞ xó and C5 (whoe 3 entry i 0 unle x i in cla 5), the column b mut be jut b œ x, i.e., a multiple of 3 5 5 3 3 K. 5. Thu, clearly all. 5 are in the column pace of B, and hence H. 5 œ. 5 for all 5. Thu by (17) how $ 5 Ðy Ñ œ x L. 5. 5 L. 5 ln 1 5 œ x. 5. 5. 5 ln 15, that i, the dicriminant from the Y -baed dicriminant function give identical value to the dicriminant (15), which we have hown give identical choice to the dicriminant baed on the original dataet, a deired. 5