Pairwise-Covariance Linear Discriminant Analysis

Size: px
Start display at page:

Download "Pairwise-Covariance Linear Discriminant Analysis"

Transcription

1 Proceedigs of the Twety-Eighth AAAI Coferece o Artificial Itelligece Pairwise-Covariace Liear Discrimiat Aalysis Deguag Kog ad Chris Dig Departmet of Computer Sciece & Egieerig Uiversity of Texas, Arligto, 500 UTA Blvd, TX doogkog@gmail.com; chqdig@uta.edu Abstract I machie learig, liear discrimiat aalysis (LDA) is a popular dimesio reductio method. I this paper, we first provide a ew perspective of LDA from a iformatio theory perspective. From this ew perspective, we propose a ew formulatio of LDA, which uses the pairwise averaged class covariace istead of the globally averaged class covariace used i stadard LDA. This pairwise (averaged) covariace describes data distributio more accurately. The ew perspective also provides a atural way to properly weigh differet pairwise distaces, which emphasizes the pairs of class with small distaces, ad this leads to the proposed pairwise covariace properly weighted LDA (pclda). The kerel versio of pclda is preseted to hadle oliear projectios. Efficiet algorithms are preseted to efficietly compute the proposed models. Itroductio I the big data era, a large umber of high-dimesioal data (i.e., DNA microarray, social blog, image scees, etc) are available for data aalysis i differet applicatios. Liear Discrimiat Aalysis (LDA) (Hastie, Tibshirai, ad Friedma 2001) is oe of the most popular methods for dimesio reductio, which has show state-of-the-art performace. The key idea of LDA is to fid a optimal liear trasformatio which projects data ito a low-dimesioal space, where the data achieves maximum iter-class separability. The optimal solutio to LDA is geerally achieved by solvig a eigevalue problem. Despite the popularity ad effectiveess of LDA, however, i stadard LDA model, istead of emphasizig the pairwise-class distaces, it simply takes a average of metrics computed i differet pairs (i.e., computatio of betwee-class scatter matrix S b or withi-class scatter matrix S w ). Thus, some pairwise class distaces are depressed, especially for those pairs whose origial class distaces are relatively large. To overcome this issue, i this paper, we preset a ew formulatio for pairwise liear discrimiat aalysis. To obtai a discrimiat projectio, the proposed method cosiders all the pairwise betwee-class ad with-class distaces. We call it pairwise-covariace LDA (pclda). The, the Copyright c 2014, Associatio for the Advacemet of Artificial Itelligece ( All rights reserved. pclda problem is cast ito solvig a optimizatio problem, which maximizes the class separability computed from pairwise distace. A efficiet algorithm is proposed to solve the resultat problem, ad experimetal results idicate the good performace of the proposed method. A ew perspective of LDA The stadard liear discrimiat aalysis (LDA) is to seek a projectio G =(g 1,, g K 1 ) 2< p (K 1) which maximizes the class separability by solvig, max Tr( GT S b G G G T S )=max wg G Tr(GT S b G)(G T S wg) 1, (1) where S w is the withi-class scatter matrix, ad S b is the betwee-class scatter matrix, ad give by S b = 1 KX k (µ k µ)(µ k µ) T, S w = = 1 KX k k, k, 1 X (x i k x i 2C k µ k )(x i µ k ) T, where k is the umber of data i class C k, µ k 2< p 1 is the mea for the data from class C k, µ is the global mea for all the data. I the history of LDA (Hastie, Tibshirai, ad Friedma 2001), the objective fuctio of LDA is evolved from Fisher s iitial 2-class LDA: max g g T S b g g T S wg. (2) For multi-class LDA, this ca be geeralized to either the trace-of-ratio of Eq.(1), or the followig ratio-of-traces objective: max G Tr(G T S b G) Tr(G T S wg). (3) Mathematically, both geeralizatio are atural; there is o clear differece i terms of machie learig. The trace-ofratio objective Eq.(1) is the most widely used oe. However, the ratio-of-trace objective of Eq.(3) has bee used by may researches, e.g., (Wag et al. 2007), (Kog ad Dig 2012), etc. To our kowledge, there exist o clear explaatios of the differeces betwee these two differet LDA objectives. I this paper, we bridge this gap, by providig theoretical support to the LDA objective of Eq.(1) from KL-divergece perspective, which is described i Theorem 1 below. 1925

2 (a) 2-dim Data distributio (b) LDA ad pclda (c) Elarged LDA ad pclda Figure 1: A sythetic data set of 150 data poits, 50 data of each class. (a) data distributio; (b) 1-dimesioal projectio of LDA ad pclda. Note that both the subspaces (lies) pass through (0,0). We shift them to avoid clutter. (c) Elarged 1-dim LDA ad pclda. From the KL-divergece to classic LDA LDA assumes that data poits of each class k are a Gaussia distributio. The covariace matrix of this class k is called the withi-class scatter matrix S k w. I this paper, we use covariace or averaged covariace istead of the usual withi-class scatter matrix S w to emphasize the ew perspective. The withi-class scatter matrix defied i Eq.(2) is the globally averaged (i.e., averaged over all k classes) covariace matrix. Furthermore, we propose the pairwise averaged covariace as a better formulatio which is used i pclda. We start with the KL-divergece betwee two Gaussia distributios N k (µ k, k ), N l (µ l, l ) with the same covariaces: k = l = kl. The KL-divergece of N k ad N l is: D KL(N k N l )= 1 2 (µ k µ l ) T 1 kl (µ k µ l ). (4) KL-divergece is used as a measure of distace betwee two classes. Whe the data are trasformed usig projectio G, i.e., we project x i to the subspace y i = G T x i, or Y = G T X, the KL-divergece i Y-space is D Y KL (N k N l )= 1 2 (µ k µ l ) T G(G T kl G) 1 G T (µ k µ l ). (5) We have the followig results. Theorem 1. Whe the covariaces of all K classes are idetical, i.e., k =,k =1 K, the sum of all pairwise KL-divergeces: J Y 0 = X k l DKL(N Y k N l ) (6) is idetical to the objective fuctio of stadard LDA of Eq.(1), where P = P K P K l=k+1. Proof: Note that (µ k µ l ) T 1 (µ k µ l ) =Tr[(µ k µ l )(µ k µ l ) T 1 ] =Tr[(µ k µ T k + µ l µ T l µ k µ T l µ l µ T k ) 1 ], we have J X 0 K = X KX k l Tr[(µ k µ T k + µ lµ T l µ k µ T l µ l µ T k ) 1 ] k=l l=1 KX =2Tr[ k (µ k µ)(µ k µ) T 1 ]=2Tr[S b 1 ]. (a) Stadard LDA. (b) Pairwise-covariace LDA. Figure 2: Results o Iris dataset with 3 classes, each class has 50 data poits. Origial 4-dimesioal data are projected ito 2 dimesios. (a) Results of stadard LDA; (b) Results of pairwisecovariace LDA. Now we project x i to the subspace y i = G T x i. The covariace i Y-space is Y = G T G ad the betwee-class scatter matrix becomes: S Y b = GT S b G. Thus J 0(G) =2Tr(G T S b G)(G T G) 1 (7) is idetical to the LDA objective fuctio of Eq.(1) aside from the uimportat costat 2. u Pairwise-covariace LDA Motivatio I stadard LDA, covariaces k of all K classes are assumed to be exactly idetical. This results i a stadard LDA of Eq.(1), as we ca see from Theorem 1. I practice, data covariace for each class is ofte differet. For 2-class problem, whe 1 6= 2, the quadratic discrimiat aalysis (QDA) (Hastie, Tibshirai, ad Friedma 2001) ca be used. However, i QDA, the boudary betwee differet classes is a quadratic surface, ad the discrimiat space ca ot be represeted by G T X explicitly. For multi-class, oe ca directly solve it usig the Gaussia mixture desity fuctio with Bayes rules. I this paper, we seek a discrimiat subspace that ca be obtaied by the liear trasformatio G T X, which has ot bee studied before. Illustrative example I most datasets, data variace for each class is geerally differet, stadard LDA uses the pooled (i.e., the global averaged) withi-class scatter matrices of all classes. However, the global averaged covariace S w could differ from each idividual covariace sig- 1926

3 ificatly. A simple example is show i Fig.1, where a 2- dimesioal data from three classes are show i Fig.(1(a)). Each class has 50 data poits. The covariace for data from each class is 1, 2, 3 : 1 = 3 = , 2 = ; 123 = ,. These idividual covariaces are very differet. I stadard LDA, we average all the classes ad obtai S w = 123. I this paper, we propose a formulatio of LDA that uses pairwise classes. The three pairwise averaged class-covariace: 12, 13 ad 23 are 12 = 23 = , 13 = , We see that the pairwise averaged covariace are much closer to the two idividual covariaces as compared to the global average. Formulatio For simplicity, we defie the distace d k,l (G) betwee two classes k, l as d k,l (G) =2D Y KL(N k, N l ), where DKL Y (N k, N l ) is defied i Eq.(5), ad kl is a pairwise covariace matrix (average of the pair of classes) ad defied as kl = k k + l l +(1 ). (8) k + l Here we use the globally averaged covariace = S w as a regularizatio. Parameter 0 apple apple 1 cotrols the balace of global covariace matrix ad local pairwise covariace matrix k, l. The pairwise-covariace LDA is defied the same as that i Theorem 1: max G J1(G) =X k l d kl (G), (9) where G 2< p (K 1) is the projectio. The objective i Eq.(9) is similar to stadard LDA (except that we use pairwise covariace istead of global averaged covariace). The proposed ew model Back to the form of Eq.9, it is easy to see that we ca defie a better objective. I maximizig J 1, all pairs of distaces are treated equally. However, i classificatio, we wish the pair of classes with smaller distaces to be give more weight, i.e., after projectig to Y = G T X subspace, they are more separated (as compared to other pairs of classes). O the other had, if two classes are already well-separated, i.e., their distaces are large, they ca have less weight i the objective fuctio. Therefore, we propose the followig pairwise covariace properly weighted objective fuctio: mi J2(G) =X k l G [d kl (G)], s.t. q GT G = I, (10) where q 1 is a hyper-parameter. I this objective fuctio, the pair of classes with smaller distaces cotribute more (a) Results of stadard LDA. (c) Covergece of algorithm. (b) Results of pclda. Figure 3: Data: 45 data poits (images) from 3 classes o mist dataset. Origial 784-dimesioal data are projected ito 2- dimesio. (a) Results of stadard LDA; (b) Results of pclda; (c) Covergece of algorithm o mist. Show are objective fuctio vs. iteratios. tha the pair of classes with larger distaces. Parameter q cotrols how much the pair of classes with smaller distaces are weighted. The larger q is, the stroger that pair of classes is weighted. I practice, we foud that q = {1, 2} are good choices. This model is our fial proposed model. For simplicity, we call it pairwise-covariace LDA (pclda) with the proper weightig implicit. As defied i Eq.(10), the objective is ivariat uder ay o-sigular trasformatio usig A 2< (K 1) (K 1), i.e., J 2 (GA) =J 2 (G). To fix this ucertaity, we require G T G = I. Illustratios of pclda We illustrate pclda o sythetic ad real data. I Fig.1, LDA ad pclda results o a sythetic 2D dataset of 150 data poits (50 data of each class) are show. We show the data distributio ad 1-dimesioal projectio results usig LDA ad pclda. The poit here is that the globally averaged covariace S w is a poor represetatio of the idividual covariaces, but the pairwise-covariace approach seems to give a better represetatio such that a sigle pclda dimesio ca clearly separate the 3 classes, while stadard LDA eeds 2-dimesios to separate data from differet classes (results ot show). I Fig.2, we show the results o the widely used iris data 1. Iris has 150 data poits with K=3 classes. Thus LDA project to K-1=2 dimesios. Fig.2 idicates that pclda gives clear discrimiatio betwee classes 2 ad 3 while stadard LDA has strog mixig betwee classes 2 ad 3. I Fig.3, we show results o 45 images (from K=3 classes) from mist hadwritte digits image dataset. LDA projectios to 2-dimesio are show. Result of pclda shows that

4 the 3 classes cotract strogly ad become more separated as compared to the LDA results. These results demostrate the beefits of the pairwise-covariace properly weighted LDA. More experimets ad comparisos with related methods are reported i 7. Algorithm to solve Pairwise-covariace LDA The key idea of our approach is to use gradiet descet algorithm to solve pclda of Eq.(10). The gradiet of J 2 (G) is = X For otatioal simplicity, we write q k kl (G) [d kl (G)] (11) B kl =(µ k µ l )(µ k µ l ) T, d kl (G) =Tr(G T B kl G)(G T kl G) 1. (12) Usig Eq.(12), the derivative of d kl (G) kl =2[B klg(g T kl G) 1 kl G(G T kl G) 1 (G T B kl G)(G T kl G) 1 ]. (13) Note that (G T kl G) 1 is a iverse of a small (K-1)-by- (K-1) matrix. rj 2 ca be efficietly computed usig Algorithm 1. Algorithm 1 Computatio of rj 2(G) (i.e., Eq.11) or rj 2(A) (i.e., gradiet of Eq.21). Iput: G, { k,µ k }, q Output: rj 2 Algorithm: 1: F =0 2: for l =1to K do 3: for k = l +1to K do 4: Compute µ kl = µ k µ l. 5: Compute b = G T µ kl. 6: Compute kl accordig to Eq.(8). % kl accordig to Eq.(23) 7: Compute B = kl G. 8: Compute b =(G T B) 1 b. 9: Compute a = k l (µ kl Bb)/(µ T kl Gb)q+1. 10: Compute F = F + a b T % cross-product betwee vectors a, b 11: ed for 12: ed for 13: rj 2 = 2qF. 14: Output: rj 2. The costrait G T G = I eforces G o the Stiefel maifold. Variatios of G o this maifold is parallel trasport, which gives some restrictio to the gradiet. This has bee bee worked out i (Edelma, Arias, ad Smith 1998). The gradiet that preserves the maifold structure is rj 2 G[rJ 2] T G. (14) Thus the algorithm computes the ew G as follows, G G (rj 2 G[rJ 2] T G) (15) The step size is usually chose as, = kgk 1k/krJ 2 G(rJ 2) T Gk 1, = (16) where kak 1 = P ij A ij. Occasioally, due to the loss of umerical accuracy, we do the projectio: G G(G T G) 1 2 to restore G T G = I. Startig with the stadard LDA solutio of G, this algorithm is iterated util the algorithm coverges to a local optimal solutio. Fig. 3(c) shows the covergece of algorithm o dataset mist. Pairwise-covariace Kerel LDA Kerel LDA (Mika et al. 1999; Tao et al. 2004) is oliear geeralizatio of LDA. We ca derive the kerel versio of pclda. Let x i! (x i ) or X! (X) = ( (x 1 ),, (x ). For 2-class LDA, the projectio vector is g = P i=1 i (x i ) = (X), where = ( P 1 ) T. For K-class LDA, the projectio vector g k = i=1 ik (x i )= (X) k, thus, G =(g 1 g K 1 )= (X)A, where A =( 1 K 1 ). Uder the trasformatio X! (X), G! (X)A, it is easy to see that the LDA objective of Eq.(1) trasforms ito Tr(G T S b G)(G T S w G) 1! Tr(A T S b A)(A T S w A) 1 (17) where the kerel withi-class scatter matrix is: ( k ) ij = (x i) T [ 1 k X (x s) (x s) T (x j) = 1 k X K isk sj, S w = 1 KX k ( k )= 1 K2,(18) ad the kerel betwee-class scatter matrix is: (S b ) ij = (x i) T 1 KX k ( k )( k )T (x j) = 1 KX k K i k K i K kj K j, (19) where we use the shorthad otatios: 1 X = (x s), k = 1 X (x s), k s=1 K i = K i = 1 X K is, K ki = K i k = 1 X k s=1 K is. (20) The solutio of kerel LDA is give by the largest k eigevectors of the eige-equatio S b v = S w v. Whe K =2, this reduces to the familiar 2-class kerel LDA (Tao et al. 2004). Efficiet computatio of S b is give i the ed of 5.1. We are ow ready to preset the pairwise-covariace kerel LDA. We apply the same trasformatio to the pairwisecovariace LDA. We have Theorem 2. Uder the trasformatio X! (X), G! (X)A, the pairwise-covariace LDA of J 2 (G) becomes J 2 (A): mi A X k l [Tr(A T B kl A)(A T kl A) 1 ] q,s.t.at A = I. (21) 1928

5 where (B kl ) ij = (x i) T ( k l)( k l) T (x j) = K i k K i l K kj K lj (22) where shorthad otatios are defied i Eq.(20), k is defied i Eq.(18), ad kl = k k + l l k + l +(1 ). (23) Algorithm for Kerel PC-LDA We solve J 2 (A) of Eq.(21) usig the same algorithm i computig pclda usig J 2 (G) of Eq.(10). The derivative is the same as Eqs.(19,20) except B kl is replaced by B kl, kl replaced by kl, G by A. The costrait A T A = I is hadled i same way as G T G = I i Eqs.(20,21). The step size is give i Eq.(22). The remaiig part is the efficiet computatio of the gradiet rj 2 (A). First, we ote that {B kl }, { k } of Eqs.(22,23) ca be efficietly computed. Let V k be a -by- k matrix cosistig of k colums of K belogig to class k. It is ready to see that i Eq.(21), k = 1 k V k V T k, u k = 1 k V k e, (24) where e =(1 1) T. Here for clarity, we use u k to represet the vector K i, k,i =1. Clearly, B kl =(u k u l )(u k u l ) T. Now rj 2 (A) is computed usig Algorithm 1, with the replacemet µ k u k, k k. (25) S b ca be efficietly computed as S b =(1/) P k k(u k v)(u k v) T, v =(1/) (X)e. Related Work A detailed survey of recet LDA works ca be foud i (Ye ad Ji 2008). Other LDA formulatio There exist earlier works (Li, Jiag, ad Zhag 2003), (Ya et al. 2004) which maximize the differece of traces, a.k.a maximum margi criteria (MMC). Several LDA formulatios with differet costraits ad overfit aalysis are give i (Luo, Dig, ad Huag 2011), (Ya et al. 2004). To solve the well-kow sigularity or uder-sampled problem, there are may extesios of LDA methods proposed, such as Regularized LDA (RLDA) (Hastie, Tibshirai, ad Friedma 2001), ucorrelated LDA (ULDA) (Ye 2005b), orthogoal LDA (OLDA) (Ye 2005a) ad orthogoal cetroid method (OCM) (Park, Jeo, ad Z 2003), etc. Amog these, ULDA extracts the feature vectors which are mutually ucorrelated i lowdimesioal space. Coectio with metric learig David et.al. (Alipaahi, Biggs, ad Ghodsi 2008) showed a strog relatioship betwee distace metric learig methods ad the Fisher Discrimiat Aalysis. Our pairwise-covariace LDA formulatio of Eq.(10) ad kerel pclda of Eq.(21) ca serve for distace metric learig purpose, which ca be used for may applicatios (e.g., (Kog ad Ya 2013), (Kog et al. 2012), etc). Table 1: Characteristics of datasets Dataset data dimesio Class MSRCv Umist Mist Bialpha There are also works discussig local discrimiative Gaussia (LDG) dimesioality reductio (Parrish ad Gupta 2012), local fisher discrimiat aalysis (Sugiyama 2006). Sparsity i the LDA solutio (Clemmese et al. 2011), (Zhag ad Chu 2013) is also desirable for iterpretatio purpose, because it is robustess to the oise ad will lead to efficiet computatio i predictio. However, to our kowledge, oe of the above works cosider the pairwise covariace by computig distace of the projectio i a pairwise way, which is the focus of this paper. Experimet results Dataset We evaluate the proposed pairwise-covariace LDA usig four data sets (see Table 1) for multi-class classificatio experimets, icludig oe face dataset umist, two digit datasets mist (Lecu et al. 1998), bialpha, oe image scee dataset MSRCv1 (Lee ad Grauma 2009) 2. Due to space limit, we omit more details of datasets. Table 1 summarizes the datasets. Methods & Parameter Settigs I our experimet, we use 5-roud 5-fold cross validatio to evaluate the classificatio performace. Each dataset is evely partitioed ito 5 parts. Oly oe part is used as testig ad the other 4 parts are used for traiig. We report the average results for 5 rouds. Next, we give a overview of the dimesio reductio ad classificatio methods used i our experimet. The compared methods ca be divided ito several groups. (1) LDA ad MMC (Li, Jiag, ad Zhag 2003; Ya et al. 2004), kerel LDA (KLDA) For LDA, maximum margi criterio(mmc) ((Li, Jiag, ad Zhag 2003; Ya et al. 2004)), kerel-lda of Eq.(17) method, we project origial data ito LDA-subspace, ad k(k=3) earest eighbor classifier is used for classificatio. For kerel LDA, we use RBF kerel to costruct the pairwise similarity W ij = kx e i x jk 2, where badwidth is searched i the grid {10 4, 10 3,, 10 3, 10 4 }. (2) Regularized LDA (RLDA) (Hastie, Tibshirai, ad Friedma 2001), ucorrelated LDA (ULDA) (Ye 2005b), orthogoal LDA (OLDA) (Ye 2005a) ad orthogoal cetroid method (OCM) (Park, Jeo, ad Z 2003). We compare our method agaist four methods of geeralized LDA. It has bee show (Ye ad Ji 2008) that these four LDAextesios ca be described i a uified framework for geeralized LDA. However, there still exist subtle differeces amog them. The parameter µ i regularized LDA is determied by cross validatio. (3) Proposed pairwise-covariace LDA model of

6 Table 2: Multi-class Classificatio Accuracy o 4 datasets usig 9 differet dimesio reductio methods: LDA, kerel LDA(KLDA), pclda, kerel pclda (pcklda), ad 5 other methods: MMC, RLDA, ULDA, OLDA, OCM. Data LDA MMC RLDA ULDA OLDA OCM pclda ( =1) KLDA pcklda( =1) MSRC Bialpha Mist Umist Average classificatio accuracy Average classificatio accuracy LDA MMC RegLDA ULDA OLDA OCM pclda (β=0.1) pclda (β=0.5) pclda (β=1) KLDA pcklda (β=0.1) pcklda (β=0.5) pcklda (β=1) 0 MSRC Dataset Bialpha 0 mist Dataset umist (a) Classificatio results o MSRC, Bialpha (b) Classificatio results o mist, umist Figure 4: Classificatio results comparisos o 4 datasets, icludig our methods: pclda, pcklda at methods: LDA, KLDA, MMC, RLDA, ULDA, OLDA, OCM. = {0.1, 0.5, 1} ad seve other β (a) pclda result o MSRC β (b) pclda result o mist β (c) pclda result o umist Figure 5: Classificatio accuracy w.r.t differet parameter for our model of Eq.(10) o dataset MSRC, mist, umist. Red lie gives LDA results, ad blue lie draws pclda results at = {0, 0.1,, 0.9, 1.0}. Eq.(10)(pcLDA) ad kerel pairwise-covariace LDA model (pcklda) of Eq.(21) We set q = 1 for Eq.(10), Eq.(21) i our experimets. The parameter is set to be {0.1, 0.5, 1}. To make a fair compariso, we project all origial data to (C-1) dimesio, ad k(k=3) earest eighbor classifier is used for classificatio purpose. Classificatio Performace Aalysis Table 2 ad Fig.4 preset the classificatio performace usig differet dimesio reductio methods. We make several importat observatios from experimet results. (1) As compared to stadard LDA, MMC ad other dimesio reductio methods, pclda cosistetly provides better classificatio performace at differet values (e.g., = {0.1, 0.5, 1}). For example, there is early 5% performace improvemet o bialpha dataset whe compared with stadard LDA method. Note bialpha dataset is composed of data from K=36 classes, this idicates that the proposed pairwise pairwise-covariace LDA method gives much performace improvemet at large class umbers. (2) I kerel space, kerel versio of LDA ad pclda do ot improve the classificatio performace quite a bit (sometimes eve worse). However, pcklda still outperforms stadard KLDA i kerel space. (3) cotrols the complexity of our model, i.e., whe approaches 1, pclda uses local pairwise covariace matrix, ad whe approaches 0, pclda uses global covariace matrix which is equivalet to stadard LDA. Fig.(5) shows the classificatio results o three datasets: MSRC, mist ad umist. The experimet results suggest that, geerally, we ted to get better classificatio results for larger values of. This further cofirms our ituitio, the pairwise covariace really helps to capture the data distributio as compared to globally averaged variace, ad thus the projectio ad classificatio results are improved. Moreover, rather tha maximizig the sum of iter-class distaces, we miimize the sum of iverse iter-class distaces. This choice makes classes that are close together have more ifluece o the LDA fit tha those classes that are well-separated. Coclusio We preset a pairwise-covariace model for liear discrimiat aalysis. The proposed model computes the projectio by utilizig the pairwise class iformatio. A efficiet algorithm is preset to solve the proposed model. Proposed method ca be easily exteded i kerel space. Experimet results idicate the good performace of proposed method. 1930

7 Ackowledgemet. This research is partially supported by NSF-CCF ad NSF-DMS grats. Refereces Alipaahi, B.; Biggs, M.; ad Ghodsi, A Distace metric learig vs. fisher discrimiat aalysis. I AAAI. Clemmese, L.; Hastie, T.; Wiite, D.; ad Ersboll, B Sparse discrimiat aalysis. Techometrics. Edelma, A.; Arias, T. A.; ad Smith, S. T The geometry of algorithms with orthogoality costraits. SIAM J. MATRIX ANAL. APPL 20(2): Hastie, T.; Tibshirai, R.; ad Friedma, J The Elemets of Statistical Learig: Data Miig, Iferece, ad Predictio. Spriger. Hoi, S. C. H.; Liu, W.; Lyu, M. R.; ad Ma, W.-Y Learig distace metrics with cotextual costraits for image retrieval. I CVPR. Kog, D., ad Dig, C. H. Q A semi-defiite positive liear discrimiat aalysis ad its applicatios. I ICDM, Kog, D., ad Ya, G Discrimiat malware distace learig o structural iformatio for automated malware classificatio. I KDD, Kog, D.; Dig, C. H. Q.; Huag, H.; ad Zhao, H Multi-label relieff ad f-statistic feature selectios for image aotatio. I CVPR, Lecu, Y.; Bottou, L.; Begio, Y.; ad Haffer, P Gradiet-based learig applied to documet recogitio. I Proceedigs of the IEEE, Lee, Y. J., ad Grauma, K Foregroud focus: Usupervised learig from partially matchig images. Iteratioal Joural of Computer Visio 85(2): Li, H.; Jiag, T.; ad Zhag, K Efficiet ad robust feature extractio by maximum margi criterio. I Proceedigs of Advaces i Neural Iformatio Processig Systems(NIPS 2003). Luo, D.; Dig, C.; ad Huag, H Liear discrimiat aalysis: New formulatios ad overfit aalysis. I AAAI2011. Mika, S.; Ratsch, G.; Westo, J.; Scholkopf, B.; ad Muller, K Fisher discrimiat aalysis with kerels. Park, H.; Jeo, L. M.; ad Z, J. B. R Lower dimesioal represetatio of text data based o cetroids ad least squares. BIT 43:2003. Parrish, N., ad Gupta, M Dimesioality reductio by local discrimiative gaussias. I ICML. Sugiyama, M Local fisher discrimiat aalysis for supervised dimesioality reductio. I ICML, Tao, X.; Ye, J.; Li, Q.; Jaarda, R.; ad Cherkassky, V Efficiet kerel discrimiat aalysis via qr decompositio. I The Eighteeth Aual Coferece o Neural Iformatio Processig Systems (NIPS 2004), Wag, H.; Ya, S.; Xu, D.; Tag, X.; ad Huag, T Trace ratio vs. ratio trace for dimesioality reductio. I CVPR. Xiag, S.; Nie, F.; ad Zhag, C Learig a mahalaobis distace metric for data clusterig ad classificatio. Ya, J.; Zhag, B.; Ya, S.; Yag, Q.; ad Li, H Immc: icremetal maximum margi criterio. I Proceedigs of the Teth ACM SIGKDD Iteratioal Coferece o Kowledge Discovery ad Data Miig. Ye, J., ad Ji, S Discrimiat Aalysis for Dimesioality Reductio: A Overview of Recet Developmets, Biometrics: Theory, Methods & Applicatios. IEEE/Wiley. Ye, J. 2005a. Characterizatio of a family of algorithms for geeralized discrimiat aalysis o udersampled problems. The Joural of Machie Learig Research 6. Ye, J. 2005b. Characterizatio of a family of algorithms for geeralized discrimiat aalysis o udersampled problems. Joural of Machie Learig 6: Zhag, X., ad Chu, D Sparse ucorrelated liear discrimiat aalysis. I ICML,

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

Chimica Inorganica 3

Chimica Inorganica 3 himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

A New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting

A New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting Iteratioal Coferece o Idustrial Egieerig ad Systems Maagemet IESM 2007 May 30 - Jue 2 BEIJING - CHINA A New Multivariate Markov Chai Model with Applicatios to Sales Demad Forecastig Wai-Ki CHING a, Li-Mi

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima ol 46 No 6 SCIENCE IN CHINA (Series F) December 3 A ew iterative algorithm for recostructig a sigal from its dyadic wavelet trasform modulus maxima ZHANG Zhuosheg ( u ), LIU Guizhog ( q) & LIU Feg ( )

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Overview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III:

Overview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III: Overview Structured learig for feature selectio ad predictio Yookyug Lee Departmet of Statistics The Ohio State Uiversity Part I: Itroductio to Kerel methods Part II: Learig with Reproducig Kerel Hilbert

More information

ACCURATE DICTIONARY LEARNING WITH DIRECT SPARSITY CONTROL. Hongyu Mou, Adrian Barbu

ACCURATE DICTIONARY LEARNING WITH DIRECT SPARSITY CONTROL. Hongyu Mou, Adrian Barbu ACCURATE DICTIONARY LEARNING WITH DIRECT SPARSITY CONTROL Hogyu Mou, Adria Barbu Statistics Departmet, Florida State Uiversity Tallahassee FL 32306 ABSTRACT Dictioary learig is a popular method for obtaiig

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Liear Algebra Appl. ; : 6 Published olie i Wiley IterSciece www.itersciece.wiley.com. DOI:./la The Perturbatio Boud for the Perro Vector of a Trasitio

More information

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods TMA4205 Numerical Liear Algebra The Poisso problem i R 2 : diagoalizatio methods September 3, 2007 c Eiar M Røquist Departmet of Mathematical Scieces NTNU, N-749 Trodheim, Norway All rights reserved A

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Direction of Arrival Estimation Method in Underdetermined Condition Zhang Youzhi a, Li Weibo b, Wang Hanli c

Direction of Arrival Estimation Method in Underdetermined Condition Zhang Youzhi a, Li Weibo b, Wang Hanli c 4th Iteratioal Coferece o Advaced Materials ad Iformatio Techology Processig (AMITP 06) Directio of Arrival Estimatio Method i Uderdetermied Coditio Zhag Youzhi a, Li eibo b, ag Hali c Naval Aeroautical

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Computational Tutorial of Steepest Descent Method and Its Implementation in Digital Image Processing

Computational Tutorial of Steepest Descent Method and Its Implementation in Digital Image Processing Computatioal utorial of Steepest Descet Method ad Its Implemetatio i Digital Image Processig Vorapoj Pataavijit Departmet of Electrical ad Electroic Egieerig, Faculty of Egieerig Assumptio Uiversity, Bagkok,

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Some New Iterative Methods for Solving Nonlinear Equations

Some New Iterative Methods for Solving Nonlinear Equations World Applied Scieces Joural 0 (6): 870-874, 01 ISSN 1818-495 IDOSI Publicatios, 01 DOI: 10.589/idosi.wasj.01.0.06.830 Some New Iterative Methods for Solvig Noliear Equatios Muhammad Aslam Noor, Khalida

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall Iteratioal Mathematical Forum, Vol. 9, 04, o. 3, 465-475 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.988/imf.04.48 Similarity Solutios to Usteady Pseudoplastic Flow Near a Movig Wall W. Robi Egieerig

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Higher-order iterative methods by using Householder's method for solving certain nonlinear equations

Higher-order iterative methods by using Householder's method for solving certain nonlinear equations Math Sci Lett, No, 7- ( 7 Mathematical Sciece Letters A Iteratioal Joural http://dxdoiorg/785/msl/5 Higher-order iterative methods by usig Householder's method for solvig certai oliear equatios Waseem

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

ME 539, Fall 2008: Learning-Based Control

ME 539, Fall 2008: Learning-Based Control ME 539, Fall 2008: Learig-Based Cotrol Neural Network Basics 10/1/2008 & 10/6/2008 Uiversity Orego State Neural Network Basics Questios??? Aoucemet: Homework 1 has bee posted Due Friday 10/10/08 at oo

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Multimodal Linear Discriminant Analysis via Structural Sparsity

Multimodal Linear Discriminant Analysis via Structural Sparsity Multimodal Liear Discriat Aalysis via Structural Sparsity Yu Zhag 1 ad Yua Jiag 2 1 Departmet of Computer Sciece ad Egieerig Hog Kog Uiversity of Sciece ad Techology 2 Natioal Key Laboratory for Novel

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Modified Decomposition Method by Adomian and. Rach for Solving Nonlinear Volterra Integro- Differential Equations

Modified Decomposition Method by Adomian and. Rach for Solving Nonlinear Volterra Integro- Differential Equations Noliear Aalysis ad Differetial Equatios, Vol. 5, 27, o. 4, 57-7 HIKARI Ltd, www.m-hikari.com https://doi.org/.2988/ade.27.62 Modified Decompositio Method by Adomia ad Rach for Solvig Noliear Volterra Itegro-

More information

MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign

MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING Yuheg Bu Jiaxu Lu Veugopal V. Veeravalli Uiversity of Illiois at Urbaa-Champaig Tsighua Uiversity Email: bu3@illiois.edu, lujx4@mails.tsighua.edu.c,

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

A Genetic Algorithm for Solving General System of Equations

A Genetic Algorithm for Solving General System of Equations A Geetic Algorithm for Solvig Geeral System of Equatios Győző Molárka, Edit Miletics Departmet of Mathematics, Szécheyi Istvá Uiversity, Győr, Hugary molarka@sze.hu, miletics@sze.hu Abstract: For solvig

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Derivative of a Determinant with Respect to an Eigenvalue in the LDU Decomposition of a Non-Symmetric Matrix

Derivative of a Determinant with Respect to an Eigenvalue in the LDU Decomposition of a Non-Symmetric Matrix Applied Mathematics, 203, 4, 464-468 http://dx.doi.org/0.4236/am.203.43069 Published Olie March 203 (http://www.scirp.org/joural/am) Derivative of a Determiat with Respect to a Eigevalue i the LDU Decompositio

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

Discrete Orthogonal Moment Features Using Chebyshev Polynomials Discrete Orthogoal Momet Features Usig Chebyshev Polyomials R. Mukuda, 1 S.H.Og ad P.A. Lee 3 1 Faculty of Iformatio Sciece ad Techology, Multimedia Uiversity 75450 Malacca, Malaysia. Istitute of Mathematical

More information

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3. Closed Leotief Model Chapter 6 Eigevalues I a closed Leotief iput-output-model cosumptio ad productio coicide, i.e. V x = x = x Is this possible for the give techology matrix V? This is a special case

More information

Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm

Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm Masahiro Kuroda Yuichi Mori Masaya Iizuka Michio Sakakihara (Okayama Uiversity of Sciece) (Okayama Uiversity

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

Research Article A New Second-Order Iteration Method for Solving Nonlinear Equations

Research Article A New Second-Order Iteration Method for Solving Nonlinear Equations Abstract ad Applied Aalysis Volume 2013, Article ID 487062, 4 pages http://dx.doi.org/10.1155/2013/487062 Research Article A New Secod-Order Iteratio Method for Solvig Noliear Equatios Shi Mi Kag, 1 Arif

More information

Relative Margin Machines

Relative Margin Machines Relative Margi Machies Paagadatta K Shivaswamy ad Toy Jebara Departmet of Computer Sciece, Columbia Uiversity, New York, NY pks0,jebara@cs.columbia.edu Abstract I classificatio problems, Support Vector

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,

More information

Estimation of Backward Perturbation Bounds For Linear Least Squares Problem

Estimation of Backward Perturbation Bounds For Linear Least Squares Problem dvaced Sciece ad Techology Letters Vol.53 (ITS 4), pp.47-476 http://dx.doi.org/.457/astl.4.53.96 Estimatio of Bacward Perturbatio Bouds For Liear Least Squares Problem Xixiu Li School of Natural Scieces,

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

PC5215 Numerical Recipes with Applications - Review Problems

PC5215 Numerical Recipes with Applications - Review Problems PC55 Numerical Recipes with Applicatios - Review Problems Give the IEEE 754 sigle precisio bit patter (biary or he format) of the followig umbers: 0 0 05 00 0 00 Note that it has 8 bits for the epoet,

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm , pp.10-106 http://dx.doi.org/10.1457/astl.016.137.19 The DOA Estimatio of ultiple Sigals based o Weightig USIC Algorithm Chagga Shu a, Yumi Liu State Key Laboratory of IPOC, Beijig Uiversity of Posts

More information

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error Proceedigs of the 6th WSEAS Iteratioal Coferece o SIGNAL PROCESSING, Dallas, Texas, USA, March -4, 7 67 Sesitivity Aalysis of Daubechies 4 Wavelet Coefficiets for Reductio of Recostructed Image Error DEVINDER

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution America Joural of Theoretical ad Applied Statistics 05; 4(: 6-69 Published olie May 8, 05 (http://www.sciecepublishiggroup.com/j/ajtas doi: 0.648/j.ajtas.05040. ISSN: 6-8999 (Prit; ISSN: 6-9006 (Olie Mathematical

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

The Phi Power Series

The Phi Power Series The Phi Power Series I did this work i about 0 years while poderig the relatioship betwee the golde mea ad the Madelbrot set. I have fially decided to make it available from my blog at http://semresearch.wordpress.com/.

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm 2270 IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 46, O. 8, AUGUST 1998 [12] Q. T. Zhag, K. M. Wog, P. C. Yip, ad J. P. Reilly, Statistical aalysis of the performace of iformatio criteria i the detectio of

More information