Non-linear Feature Extraction by the Coordination of Mixture Models

Size: px
Start display at page:

Download "Non-linear Feature Extraction by the Coordination of Mixture Models"

Transcription

1 No-liear Feature Extractio by the Coordiatio of Mixture Models J.J. Verbeek N. Vlassis B.J.A. Kröse Itelliget Autoomous Systems Group, Iformatics Istitute, Faculty of Sciece, Uiversity of Amsterdam, Kruislaa 43, 98 SJ Amsterdam, The Netherlads Keywords: Caoical Correlatio Aalysis, Pricipal Compoet Aalysis, Self-orgaizig Maps. Abstract We preset a method for o-liear data projectio that offers o-liear versios of Pricipal Compoet Aalysis ad Caoical Correlatio Aalysis. The data is accessed through a probabilistic mixture model oly, therefore ay mixture model for ay type of data ca be plugged i. Gaussia mixtures are oe example, but mixtures of Beroulli s to model discrete data might be used as well. The algorithm miimizes a objective fuctio that exhibits oe global optimum that ca be foud by fidig the eigevectors of some matrix. Experimetal results o toy data ad real data are provided. Itroductio Much research has bee doe i the field of usupervised o-liear feature extractio to obtai a low dimesioal descriptio of high dimesioal data that preserves importat properties of the data. I this paper we are iterested i data that lies o or close to a low dimesioal maifold embedded (possibly o-liearly) i a Euclidea space of much higher dimesio. A typical example of such data are images of a object uder differet coditios (e.g. pose ad lightig) obtaied with a high resolutio camera. A low dimesioal represetatio may be desirable for differet reasos like compressio for storage ad commuicatio, for visualizatio of high dimesioal data, ad as a preprocessig step for further data aalysis or predictio tasks. A example is give i Fig., we have data i IR 3 which lies o a two dimesioal maifold (top plot). We wat to recover the structure of the data maifold, so that we ca uroll the data maifold ad work with the data expressed i the latet coordiates, i.e. coordiates o the maifold (secod plot). Here, we cosider a method to itegrate several local feature extractors ito a sigle global represetatio. The idea is to lear a mixture model o the data where each mixture compoet acts as a local feature extractor. A mixture of Probabilistic Pricipal Compoet Aalyzers [] is a typical example. After this mixture has bee leared, the local feature extractors are coordiated by fidig for each local feature extractor a suitable liear map ito a sigle global low-dimesioal coordiate system. The collective of the local feature extractors together with the liear maps ito the global latet space provides a global o-liear projectio from the data space ito the latet space. This differs from other o-liear feature extractio methods like: Geerative Topographic Map (GTM) [], Locally Liear GTM [3], Kohoe s Self-Orgaizig Map [4] ad a Variatioal Expectatio Maximizatio algorithm similar to SOM [5]. These algorithms are actually quite opposite i ature to the method preseted here: they start with a give cofiguratio of mixture compoets i the latet space ad adapt the mixture cofiguratio i the data space to optimize some objective fuctio. Here we start with a give ad fixed mixture i the data space ad fid a optimal cofiguratio i the latet space. The advatage of the latter method is that it is ot proe to local optima i the objective fuctio. Moreover, the solutio for a d-dimesioal latet space ca be foud by fidig the eigevec- Of course, fittig the mixture model o the data might be susceptible to local optima.

2 tors of a matrix correspodig to the d smallest up to the (d + ) st smallest eigevalues. The size of this matrix is give by the total umber of locally extracted features plus the umber of mixture compoets, i.e. if all k compoets extract d features the the size is k(d + ). The work preseted i this paper ca be categorized together with other recet algorithms for usupervised o-liear feature extractio like IsoMap [6], Local Liear Embeddig [7], Kerel PCA [8], Kerel CCA [9] ad Laplacia Eigemaps []. All these algorithms perform oliear feature extractio by miimizig a objective fuctio that exhibits a limited amout of stable poits that ca be characterized as eigevectors of some matrix. These algorithms are gerally simple to implemet, sice it is oly eeded to costruct some matrix, the the eigevectors are foud by stadard methods. I this paper we build upo recet work of Brad []. I the ext sectio we review the work of Brad, we derive the objective fuctio by a slight modificatio of the free-eergy used i [] ad we discuss how we ca fid its stable poits. I Sectio 3 we preset a method to obtai better results by usig several mixtures for the data i parallel. Secodly, we show i Sectio 4 that fidig the locally valid liear maps betwee the data space ad the latet space ca be regarded as a weighted form of Caoical Correlatio Aalysis (CCA), ad we show how we ca geeralize the work of Brad to perform oliear CCA. A geeral discussio ad coclusios ca be foud i Sectio 5. Liear Embeddig of Local Feature Extractors I this sectio we describe the work of Brad [], i a otatio that allows us to draw liks with CCA i sectio 4. Cosider a give data set X = {x,..., x N } ad a collectio of k local feature extractors of the data f s (x) for s =,... k. Thus, f s (x) is a vector cotaiig the features extracted from x by feature extractor s. Each feature extractor gives zero or more (liear) features of the data. With each feature extractor, we associate a mixture compoet i a probabilistic k-compoet mixture model. We write the desity o high dimesioal data items x as: p(x) = k p(s)p(x s), () s= where s is a variable ragig over the mixture compoets ad p(s) is the a-priori probability o compoet s (sometimes also termed mixig weight of compoet s). Next, we cosider the relatioship betwee the give represetatio of the data (deoted with x) ad the represetatio of the data i the latet space, which we have to fid. Throughout, we will use g to deote latet coordiates for data items. The g is for Global latet coordiate as opposed to locally extracted features, which we will deote by z. For the uobserved latet coordiate g correspodig to a data poit x ad coditioed o s, we assume the desity: p(g x, s) = N (g; κ s + A s f s (x), σ I) () where N (g; µ, Σ) is a Gaussia distributio o g with mea µ ad covariace Σ. The mea, g s, of p(g x, s) is the sum of the compoet mea κ s i the latet space ad a liear trasformatio, implemeted by A s, of f s (x ). From ow o we will use homogeeous coordiates ad write: L s = [A s κ s ] ad z s = [f s (x ) ], ad thus g s = L s z s. Cosider the margial (over s) distributio o latet coordiates give data: p(g x) = s p(s, g x) = s p(s x)p(g x, s). (3) Give a fixed set of local feature extractors ad a correspodig mixture model, we are iterested i fidig liear maps L s that give rise to ice projectios of the data i the latet space. By ice, we mea that compoets that have a high posterior probability to have geerated x (i.e. p(s x) is large) should give similar projectios of x i the latet space. So we would like the p(g x, s) to look similar for the compoets with large posterior. Sice the margial p(g x) is a Mixture of Gaussias, we ca measure the similarity betwee the predictios by lookig how much the mixture resembles a sigle Gaussia. If the predictios are all the same, the mixture is a sigle Gaussia. We are ow ready to formally defie the objective fuctio Φ({L,..., L k }), which sums the data log-likelihood ad a Kullback-Leibler divergece D which measures how ui modal the distributio p(g x) = s p(s x)p(g x, s) is. Φ = N log p(x ) = D(Q (g, s) p(g, s x )), (4) where Q distributes idepedetly over g ad s: Q (g, s) = q s Q (g) = q s N (g; g, Σ ). (5)

3 This objective fuctio is similar to the oe used i [, 3]. The differece here is that the covariace matrix of p(g x, s) does ot deped o the liear maps L s. If we fix the data space model, turig the data log-likelihood ito a costat, ad set q s = p(s x ), the the objective sums for each data poit x the Kullback-Leibler divergece betwee Q (g) ad p(g x, s), weighted by the posterior p(s x ): Φ =,s q s D(Q (g) p(g x, s)). (6) I order to miimize the objective Φ with respect to g ad Σ, it is easy to derive that we get: g = s q s g s ad Σ = σ I, (7) where I deotes the idetity matrix. Skippig some additive ad multiplicative costats with respect to the liear maps L s, the objective Φ the simplifies to: Φ =,s q s g g s, (8) with g s = L s z s as before. Note that we ca rewrite the objective as: Φ = q s q t g t g s. (9),s,t The objective is a quadratic fuctio of the liear maps L s []. At the expese of some extra otatio, we obtai a clearer form of the objective as a fuctio of the liear maps. Let: u = [q z... q k z k], () U = [u... u N ], () L = [L... L k ]. () Note that from (7), () ad () we have: g = (u L). The expected projectio coordiates ca thus be computed as: G = [g... g N ] = UL. (3) We defie the block-diagoal matrix D as: D D =... (4) D k where D s = q sz s z s. The objective the ca be writte as: Φ = Tr{L (D U U)L}. (5) The objective fuctio is ivariat for traslatio ad rotatio of the global latet space, as ca be easily see from (8). Furthermore, re-scalig the latet space chages the objective mootoically. To make solutios uique with respect to traslatio, rotatio ad scalig, we put two costraits: (tras.) : ḡ = g =, (6) N (rot. + sc.) : Σ g = (g ḡ)(g ḡ) (7) = L U (I /N)UL = I, where we used to deote the matrix with oes everywhere. Solutios are foud by differetiatio usig Lagrage multipliers, ad the colums v of L are foud to be geeralized eigevectors: (D U U)v = λu (I /)Uv (8) Sice the solutios are zero mea, it follows that Uv =, ad hece that: (D U U)v = λu Uv (9) Dv = (λ + )U Uv. () The value of the objective fuctio is give by: Φ = i = i v i (D U U)v i () λ i v i U Uv i () = (N ) i λ i var(g i ), (3) where var(g i ) deotes the variace i the i th coordiate of the projected poits. So rescalig to get uit variace makes the objective equal to (N ) i λ i. The smallest eigevalue is always zero, correspodig to projectios that collapses all projectios i the poit κ. This projectio has for s =,..., k : A s = ad κ s = κ. This is embeddig tells us othig about the data, therefore we eed the eigevectors correspodig to the secod up to the (d + ) st smallest eigevalues to obtai the best embeddig i d dimesios. Note that there is o problem if differet feature extractors extract a differet umber of features. I Fig. we give a illustratio of the above. The top two images show the origial data preseted to the algorithm ad the -dimesioal represetatio foud by the algorithm. The blue ad yellow sticks idicate the local feature extractors which are cetered o the red dots. The two lower scatter plots compare the coordiates o the data geeratig maifold with the discovered latet coordiates. For both coordiates there is high correlatio with the true latet coordiates.

4 .5 3 Icorporatig more poits i the objective fuctio I this sectio we discuss how usig multiple mixtures for the data ca help obtai more stable results. The ext sectio the discusses how this method ca be exploited to defie a o-liear form of Caoical Correlatio Aalysis. Sometimes the method of the previous sectio collapses several liear feature extractors i some part of the uderlyig data maifold oto a poit or a lie. Whereas the Automatic Aligmet (AA) method [4] was less sesitive to such pheomea. AA is based o the Local Liear Embeddig algorithm [7] ad fids liear maps to itegrate a collectio of locally valid feature extractors i a maer similar to what we described i the previous sectio. I AA the objective fuctio is:.5 Φ AA = m, W m g g m, (4) Figure : From top to bottom. () Data i IR 3 with charts idicated by the blue ad yellow sticks. () Data represetatio i IR. (3) First origial coordiate o maifold agaist first foud latet coordiate. (4) Secod origial coordiate o maifold agaist secod foud latet coordiate. agai we have g = s q sg. s The weights, collected i the weight matrix W, are foud by first determiig for every data poit x its earest eighbors () i the data space ad use the W that miimizes: x W m x m, (5) m () uder the costrait that m W m = ad oly earest eighbors ca have o-zero weights, thus: m / () : W m =. The objective fuctio (8) of the previous sectio is i fact oly based o poits for which the posteriors q s assigs o-egligible mass o at least two mixture compoets. This is see directly from (9), sice if oe of the q s takes all mass, all products of the posteriors are zero except for oe term where the squared distace is zero. I other words, the objective oly focuses o data that is o the overlap of at least two mixture compoets. So, oly poits for which the etropy of the posterior is reasoably high cotribute to the objective fuctio. The other data, for which the etropy i the posterior is low, is thus completely eglected i the objective fuctio ad thus i the coordiatio of the local feature extractors. I practice, ofte the bulk of the data has a low etropy posterior ad hardly plays a role i the objective fuctio. To make more use of the available data we should have more etropy i the posteriors. Oe way to do that is to smoothe the posteriors by takig the closest posterior i KL-divergece sese to the old posterior that has

5 These images ca be obtaied from the web page of Sam Roweis at a certai etropy, yieldig: q s qs. α Aother possibility is to smoothe such that the average etropy has a certai value. However, to determie the right amout of etropy i the posteriors seems a rather problematic questio. The problem ca also be circumveted by usig several mixtures, say two, i parallel. The mixtures are traied idepedetly ad the fial posterior for a specific mixture compoet from oe of the two mixtures is defied as half of the posterior withi the mixture from which this compoet origiates. So, the sum of the posteriors o the compoets origiatig from the first mixture equals / ad similarly for the secod mixture. I this maer we always have a etropy of at least oe bit i the posteriors, ad hece all poits make a sigificat cotributio to the objective fuctio. Note that we are usig just a heuristic here, more pricipled ways to lear a mixture with high etropy i the posteriors could be divised but do ot seem essetial. Oly i the case that the two idepedetly leared mixtures are the same we have ot gaied aythig. This, however, is ot likely to be the case for mixture models fitted by EM algorithms that fid local optima of the loglikelihood fuctio. We iitialize the EM mixture learig algorithm at differet values ad i this maer we (probably) fid mixtures that are differet local optima of the data log-likelihood fuctio. We ca thik of this method as coverig the data maifold ot with oe set of tiles, but usig two coverigs of the maifold. I this way every tile is overlappig with several other tiles o its complete extet. We foud i experimets that usig two mixtures istead of oe gives great improvemet i the obtaied results. Moreover, we prevet the computatio of the weight matrix W eeded by AA. The time eeded to compute the weight matrix is i priciple quadratic i the umber of data poits ad ca be problematic i o-euclidea spaces. By usig several mixtures i parallel, we avoid the quadratic cost ad have a method that is based purely o mixture models ad that ca be readily applied to ay type of mixture models. I Fig. we give some examples of data embeddigs obtaied o a data set of 965 pictures of a face. First the images were projected from the origial 8 pixel coordiates to dimesios with PCA. The projectio from to dimesios was doe by fittig a compoet mixture of -dimesioal probabilistic PCA [] ad aligig them with the method described i the previous sectio. We show the latet co- ordiates assiged to the pictures as dots, some pictures are show ext to their latet coordiate. The two bottom figures, where we used two parallel mixtures, show a ice separatio of the smilig faces ad the agry faces. Furthermore, the variatios i gaze directio are icely idetified. The two top images, where we used sigle mixtures with respectively te ad twety mixture compoets do ot show the separatio betwee the two moods. 4 No-liear Caoical Correlatio Aalysis I Caoical Correlatio Aalysis (CCA) [5, 6] two zero mea sets of poits are give: X = {x,..., x N } IR p ad Y = {y,..., y N } IR q. The aim is to fid liear maps a ad b, that map members of X ad Y respectively o the real lie, such that the correlatio betwee the liearly trasformed variables is maximized. This is easily show to be equivalet with miimizig: E = [ax by ] (6) uder the costraits that a[ x x ]a + b[ y y ]b =. Several geeralizatios of CCA are possible. Geeralizig the above such that the sets do ot eed to be zero mea ad allowig a traslatio as well gives the problem of miimizig: E = [(ax + t a ) (by + t b )], (7) agai uder the costraits that the sum of variaces of the projectios of X ad Y are equal to some fixed costat. We ca also geeralize by mappig to IR d istead of the real lie, ad the requirig the covariace matrix of the projectios to be idetity. 3 CCA ca also be readily exteded to take ito accout more tha two poit sets, c.f. [9]. I the geeralized CCA with multiple poitsets ad allowig traslatios ad mappig to IR d, we miimize the squared distace betwee all pairs of projectios uder the costraits that each set of projected poits has uit variace. We deote the projectio of the -th poit i the s- th poit-set as g s. We thus miimize the error 3 If the liear maps are costraied to be orthoormal matrices (rotatios plus reflectio) the this is kow as the Procrustes rotatio [7].

6 fuctio:.5.5 Φ CCA = k = s k s= t= k g s g t (8) g s g, (9) Figure : From top to bottom: () Oe mixture of compoets. () Oe mixture of compoets. (3) Two parallel mixtures of compoets each. where g = k s g s =. s g s. The costrait is that: Note that the objective Φ i equatio (8) coicides with Φ CCA i (9) if we set q s = /k. The differet costraits imposed upo the optimizatio by CCA ad our objective of the previous sectios tur out to be equivalet, sice the stable poits uder both costraits coicide. We ca regard the features extracted by each local feature extractor as a poit set. Hece, we ca iterpret the method of the Sectio as a weighted form of CCA that allows traslatio o multiple poit sets. Viewig the coordiatio of local feature extractors as a form of CCA suggests usig the coordiatio techique for o-liear CCA. This is achieved quite easily, without modifyig the objective fuctio (8). We cosider differet poit sets, each havig a mixture of locally valid liear projectios ito the global (i the sese of shared by all mixture compoets ad poit sets) latet space. We miimize the weighted sum of the squared distaces betwee all pairs of projectios, i.e. we have pairs of projectios due to the same poit set ad also pairs that combie projectios from differet poit sets. As with the use of parallel mixtures, the weight q s associated with the s th projectio of observatio, is give by the posterior of the mixture compoet i the correspodig mixture (e.g. the posterior p(s x ) if projectio s is due to a compoet i the mixture for X ad p(s y ) if s refers to a compoet i the mixture fitted to Y) divided over the umber of poit sets. It is ow clear that the use of parallel mixtures i the previous sectio was merely a form of o-liear CCA where both poit sets are the same. We agai defie g = s q sg s, where s ow rages over all mixture compoets i all mixtures fitted o the C differet poit sets. We use qr c to deote the posterior o compoet r for observatio i poit set c. where the qr c are rescaled with a factor C such that all qr c for oe poit set c sum to oe. Furthermore, we defie g c = r qc rgr c as the average projectio due to poit set c. The objective ca be rewritte i

7 differet forms: Φ = q s q t g s g t (3) s,t = q s g s g (3) s = [ ] qs c g gs c (3) C c,s = [ g g c ] C c + [ ] qs c g c gs c. (33) C c s Observe how i (33) the objective sums both betwee poit set cosistecy of the projectios (first summad) ad withi poit set cosistecy of the projectios (secod summad). As a illustrative toy applicatio of the oliear CCA we took two poit-sets i IR of 6 poits each. The first poit-set was geerated o a S-shaped curve the secod poit set was geerated alog a circle segmet. To both poit sets we added Gaussia oise. We leared a k = compoet mixture model o both poit-sets, illustrated i the top figures of Fig. 3. I the bottom figure the discovered latet space represetatio (vertical axis) is plotted agaist the coordiate o the geeratig curve (horizotal axis). The relatioship is mootoic ad roughly liear. 5 Discussio ad Coclusios We have show how we ca globally maximize a free-eergy objective similar to that of [, 3] with respect to the liear maps L s that map homogeeous coordiates of the local feature extractors ito a global latet space. This was achieved by makig the variace of p(g s, x) idepedet of the liear maps. The liear maps are foud by computig the d+ smallest eigevectors of a matrix. The edge size of the eigeproblem is give by k plus the total umber of extracted features. Typically, if all feature extractors give the same umber of features, this will be k(d+). However, it is ot ecessary to let every mixture compoet extract a equal umber of features. The way i which we fid the latet represetatio of the data is very similar to the k-segmets algorithm [8]. I the latter we first fitted a mixture of PCA s ad used a (oly locally optimal) combiatorial optimizatio algorithm to discover how the differet PCA s (segmets) could be aliged. Also here, first a mixture model is fitted to the data ad the a optimizatio problem is solved to coordiate the mixture compoets Figure 3: (Top) Two poit-sets ad the fitted mixture models. Local charts are idicated by the blue sticks. (Bottom) Scatter plot of the foud latet coordiates (vertical) ad coordiate alog the geeratig curve (horizotal).

8 However, i the curret work the optimizatio of the aligmet is globally optimal ad ca be applied to PCA s of ay dimesio ad latet spaces of ay dimesio. As compared to [4] we do ot have to compute eighbors aymore i the data space, which is costly ad moreover ca be hard whe processig data with mixed discrete ad cotiuous features. We oly eed a mixture model to produce posteriors here. I [] the same objective ad solutio are used as here. We showed how usig multiple mixtures i parallel ca boost performace sigificatly ad how the same techique ca be used for o-liear CCA. We illustrated the viability of the preseted algorithms o sythetic ad real data. I future work, we wat to exploit the lik with CCA further by ivestigatig its applicability i classificatio ad regressio problems. Ackowledgmet This research is supported as project AIF 4997 by the Techology Foudatio STW, applied sciece divisio of NWO ad the techology program of the Miistry of Ecoomic Affairs. Refereces [] M.E. Tippig ad C.M. Bishop. Mixtures of probabilistic pricipal compoet aalysers. Neural Computatio, ():443 48, 999. [] C. M. Bishop, M. Svesé, ad C. K. I Williams. GTM: The geerative topographic mappig. Neural Computatio, :5 34, 998. [3] J.J. Verbeek, N. Vlassis, ad B. Kröse. Locally liear geerative topographic mappig. I M. Wierig, editor, Proc. of th Belgia- Dutch cof. o Machie Learig,. [4] T. Kohoe. Self-Orgaizig Maps. Spriger,. [5] J.J. Verbeek, N. Vlassis, ad B.J.A. Kröse. Self-orgaizatio by optimizig free-eergy. I M. Verleyse, editor, Proc. of Europea Symposium o Artificial Neural Networks. D-side, Evere, Belgium, 3. To appear, preprit available from [6] J.B. Teebaum, V. de Silva, ad J.C. Lagford. A global geometric framework for oliear dimesioality reductio. Sciece, 9(55):39 33, December. [7] S.T. Roweis ad L.K. Saul. Noliear dimesioality reductio by locally liear embeddig. Sciece, 9(55):33 36, December. [8] B. Schölkopf, A.J. Smola, ad K. Müller. Noliear compoet aalysis as a kerel eigevalue problem. Neural Computatio, :99 39, 998. [9] F. R. Bach ad M. I. Jorda. Kerel idepedet compoet aalysis. Joural of Machie Learig Research, 3: 48,. [] M. Belki ad P. Niyogi. Laplacia eigemaps ad spectral techiques for embeddig ad clusterig. I T. G. Dietterich, S. Becker, ad Z. Ghahramai, editors, Advaces i Neural Iformatio Processig Systems 4, Cambridge, MA,. MIT Press. [] M. Brad. Chartig a maifold. I Advaces i Neural Iformatio Processig Systems 5, Cambridge, MA, 3. MIT Press. [] S.T. Roweis, L.K. Saul, ad G.E. Hito. Global coordiatio of local liear models. I T.G. Dietterich, S. Becker, ad Z. Ghahramai, editors, Advaces i Neural Iformatio Processig Systems 4, Cambridge, MA, USA,. MIT Press. [3] J.J. Verbeek, N. Vlassis, ad B. Kröse. Coordiatig Pricipal Compoet Aalyzers. I J.R. Dorrosoro, editor, Proceedigs of It. Cof. o Artificial Neural Networks, pages 94 99, Madrid, Spai,. Spriger. [4] Y.W. Teh ad S.T. Roweis. Automatic aligmet of local represetatios. I Advaces i Neural Iformatio Processig Systems 5, Cambridge, MA, 3. MIT Press. [5] H. Hotellig. Relatio betweee two sets of variates. Biometrika, 8:3 377, 936. [6] K. V. Mardia, J. T. Ket, ad J. M. Bibby. Multivariate Aalysis. Academic Press, 994. [7] T.F. Cox ad M.A.A. Cox. Multidimesioal Scalig. Number 59 i Moographs o statistics ad applied probability. Chapma & Hall, 994. [8] J.J. Verbeek, N. Vlassis, ad B. Kröse. A k-segmets algorithm for fidig pricipal curves. Patter Recogitio Letters, 3(8):9 7,.

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

18.S096: Homework Problem Set 1 (revised)

18.S096: Homework Problem Set 1 (revised) 8.S096: Homework Problem Set (revised) Topics i Mathematics of Data Sciece (Fall 05) Afoso S. Badeira Due o October 6, 05 Exteded to: October 8, 05 This homework problem set is due o October 6, at the

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall Iteratioal Mathematical Forum, Vol. 9, 04, o. 3, 465-475 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.988/imf.04.48 Similarity Solutios to Usteady Pseudoplastic Flow Near a Movig Wall W. Robi Egieerig

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Optimum LMSE Discrete Transform

Optimum LMSE Discrete Transform Image Trasformatio Two-dimesioal image trasforms are extremely importat areas of study i image processig. The image output i the trasformed space may be aalyzed, iterpreted, ad further processed for implemetig

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Ellipsoid Method for Linear Programming made simple

Ellipsoid Method for Linear Programming made simple Ellipsoid Method for Liear Programmig made simple Sajeev Saxea Dept. of Computer Sciece ad Egieerig, Idia Istitute of Techology, Kapur, INDIA-08 06 December 3, 07 Abstract I this paper, ellipsoid method

More information

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t Itroductio to Differetial Equatios Defiitios ad Termiolog Differetial Equatio: A equatio cotaiig the derivatives of oe or more depedet variables, with respect to oe or more idepedet variables, is said

More information

Chimica Inorganica 3

Chimica Inorganica 3 himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Spectral Clustering and Kernel PCA are Learning Eigenfunctions

Spectral Clustering and Kernel PCA are Learning Eigenfunctions Spectral Clusterig ad Kerel PCA are Learig Eigefuctios Yoshua Begio, Pascal Vicet, Jea-Fraçois Paiemet Olivier Delalleau, Marie Ouimet, ad Nicolas Le Roux Départemet d Iformatique et Recherche Opératioelle

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

1 Adiabatic and diabatic representations

1 Adiabatic and diabatic representations 1 Adiabatic ad diabatic represetatios 1.1 Bor-Oppeheimer approximatio The time-idepedet Schrödiger equatio for both electroic ad uclear degrees of freedom is Ĥ Ψ(r, R) = E Ψ(r, R), (1) where the full molecular

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Principle Of Superposition

Principle Of Superposition ecture 5: PREIMINRY CONCEP O RUCUR NYI Priciple Of uperpositio Mathematically, the priciple of superpositio is stated as ( a ) G( a ) G( ) G a a or for a liear structural system, the respose at a give

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Machine Learning for Data Science (CS4786) Lecture 4

Machine Learning for Data Science (CS4786) Lecture 4 Machie Learig for Data Sciece (CS4786) Lecture 4 Caoical Correlatio Aalysis (CCA) Course Webpage : http://www.cs.corell.edu/courses/cs4786/2016fa/ Aoucemet We are gradig HW0 ad you will be added to cms

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Image Spaces. What might an image space be

Image Spaces. What might an image space be Image Spaces What might a image space be Map each image to a poit i a space Defie a distace betwee two poits i that space Mabe also a shortest path (morph) We have alread see a simple versio of this, i

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

A NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS

A NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS A NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS STEVEN L. LEE Abstract. The Total Least Squares (TLS) fit to the poits (x,y ), =1,,, miimizes the sum of the squares of the perpedicular distaces

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities Polyomials with Ratioal Roots that Differ by a No-zero Costat Philip Gibbs The problem of fidig two polyomials P(x) ad Q(x) of a give degree i a sigle variable x that have all ratioal roots ad differ by

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

Discrete Orthogonal Moment Features Using Chebyshev Polynomials Discrete Orthogoal Momet Features Usig Chebyshev Polyomials R. Mukuda, 1 S.H.Og ad P.A. Lee 3 1 Faculty of Iformatio Sciece ad Techology, Multimedia Uiversity 75450 Malacca, Malaysia. Istitute of Mathematical

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Time-Domain Representations of LTI Systems

Time-Domain Representations of LTI Systems 2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS

APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS 1. Itroductio Let C be a bouded, covex subset of. Thus, by defiitio, with every two poits i the set, the lie segmet coectig these two poits is

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe

More information

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: 0 Multivariate Cotrol Chart 3 Multivariate Normal Distributio 5 Estimatio of the Mea ad Covariace Matrix 6 Hotellig s Cotrol Chart 6 Hotellig s Square 8 Average Value of k Subgroups 0 Example 3 3 Value

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

September 2012 C1 Note. C1 Notes (Edexcel) Copyright   - For AS, A2 notes and IGCSE / GCSE worksheets 1 September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright

More information

Notes on iteration and Newton s method. Iteration

Notes on iteration and Newton s method. Iteration Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Robust subspace mixture models using t-distributions

Robust subspace mixture models using t-distributions Robust subspace mixture models usig t-distributios Dick de Ridder ad Vojtĕch Frac Patter Recogitio Group, Imagig Sciece & Techology Dept. Delft Uiversity of Techology Loretzweg 1, 2628 CJ Delft, The Netherlads

More information

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION [412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION BY ALAN STUART Divisio of Research Techiques, Lodo School of Ecoomics 1. INTRODUCTION There are several circumstaces

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

The Sample Variance Formula: A Detailed Study of an Old Controversy

The Sample Variance Formula: A Detailed Study of an Old Controversy The Sample Variace Formula: A Detailed Study of a Old Cotroversy Ky M. Vu PhD. AuLac Techologies Ic. c 00 Email: kymvu@aulactechologies.com Abstract The two biased ad ubiased formulae for the sample variace

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated

More information

DIMENSION REDUCTION FOR POWER SYSTEM MODELING USING PCA METHODS CONSIDERING INCOMPLETE DATA READINGS

DIMENSION REDUCTION FOR POWER SYSTEM MODELING USING PCA METHODS CONSIDERING INCOMPLETE DATA READINGS Michiga Techological Uiversity Digital Commos @ Michiga Tech Dissertatios, Master's Theses ad Master's Reports - Ope Dissertatios, Master's Theses ad Master's Reports 3 DIMENSION REDUCTION FOR POWER SYSTEM

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

(VII.A) Review of Orthogonality

(VII.A) Review of Orthogonality VII.A Review of Orthogoality At the begiig of our study of liear trasformatios i we briefly discussed projectios, rotatios ad projectios. I III.A, projectios were treated i the abstract ad without regard

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information