1 Efficient Splice Site Prediction with Context-Sensitive Distance Kernels

Size: px
Start display at page:

Download "1 Efficient Splice Site Prediction with Context-Sensitive Distance Kernels"

Transcription

1 1 Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels Berard Maderick, Feg Liu ad Bram Vaschoewikel This paper presets a compariso betwee differet cotext-sesitive kerel fuctios for doig splice site predictio with a support vector machie. Four types of kerel fuctios will be used: liear-, polyomial-, radial basis fuctio- ad egative distace-based kerels. Domai-kowledge ca be icorporated ito the kerels by icorporatig statistical measures or by directly pluggig i distace fuctios defied o the splice site istaces. From the experimetal results it becomes clear that the radial basis fuctio-based kerels get the best accuracies. However, because classificatio speed is of crucial importace to the splice site predictio system, this kerel is computatioally too expesive. Nevertheless, i geeral icorporatig domai kowledge does ot oly improve classificatio accuracy, but also reduces model complexity which i its tur agai icreases classificatio speed. 1. Itroductio A importat task i bio-iformatics is the aalysis of geome sequeces for the locatio ad structure of their gees, ofte referred to as gee fidig. I geeral, a gee ca be defied as a regio i a DNA sequece that is used i the productio of a specific protei. I may gees, the DNA sequece codig for proteis, called exos, may be iterrupted by stretches of o-codig DNA, called itros. A gee starts with a exo, is the iterrupted by a itro, followed by aother exo, itro ad so o, util it eds i a exo. Splicig is the process by which the itros are subtracted from the exos. Hece we ca make a distictio betwee two differet splice sites: i) the exoitro boudary, referred to as the door site ad ii) the itro-exo boudary, referred to as the acceptor site. Splice site predictio, a importat subtask i gee fidig systems, is the automatic idetificatio of those regios i the DNA sequece that are either door sites or acceptor sites [?]. Because splice site predictio istaces ca be represeted by a cotext of a umber of ucleotides before ad after the cadidate splice site, it is called a cotext-depedet classificatio task. I this paper we do splice predictio with support vector machies (SVMs) usig kerel fuctios that take ito accout the iformatio available at differet positios i the cotexts. I this sese the kerel fuctios are called cotextsesitive. This is explaied i Sectios?? ad??.

2 Advaces i Systems Modellig ad ICT Applicatios More precisely, i a support vector machie, the data is first mapped oliearly from the origial iput space X to a high-dimesioal Hilbert space called the feature space F ad the separated by a maximum-margi hyperplae, i.e. liearly, i that space F. By makig use of the kerel trick, the mappig : X F remais implicit, ad as a result we avoid workig i the high-dimesioal feature space F. Moreover, because the mappig is o-liear, the decisio boudary which is liear i F correspods to a o-liear decisio boudary i the iput space X. Oe of the most importat desig decisios i SVM learig is the choice of kerel fuctio K : X X because the maximum-margi hyperplae is defied completely by ier products of vectors i the Hilbert feature space F usig the kerel fuctio K. Sice K takes elemets x ad y from the iput space X ad calculates the ier products of (x) ad (y) i the Hilbert feature space F without havig to represet or eve to kow the exact form of the elemets (x) ad (y). As a cosequece the mappig remais implicit ad we have a computatioal beefit [?]. I the light of the above it is ot hard to see that computatioal efficiecy of K is crucial for the success of the classificatio process. We refer to Sectio?? for more o theoretical backgroud cocerig the SVM. As a result, the learig process ca beefit a lot from the use of special purpose similarity or dissimilarity measures i the calculatio of K [?,?,?,?]. However, icorporatig such kowledge i a kerel fuctio is o-trivial sice a kerel fuctio K has to satisfy a umber of properties that result directly from the defiitio of a ier product. I this paper we will cosider two types of kerel fuctios that ca make direct use of distace fuctios defied o cotexts themselves: i) egative distace kerels ad ii) radial basis fuctio kerels. This is explaied i Sectio??. Furthermore, because classificatio speed is of crucial importace to a splice site predictio system the used kerel fuctios should be computatioally very efficiet. For that reaso, i related work o splice site predictio with a SVM a liear kerel is chose i favor of computatioal efficiecy but at the cost of some accuracy [?]. I this light most of the kerels preseted here will probably be too expesive, therefore we also show results for cotext-sesitive liear kerels ad from these results it ca be see that the classificatio speed ca be further icreased while at the same time precisio ad accuracy of the predictios are a little higher. This is discussed i Sectio??. 2. Cotext-Depedet Classificatio I this paper we cosider classificatio tasks where it is the purpose to classify a focus symbol i a sequece of symbols, based o a umber of symbols before ad after the focus symbol. The focus symbol, together with the symbols before ad after it, is called a cotext ad applicatios that rely o such cotexts will be called cotext-depedet. Splice site predictio is a typical example of a cotextdepedet classificatio task. Here, each symbol is oe of the four ucleotides {A,C,G, T}.

3 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels 2.1 Cotexts We start with a defiitio of a cotext followed by a illustratio i the framework of splice site predictio. Defiitio 1. A cotext s p is a sequece of symbols si D with p symbols before ad q symbols after a focus symbol s p at positio p as follows s p = (s 0,..., s p,... sp+q (1) with (p + q) + 1 the legth of the cotext, with D the dictioary of all symbols, with D = m ad with p the left cotext size ad with q the right cotext size. Example 1. Remid from the itroductio that i splice site predictio it is the purpose to automatically idetify those regios i a DNA sequece to be door sites or acceptor sites. Essetially, DNA is a sequece of ucleotides represeted by a four character alphabet or dictioary D = {A,C,G, T}. Moreover, a acceptor site always cotais the AG diucleotide ad a door site always cotais the GT diucleotide. I this light splice site predictio istaces ca be represeted by a cotext of a umber of ucleotides before ad after the AG/GT diucleotides. More precisely, give a fragmet of a DNA sequece,... CCATTGGTGGCAGCCAG... the cadidate door site give by the diucleotide GT ca be represeted by a cotext i terms of Defiitio?? as s p = A, T, T, G, GT SP S 0..., SP, G, G, C SP +..., SP + Q with p = 4 the left cotext size ad q = 3 the right cotext size ad with (p + q) + 1 = 8 the total legth of the cotext. Furthermore, for splice site predictio there is o eed to represet the AG/ GT diucleotides, because two separate classifiers are traied, oe for door sites ad oe for acceptor sites. I this light the oly possible symbols occurrig i the cotexts are give by the dictioary D. Note that, for reasos of computatioal efficiecy, i practice the symbols i the cotexts will be represeted by a iteger, more precisely by assigig all the symbols that occur i the traiig set a uique idex ad subsequetly usig that idex i the cotext istead of the symbols themselves. 2.2 The Overlap Metric The most basic distace fuctio defied o cotexts is called the overlap metric, it simply couts the umber of mismatchig symbols at correspodig positios i two cotexts.

4 Advaces i Systems Modellig ad ICT Applicatios Defiitio 2. be a set with cotexts s p ad q t p with = (p + q) + 1 the legth of the cotexts, with symbols si, ti 2 D the dictioary of all distict symbols with D = m ad let w be a cotext weight vector. The the overlap metric d OM : + is defied as - + : d OM -1 ( s, t) = w δ i ( s i, ti i= 0 ) with δ : = + defied as wi if si ti δ (s i,t i )= { 0 else (3) with w i 0 a cotext weight for the symbol at positio i. Next, we make a distictio betwee two cases: i) if all w i = 1 o weightig takes place ad the metric is referred to as the simple overlap metric d SOM ad ii) otherwise a positio depedet weightig does take place ad the metric is referred to as the weighted overlap metric d WOM. A questio that ow aturally rises is: what measures ca be used to weigh the differet cotext positios? Iformatio theory provides may useful tools for measurig statistics i the way described above. I this work we made use of three measures kow as i) iformatio gai [?], ii) gai ratio [?] ad iii) shared variace [?]. For more details the reader is referred to the related literature. 2.3 The Modified Value Differece Metric The Modified Value Differece Metric (MVDM) [?] is a powerful method for measurig the distace betwee sequeces of symbols like the cotexts cosidered here. The MVDM is based o the Stafill-Waltz Value Differece Metric itroduced i 1986 [?]. The MVDM determies the similarity of all the possible symbols at a particular cotext positio by lookig at co-occurrece of the symbols with the target class. Cosider the followig defiitio. Defiitio 3. Let be a set with cotexts s p ad q t p with = (p + q) + 1 the legth of the cotexts as before, with compoets s i ad t i D the dictioary of all distict symbols with D = m. The the modified value differece metric d MVDM : + is defied as i= 0 δ + r : d MVDM ( s, t) ( s i, ti ) (4) with r a costat ofte equal to 1 or 2 ad with : D D + the differece of the coditioal distributio of the classes as follows: -

5 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels M δ (s i,t i ) r = p (y j \s i ) p(y j \t i ) r (5) j = with y j the class labels ad with M the umber of classes i the classificatio problem uder cosideratio. 3. Cotext-Sesitive Kerel Fuctios I the followig sectio we will itroduce a umber of kerel fuctios that make direct use of the distace fuctios d SOM, d WOM ad d MVDM defied i the previous sectio. I the case of d WOM ad d MVDM the kerels are called cotextsesitive as they take ito accout the amout of iformatio that is preset at differet cotext positios as discussed above. 3.1 Theoretical Backgroud Remid that i the SVM framework classificatio is doe by cosiderig a kerel iduced feature mappig Φ that maps the data from the iput space X to a high dimesioal Hilbert space F ad classificatio is doe by meas of a maximummargi hyperplae i that space F. This is doe by makig use of a special fuctio called a kerel. Defiitio 4. A kerel K : X X is a symmetric fuctio so that for all x ad x' i X, K(x, x') = φ (x), (x') where φ is a (o-liear) mappig from the iput space X ito the Hilbert space F provided with the ier product.,.. However, ot all symmetric fuctios over X X are kerels that ca be used i a SVM, because a kerel fuctio eeds to satisfy a umber of coditios imposed by the fact that it calculates a ier product i F. More precisely, i the SVM framework we distiguish two classes of kerel fuctios: i) positive semidefiite kerels (PSD) ad ii) coditioally positive defiite (CPD) kerels. Whereas a PSD kerel ca be cosidered as oe of the most simple geeralizatios of oe of the simplest similarity measures, i.e. the ier product, CPD kerels ca be cosidered as geeralizatios of the simplest dissimilarity measure, i.e. the distace x x' [?,?,?]. Oe type of CPD kerel that is of particular iterest to us is give i [?] from which we quote the followig two theorems. Theorem 1. Let X be the iput space, the the fuctio K : X X : K d (x x') = x x' β with 0 < β 2 (6) is CPD. The kerel K defied i this way is referred to as the egative distace kerel. Aother result that is of particular iterest to us relates a CPD K to a PSD kerel K by pluggig i K ito the expoet of the stadard radial basis fuctio kerel, this is expressed i the followig theorem [?]:

6 Advaces i Systems Modellig ad ICT Applicatios Theorem 2. Let X be the iput space ad let K : X X be a kerel, the K is CPD if ad oly if K rbf (x x')= exp (γk (x x') (7) is PSD for all γ> 0. The kerel K rbf defied i this way is referred to as the radial basis fuctio kerel. For Theorem?? to work, it is assumed that X where is a ormed vector space. But, for cotexts i particular ad sequeces of symbols i geeral oe ca ot defie a orm like i the RHS of Equatio??. More precisely, give the results above, if we wat to use a arbitrary distace dx defied o the iput space X i a kerel K, we should be able to express it as d x( x x') = x x' from which it the automatically follows that d x is CPD by applicatio of Theorem??. I our case however, sice the iput space X the set of all cotexts of legth, the distaces d SOM, d WOM ad d MVDM we would like to use ca therefore ot be expressed i terms of Theorem??. Nevertheless, i previous work it has bee show that d SOM, d WOM ad d MVDM are CPD [?,?,?,?], this will be briefly explaied ext. For more details the reader is referred to the literature. More precisely, for the overlap metric defied o the cotexts it ca be show that it correspods to a orthoormal vector ecodig of those cotexts [?,?,?]. I the orthoormal vector ecodig every symbol i the dictioary D is represeted by a uique uit vector ad complete cotexts are formed by cocateatig these uit vectors. Notice that this is actually the stadard approach to cotext-depedet classificatio with SVMs [?,?] ad i this light the o-sesitive liear, polyomial, radial basis fuctio ad egative distace kerels employig the simple overlap metric (i.e. the uweighted case) preseted ext, are actually equivalet to the stadard liear, polyomial, radial basis fuctio ad egative distace kerel applied to the orthoormal vector ecodig of the cotexts. Fially, for the MVDM with r = 2 it ca be show that it correspods to the Euclidea distace i a trasformed space, based o a probabilistic reformulatio of the MVDM preseted i [?,?]. 3.2 A Weighted Polyomial Kerel The first kerel we will defie here is based o Equatio?? of the defiitio of the overlap metric from Defiitio??. I the same way as before, we make a distictio betwee the uweighted o-sesitive case ad the weighted cotextsesitive case, for more details the reader is referred to [?,?,?]. Defiitio 5. Let X be the iput space with cotexts s p ad t p with = (p+q)+1 the legth of the cotexts ad s i, t i D the symbols at positio i i the cotexts as before, ad let w be a cotext weight vector, the we defie the simple overlap kerel KSOK : X X as

7 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels K SOK d = s -, t δ( s i, ti ) + c (8) i = 0 with c 0, d > 0 ad w = 1, the weighted overlap kerel K WOK : X X is defied i the same way but with a cotext weight vector w Negative Distace Kerels Next, we give the defiitios of three egative distace kerels employig the distaces d SOM, d WOM ad d MVDM, for more details we refer to [?,?,?]. We start with the defiitio of two egative distace kerels usig the overlap metric from Defiitio??. I the same way as before, we make a distictio betwee the uweighted, o-sesitive case d SOM ad the weighted, cotext-sesitive case d WOM. Defiitio 6. Let X be the iput space with cotexts s p ad t p with = (p+q)+1 the legth of the cotexts ad s i, t i D the symbols at positio i i the cotexts as before, ad let w be a cotext weight vector, the we defie the egative overlap distace kerel K NODK : X X as K NODK s - β, t = d SOM ( s -, t) (9) with 0 <β 2 ad w = 1 as before, the egative weighted distace kerel K NWDK : X X is defied i the same way but substitutig d WOM for d SOM i the RHS of Equatio??, i.e. with a cotext weight vector w 1. Similarly, for the MVDM from Defiitio?? we ca defie a egative distace type kerel as follows. Defiitio 7. Let X be the iput space with cotexts s p ad t p with = (p+q)+1 the legth of the cotexts ad s i, t i D the symbols at positio i i the cotexts as before, the we defie the egative modified distace kerel K NMDK : X X as 2 K NMDK s - β, t = d MVDM ( s -, t) (10) 2 with 0 < β 2as before.

8 Advaces i Systems Modellig ad ICT Applicatios However, it should be oted that for r = 1 i the defiitio of the MVDM d MVDM is ot CPD ad thus for r = 1 the kerel K NMDK will also ot be CPD. Nevertheless, give the good empirical results we will use K NMDK with d MVDM ad r = 1 ayway. 3.4 Radial Basis Fuctio Kerels Next, we give the defiitios of three radial basis fuctio kerels employig the distaces d SOM, d WOM ad d MVDM, for more details we refer to [?,?,?]. We start with the defiitio of two radial basis fuctio kerels employig the overlap metric from Defiitio??. I the same way as before, we make a distictio betwee the uweighted o-sesitive case d SOM ad the weighted cotext-sesitive case d WOM. Defiitio 8. Let X be the iput space with cotexts s p ad q t p with = (p+q)+1 the legth of the cotexts ad s i, t i D the symbols at positio i i the cotexts as before, ad let w be a cotext weight vector, the we defie the overlap radial basis fuctio kerel K ORBF : X X as K ORBF s -, t = exp ( γ d SOM ( s -, t) ) with γ > 0 as before, with w = 1 ad the weighted radial basis fuctio kerel K WRBF : X X is defied i the same way but substitutig d WOM for d SOM i the RHS of Equatio??, i.e. with a cotext weight vector w 1. Similarly, for the MVDM from Defiitio?? we ca defie a radial basis fuctio type kerel as follows. - (11) Defiitio 9. Let X be the iput space with cotexts s p ad t p with = (p+q)+1 the legth of the cotexts ad s i, t i D the symbols at positio i i the cotexts as before, the we defie the modified radial basis fuctio kerel K MRBF : X X as K MRBF with γ > 0 as before. s -, t = exp ( γ d MVDM ( s -, t) ) It should however be oted that, with respect to the discussio above, i.e. that for r = 1 the distace d MVDM ad correspodig kerel K NMDK are ot CPD ad therefore here for r = 1 the kerel K MRBF will ot be PSD. Nevertheless, give the good empirical results we used it ayway. (12)

9 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels 4. Experimets We have doe a umber of experimets, first of all we wated to validate the feasibility of our approach ad compare our kerel fuctios that operate o cotexts directly ad see whether they are doig at least as well ad hopefully better tha the traditioal kerels. I these experimets the left ad right cotext legth was set to 50. Secod, we set up some experimets to fid the optimal left ad right cotext legth for each classifier. Third, we also looked at di- ad triucleotides to fid out whether this gave better performace tha the sigle ucleotide case. I the ext sectios, we describe the software ad the data sets used i our experimets, we discuss how we have set the differet parameters for the SVM, we preset ad discuss the results obtaied ad fially we give a overview of related work Software ad Data We did the experimets with LIBSVM [?], a Java/C++ library for SVM learig. The dataset we use i the experimets is a set of huma gees, which is referred to as HumGS [?]. Each istace is represeted by a fixed cotext size of 50 ucleotides before ad 50 ucleotides after the cadidate splice site based o the iitial desig strategy i [?]. Sice, we trai oe classifier to predict door sites ad aother classifier to predict acceptor sites, separate traiig ad test sets are costructed for door ad acceptor sites. For the purpose of traiig the classifiers, we costructed balaced traiig sets. For testig however we wat a reflectio of the real situatio ad keep the same ratio as give i the origial set HumGS. This is show i Table?? Parameter Selectio ad Accuracy Parameter selectio is doe by 5-fold cross validatio o the traiig set. For the ORBF, WRBF ad MRBF, there are two free parameters that eed to be optimized: the SVM cost parameterc (which is a trade-off for the model complexity ad the model accuracy) ad the radial basis fuctio parameter γ. Table 1. Overview of the data sets that have bee used for the splice site predictio experimets. data set gees GT+ GT AG+ AG HumGS traiig / testig / We performed a fie grid search for values of C ad betwee 2 16 ad 2 5. For the NODK, NWDK ad NMDK oly the cost parameter C has to be optimized because we choose fixed to 1 as this gives very good results, more precisely for β

10 10 Advaces i Systems Modellig ad ICT Applicatios = 2 results are ot good at all, other values have ot bee tried. Agai, values for C betwee 2 16 ad 2 5 have bee cosidered. The LINEAR kerel i the results below refers to the the overlap kerel from Defiitio?? with d = 1 ad c = 0. For the SOK ad the WOK we take d = 2 ad c = 0 as previous work poited out that higher values for d actually leads to bad results, while takig values for c > 0 does ot have a sigificat impact o the results. As a weightig scheme for the weighted kerels, we used three differet weights: Iformatio Gai (IG), Gai Ratio (GR) ad Shared Variace (SV). For more details the reader is referred to [?]. Splice site predictio systems are ofte evaluated by meas of the percetage of FP classificatios at a particular recall rate. This measure is referred to as FP% [?] ad is calculated as follows: FP% = #false positives # false positives + # true egatives x 00 We used this evaluatio measure for a recall rate of 95%, i this case the measure is referred to as FP95%, i.e. the FP95% measure gives the percetage of the predictios falsely classified as actual splice site at a level where the system has foud 95% of all actual splice sites i the test set. Note that it is the purpose to have FP95% as low as possible Results Table?? gives a overview of the fial FP95% results ad model complexity i terms of the umber of support vectors of the differet kerels o the splice site predictio task. Note that the cofidece itervals have bee obtaied by bootstrap resamplig, at a cofidece level = 0.05 [?]. A FP95% rate outside of these itervals is assumed to be sigificatly differet from the related FP95% rate at a cofidece level of = I additio to the fial FP95% results we also give as a illustratio two FP% plots, for door sites, comparig the cotext-sesitive kerels with those kerels that are ot cotext-sesitive. Figure?? does this for the egative distace kerel makig use of the MVDM ad Figure?? does this for the radial basis fuctio kerel makig use of the WOK with GR, IG ad SV weights. 10

11 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels 11 Table 2. Splice site predictio, results for all kerels, for door sites ad for acceptor sites. door sites acceptor sites Kerel ad Weights FP95% #S Vs FP95% #S Vs LINEAR 8.18 ± ± LINEAR/GR 7.86 ± ± LINEAR/IG 7.92 ± ± LINEAR/SV 7.88 ± ± SOK 7.19 ± ± WOK/GR 6.51 ± ± WOK/IG 6.38 ± ± WOK/SV 6.43 ± ± NODK 7.97 ± ± NWDK/GR 6.43 ± ± NWDK/IG 6.40 ± ± NWDK/SV 6.38 ± ± NMDK (r = 1) 6.26 ± ± NMDK (r = 2) 6.38 ± ± ORBF 7.46 ± ± WRBF/GR 6.25 ± ± WRBF/IG 6.21 ± ± WRBF/SV 6.27 ± ± MRBF (r = 1) 5.81 ± ± MRBF (r = 2) 6.40 ± ± From the results it ca be easily see that i all cases the cotext-sesitive kerels makig use of the WOM with IG, GR ad SV weights ad the MVDM always outperform their simple o-sesitive couterparts both i accuracy ad i model complexity. Moreover i almost all cases this happes with a sigificat differece. There is however oe exceptio, i.e. the MRBF with r = 2 for acceptor sites performs worse tha its o-sesitive couterpart. Aother overall observatio is that the differece i the results betwee differet cotext weights is ot sigificat at all. Fially, it ca be see that the best result for door sites is obtaied by the MRBF with r = 1, but for acceptor sites this is the WRBF with IG weights. Therefore it is clear that the success of the used metric ad the used weights depeds, for a great deal, o the properties of the data uder cosideratio so that it is worthwhile tryig differet metrics ad differet cotext weights to see which oe gives the best result. 11

12 12 Advaces i Systems Modellig ad ICT Applicatios Fially, if oe would like to use the LINEAR kerel i favour of classificatio speed but at the cost of some accuracy, it ca be see from the results that the weighted LINEAR kerel outperforms its uweighted couterpart, although the differece is ot sigificat at a cofidece level = Nevertheless, it ca be see that the umber of support vectors is sigificatly lower tha for the uweighted LINEAR kerel ad this will result i faster classificatio, because classificatio of a ew istace happes by comparig it with every support vector i the model through the kerel fuctio K. Next, we look at the experimets to fid the optimal left ad right cotext legth for each classifier. The, we look at di- ad triucleotides to fid out whether this gave better performace tha the sigle ucleotide case. For these experimets we used the WRBF/GR kerel, this choice was based o the fact that WRBF performs secod best for door sites ad best for acceptor sites. Moreover, sice IG(iformatio gai), GR(gai ratio) ad SV(shared variace) were ot sigificatly differet i our experimets we used GR (gai ratio) as the weightig scheme. This follows from the experimets described above. The results are show i Table??. Table 3. Splice site predictio, for the WRBF/GR kerel, for door sites ad for acceptor sites. door sites acceptor sites r. ucleotides FP95% left cotext right cotext FP95% left cotext right cotext Sigle ucleotide 6.54 ± ± Diucleotide 5.52 ± ± Triucleotide 5.87 ± ± Related Work The umber of papers o splice site predictio ad the related problem of gee fidig is eormous ad hece it is impossible to give a exhaustive overview. We will give some popular refereces (accordig to citeseer) ad discuss some recet work. The problem of recogizig sigals i geomic sequeces by computer aalysis was pioeered by Stade [?] ad the recogitio of splice sites usig eural etworks was first addressed by Bruak et al. [?]. They traied a backpropagatio feedforward eural etwork with oe layer of hidde uits to recogize door ad acceptor sites, respectively. The iput cosist of a slidig widow cetered o the ucleotide for which a predictio is to be made. The widow is ecoded as a umerical vector. The best results were obtaied by combiig a eural etwork to recogize the cosesus sigal at the splice site with aother oe that predicted codig regios based o the statistical properties of the codo usage ad preferece. This tool is available olie at

13 Key Note 1: Efficiet Splice Site Predictio with Cotext-Sesitive Distace Kerels 13 Kulp et al. [?] ad Reese et al. [?] build upo the work of Bruak by explicitely takig ito accout the correlatios betwee eighborig ucleotides aroud a splice site by usig diucleotides as iput features istead of sigle ucleotides. This tool is available olie at Geesplicer [?] uses a combiatio of a hidde Markov model ad a decisio tree. They obtaied good performace compared to other leadig splice site detectors at that time. Rätsch ad Soeburg [?] use a SVM with a special kerel to classify ucleotides as either door or acceptor sites. There is oe SVM for door sites ad oe for acceptor sites. The system predicts the correct splice form for more tha 92% of these gees. This approach is quite similar to ours but the kerel is differet. Fially, a list of olie tools for splice site predictio ad gee fidig is available at 5. Coclusios I this article it was show how differet statistical measures ad distace fuctios ca be icluded ito kerel fuctios for SVM learig i cotextdepedet classificatio tasks. The purpose of this approach is to make the kerels sesitive to the amout of iformatio that is preset i the cotexts. More precisely, the case of splice site predictio has bee discussed ad from the experimetal results it became clear that the sesitivity iformatio has a positive effect o the results. So far, this was show o oly oe data set because the SVM is computatioally very expesive but we have show that kerel fuctios that operate o cotexts directly gives additioal beefits. At the momet, we are ruig experimets o a umber of other data sets to show that the icreased performace is ot due to bias to the data sets. Apart from that, we are ruig experimets with more complex features based o the improved desig strategy i [?], where a FP95% rate of 2.2% for door ad 2.9% for acceptor sites is obtaied. I this light it remais to be see whether the positive effect of the sesitivity iformatio will still be sigificat i a system that already performs at very high precisio without such iformatio. Fially, we pla to compare our results with the oes obtaied by other classifiers o the same data sets.

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

11 Hidden Markov Models

11 Hidden Markov Models Hidde Markov Models Hidde Markov Models are a popular machie learig approach i bioiformatics. Machie learig algorithms are preseted with traiig data, which are used to derive importat isights about the

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Square-Congruence Modulo n

Square-Congruence Modulo n Square-Cogruece Modulo Abstract This paper is a ivestigatio of a equivalece relatio o the itegers that was itroduced as a exercise i our Discrete Math class. Part I - Itro Defiitio Two itegers are Square-Cogruet

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

Hashing and Amortization

Hashing and Amortization Lecture Hashig ad Amortizatio Supplemetal readig i CLRS: Chapter ; Chapter 7 itro; Sectio 7.. Arrays ad Hashig Arrays are very useful. The items i a array are statically addressed, so that isertig, deletig,

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Introduction to Signals and Systems, Part V: Lecture Summary

Introduction to Signals and Systems, Part V: Lecture Summary EEL33: Discrete-Time Sigals ad Systems Itroductio to Sigals ad Systems, Part V: Lecture Summary Itroductio to Sigals ad Systems, Part V: Lecture Summary So far we have oly looked at examples of o-recursive

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities Polyomials with Ratioal Roots that Differ by a No-zero Costat Philip Gibbs The problem of fidig two polyomials P(x) ad Q(x) of a give degree i a sigle variable x that have all ratioal roots ad differ by

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

CALCULATION OF FIBONACCI VECTORS

CALCULATION OF FIBONACCI VECTORS CALCULATION OF FIBONACCI VECTORS Stuart D. Aderso Departmet of Physics, Ithaca College 953 Daby Road, Ithaca NY 14850, USA email: saderso@ithaca.edu ad Dai Novak Departmet of Mathematics, Ithaca College

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016 subcaptiofot+=small,labelformat=pares,labelsep=space,skip=6pt,list=0,hypcap=0 subcaptio ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, /6/06. Self-cojugate Partitios Recall that, give a partitio λ, we may

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

Lecture 1: Basic problems of coding theory

Lecture 1: Basic problems of coding theory Lecture 1: Basic problems of codig theory Error-Correctig Codes (Sprig 016) Rutgers Uiversity Swastik Kopparty Scribes: Abhishek Bhrushudi & Aditya Potukuchi Admiistrivia was discussed at the begiig of

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

CALCULATING FIBONACCI VECTORS

CALCULATING FIBONACCI VECTORS THE GENERALIZED BINET FORMULA FOR CALCULATING FIBONACCI VECTORS Stuart D Aderso Departmet of Physics, Ithaca College 953 Daby Road, Ithaca NY 14850, USA email: saderso@ithacaedu ad Dai Novak Departmet

More information

UNIVERSITY OF CALIFORNIA - SANTA CRUZ DEPARTMENT OF PHYSICS PHYS 116C. Problem Set 4. Benjamin Stahl. November 6, 2014

UNIVERSITY OF CALIFORNIA - SANTA CRUZ DEPARTMENT OF PHYSICS PHYS 116C. Problem Set 4. Benjamin Stahl. November 6, 2014 UNIVERSITY OF CALIFORNIA - SANTA CRUZ DEPARTMENT OF PHYSICS PHYS 6C Problem Set 4 Bejami Stahl November 6, 4 BOAS, P. 63, PROBLEM.-5 The Laguerre differetial equatio, x y + ( xy + py =, will be solved

More information

Practical Spectral Anaysis (continue) (from Boaz Porat s book) Frequency Measurement

Practical Spectral Anaysis (continue) (from Boaz Porat s book) Frequency Measurement Practical Spectral Aaysis (cotiue) (from Boaz Porat s book) Frequecy Measuremet Oe of the most importat applicatios of the DFT is the measuremet of frequecies of periodic sigals (eg., siusoidal sigals),

More information

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.

More information

Multilayer perceptrons

Multilayer perceptrons Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

Finally, we show how to determine the moments of an impulse response based on the example of the dispersion model.

Finally, we show how to determine the moments of an impulse response based on the example of the dispersion model. 5.3 Determiatio of Momets Fially, we show how to determie the momets of a impulse respose based o the example of the dispersio model. For the dispersio model we have that E θ (θ ) curve is give by eq (4).

More information

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) = AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

1 Hash tables. 1.1 Implementation

1 Hash tables. 1.1 Implementation Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

2 Geometric interpretation of complex numbers

2 Geometric interpretation of complex numbers 2 Geometric iterpretatio of complex umbers 2.1 Defiitio I will start fially with a precise defiitio, assumig that such mathematical object as vector space R 2 is well familiar to the studets. Recall that

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Lecture 10: Performance Evaluation of ML Methods

Lecture 10: Performance Evaluation of ML Methods CSE57A Machie Learig Sprig 208 Lecture 0: Performace Evaluatio of ML Methods Istructor: Mario Neuma Readig: fcml: 5.4 (Performace); esl: 7.0 (Cross-Validatio); optioal book: Evaluatio Learig Algorithms

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

7. Modern Techniques. Data Encryption Standard (DES)

7. Modern Techniques. Data Encryption Standard (DES) 7. Moder Techiques. Data Ecryptio Stadard (DES) The objective of this chapter is to illustrate the priciples of moder covetioal ecryptio. For this purpose, we focus o the most widely used covetioal ecryptio

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Oblivious Transfer using Elliptic Curves

Oblivious Transfer using Elliptic Curves Oblivious Trasfer usig Elliptic Curves bhishek Parakh Louisiaa State Uiversity, ato Rouge, L May 4, 006 bstract: This paper proposes a algorithm for oblivious trasfer usig elliptic curves lso, we preset

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information