FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately

Size: px
Start display at page:

Download "FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately"

Transcription

1 FrgBg, n ccurte representtion of protein structure, retrieves structurl neighbors from the entire PDB quickly n ccurtely Inbl Buowski-Tl, Yuvl Nov b, n Rchel Kolony, Deprtment of Computer Science n b Deprtment of Sttistics, University of Hif, Mount Crmel, Hif 3905, Isrel Communicte by Michel Levitt, Stnfor University School of Meicine, Stnfor, CA, December, 2009 (receive for review November 3, 2009) Fst ientifiction of protein structures tht re similr to specifie query structure in the entire Protein Dt Bnk (PDB) is funmentl in structure n function preiction. We present FrgBg: An ultrfst n ccurte metho for compring protein structures. We escribe protein structure by the collection of its overlpping short contiguous bckbone segments, n iscretize this set using librry of frgments. Then, we succinctly represent the protein s bgs-of-frgments vector tht counts the number of occurrences of ech frgment n mesure the similrity between two structures by the similrity between their vectors. Our representtion hs two itionl benefits: (i) it cn be use to construct n inverte inex, for implementing fst structurl serch engine of the entire PDB, n (ii) one cn specify structure s collection of substructures, without combining them into single structure; this is vluble for structure preiction, when there re relible preictions only of prts of the protein. We use receiver operting chrcteristic curve nlysis to quntify the success of FrgBg in ientifying neighbor cnite sets in tset of over 2,900 structures. The gol stnr is the set of neighbors foun by six stte of the rt structurl ligners. Our best FrgBg librry fins more ccurte cnite sets thn the three other filter methos: The SGM, PRIDE, n metho by Zotenko et l. More interestingly, FrgBg performs on pr with the computtionlly expensive, yet highly truste structurl ligners STRUCTAL n CE. evlution of structure serch fst structurl serch of Protein Dt Bnk filter n refine protein bckbone frgments protein structure serch Fining structurl neighbors of protein, i.e., proteins tht shre with it sizble substructure, is n importnt yet persistently ifficult, tsk. Such structurl neighbors my hint t protein s function or evolutionry origin even without etectble sequence similrity, s structure is more conserve thn sequence. Inee, mny methos for protein function (, 2) n structure (3) preiction rely on fining such neighbors. Structurl clssifictions such s SCOP (4) n CATH (5) ientify some neighbors; however, there re mny other neighbors, which lthough clssifie ifferently, re ctully structurlly similr n importnt for function n structure preiction (, 6). Devising fst, ccurte, yet comprehensive, structurl serch tools for the rpily growing Protein Dt Bnk (PDB) remins n importnt chllenge. Structurl lignment quntifies the similrity between two protein structures, by ientifying two eqully size, geometriclly similr substructures. Mny structurl lignment methos hve been propose over the pst twenty yers [e.g., STRUCTAL (7), CE (8), n SSM (9)]. Regrless of the wy they quntify similrity n their serch heuristics, one cn efine common similrity score to ssess the resulting lignments, n crete best-of-ll metho tht gives the best lignment uner tht score (0). Unfortuntely, ligning two structures is n expensive computtion (0); thus, mny structurl lignment servers consier only representtive subset of the Protein Dt Bnk (PDB) (e.g. FATCAT () n CE). However, by using such sequencenonreunnt representtive sets we risk excluing interesting structurl vribility (2). In ny cse, nively structurlly ligning query ginst the entire PDB, or structurlly ligning ll PDB structures ginst one nother, is prohibitively expensive, s it requires O(n) or Oðn 2 Þ computtionlly intensive comprisons. To serch the entire PDB efficiently, reserchers evise the filter n refine prigm (3). A filter metho quickly sifts through lrge set of structures, n selects smll cnite set to be structurlly ligne by more ccurte, but computtionlly expensive, metho. Filter methos gin their spee by representing structures bstrctly typiclly s vectors n compring these representtions quickly. Such vector representtions llow constructing n inverte inex t structure tht enbles fst retrievl of neighbors, even in huge tsets [e.g. (4)]. PRIDE represents structure by the histogrms of igonls in its internl istnce mtrix, n mesures similrity between two structures by the similrity between their histogrms (5). Choi et l. (6) represent structure by vector of frequencies of locl fetures in its internl istnce mtrix, n mesure similrity between two structures by the istnce between their corresponing vectors. Inspire by knot theory, Rögen n Fin evise the Scle Guss Metric (SGM) metho, which represents structure by vector of 30 globl topologicl mesures of its bckbone (7). Zotenko et l. (8) represent protein structure by vector of the frequencies of ptterns of seconry structure element (SSE) triplets. Severl methos [e.g., (9), (20)] represent structure s n orere string of structurl frgments, n sequencelign these strings to mesure structurl similrity; such representtions re less suitble for constructing n inverte inex. To serch in very lrge tsets, computer scientists often represent objects s bg-of-wors (BOW) unorere collections of locl fetures. Web serch engines use n inverte inex of the BOW representtion of the web. Ech ocument n query is represente by the number of occurrences of its wors (2). In Computer Vision, texture imges re represente s BOW of locl imge fetures for texture recognition, object clssifiction, n imge n vieo retrievl [e.g., (22 23)]. In protein sequence nlysis, Leslie n coworkers escribe protein sequences s BOW of their k-mers for etecting remote homology (24). In FrgBg we represent protein structure s BOW of bckbone frgments, n use this representtion to ientify quickly goo cnite sets of structurl neighbors. Specificlly, we represent structure by vector whose entries count the number of times ech frgment pproximtes segment in the protein bckbone (Fig. ), n mesure similrity between two structures by the similrity between their corresponing vectors. In testing our pproch, we consier 24 librries of vrious sizes n Author contributions: I.B.-T., Y.N., n R.K. performe reserch n nlyze t; n Y.N. n R.K. esigne reserch n wrote the pper. The uthors eclre no conflict of interest. To whom corresponence shoul be resse. E-mil: trchel@cs.hif.c.il. This rticle contins supporting informtion online t /DCSupplementl. BIOPHYSICS AND COMPUTATIONAL BIOLOGY PNAS Februry 23, 200 vol. 07 no

2 frgment lengths, n use three BOW similrity mesures (Euclien, Cosine, n Histogrm Intersection). We stuy how well ifferent mesures ientify structurl neighbors, reltive to stringent gol stnr: The structurl neighbors foun by best-of-six structurl lignment metho (0). We lso test sttisticlly whether BOW representtions of structures within CATH ctegories re inee similr to ech other. We then compre the performnce of our mesure with tht of other filter methos SGM, Zotenko et l. (8), n PRIDE, to BLAST sequence lignment (25), n to the structurl lignment methos STRUC- TAL, CE, n SSM. Our best filter metho outperforms other filter methos n BLAST. More importntly, it performs on pr with the computtionlly expensive structurl ligners STRUCTAL n CE. In our tests, the rnking of the methos is insensitive to the threshol vlue efining structurl neighbors. Compring FrgBg vectors is orers of mgnitues fster thn structurl lignment; it lso is well suite for using in inverte inices. Thus, our metho cn be use to quickly ientify goo cnite sets of structurl neighbors in the entire PDB. Results In FrgBg, the bg-of-frgments tht represents protein structure is succinctly escribe by vector of length N, the size of the frgment librry. Fig. illustrtes how this vector is clculte from the α-crbon coorintes of protein. For ech contiguous (n overlpping) k-resiue segment long the protein bckbone, we ientify the librry frgment of length k tht fits it best in terms of rms fter optiml superposition. Then, we count the number of times ech librry frgment ws use, n escribe the protein by vector of these counts. Compring Filter Methos vi Receiver Operting Chrcteristic (ROC) Curve Anlysis. We mesure the ccurcy of structurl retrievl methos by how well they ientify cnite sets of structurl neighbors in tbse, given query structure. We consier tbse of 2,928 sequence-nonreunnt structures, n query it by ech of its structures. The gol-stnr nswer inclues neighbors foun by best-of-six structurl ligner (using SSAP (26), STRUCTAL, DALI (27), LSQMAN (28), CE, n SSM); this expensive computtion ws one previously (0). Structurl neighbors of the query re ones ligne to it with structurl lignment score [SAS; see Methos for efinition (7)] below threshol T (for T ¼ 2 Å, 3.5 Å, n 5 Å). We use the re uner the curve (AUC) of the ROC curves to mesure how well ech metho ientifies the structurl neighbors of query, n verge the AUCs over ll 2,928 queries. A higher AUC is better: A perfect imittor of the gol stnr, which rnks the structurl neighbors before ll other proteins, will hve n AUC of, wheres rnom mesure will hve n AUC of 0.5. Fig. 2 (Left) shows the verge AUCs with respect to the three gol stnrs efine bove (corresponing to the three vlues of T) for 24 librries (with frgments of length 5 2 I b c e Librry of Frgments f II f e III f f f e IV Number of Occurnces b c e f Frgment ID Fig.. FrgBg representtion of protein structure. (A) For illustrtion, we consier librry of 6 frgments. (B) Ech (overlpping) contiguous segment in the bckbone is ssocite with its most similr librry frgment. (C) All frgments re collecte to n unorere bg. (D) The structure is represente by vector v, whose entries count the number of times ech librry frgment ppere in the bg. In this exmple, v ¼ð4; 0; 0; 5; ; 3Þ, implying frgment orer (, b, c,, e, f). resiues). We consier three BOW similrity mesures: Cosine, Histogrm Intersection, n Euclien istnces. For comprison, the right pnel shows the verge AUCs of other methos: (i) sequence-bse metho using BLAST s E-vlue (25), (ii) the filter methos PRIDE, SGM, n tht of Zotenko et l. (8), n (iii) the structure ligners STRUCTAL, CE, n SSM. We sort the lignments by their SAS scores, n for STRUCTAL n Averge AUC of ROC Curve Averge AUC of ROC Curve Averge AUC of ROC Curve Frgment length (number of resiues) Frgment length (number of resiues) Frgment length (number of resiues) SAS Threshol = 5A cosine istnce histogrm Intersection istnce Eucliin istnce histogrm Intersection istnce cosine istnce histogrm Intersection istnce SAS Threshol = 3.5A cosine istnce Eucliin istnce SAS Threshol = 2A Eucliin istnce Librry Size SSM STRUCTAL FrgBg CE SGM Zutenko et l. PRIDE sequence lignment SSM STRUCTAL CE (N) STRUCTAL (N) FrgBg CE SGM Zutenko et l. sequence lignment PRIDE SSM CE(N), STRUCTAL FrgBg STRUCTAL (N) SGM CE Zutenko et l. sequence lignment PRIDE Fig. 2. The verge AUC of ROC curves of ientifying structurl neighbors. The AUC mesures how well rnking of structures imittes the rnking ccoring to gol stnr; lrger vlues correspon to more successful imittors, rnging from 0.5 ( rnom rnker) to ( perfect imittor). We consier three efinitions of structurl neighbors, using SAS threshols of 2 Å, 3.5 Å, n 5 Å. The left pnels show the performnce of librries with frgments of 6 2 resiues, n ifferent number of frgments (vlue long the x-xis), n using the cosine (plus sign), Euclien (circles), n Histogrm Intersection (imons) istnces. On the right we compre the best FrgBg result (400() librry n the cosine istnce) to other methos: Sequencebse similrity mesure in finely she blck, filter methos in she blck, n structure lignment methos in soli blck. We see tht the FrgBg performs similrly to CE n STRUCTAL two computtionlly expensive n well-truste structurl ligners Buowski-Tl et l.

3 CE, lso by their ntive scores. The verge AUCs plotte in Fig. 2 re lso vilble s supplementry mteril (Tble S). The rnking of the performnce of ifferent methos is generlly inepenent of T, the threshol efining structurl neighbors. Uner the strictest efinition, the neighbors of protein re only structures tht were ligne with n SAS score <2 Å; uner the most lx efinition, the neighbors re the structures tht were ligne with n SAS score <5 Å. As expecte, ll methos perform better (i.e., chieve higher verge AUCs) when the efinition of structurl neighbors is more strict, n less well when the efinition inclues more istnt structures. Becuse structures with SAS score <5 Å re often the most meningful neighbors, which re ifficult to etect ccurtely, we re most intereste in ientifying methos tht excel uner this stringent test (see Discussion). FrgBg s best result is with the 400() librry (400 frgments of length ) n the cosine istnce; the verge AUCs re 0.89 (T ¼ 2 Å), 0.77 (T ¼ 3.5 Å), n 0.75 (T ¼ 5 Å). The cosine istnce performs best. When using cosine istnce or the histogrm intersection istnce, lrger librries (which escribe locl structure vribility in higher etil), typiclly perform better. From compring librries of fixe size (00, 200, or 400 frgments), we lern tht when using cosine istnce, librries of longer frgments perform better; when using the histogrm intersection or the Euclien istnces, the length of the frgment oes not influence the results. The rnking of the filter methos from most to lest successful is (i) FrgBg (using the 400() librry n cosine istnce), (ii) SGM, (iii) Zotenko et l. s metho (8), n (iv) PRIDE, which performs similrly to the sequence-bse metho. Among the structurl ligners, the most successful is SSM, followe by STRUCTAL n CE. Fig. 2 n Tble show tht the best filter metho, i.e., the FrgBg representtion mentione bove, performs on pr with CE n STRUCTAL, two computtionlly expensive n highly truste structurl ligners. For exmple, using the T ¼ 5 Å threshol, FrgBg hs n verge AUC of 0.75, which is similr to CE s 0.74 using the ntive score, n 0.75 using SAS score; it is lower thn STRUCTAL s verge AUC of 0.83 using the ntive score n 0.84 using SAS score. Using the T ¼ 3.5 Å threshol, FrgBg hs n verge AUC of 0.77, which is similr to STRUC- TAL s 0.77 using its ntive score n to CE s 0.72 using SAS score. SSM outperforms FrgBg t ll three threshols. Ctegories of CATH Proteins hve BOW Descriptions tht Are Different from Ech Other In Sttisticlly Significnt Wy. Next, we test sttisticlly whether FrgBg grees with the CATH clssifiction, t the Clss Architecture (CA) n the Clss, Architecture, n Topology (CAT) levels. In bro strokes, we expect FrgBg Tble. AUCs of ROC Curves Using Best-of-Six Gol Stnr representtions of proteins in the sme ctegory to be more similr thn those of proteins in ifferent ctegories, n wish to verify tht such similrity, if foun, is not ue to mere chnce. When running mny sttisticl tests, s we o here (one for ech pir of ctegories), the chnces of getting t lest one significnt result is much higher thn the chnces of getting significnt result in single test, even when there is no true unerlying ifference (the multiple comprisons problem ). We control for this problem through the Bonferroni correction (29), n the flse iscovery rte (FDR) pproch (30). When using the Bonferroni correction, we just the require significnce level of ech test, so tht the probbility of flsely eclring t lest one test s significnt is the stnr α ¼ Tble 2 summrizes the results uner the Bonferroni correction, cross the 24 librries. For exmple, there re 2 minly-α CATH ctegories t the CAT level, n thus 2 2 ¼ 66 ctegory pirs; out of the 66 corresponing tests, 6 were significnt t the juste significnce level cross ll 24 librries, hence the frction 6 66 t the tble s first cell in the secon row. The prenthesize figures in the tble re the frction of significnt tests for the 400() librry. The iniviul test results re vilble in Dtsets S n S2. Using the FDR pproch, we fin which tests cn be eclre significnt, while controlling the verge frction of the wrongly eclre tests t some prechosen level; for etils, see ref. 30. Tble 2 lists the frction of tests eclre significnt, verge cross the 24 librries n uner n FDR of 0.05; the prenthesize figures re the frction of the tests eclre significnt for the 400() librry. The frctions reporte in Tble 2, ll being very close to, strongly support the conclusion tht the FrgBg representtion grees with the CATH clssifiction, both t the CA n the CAT level. Comprison of FrgBg Similrity Mesure to rms on Dtset of Structure Pirs Within Nucler Mgnetic Resonnce (NMR) Ensembles. Next, we stuy how well FrgBg ientifies similrity between structures tht re only loclly similr, i.e., hve highly similr substructures tht re connecte ifferently. The bility to ientify such locl similrity my help in etecting similrity to prtilly chrcterize structure, s neee in structure preiction. We consier pirs of structures within n NMR ensemble collection of structures tht re consistent with the experimentl constrints; these typiclly iffer only t severl flexible points long the bckbone, n re thus loclly similr. Our similrity criterion is efine by the threshol 0.35 the verge FrgBg cosine istnce between structures with the sme CAT clssifiction (see Tble 3). We use throughout this section the 400() librry. SAS Similrity Threshol (Å) Metho Averge vlue Rnk Spee * SSM using SAS score [2] Structl using SAS score [2] Structl using ntive score [2] CE using ntive score [2] FrgBg Cos istnce (400,) Fst CE using SAS score [2] FrgBg histogrm intersection (600,) Fst SGM Fst FrgBg Euclien istnce (40,6) Fst Zotenko et l. (8) Fst Sequence mtching by BLAST e-vlue Fst PRIDE Fst *Averge CPU minutes per query. Essentilly instntneous fter preprocessing (<0. s). BIOPHYSICS AND COMPUTATIONAL BIOLOGY Buowski-Tl et l. PNAS Februry 23, 200 vol. 07 no

4 Tble 2. Sttisticl nlysis summry Anlysis using Bonferroni correction *, Anlysis using FDR, Minly α Minly β Mixe α þ β Minly α Minly β Mixe α þ β CA 6 6 (6 6) 3 36 (35 36) 2 2 (2 2) 6 6 (6 6) (36 36) 2 2 (2 2) CAT 6 66 (65 66) (78 78) (225 23) (65 66) (78 78) (23 23) *Nonprenthesize vlues re the number of ctegory pirs foun ifferent (t the Bonferroni-juste significnt level) in ll 24 librries, ivie by the totl number of pirs; prenthesize vlues re the frction of significnt comprisons for the librry of 400 frgments of length. In ll cses, vlues close to re esirble. Nonprenthesize vlues re the frction of comprisons eclre significnt in the FDR nlysis, verge cross the 24 librries; prenthesize figures re the frction of the comprisons eclre significnt for the librry of 400 frgments of length. We use the tset of 230 NMR ensembles tht ws constructe in the PRIDE stuy (5). The set inclues 54,465 pirs, 43,246 of which hve rms 4 Å (bqv, bmy, e0, n lx were replce by their newer versions). Fig. 3 shows the FrgBg cosine istnce vs. rms, n the mrginl istributions of the two istnces. The vst mjority of pirs re ientifie s very similr by FrgBg: 9% hve cosine istnce below 0.35, showing tht FrgBg inee ientifies similrity between loclly similr structures. We see similr results using Histogrm intersection n Euclien istnce (Fig. S). For comprison, Tble 3 lists the mens n stnr evitions of the FrgBg istnces between structure pirs t ifferent levels of structurl similrity. The most similr pirs re those within NMR ensembles: We consier ll pirs n only those with rms 4 Å. We lso consier pirs in the set of 2,928 CATH omins tht hve the sme CATH, CAT, CA, n C clssifiction, s well s ll pirs. As expecte, the verge istnce grows s the sets become more structurlly iverse, uner ll three istnces. Discussion A Fst Filter tht Performs On Pr with Structurl Alignment Methos. FrgBg cn quickly ientify structurl neighbor cnites for structure query, n performs s well s some highly truste, computtionlly expensive structurl lignment methos. Our results with FrgBg s 400() librry re s ccurte s CE s, n lmost s ccurte s STRUCTAL s. This is impressive, s CE n STRUCTAL re mong the most ccurte structurl lignment methos (0). As expecte, structurl neighbors re generlly ientifie best by structurl ligners, less well by filters, n lest well by sequence lignment. Our results re robust we see similr rnking of methos using ifferent efinitions for structurl neighbors of protein. An ttrctive feture of bstrct representtions of protein structure such s FrgBg (n other filter methos) is tht one cn store the vectors representing ll PDB proteins in n inverte inex t structure esigne for fst retrievl of neighbors. Evlution Protocol. Retrieving structurl neighbors of query protein from the entire PDB is chllenge. We cnnot euce the structurl neighbors solely from SCOP or CATH, becuse crossfol similrities proteins tht re geometriclly similr, yet clssifie ifferently re very common (0, 3). Kihr n Skolnick (3) note tht crossfol similrities mong smll proteins (<00 resiues), re bunnt even t CATH s C level. Crossfol similrities re prticulrly importnt to ientify, when preicting structure n function (, 6). Recent Criticl Assessment of Techniques for Protein Structure Preiction (CASP) experiments show tht the most successful structure preiction methos construct their preictions from substructures tht re not, in generl, from proteins clssifie in the sme fol [e.g. (3, 32, 33)]. Frieberg n Gozik (34) showe tht crossfol similrities correlte well with the functionl similrity of proteins populting the fols, n Petrey n Honig (6) showe exmples of functionl similrity mong ifferently clssifie proteins. Nevertheless, proteins clssifie in the sme CATH ctegory (t the CA or CAT levels) re truly similr, n we expect their FrgBg representtions to be similr s well. Becuse there re mny CA n CAT ctegories, ech populte with mny proteins, we use sttisticl theory to test n confirm this. Any gol stnr must meet the chllenge: Becuse filter metho nees to ientify structurl neighbors of query, the gol stnr must be ll ientifie neighbors. Here, we fin these neighbors using the expensive computtion of best-of-six structurl ligner. Nmely, we ientify structure s neighbor if ny of the six methos fins in both sizble substructure tht cn be superimpose with low rms. Such neighbor is selecte regrless of its CATH clssifiction, n coul well belong to ctegory other thn tht of the query protein. H we relie on clssifiction, similr structures woul hve been mrke s nonneighbors, n the ROC curve nlysis woul hve effectively penlize filter methos tht correctly ientify them. Becuse structurl similrity tht is ue to sequence similrity is esy to ientify, we use tsets of nonreunnt sequences. This ensures tht we hve eliminte trivil pirs in our evlution protocol. Note tht when the structurl neighbors re efine with the T ¼ 2 Å, 3.5 Å threshols, sequence lignment oes better thn rnom (AUC > 0.5); inee, when there re only few structurl neighbors (e.g., only the query), even trivil sequence lignment metho will perform well, becuse it rnks the query s most similr to itself. This phenomenon will hve greter impct on the verge AUC when the number of structurl neighbors is smll (lower T). Thus, the verge AUC of the Tble 3. Averge FrgBg istnces in tsets of vrying structurl similrity Dtset Histogrm intersection istnce* Euclien (L 2 norm) istnce* Cosine istnce* Within NMR ensemble (rms 4 Å) Within NMR ensemble Sme CATH clssifiction Sme CAT clssifiction Sme CA clssifiction Sme C clssifiction Different C clssifiction *Using the librry of 400 frgments of length Buowski-Tl et l.

5 in protein structure spce. Finlly, this inex my be use to nswer prtilly chrcterize queries, to the benefit of the structure preiction community. Methos We consier 24 librries with frgments of 5 2 resiues, constructe in previous stuy (35). There, we clustere the frgments of the Cα trces of 200 ccurtely etermine structures, n forme librry by tking representtive from ech cluster. Fig. 3. Cosine intersection istnce vs. rms in structure pirs within NMR ensembles. The tset hs 230 NMR ensembles with 43,246 pirs hving rms 4 Å (5). FrgBg ientifies the vst mjority (9%) of the pirs in this set s very similr (cosine istnce below 0.35 the verge istnce in pirs of the sme CAT clssifiction, mrke in she pink). sequence lignment metho cts s lower boun, inicting the ifficulty of the tsk. Although not the min focus of this stuy, our results provie nother evlution of the performnce of STRUCTAL, CE, n SSM, n show tht SSM is the top performer. SSM compres structures in two stges. (i) A fst estimtion of structurl similrity by mtching the SSE grphs of the structures. This step clcultes two estimtes of the percent SSE mtch, n the user cn specify their llowe mximl vlues; the SSM server oes not provie combine vlue of these estimtes tht cn be use to rnk structure pirs. (ii) An expensive n ccurte lignment of the Cαs of the two structures. Here, we use the SAS of the lignments foun by SSM fter the secon stge. Serching with Prtilly Define Queries for Protein Structure Preiction. Importntly, FrgBg cn serch the entire PDB for neighbors of query structure tht is only prtilly chrcterize. Such serch is useful for structure preiction, s preiction methos often preict only the structure of prts of protein, n fining composition of these prts in the PDB my hint t how these prts shoul be combine into complete structure. In FrgBg, the missing informtion hs minor impct. The union of the bgsof-frgments of the prts iffers from the true bg-of-frgments only by the few frgments in the connecting regions. Similrly, two structures tht re flexible vrints (i.e., iffer only t hinge point) will hve similr FrgBg representtions. Future Directions. We hope to improve FrgBg using similrity mesure tht weights the frgments ifferently, n possibly ignores some; BOWs weighting schemes were successfully use in other res of Computer Science. Currently, ech representtion is bse on one librry, n ll its frgments re weighte eqully. Once we hve weighting scheme, we cn combine librries n trust the weights to select ll significnt frgments. We lso pln to construct FrgBg-bse inverte inex for the entire PDB; s note bove, this inex will llow quickly ientifying smll cnite sets of structurl neighbors of protein. The cnite sets will lso inclue structures tht re flexible vrints of the query, potentilly reveling new connections ROC Curve Anlysis with Structurl Alignments Gol Stnr. We use set of 2,928 sequence-iverse CATH v.2.4 omins n their ll-ginst-ll structurl lignments; the set ws constructe for previous comprison stuy of structurl ligners (0). Two 7-resiue long structures (pspa, pspb) were remove becuse they re shorter thn some frgments. All structures were structurlly ligne to ll others by six lignment methos (SSAP, STRUCTAL, DALI, LSQMAN, CE, n SSM) n their SAS scores were recore, where SAS ¼ 00 rms ðlignment lengthþ. Our gol stnr is bse on the best lignment foun by these six methos in terms of the SAS score. The sequences of every pir of structures in this set iffer significntly (FASTA E-vlue >0 4 ). A FrgBg escription of protein is row vector; its length, N, isthe size of the librry use. The vector escribing the ith protein is b i ¼ðb i ðþ; b i ð2þ; ;b i ðnþþ, where b i ðjþ is the number of times frgment j is the best locl pproximtion of segment in the ith protein. We consier three istnce metrics between two vectors, b i n b k : (i) cosine istnce, b T i b k b i b k,(ii) histogrm intersection istnce, N j¼ minfb iðjþ;b k ðjþg minfs i ;s k g, where s i ¼ N j¼ b iðjþ, n (iii) Euclien (L 2 ) istnce, b i b k. Sttisticl Anlysis. We now escribe wht t were use in the sttisticl nlysis, how these t re summrize in mtrix form, n the etils of the sttisticl nlysis. Rw Dt. We use the 8,87 omins in the S35 fmily level in CATH Becuse the clssifiction t the C level is bse on seconry structure, we focus on the CA n CAT levels, n run the tests seprtely on ifferent C clsses. To improve the sttisticl power of the tests, we use only ctegories with t lest 30 structures. When prtitioning the tset to ctegories t the CA level, there re 4 ctegories (with t lest 30 structures) in the minly-α clss (totling 2,077 structures out of 2,078); 9 in the minly-β clss (,968 out of 2,062); n 7 in the mixe α þ β clss (4,507 out of 4,558). There ws only one ctegory in the few-seconry-structure clss, so this clss ws omitte. When prtitioning t the CAT level, there re 2 ctegories in the minly-α clss (totling,03 structures); 3 in the minly-β clss (,396 structures); n 22 in the mixe α þ β clss (2,68 structures). Dt in Mtrix Form. Consier librry of N frgments, n, sy, the CA-level clssifiction of the M ¼ 968 minly-β proteins into Q ¼ 9 ctegories. The FrgBg representtions of thesem proteins is initilly summrize in n M N mtrix B, whose ði;jþ-th entry is b i ðjþ. The mtrix B is prtitione rowwise into Q blocks corresponing to the Q ctegories, n we enote by M q the number of rows of the qth block. When consiering CATH s CAT level, minlyα proteins, or mixe α þ β proteins, M, Q, n B, s well s the prtition of B, chnge ccoringly. Omnibus Test. Following the usul sttisticl prctice, we first run single omnibus test (29) to check whether there is t lest one pir of ctegories whose proteins FrgBg representtions re ifferent from ech other in sttisticlly significnt wy; only fter such ifference is foun, we compre ll possible pirs of ctegories (post hoc nlysis). The t is multivrite, s ech FrgBg vector consists of N observtions, yet it certinly cnnot be ssume to be normlly istribute. Thus, we use nonprmetric permuttion test, pte from (36). We now construct sttistic w tht cptures the overll issimilrity between vectors belonging to ifferent ctegories; lrge vlues of w support rejecting the null hypothesis, ccoring to which the prtition into blocks crries no informtion with respect to the clssifiction. We first stnrize B s columns by iviing ech column by its stnr evition (36). Let B q be the M q N submtrix of (the stnrize) B, constituting the qth block, n let B q be the N-vector whose entries re the mens of the columns of B q.for two istinct blocks, q n r, we efine Δ qr ¼ mx j B q B r j, where the mximum is tken over the N ifferences (in bsolute vlues) between the entries of the two vectors. The omnibus test sttistic is w ¼ mx q r Δ qr. BIOPHYSICS AND COMPUTATIONAL BIOLOGY Buowski-Tl et l. PNAS Februry 23, 200 vol. 07 no

6 Being permuttion test, the omnibus test s p-vlue is PðW wþ, where W is similrly compute score uner rnom permuttion of B s rows. Becuse the number of permuttions is too lrge to enumerte, we resort to estimting the p-vlue in Monte Crlo fshion, by rwing 000 rnom permuttions of B s rows, n observing the proportion of the permuttions chieving sttistic higher thn w. The omnibus test results were ll significnt, for comprisons both t the CA n CAT levels, for ll 24 librries, n for ech of the three CATH clsses (p < 0.00 in ll cses). Post Hoc Anlysis, Bonferroni Correction, n FDR. Once the omnibus test results were foun significnt, we test the t for more stringent lterntive hypothesis, ccoring to which ny two blocks re ifferent from ech other (rther thn testing for the existence of t lest one pir of ifferent blocks, s the omnibus test oes). To o so, we run the bove test seprtely for ech of the K ¼ QðQ Þ 2 pirs of blocks. When compring blocks q n r, the mtrix B is of imension ðm q þ M r Þ N, n s only two blocks re consiere, this comprison s sttistic reuces to w ¼ Δ qr. The result is collection of Kp-vlues, corresponing to the K pirwise comprisons. When using the Bonferroni correction, we eclre s significnt only the comprisons in which the p-vlue is below 0.05 K. When using the FDR pproch, we follow (30). ACKNOWLEDGMENTS. We thnk our nonymous reviewers for their helpful comments. This reserch ws supporte by the Mrie Curie IRG Grnt Kolony R, Petrey D, Honig B (2006) Protein structure comprison: Implictions for the nture of fol spce, n structure n function preiction. Curr Opin Struct Biol 6(3): Wtson JD, Lskowski RA, Thornton JM (2005) Preicting protein function from sequence n structurl t. Curr Opin Struct Biol 5(3): Zhng Y (2008) Progress n chllenges in protein structure preiction. Curr Opin Struct Biol 8(3): Murzin AG, Brenner SE, Hubbr T, Chothi C (995) SCOP: A structurl clssifiction of proteins tbse for the investigtion of sequences n structures. J Mol Biol 247(4): Perl FM, et l. (2003) The CATH tbse: An extene protein fmily resource for structurl n functionl genomics. Nucleic Acis Res 3(): Petrey D, Honig B (2009) Is protein clssifiction necessry? Towr lterntive pproches to function nnottion. Curr Opin Struct Biol 9(3): Subbih S, Lurents DV, Levitt M (993) Structurl similrity of DNA-bining omins of bcteriophge repressors n the globin core. Curr Biol 3(3): Shinylov IN, Bourne PE (998) Protein structure lignment by incrementl combintoril extension (CE) of the optiml pth. Protein Eng (9): Krissinel E, Henrick K (2004) Seconry-structure mtching (SSM), new tool for fst protein structure lignment in three imensions. Act Crystllogr D 60: Kolony R, Koehl P, Levitt M (2005) Comprehensive Evlution of Protein Structure Alignment Methos: Scoring by Geometric Mesures. J Mol Biol 346(4): Li Z, Ye Y, Gozik A (2006) Flexible structurl neighborhoo tbse of protein structurl similrities n lignments. Nucleic Acis Res 34:D277 D Kosloff M, Kolony R (2008) Sequence-similr, structure-issimilr protein pirs in the PDB. Proteins 7(2): Aung Z, Tn KL (2007) Rpi retrievl of protein structures from tbses. Drug Discov Toy 2(7 8): Aung Z, Tn KL (2004) Rpi 3D protein structure tbse serching using informtion retrievl techniques. Bioinformtics 20(7): Crugo O, Pongor S (2002) Protein fol similrity estimte by probbilistic pproch bse on C(lph)-C(lph) istnce comprison. J Mol Biol 35(4): Choi IG, Kwon J, Kim SH (2004) Locl feture frequency profile: A metho to mesure structurl similrity in proteins. Proc Ntl Ac Sci USA 0(): Rögen P, Fin B (2003) Automtic clssifiction of protein structure by using Guss integrls. Proc Ntl Ac Sci USA 00(): Zotenko E, O Lery DP, Przytyck TM (2006) Seconry structure sptil conformtion footprint: A novel metho for fst protein structure comprison n clssifiction. BMC Struct Biol 6:2. 9. Frieberg I, et l. (2007) Using n lignment of frgment strings for compring protein structures. Bioinformtics 23(2):e Tung CH, Hung JW, Yng JM (2007) Kpp-lph plot erive structurl lphbet n BLOSUM-like substitution mtrix for rpi serch of protein structure tbse. Genome Biol 8(3):R3. 2. Mnning CD, Rghvn P, Schütze H (2008) Introuction to Informtion Retrievl (Cmbrige Univ Press, Cmbrige). 22. Puzich J, Buhmnn JM, Rubner Y, Tomsi C (999) Empiricl evlution of issimilrity mesures for color n texture. The Proceeings of the Seventh IEEE Interntionl Conference on Computer Vision, 999 p Fei-Fei L, Pietro P (2005) A Byesin Hierrchicl Moel for Lerning Nturl Scene Ctegories. Proceeings of the 2005 IEEE Computer Society Conference on Computer Vision n Pttern Recognition (CVPR 05) - Volume 2 - Volume 02 (IEEE Computer Society). 24. Melvin I, et l. (2007) SVM-Fol: A tool for iscrimintive multi-clss protein fol n superfmily recognition. BMC Bioinformtics 8(Suppl 4):S Ttusov TA, Men TL (999) BLAST 2 Sequences, new tool for compring protein n nucleotie sequences. FEMS Microbiol Lett 74(2): Tylor WR, Orengo CA (989) Protein structure lignment. J Mol Biol 208(): Holm L, Sner C (993) Protein structure comprison by lignment of istnce mtrices. J Mol Biol 233(): Kleywegt GJ (996) Use of non-crystllogrphic symmetry in protein structure refinement. Act Crystllogr., Sect. D: Biol. Crystllogr. 52: Miller RGJ (98) Simultneous Sttisticl Inference (Springer, New York), 2n eition. 30. Benjmini Y, Hochberg Y (995) Controlling the flse iscovery rte: A prcticl n powerful pproch to multiple testing. J R Sttist Soc B 57(): Kihr D, Skolnick J (2003) The PDB is Covering Set of Smll Protein Structures. J Mol Biol 334(4): Moult J, et l. (2007) Criticl ssessment of methos of protein structure preiction- Roun VII. Proteins 69(Suppl 8): Zhng Y, Arkki AK, Skolnick J (2005) TASSER: An utomte metho for the preiction of protein tertiry structures in CASP6. Proteins: Structure, Function, n Bioinformtics 9999(9999):NA. 34. Frieberg I, Gozik A (2005) Frgnostic: Wlking through protein structure spce. Nucleic Acis Res 33(suppl 2):W Kolony R, Koehl P, Guibs L, Levitt M (2002) Smll librries of protein frgments moel ntive protein structures ccurtely. J Mol Biol 323(2): Goo P (2000) Permuttion Tests (Springer, New York), 2n e Buowski-Tl et l.

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Basic Derivative Properties

Basic Derivative Properties Bsic Derivtive Properties Let s strt this section by remining ourselves tht the erivtive is the slope of function Wht is the slope of constnt function? c FACT 2 Let f () =c, where c is constnt Then f 0

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Predict Global Earth Temperature using Linier Regression

Predict Global Earth Temperature using Linier Regression Predict Globl Erth Temperture using Linier Regression Edwin Swndi Sijbt (23516012) Progrm Studi Mgister Informtik Sekolh Teknik Elektro dn Informtik ITB Jl. Gnesh 10 Bndung 40132, Indonesi 23516012@std.stei.itb.c.id

More information

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the

More information

Conservation Law. Chapter Goal. 6.2 Theory

Conservation Law. Chapter Goal. 6.2 Theory Chpter 6 Conservtion Lw 6.1 Gol Our long term gol is to unerstn how mthemticl moels re erive. Here, we will stuy how certin quntity chnges with time in given region (sptil omin). We then first erive the

More information

VII. The Integral. 50. Area under a Graph. y = f(x)

VII. The Integral. 50. Area under a Graph. y = f(x) VII. The Integrl In this chpter we efine the integrl of function on some intervl [, b]. The most common interprettion of the integrl is in terms of the re uner the grph of the given function, so tht is

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line APPENDIX D Preclculus Review APPENDIX D.1 Rel Numers n the Rel Numer Line Rel Numers n the Rel Numer Line Orer n Inequlities Asolute Vlue n Distnce Rel Numers n the Rel Numer Line Rel numers cn e represente

More information

Vidyalankar S.E. Sem. III [CMPN] Discrete Structures Prelim Question Paper Solution

Vidyalankar S.E. Sem. III [CMPN] Discrete Structures Prelim Question Paper Solution S.E. Sem. III [CMPN] Discrete Structures Prelim Question Pper Solution 1. () (i) Disjoint set wo sets re si to be isjoint if they hve no elements in common. Exmple : A = {0, 4, 7, 9} n B = {3, 17, 15}

More information

4.5 THE FUNDAMENTAL THEOREM OF CALCULUS

4.5 THE FUNDAMENTAL THEOREM OF CALCULUS 4.5 The Funmentl Theorem of Clculus Contemporry Clculus 4.5 THE FUNDAMENTAL THEOREM OF CALCULUS This section contins the most importnt n most use theorem of clculus, THE Funmentl Theorem of Clculus. Discovere

More information

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions Plnning gent: single gent or multi-gent Stte: complete or Incomplete (logicl/probbilistic) stte of the worl n/or gent s stte of knowlege ctions: worl-ltering n/or knowlege-ltering (e.g. sensing) eterministic

More information

1.1 Functions. 0.1 Lines. 1.2 Linear Functions. 1.3 Rates of change. 0.2 Fractions. 0.3 Rules of exponents. 1.4 Applications of Functions to Economics

1.1 Functions. 0.1 Lines. 1.2 Linear Functions. 1.3 Rates of change. 0.2 Fractions. 0.3 Rules of exponents. 1.4 Applications of Functions to Economics 0.1 Lines Definition. Here re two forms of the eqution of line: y = mx + b y = m(x x 0 ) + y 0 ( m = slope, b = y-intercept, (x 0, y 0 ) = some given point ) slope-intercept point-slope There re two importnt

More information

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Section 6.3 The Fundamental Theorem, Part I

Section 6.3 The Fundamental Theorem, Part I Section 6.3 The Funmentl Theorem, Prt I (3//8) Overview: The Funmentl Theorem of Clculus shows tht ifferentition n integrtion re, in sense, inverse opertions. It is presente in two prts. We previewe Prt

More information

C. C^mpenu, K. Slom, S. Yu upper boun of mn. So our result is tight only for incomplete DF's. For restricte vlues of m n n we present exmples of DF's

C. C^mpenu, K. Slom, S. Yu upper boun of mn. So our result is tight only for incomplete DF's. For restricte vlues of m n n we present exmples of DF's Journl of utomt, Lnguges n Combintorics u (v) w, x{y c OttovonGuerickeUniversitt Mgeburg Tight lower boun for the stte complexity of shue of regulr lnguges Cezr C^mpenu, Ki Slom Computing n Informtion

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

N 0 completions on partial matrices

N 0 completions on partial matrices N 0 completions on prtil mtrices C. Jordán C. Mendes Arújo Jun R. Torregros Instituto de Mtemátic Multidisciplinr / Centro de Mtemátic Universidd Politécnic de Vlenci / Universidde do Minho Cmino de Ver

More information

Necessary and sufficient conditions for some two variable orthogonal designs in order 44

Necessary and sufficient conditions for some two variable orthogonal designs in order 44 University of Wollongong Reserch Online Fculty of Informtics - Ppers (Archive) Fculty of Engineering n Informtion Sciences 1998 Necessry n sufficient conitions for some two vrile orthogonl esigns in orer

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

5.4, 6.1, 6.2 Handout. As we ve discussed, the integral is in some way the opposite of taking a derivative. The exact relationship

5.4, 6.1, 6.2 Handout. As we ve discussed, the integral is in some way the opposite of taking a derivative. The exact relationship 5.4, 6.1, 6.2 Hnout As we ve iscusse, the integrl is in some wy the opposite of tking erivtive. The exct reltionship is given by the Funmentl Theorem of Clculus: The Funmentl Theorem of Clculus: If f is

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

If we have a function f(x) which is well-defined for some a x b, its integral over those two values is defined as

If we have a function f(x) which is well-defined for some a x b, its integral over those two values is defined as Y. D. Chong (26) MH28: Complex Methos for the Sciences 2. Integrls If we hve function f(x) which is well-efine for some x, its integrl over those two vlues is efine s N ( ) f(x) = lim x f(x n ) where x

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

f a L Most reasonable functions are continuous, as seen in the following theorem:

f a L Most reasonable functions are continuous, as seen in the following theorem: Limits Suppose f : R R. To sy lim f(x) = L x mens tht s x gets closer n closer to, then f(x) gets closer n closer to L. This suggests tht the grph of f looks like one of the following three pictures: f

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Optimal estimation of a physical observable s expectation value for pure states

Optimal estimation of a physical observable s expectation value for pure states Optiml estimtion of physicl observble s expecttion vlue for pure sttes A. Hyshi, M. Horibe, n T. Hshimoto Deprtment of Applie Physics, University of Fukui, Fukui 90-8507, Jpn Receive 5 December 2005; publishe

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Overview of Calculus

Overview of Calculus Overview of Clculus June 6, 2016 1 Limits Clculus begins with the notion of limit. In symbols, lim f(x) = L x c In wors, however close you emn tht the function f evlute t x, f(x), to be to the limit L

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

SUPPLEMENTARY NOTES ON THE CONNECTION FORMULAE FOR THE SEMICLASSICAL APPROXIMATION

SUPPLEMENTARY NOTES ON THE CONNECTION FORMULAE FOR THE SEMICLASSICAL APPROXIMATION Physics 8.06 Apr, 2008 SUPPLEMENTARY NOTES ON THE CONNECTION FORMULAE FOR THE SEMICLASSICAL APPROXIMATION c R. L. Jffe 2002 The WKB connection formuls llow one to continue semiclssicl solutions from n

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Multivariate problems and matrix algebra

Multivariate problems and matrix algebra University of Ferrr Stefno Bonnini Multivrite problems nd mtrix lgebr Multivrite problems Multivrite sttisticl nlysis dels with dt contining observtions on two or more chrcteristics (vribles) ech mesured

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

5.1 How do we Measure Distance Traveled given Velocity? Student Notes

5.1 How do we Measure Distance Traveled given Velocity? Student Notes . How do we Mesure Distnce Trveled given Velocity? Student Notes EX ) The tle contins velocities of moving cr in ft/sec for time t in seconds: time (sec) 3 velocity (ft/sec) 3 A) Lel the x-xis & y-xis

More information

Lecture INF4350 October 12008

Lecture INF4350 October 12008 Biosttistics ti ti Lecture INF4350 October 12008 Anj Bråthen Kristoffersen Biomedicl Reserch Group Deprtment of informtics, UiO Gol Presenttion of dt descriptive tbles nd grphs Sensitivity, specificity,

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n ) Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8

More information

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system. Section 24 Nonsingulr Liner Systems Here we study squre liner systems nd properties of their coefficient mtrices s they relte to the solution set of the liner system Let A be n n Then we know from previous

More information

The Properties of Stars

The Properties of Stars 10/11/010 The Properties of Strs sses Using Newton s Lw of Grvity to Determine the ss of Celestil ody ny two prticles in the universe ttrct ech other with force tht is directly proportionl to the product

More information

1 The Riemann Integral

1 The Riemann Integral The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Classification Part 4. Model Evaluation

Classification Part 4. Model Evaluation Clssifiction Prt 4 Dr. Snjy Rnk Professor Computer nd Informtion Science nd Engineering University of Florid, Ginesville Model Evlution Metrics for Performnce Evlution How to evlute the performnce of model

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Testing categorized bivariate normality with two-stage. polychoric correlation estimates

Testing categorized bivariate normality with two-stage. polychoric correlation estimates Testing ctegorized bivrite normlity with two-stge polychoric correltion estimtes Albert Mydeu-Olivres Dept. of Psychology University of Brcelon Address correspondence to: Albert Mydeu-Olivres. Fculty of

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

Czechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction

Czechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction Czechoslovk Mthemticl Journl, 55 (130) (2005), 933 940 ESTIMATES OF THE REMAINDER IN TAYLOR S THEOREM USING THE HENSTOCK-KURZWEIL INTEGRAL, Abbotsford (Received Jnury 22, 2003) Abstrct. When rel-vlued

More information

Comparison Procedures

Comparison Procedures Comprison Procedures Single Fctor, Between-Subects Cse /8/ Comprison Procedures, One-Fctor ANOVA, Between Subects Two Comprison Strtegies post hoc (fter-the-fct) pproch You re interested in discovering

More information

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018 Physics 201 Lb 3: Mesurement of Erth s locl grvittionl field I Dt Acquisition nd Preliminry Anlysis Dr. Timothy C. Blck Summer I, 2018 Theoreticl Discussion Grvity is one of the four known fundmentl forces.

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014 SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 014 Mrk Scheme: Ech prt of Question 1 is worth four mrks which re wrded solely for the correct nswer.

More information

Matrix & Vector Basic Linear Algebra & Calculus

Matrix & Vector Basic Linear Algebra & Calculus Mtrix & Vector Bsic Liner lgebr & lculus Wht is mtrix? rectngulr rry of numbers (we will concentrte on rel numbers). nxm mtrix hs n rows n m columns M x4 M M M M M M M M M M M M 4 4 4 First row Secon row

More information

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications Mth 1c TA: Pdric Brtlett Recittion 3: Applictions of the Derivtive Week 3 Cltech 013 1 Higher-Order Derivtives nd their Applictions Another thing we could wnt to do with the derivtive, motivted by wht

More information

School of Business. Blank Page

School of Business. Blank Page Integrl Clculus This unit is esigne to introuce the lerners to the sic concepts ssocite with Integrl Clculus. Integrl clculus cn e clssifie n iscusse into two thres. One is Inefinite Integrl n the other

More information

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year 1/1/21. Fill in the circles in the picture t right with the digits 1-8, one digit in ech circle with no digit repeted, so tht no two circles tht re connected by line segment contin consecutive digits.

More information

Applicable Analysis and Discrete Mathematics available online at

Applicable Analysis and Discrete Mathematics available online at Applicble Anlysis nd Discrete Mthemtics vilble online t http://pefmth.etf.rs Appl. Anl. Discrete Mth. 4 (2010), 23 31. doi:10.2298/aadm100201012k NUMERICAL ANALYSIS MEETS NUMBER THEORY: USING ROOTFINDING

More information

Lecture 20: Numerical Integration III

Lecture 20: Numerical Integration III cs4: introduction to numericl nlysis /8/0 Lecture 0: Numericl Integrtion III Instructor: Professor Amos Ron Scribes: Mrk Cowlishw, Yunpeng Li, Nthnel Fillmore For the lst few lectures we hve discussed

More information

Math 211A Homework. Edward Burkard. = tan (2x + z)

Math 211A Homework. Edward Burkard. = tan (2x + z) Mth A Homework Ewr Burkr Eercises 5-C Eercise 8 Show tht the utonomous system: 5 Plne Autonomous Systems = e sin 3y + sin cos + e z, y = sin ( + 3y, z = tn ( + z hs n unstble criticl point t = y = z =

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function?

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function? Mth 125 Summry Here re some thoughts I ws hving while considering wht to put on the first midterm. The core of your studying should be the ssigned homework problems: mke sure you relly understnd those

More information

The Fundamental Theorem of Calculus Part 2, The Evaluation Part

The Fundamental Theorem of Calculus Part 2, The Evaluation Part AP Clculus AB 6.4 Funmentl Theorem of Clculus The Funmentl Theorem of Clculus hs two prts. These two prts tie together the concept of integrtion n ifferentition n is regre by some to by the most importnt

More information

lim P(t a,b) = Differentiate (1) and use the definition of the probability current, j = i (

lim P(t a,b) = Differentiate (1) and use the definition of the probability current, j = i ( PHYS851 Quntum Mechnics I, Fll 2009 HOMEWORK ASSIGNMENT 7 1. The continuity eqution: The probbility tht prticle of mss m lies on the intervl [,b] t time t is Pt,b b x ψx,t 2 1 Differentite 1 n use the

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

COUNTS OF FAILURE STRINGS IN CERTAIN BERNOULLI SEQUENCES

COUNTS OF FAILURE STRINGS IN CERTAIN BERNOULLI SEQUENCES COUNTS OF FAILURE STRINGS IN CERTAIN BERNOULLI SEQUENCES LARS HOLST Deprtment of Mthemtics, Royl Institute of Technology SE 100 44 Stockholm, Sween E-mil: lholst@mth.kth.se October 6, 2006 Abstrct In sequence

More information

Math 131. Numerical Integration Larson Section 4.6

Math 131. Numerical Integration Larson Section 4.6 Mth. Numericl Integrtion Lrson Section. This section looks t couple of methods for pproimting definite integrls numericlly. The gol is to get good pproimtion of the definite integrl in problems where n

More information

A New Grey-rough Set Model Based on Interval-Valued Grey Sets

A New Grey-rough Set Model Based on Interval-Valued Grey Sets Proceedings of the 009 IEEE Interntionl Conference on Systems Mn nd Cybernetics Sn ntonio TX US - October 009 New Grey-rough Set Model sed on Intervl-Vlued Grey Sets Wu Shunxing Deprtment of utomtion Ximen

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Math 142: Final Exam Formulas to Know

Math 142: Final Exam Formulas to Know Mth 4: Finl Exm Formuls to Know This ocument tells you every formul/strtegy tht you shoul know in orer to o well on your finl. Stuy it well! The helpful rules/formuls from the vrious review sheets my be

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

a a a a a a a a a a a a a a a a a a a a a a a a In this section, we introduce a general formula for computing determinants.

a a a a a a a a a a a a a a a a a a a a a a a a In this section, we introduce a general formula for computing determinants. Section 9 The Lplce Expnsion In the lst section, we defined the determinnt of (3 3) mtrix A 12 to be 22 12 21 22 2231 22 12 21. In this section, we introduce generl formul for computing determinnts. Rewriting

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Week 10: Line Integrals

Week 10: Line Integrals Week 10: Line Integrls Introduction In this finl week we return to prmetrised curves nd consider integrtion long such curves. We lredy sw this in Week 2 when we integrted long curve to find its length.

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Materials Analysis MATSCI 162/172 Laboratory Exercise No. 1 Crystal Structure Determination Pattern Indexing

Materials Analysis MATSCI 162/172 Laboratory Exercise No. 1 Crystal Structure Determination Pattern Indexing Mterils Anlysis MATSCI 16/17 Lbortory Exercise No. 1 Crystl Structure Determintion Pttern Inexing Objectives: To inex the x-ry iffrction pttern, ientify the Brvis lttice, n clculte the precise lttice prmeters.

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

Mass Creation from Extra Dimensions

Mass Creation from Extra Dimensions Journl of oern Physics, 04, 5, 477-48 Publishe Online April 04 in SciRes. http://www.scirp.org/journl/jmp http://x.oi.org/0.436/jmp.04.56058 ss Cretion from Extr Dimensions Do Vong Duc, Nguyen ong Gio

More information

Homework Assignment 5 Solution Set

Homework Assignment 5 Solution Set Homework Assignment 5 Solution Set PHYCS 44 3 Februry, 4 Problem Griffiths 3.8 The first imge chrge gurntees potentil of zero on the surfce. The secon imge chrge won t chnge the contribution to the potentil

More information

( x) ( ) takes at the right end of each interval to approximate its value on that

( x) ( ) takes at the right end of each interval to approximate its value on that III. INTEGRATION Economists seem much more intereste in mrginl effects n ifferentition thn in integrtion. Integrtion is importnt for fining the expecte vlue n vrince of rnom vriles, which is use in econometrics

More information