FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately

 Sharyl Underwood
 5 months ago
 Views:
Transcription
1 FrgBg, n ccurte representtion of protein structure, retrieves structurl neighbors from the entire PDB quickly n ccurtely Inbl BuowskiTl, Yuvl Nov b, n Rchel Kolony, Deprtment of Computer Science n b Deprtment of Sttistics, University of Hif, Mount Crmel, Hif 3905, Isrel Communicte by Michel Levitt, Stnfor University School of Meicine, Stnfor, CA, December, 2009 (receive for review November 3, 2009) Fst ientifiction of protein structures tht re similr to specifie query structure in the entire Protein Dt Bnk (PDB) is funmentl in structure n function preiction. We present FrgBg: An ultrfst n ccurte metho for compring protein structures. We escribe protein structure by the collection of its overlpping short contiguous bckbone segments, n iscretize this set using librry of frgments. Then, we succinctly represent the protein s bgsoffrgments vector tht counts the number of occurrences of ech frgment n mesure the similrity between two structures by the similrity between their vectors. Our representtion hs two itionl benefits: (i) it cn be use to construct n inverte inex, for implementing fst structurl serch engine of the entire PDB, n (ii) one cn specify structure s collection of substructures, without combining them into single structure; this is vluble for structure preiction, when there re relible preictions only of prts of the protein. We use receiver operting chrcteristic curve nlysis to quntify the success of FrgBg in ientifying neighbor cnite sets in tset of over 2,900 structures. The gol stnr is the set of neighbors foun by six stte of the rt structurl ligners. Our best FrgBg librry fins more ccurte cnite sets thn the three other filter methos: The SGM, PRIDE, n metho by Zotenko et l. More interestingly, FrgBg performs on pr with the computtionlly expensive, yet highly truste structurl ligners STRUCTAL n CE. evlution of structure serch fst structurl serch of Protein Dt Bnk filter n refine protein bckbone frgments protein structure serch Fining structurl neighbors of protein, i.e., proteins tht shre with it sizble substructure, is n importnt yet persistently ifficult, tsk. Such structurl neighbors my hint t protein s function or evolutionry origin even without etectble sequence similrity, s structure is more conserve thn sequence. Inee, mny methos for protein function (, 2) n structure (3) preiction rely on fining such neighbors. Structurl clssifictions such s SCOP (4) n CATH (5) ientify some neighbors; however, there re mny other neighbors, which lthough clssifie ifferently, re ctully structurlly similr n importnt for function n structure preiction (, 6). Devising fst, ccurte, yet comprehensive, structurl serch tools for the rpily growing Protein Dt Bnk (PDB) remins n importnt chllenge. Structurl lignment quntifies the similrity between two protein structures, by ientifying two eqully size, geometriclly similr substructures. Mny structurl lignment methos hve been propose over the pst twenty yers [e.g., STRUCTAL (7), CE (8), n SSM (9)]. Regrless of the wy they quntify similrity n their serch heuristics, one cn efine common similrity score to ssess the resulting lignments, n crete bestofll metho tht gives the best lignment uner tht score (0). Unfortuntely, ligning two structures is n expensive computtion (0); thus, mny structurl lignment servers consier only representtive subset of the Protein Dt Bnk (PDB) (e.g. FATCAT () n CE). However, by using such sequencenonreunnt representtive sets we risk excluing interesting structurl vribility (2). In ny cse, nively structurlly ligning query ginst the entire PDB, or structurlly ligning ll PDB structures ginst one nother, is prohibitively expensive, s it requires O(n) or Oðn 2 Þ computtionlly intensive comprisons. To serch the entire PDB efficiently, reserchers evise the filter n refine prigm (3). A filter metho quickly sifts through lrge set of structures, n selects smll cnite set to be structurlly ligne by more ccurte, but computtionlly expensive, metho. Filter methos gin their spee by representing structures bstrctly typiclly s vectors n compring these representtions quickly. Such vector representtions llow constructing n inverte inex t structure tht enbles fst retrievl of neighbors, even in huge tsets [e.g. (4)]. PRIDE represents structure by the histogrms of igonls in its internl istnce mtrix, n mesures similrity between two structures by the similrity between their histogrms (5). Choi et l. (6) represent structure by vector of frequencies of locl fetures in its internl istnce mtrix, n mesure similrity between two structures by the istnce between their corresponing vectors. Inspire by knot theory, Rögen n Fin evise the Scle Guss Metric (SGM) metho, which represents structure by vector of 30 globl topologicl mesures of its bckbone (7). Zotenko et l. (8) represent protein structure by vector of the frequencies of ptterns of seconry structure element (SSE) triplets. Severl methos [e.g., (9), (20)] represent structure s n orere string of structurl frgments, n sequencelign these strings to mesure structurl similrity; such representtions re less suitble for constructing n inverte inex. To serch in very lrge tsets, computer scientists often represent objects s bgofwors (BOW) unorere collections of locl fetures. Web serch engines use n inverte inex of the BOW representtion of the web. Ech ocument n query is represente by the number of occurrences of its wors (2). In Computer Vision, texture imges re represente s BOW of locl imge fetures for texture recognition, object clssifiction, n imge n vieo retrievl [e.g., (22 23)]. In protein sequence nlysis, Leslie n coworkers escribe protein sequences s BOW of their kmers for etecting remote homology (24). In FrgBg we represent protein structure s BOW of bckbone frgments, n use this representtion to ientify quickly goo cnite sets of structurl neighbors. Specificlly, we represent structure by vector whose entries count the number of times ech frgment pproximtes segment in the protein bckbone (Fig. ), n mesure similrity between two structures by the similrity between their corresponing vectors. In testing our pproch, we consier 24 librries of vrious sizes n Author contributions: I.B.T., Y.N., n R.K. performe reserch n nlyze t; n Y.N. n R.K. esigne reserch n wrote the pper. The uthors eclre no conflict of interest. To whom corresponence shoul be resse. Emil: This rticle contins supporting informtion online t /DCSupplementl. BIOPHYSICS AND COMPUTATIONAL BIOLOGY PNAS Februry 23, 200 vol. 07 no
2 frgment lengths, n use three BOW similrity mesures (Euclien, Cosine, n Histogrm Intersection). We stuy how well ifferent mesures ientify structurl neighbors, reltive to stringent gol stnr: The structurl neighbors foun by bestofsix structurl lignment metho (0). We lso test sttisticlly whether BOW representtions of structures within CATH ctegories re inee similr to ech other. We then compre the performnce of our mesure with tht of other filter methos SGM, Zotenko et l. (8), n PRIDE, to BLAST sequence lignment (25), n to the structurl lignment methos STRUC TAL, CE, n SSM. Our best filter metho outperforms other filter methos n BLAST. More importntly, it performs on pr with the computtionlly expensive structurl ligners STRUCTAL n CE. In our tests, the rnking of the methos is insensitive to the threshol vlue efining structurl neighbors. Compring FrgBg vectors is orers of mgnitues fster thn structurl lignment; it lso is well suite for using in inverte inices. Thus, our metho cn be use to quickly ientify goo cnite sets of structurl neighbors in the entire PDB. Results In FrgBg, the bgoffrgments tht represents protein structure is succinctly escribe by vector of length N, the size of the frgment librry. Fig. illustrtes how this vector is clculte from the αcrbon coorintes of protein. For ech contiguous (n overlpping) kresiue segment long the protein bckbone, we ientify the librry frgment of length k tht fits it best in terms of rms fter optiml superposition. Then, we count the number of times ech librry frgment ws use, n escribe the protein by vector of these counts. Compring Filter Methos vi Receiver Operting Chrcteristic (ROC) Curve Anlysis. We mesure the ccurcy of structurl retrievl methos by how well they ientify cnite sets of structurl neighbors in tbse, given query structure. We consier tbse of 2,928 sequencenonreunnt structures, n query it by ech of its structures. The golstnr nswer inclues neighbors foun by bestofsix structurl ligner (using SSAP (26), STRUCTAL, DALI (27), LSQMAN (28), CE, n SSM); this expensive computtion ws one previously (0). Structurl neighbors of the query re ones ligne to it with structurl lignment score [SAS; see Methos for efinition (7)] below threshol T (for T ¼ 2 Å, 3.5 Å, n 5 Å). We use the re uner the curve (AUC) of the ROC curves to mesure how well ech metho ientifies the structurl neighbors of query, n verge the AUCs over ll 2,928 queries. A higher AUC is better: A perfect imittor of the gol stnr, which rnks the structurl neighbors before ll other proteins, will hve n AUC of, wheres rnom mesure will hve n AUC of 0.5. Fig. 2 (Left) shows the verge AUCs with respect to the three gol stnrs efine bove (corresponing to the three vlues of T) for 24 librries (with frgments of length 5 2 I b c e Librry of Frgments f II f e III f f f e IV Number of Occurnces b c e f Frgment ID Fig.. FrgBg representtion of protein structure. (A) For illustrtion, we consier librry of 6 frgments. (B) Ech (overlpping) contiguous segment in the bckbone is ssocite with its most similr librry frgment. (C) All frgments re collecte to n unorere bg. (D) The structure is represente by vector v, whose entries count the number of times ech librry frgment ppere in the bg. In this exmple, v ¼ð4; 0; 0; 5; ; 3Þ, implying frgment orer (, b, c,, e, f). resiues). We consier three BOW similrity mesures: Cosine, Histogrm Intersection, n Euclien istnces. For comprison, the right pnel shows the verge AUCs of other methos: (i) sequencebse metho using BLAST s Evlue (25), (ii) the filter methos PRIDE, SGM, n tht of Zotenko et l. (8), n (iii) the structure ligners STRUCTAL, CE, n SSM. We sort the lignments by their SAS scores, n for STRUCTAL n Averge AUC of ROC Curve Averge AUC of ROC Curve Averge AUC of ROC Curve Frgment length (number of resiues) Frgment length (number of resiues) Frgment length (number of resiues) SAS Threshol = 5A cosine istnce histogrm Intersection istnce Eucliin istnce histogrm Intersection istnce cosine istnce histogrm Intersection istnce SAS Threshol = 3.5A cosine istnce Eucliin istnce SAS Threshol = 2A Eucliin istnce Librry Size SSM STRUCTAL FrgBg CE SGM Zutenko et l. PRIDE sequence lignment SSM STRUCTAL CE (N) STRUCTAL (N) FrgBg CE SGM Zutenko et l. sequence lignment PRIDE SSM CE(N), STRUCTAL FrgBg STRUCTAL (N) SGM CE Zutenko et l. sequence lignment PRIDE Fig. 2. The verge AUC of ROC curves of ientifying structurl neighbors. The AUC mesures how well rnking of structures imittes the rnking ccoring to gol stnr; lrger vlues correspon to more successful imittors, rnging from 0.5 ( rnom rnker) to ( perfect imittor). We consier three efinitions of structurl neighbors, using SAS threshols of 2 Å, 3.5 Å, n 5 Å. The left pnels show the performnce of librries with frgments of 6 2 resiues, n ifferent number of frgments (vlue long the xxis), n using the cosine (plus sign), Euclien (circles), n Histogrm Intersection (imons) istnces. On the right we compre the best FrgBg result (400() librry n the cosine istnce) to other methos: Sequencebse similrity mesure in finely she blck, filter methos in she blck, n structure lignment methos in soli blck. We see tht the FrgBg performs similrly to CE n STRUCTAL two computtionlly expensive n welltruste structurl ligners BuowskiTl et l.
3 CE, lso by their ntive scores. The verge AUCs plotte in Fig. 2 re lso vilble s supplementry mteril (Tble S). The rnking of the performnce of ifferent methos is generlly inepenent of T, the threshol efining structurl neighbors. Uner the strictest efinition, the neighbors of protein re only structures tht were ligne with n SAS score <2 Å; uner the most lx efinition, the neighbors re the structures tht were ligne with n SAS score <5 Å. As expecte, ll methos perform better (i.e., chieve higher verge AUCs) when the efinition of structurl neighbors is more strict, n less well when the efinition inclues more istnt structures. Becuse structures with SAS score <5 Å re often the most meningful neighbors, which re ifficult to etect ccurtely, we re most intereste in ientifying methos tht excel uner this stringent test (see Discussion). FrgBg s best result is with the 400() librry (400 frgments of length ) n the cosine istnce; the verge AUCs re 0.89 (T ¼ 2 Å), 0.77 (T ¼ 3.5 Å), n 0.75 (T ¼ 5 Å). The cosine istnce performs best. When using cosine istnce or the histogrm intersection istnce, lrger librries (which escribe locl structure vribility in higher etil), typiclly perform better. From compring librries of fixe size (00, 200, or 400 frgments), we lern tht when using cosine istnce, librries of longer frgments perform better; when using the histogrm intersection or the Euclien istnces, the length of the frgment oes not influence the results. The rnking of the filter methos from most to lest successful is (i) FrgBg (using the 400() librry n cosine istnce), (ii) SGM, (iii) Zotenko et l. s metho (8), n (iv) PRIDE, which performs similrly to the sequencebse metho. Among the structurl ligners, the most successful is SSM, followe by STRUCTAL n CE. Fig. 2 n Tble show tht the best filter metho, i.e., the FrgBg representtion mentione bove, performs on pr with CE n STRUCTAL, two computtionlly expensive n highly truste structurl ligners. For exmple, using the T ¼ 5 Å threshol, FrgBg hs n verge AUC of 0.75, which is similr to CE s 0.74 using the ntive score, n 0.75 using SAS score; it is lower thn STRUCTAL s verge AUC of 0.83 using the ntive score n 0.84 using SAS score. Using the T ¼ 3.5 Å threshol, FrgBg hs n verge AUC of 0.77, which is similr to STRUC TAL s 0.77 using its ntive score n to CE s 0.72 using SAS score. SSM outperforms FrgBg t ll three threshols. Ctegories of CATH Proteins hve BOW Descriptions tht Are Different from Ech Other In Sttisticlly Significnt Wy. Next, we test sttisticlly whether FrgBg grees with the CATH clssifiction, t the Clss Architecture (CA) n the Clss, Architecture, n Topology (CAT) levels. In bro strokes, we expect FrgBg Tble. AUCs of ROC Curves Using BestofSix Gol Stnr representtions of proteins in the sme ctegory to be more similr thn those of proteins in ifferent ctegories, n wish to verify tht such similrity, if foun, is not ue to mere chnce. When running mny sttisticl tests, s we o here (one for ech pir of ctegories), the chnces of getting t lest one significnt result is much higher thn the chnces of getting significnt result in single test, even when there is no true unerlying ifference (the multiple comprisons problem ). We control for this problem through the Bonferroni correction (29), n the flse iscovery rte (FDR) pproch (30). When using the Bonferroni correction, we just the require significnce level of ech test, so tht the probbility of flsely eclring t lest one test s significnt is the stnr α ¼ Tble 2 summrizes the results uner the Bonferroni correction, cross the 24 librries. For exmple, there re 2 minlyα CATH ctegories t the CAT level, n thus 2 2 ¼ 66 ctegory pirs; out of the 66 corresponing tests, 6 were significnt t the juste significnce level cross ll 24 librries, hence the frction 6 66 t the tble s first cell in the secon row. The prenthesize figures in the tble re the frction of significnt tests for the 400() librry. The iniviul test results re vilble in Dtsets S n S2. Using the FDR pproch, we fin which tests cn be eclre significnt, while controlling the verge frction of the wrongly eclre tests t some prechosen level; for etils, see ref. 30. Tble 2 lists the frction of tests eclre significnt, verge cross the 24 librries n uner n FDR of 0.05; the prenthesize figures re the frction of the tests eclre significnt for the 400() librry. The frctions reporte in Tble 2, ll being very close to, strongly support the conclusion tht the FrgBg representtion grees with the CATH clssifiction, both t the CA n the CAT level. Comprison of FrgBg Similrity Mesure to rms on Dtset of Structure Pirs Within Nucler Mgnetic Resonnce (NMR) Ensembles. Next, we stuy how well FrgBg ientifies similrity between structures tht re only loclly similr, i.e., hve highly similr substructures tht re connecte ifferently. The bility to ientify such locl similrity my help in etecting similrity to prtilly chrcterize structure, s neee in structure preiction. We consier pirs of structures within n NMR ensemble collection of structures tht re consistent with the experimentl constrints; these typiclly iffer only t severl flexible points long the bckbone, n re thus loclly similr. Our similrity criterion is efine by the threshol 0.35 the verge FrgBg cosine istnce between structures with the sme CAT clssifiction (see Tble 3). We use throughout this section the 400() librry. SAS Similrity Threshol (Å) Metho Averge vlue Rnk Spee * SSM using SAS score [2] Structl using SAS score [2] Structl using ntive score [2] CE using ntive score [2] FrgBg Cos istnce (400,) Fst CE using SAS score [2] FrgBg histogrm intersection (600,) Fst SGM Fst FrgBg Euclien istnce (40,6) Fst Zotenko et l. (8) Fst Sequence mtching by BLAST evlue Fst PRIDE Fst *Averge CPU minutes per query. Essentilly instntneous fter preprocessing (<0. s). BIOPHYSICS AND COMPUTATIONAL BIOLOGY BuowskiTl et l. PNAS Februry 23, 200 vol. 07 no
4 Tble 2. Sttisticl nlysis summry Anlysis using Bonferroni correction *, Anlysis using FDR, Minly α Minly β Mixe α þ β Minly α Minly β Mixe α þ β CA 6 6 (6 6) 3 36 (35 36) 2 2 (2 2) 6 6 (6 6) (36 36) 2 2 (2 2) CAT 6 66 (65 66) (78 78) (225 23) (65 66) (78 78) (23 23) *Nonprenthesize vlues re the number of ctegory pirs foun ifferent (t the Bonferronijuste significnt level) in ll 24 librries, ivie by the totl number of pirs; prenthesize vlues re the frction of significnt comprisons for the librry of 400 frgments of length. In ll cses, vlues close to re esirble. Nonprenthesize vlues re the frction of comprisons eclre significnt in the FDR nlysis, verge cross the 24 librries; prenthesize figures re the frction of the comprisons eclre significnt for the librry of 400 frgments of length. We use the tset of 230 NMR ensembles tht ws constructe in the PRIDE stuy (5). The set inclues 54,465 pirs, 43,246 of which hve rms 4 Å (bqv, bmy, e0, n lx were replce by their newer versions). Fig. 3 shows the FrgBg cosine istnce vs. rms, n the mrginl istributions of the two istnces. The vst mjority of pirs re ientifie s very similr by FrgBg: 9% hve cosine istnce below 0.35, showing tht FrgBg inee ientifies similrity between loclly similr structures. We see similr results using Histogrm intersection n Euclien istnce (Fig. S). For comprison, Tble 3 lists the mens n stnr evitions of the FrgBg istnces between structure pirs t ifferent levels of structurl similrity. The most similr pirs re those within NMR ensembles: We consier ll pirs n only those with rms 4 Å. We lso consier pirs in the set of 2,928 CATH omins tht hve the sme CATH, CAT, CA, n C clssifiction, s well s ll pirs. As expecte, the verge istnce grows s the sets become more structurlly iverse, uner ll three istnces. Discussion A Fst Filter tht Performs On Pr with Structurl Alignment Methos. FrgBg cn quickly ientify structurl neighbor cnites for structure query, n performs s well s some highly truste, computtionlly expensive structurl lignment methos. Our results with FrgBg s 400() librry re s ccurte s CE s, n lmost s ccurte s STRUCTAL s. This is impressive, s CE n STRUCTAL re mong the most ccurte structurl lignment methos (0). As expecte, structurl neighbors re generlly ientifie best by structurl ligners, less well by filters, n lest well by sequence lignment. Our results re robust we see similr rnking of methos using ifferent efinitions for structurl neighbors of protein. An ttrctive feture of bstrct representtions of protein structure such s FrgBg (n other filter methos) is tht one cn store the vectors representing ll PDB proteins in n inverte inex t structure esigne for fst retrievl of neighbors. Evlution Protocol. Retrieving structurl neighbors of query protein from the entire PDB is chllenge. We cnnot euce the structurl neighbors solely from SCOP or CATH, becuse crossfol similrities proteins tht re geometriclly similr, yet clssifie ifferently re very common (0, 3). Kihr n Skolnick (3) note tht crossfol similrities mong smll proteins (<00 resiues), re bunnt even t CATH s C level. Crossfol similrities re prticulrly importnt to ientify, when preicting structure n function (, 6). Recent Criticl Assessment of Techniques for Protein Structure Preiction (CASP) experiments show tht the most successful structure preiction methos construct their preictions from substructures tht re not, in generl, from proteins clssifie in the sme fol [e.g. (3, 32, 33)]. Frieberg n Gozik (34) showe tht crossfol similrities correlte well with the functionl similrity of proteins populting the fols, n Petrey n Honig (6) showe exmples of functionl similrity mong ifferently clssifie proteins. Nevertheless, proteins clssifie in the sme CATH ctegory (t the CA or CAT levels) re truly similr, n we expect their FrgBg representtions to be similr s well. Becuse there re mny CA n CAT ctegories, ech populte with mny proteins, we use sttisticl theory to test n confirm this. Any gol stnr must meet the chllenge: Becuse filter metho nees to ientify structurl neighbors of query, the gol stnr must be ll ientifie neighbors. Here, we fin these neighbors using the expensive computtion of bestofsix structurl ligner. Nmely, we ientify structure s neighbor if ny of the six methos fins in both sizble substructure tht cn be superimpose with low rms. Such neighbor is selecte regrless of its CATH clssifiction, n coul well belong to ctegory other thn tht of the query protein. H we relie on clssifiction, similr structures woul hve been mrke s nonneighbors, n the ROC curve nlysis woul hve effectively penlize filter methos tht correctly ientify them. Becuse structurl similrity tht is ue to sequence similrity is esy to ientify, we use tsets of nonreunnt sequences. This ensures tht we hve eliminte trivil pirs in our evlution protocol. Note tht when the structurl neighbors re efine with the T ¼ 2 Å, 3.5 Å threshols, sequence lignment oes better thn rnom (AUC > 0.5); inee, when there re only few structurl neighbors (e.g., only the query), even trivil sequence lignment metho will perform well, becuse it rnks the query s most similr to itself. This phenomenon will hve greter impct on the verge AUC when the number of structurl neighbors is smll (lower T). Thus, the verge AUC of the Tble 3. Averge FrgBg istnces in tsets of vrying structurl similrity Dtset Histogrm intersection istnce* Euclien (L 2 norm) istnce* Cosine istnce* Within NMR ensemble (rms 4 Å) Within NMR ensemble Sme CATH clssifiction Sme CAT clssifiction Sme CA clssifiction Sme C clssifiction Different C clssifiction *Using the librry of 400 frgments of length BuowskiTl et l.
5 in protein structure spce. Finlly, this inex my be use to nswer prtilly chrcterize queries, to the benefit of the structure preiction community. Methos We consier 24 librries with frgments of 5 2 resiues, constructe in previous stuy (35). There, we clustere the frgments of the Cα trces of 200 ccurtely etermine structures, n forme librry by tking representtive from ech cluster. Fig. 3. Cosine intersection istnce vs. rms in structure pirs within NMR ensembles. The tset hs 230 NMR ensembles with 43,246 pirs hving rms 4 Å (5). FrgBg ientifies the vst mjority (9%) of the pirs in this set s very similr (cosine istnce below 0.35 the verge istnce in pirs of the sme CAT clssifiction, mrke in she pink). sequence lignment metho cts s lower boun, inicting the ifficulty of the tsk. Although not the min focus of this stuy, our results provie nother evlution of the performnce of STRUCTAL, CE, n SSM, n show tht SSM is the top performer. SSM compres structures in two stges. (i) A fst estimtion of structurl similrity by mtching the SSE grphs of the structures. This step clcultes two estimtes of the percent SSE mtch, n the user cn specify their llowe mximl vlues; the SSM server oes not provie combine vlue of these estimtes tht cn be use to rnk structure pirs. (ii) An expensive n ccurte lignment of the Cαs of the two structures. Here, we use the SAS of the lignments foun by SSM fter the secon stge. Serching with Prtilly Define Queries for Protein Structure Preiction. Importntly, FrgBg cn serch the entire PDB for neighbors of query structure tht is only prtilly chrcterize. Such serch is useful for structure preiction, s preiction methos often preict only the structure of prts of protein, n fining composition of these prts in the PDB my hint t how these prts shoul be combine into complete structure. In FrgBg, the missing informtion hs minor impct. The union of the bgsoffrgments of the prts iffers from the true bgoffrgments only by the few frgments in the connecting regions. Similrly, two structures tht re flexible vrints (i.e., iffer only t hinge point) will hve similr FrgBg representtions. Future Directions. We hope to improve FrgBg using similrity mesure tht weights the frgments ifferently, n possibly ignores some; BOWs weighting schemes were successfully use in other res of Computer Science. Currently, ech representtion is bse on one librry, n ll its frgments re weighte eqully. Once we hve weighting scheme, we cn combine librries n trust the weights to select ll significnt frgments. We lso pln to construct FrgBgbse inverte inex for the entire PDB; s note bove, this inex will llow quickly ientifying smll cnite sets of structurl neighbors of protein. The cnite sets will lso inclue structures tht re flexible vrints of the query, potentilly reveling new connections ROC Curve Anlysis with Structurl Alignments Gol Stnr. We use set of 2,928 sequenceiverse CATH v.2.4 omins n their llginstll structurl lignments; the set ws constructe for previous comprison stuy of structurl ligners (0). Two 7resiue long structures (pspa, pspb) were remove becuse they re shorter thn some frgments. All structures were structurlly ligne to ll others by six lignment methos (SSAP, STRUCTAL, DALI, LSQMAN, CE, n SSM) n their SAS scores were recore, where SAS ¼ 00 rms ðlignment lengthþ. Our gol stnr is bse on the best lignment foun by these six methos in terms of the SAS score. The sequences of every pir of structures in this set iffer significntly (FASTA Evlue >0 4 ). A FrgBg escription of protein is row vector; its length, N, isthe size of the librry use. The vector escribing the ith protein is b i ¼ðb i ðþ; b i ð2þ; ;b i ðnþþ, where b i ðjþ is the number of times frgment j is the best locl pproximtion of segment in the ith protein. We consier three istnce metrics between two vectors, b i n b k : (i) cosine istnce, b T i b k b i b k,(ii) histogrm intersection istnce, N j¼ minfb iðjþ;b k ðjþg minfs i ;s k g, where s i ¼ N j¼ b iðjþ, n (iii) Euclien (L 2 ) istnce, b i b k. Sttisticl Anlysis. We now escribe wht t were use in the sttisticl nlysis, how these t re summrize in mtrix form, n the etils of the sttisticl nlysis. Rw Dt. We use the 8,87 omins in the S35 fmily level in CATH Becuse the clssifiction t the C level is bse on seconry structure, we focus on the CA n CAT levels, n run the tests seprtely on ifferent C clsses. To improve the sttisticl power of the tests, we use only ctegories with t lest 30 structures. When prtitioning the tset to ctegories t the CA level, there re 4 ctegories (with t lest 30 structures) in the minlyα clss (totling 2,077 structures out of 2,078); 9 in the minlyβ clss (,968 out of 2,062); n 7 in the mixe α þ β clss (4,507 out of 4,558). There ws only one ctegory in the fewseconrystructure clss, so this clss ws omitte. When prtitioning t the CAT level, there re 2 ctegories in the minlyα clss (totling,03 structures); 3 in the minlyβ clss (,396 structures); n 22 in the mixe α þ β clss (2,68 structures). Dt in Mtrix Form. Consier librry of N frgments, n, sy, the CAlevel clssifiction of the M ¼ 968 minlyβ proteins into Q ¼ 9 ctegories. The FrgBg representtions of thesem proteins is initilly summrize in n M N mtrix B, whose ði;jþth entry is b i ðjþ. The mtrix B is prtitione rowwise into Q blocks corresponing to the Q ctegories, n we enote by M q the number of rows of the qth block. When consiering CATH s CAT level, minlyα proteins, or mixe α þ β proteins, M, Q, n B, s well s the prtition of B, chnge ccoringly. Omnibus Test. Following the usul sttisticl prctice, we first run single omnibus test (29) to check whether there is t lest one pir of ctegories whose proteins FrgBg representtions re ifferent from ech other in sttisticlly significnt wy; only fter such ifference is foun, we compre ll possible pirs of ctegories (post hoc nlysis). The t is multivrite, s ech FrgBg vector consists of N observtions, yet it certinly cnnot be ssume to be normlly istribute. Thus, we use nonprmetric permuttion test, pte from (36). We now construct sttistic w tht cptures the overll issimilrity between vectors belonging to ifferent ctegories; lrge vlues of w support rejecting the null hypothesis, ccoring to which the prtition into blocks crries no informtion with respect to the clssifiction. We first stnrize B s columns by iviing ech column by its stnr evition (36). Let B q be the M q N submtrix of (the stnrize) B, constituting the qth block, n let B q be the Nvector whose entries re the mens of the columns of B q.for two istinct blocks, q n r, we efine Δ qr ¼ mx j B q B r j, where the mximum is tken over the N ifferences (in bsolute vlues) between the entries of the two vectors. The omnibus test sttistic is w ¼ mx q r Δ qr. BIOPHYSICS AND COMPUTATIONAL BIOLOGY BuowskiTl et l. PNAS Februry 23, 200 vol. 07 no
6 Being permuttion test, the omnibus test s pvlue is PðW wþ, where W is similrly compute score uner rnom permuttion of B s rows. Becuse the number of permuttions is too lrge to enumerte, we resort to estimting the pvlue in Monte Crlo fshion, by rwing 000 rnom permuttions of B s rows, n observing the proportion of the permuttions chieving sttistic higher thn w. The omnibus test results were ll significnt, for comprisons both t the CA n CAT levels, for ll 24 librries, n for ech of the three CATH clsses (p < 0.00 in ll cses). Post Hoc Anlysis, Bonferroni Correction, n FDR. Once the omnibus test results were foun significnt, we test the t for more stringent lterntive hypothesis, ccoring to which ny two blocks re ifferent from ech other (rther thn testing for the existence of t lest one pir of ifferent blocks, s the omnibus test oes). To o so, we run the bove test seprtely for ech of the K ¼ QðQ Þ 2 pirs of blocks. When compring blocks q n r, the mtrix B is of imension ðm q þ M r Þ N, n s only two blocks re consiere, this comprison s sttistic reuces to w ¼ Δ qr. The result is collection of Kpvlues, corresponing to the K pirwise comprisons. When using the Bonferroni correction, we eclre s significnt only the comprisons in which the pvlue is below 0.05 K. When using the FDR pproch, we follow (30). ACKNOWLEDGMENTS. We thnk our nonymous reviewers for their helpful comments. This reserch ws supporte by the Mrie Curie IRG Grnt Kolony R, Petrey D, Honig B (2006) Protein structure comprison: Implictions for the nture of fol spce, n structure n function preiction. Curr Opin Struct Biol 6(3): Wtson JD, Lskowski RA, Thornton JM (2005) Preicting protein function from sequence n structurl t. Curr Opin Struct Biol 5(3): Zhng Y (2008) Progress n chllenges in protein structure preiction. Curr Opin Struct Biol 8(3): Murzin AG, Brenner SE, Hubbr T, Chothi C (995) SCOP: A structurl clssifiction of proteins tbse for the investigtion of sequences n structures. J Mol Biol 247(4): Perl FM, et l. (2003) The CATH tbse: An extene protein fmily resource for structurl n functionl genomics. Nucleic Acis Res 3(): Petrey D, Honig B (2009) Is protein clssifiction necessry? Towr lterntive pproches to function nnottion. Curr Opin Struct Biol 9(3): Subbih S, Lurents DV, Levitt M (993) Structurl similrity of DNAbining omins of bcteriophge repressors n the globin core. Curr Biol 3(3): Shinylov IN, Bourne PE (998) Protein structure lignment by incrementl combintoril extension (CE) of the optiml pth. Protein Eng (9): Krissinel E, Henrick K (2004) Seconrystructure mtching (SSM), new tool for fst protein structure lignment in three imensions. Act Crystllogr D 60: Kolony R, Koehl P, Levitt M (2005) Comprehensive Evlution of Protein Structure Alignment Methos: Scoring by Geometric Mesures. J Mol Biol 346(4): Li Z, Ye Y, Gozik A (2006) Flexible structurl neighborhoo tbse of protein structurl similrities n lignments. Nucleic Acis Res 34:D277 D Kosloff M, Kolony R (2008) Sequencesimilr, structureissimilr protein pirs in the PDB. Proteins 7(2): Aung Z, Tn KL (2007) Rpi retrievl of protein structures from tbses. Drug Discov Toy 2(7 8): Aung Z, Tn KL (2004) Rpi 3D protein structure tbse serching using informtion retrievl techniques. Bioinformtics 20(7): Crugo O, Pongor S (2002) Protein fol similrity estimte by probbilistic pproch bse on C(lph)C(lph) istnce comprison. J Mol Biol 35(4): Choi IG, Kwon J, Kim SH (2004) Locl feture frequency profile: A metho to mesure structurl similrity in proteins. Proc Ntl Ac Sci USA 0(): Rögen P, Fin B (2003) Automtic clssifiction of protein structure by using Guss integrls. Proc Ntl Ac Sci USA 00(): Zotenko E, O Lery DP, Przytyck TM (2006) Seconry structure sptil conformtion footprint: A novel metho for fst protein structure comprison n clssifiction. BMC Struct Biol 6:2. 9. Frieberg I, et l. (2007) Using n lignment of frgment strings for compring protein structures. Bioinformtics 23(2):e Tung CH, Hung JW, Yng JM (2007) Kpplph plot erive structurl lphbet n BLOSUMlike substitution mtrix for rpi serch of protein structure tbse. Genome Biol 8(3):R3. 2. Mnning CD, Rghvn P, Schütze H (2008) Introuction to Informtion Retrievl (Cmbrige Univ Press, Cmbrige). 22. Puzich J, Buhmnn JM, Rubner Y, Tomsi C (999) Empiricl evlution of issimilrity mesures for color n texture. The Proceeings of the Seventh IEEE Interntionl Conference on Computer Vision, 999 p FeiFei L, Pietro P (2005) A Byesin Hierrchicl Moel for Lerning Nturl Scene Ctegories. Proceeings of the 2005 IEEE Computer Society Conference on Computer Vision n Pttern Recognition (CVPR 05)  Volume 2  Volume 02 (IEEE Computer Society). 24. Melvin I, et l. (2007) SVMFol: A tool for iscrimintive multiclss protein fol n superfmily recognition. BMC Bioinformtics 8(Suppl 4):S Ttusov TA, Men TL (999) BLAST 2 Sequences, new tool for compring protein n nucleotie sequences. FEMS Microbiol Lett 74(2): Tylor WR, Orengo CA (989) Protein structure lignment. J Mol Biol 208(): Holm L, Sner C (993) Protein structure comprison by lignment of istnce mtrices. J Mol Biol 233(): Kleywegt GJ (996) Use of noncrystllogrphic symmetry in protein structure refinement. Act Crystllogr., Sect. D: Biol. Crystllogr. 52: Miller RGJ (98) Simultneous Sttisticl Inference (Springer, New York), 2n eition. 30. Benjmini Y, Hochberg Y (995) Controlling the flse iscovery rte: A prcticl n powerful pproch to multiple testing. J R Sttist Soc B 57(): Kihr D, Skolnick J (2003) The PDB is Covering Set of Smll Protein Structures. J Mol Biol 334(4): Moult J, et l. (2007) Criticl ssessment of methos of protein structure preiction Roun VII. Proteins 69(Suppl 8): Zhng Y, Arkki AK, Skolnick J (2005) TASSER: An utomte metho for the preiction of protein tertiry structures in CASP6. Proteins: Structure, Function, n Bioinformtics 9999(9999):NA. 34. Frieberg I, Gozik A (2005) Frgnostic: Wlking through protein structure spce. Nucleic Acis Res 33(suppl 2):W Kolony R, Koehl P, Guibs L, Levitt M (2002) Smll librries of protein frgments moel ntive protein structures ccurtely. J Mol Biol 323(2): Goo P (2000) Permuttion Tests (Springer, New York), 2n e BuowskiTl et l.
Tests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More information1 Online Learning and Regret Minimization
2.997 DecisionMking in LrgeScle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationA SignalLevel Fusion Model for ImageBased Change Detection in DARPA's Dynamic Database System
SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 160, Orlndo FL. (Minor errors in published version corrected.) A SignlLevel Fusion Model for ImgeBsed
More informationAcceptance Sampling by Attributes
Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire
More informationChapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses
Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of
More informationComputing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )
Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8
More informationNUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.
NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with
More informationClassification Part 4. Model Evaluation
Clssifiction Prt 4 Dr. Snjy Rnk Professor Computer nd Informtion Science nd Engineering University of Florid, Ginesville Model Evlution Metrics for Performnce Evlution How to evlute the performnce of model
More informationTesting categorized bivariate normality with twostage. polychoric correlation estimates
Testing ctegorized bivrite normlity with twostge polychoric correltion estimtes Albert MydeuOlivres Dept. of Psychology University of Brcelon Address correspondence to: Albert MydeuOlivres. Fculty of
More informationCzechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction
Czechoslovk Mthemticl Journl, 55 (130) (2005), 933 940 ESTIMATES OF THE REMAINDER IN TAYLOR S THEOREM USING THE HENSTOCKKURZWEIL INTEGRAL, Abbotsford (Received Jnury 22, 2003) Abstrct. When relvlued
More informationMatrix & Vector Basic Linear Algebra & Calculus
Mtrix & Vector Bsic Liner lgebr & lculus Wht is mtrix? rectngulr rry of numbers (we will concentrte on rel numbers). nxm mtrix hs n rows n m columns M x4 M M M M M M M M M M M M 4 4 4 First row Secon row
More informationUSA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year
1/1/21. Fill in the circles in the picture t right with the digits 18, one digit in ech circle with no digit repeted, so tht no two circles tht re connected by line segment contin consecutive digits.
More informationMath 211A Homework. Edward Burkard. = tan (2x + z)
Mth A Homework Ewr Burkr Eercises 5C Eercise 8 Show tht the utonomous system: 5 Plne Autonomous Systems = e sin 3y + sin cos + e z, y = sin ( + 3y, z = tn ( + z hs n unstble criticl point t = y = z =
More informationLecture 14: Quadrature
Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be relvlues nd smooth The pproximtion of n integrl by numericl
More informationa a a a a a a a a a a a a a a a a a a a a a a a In this section, we introduce a general formula for computing determinants.
Section 9 The Lplce Expnsion In the lst section, we defined the determinnt of (3 3) mtrix A 12 to be 22 12 21 22 2231 22 12 21. In this section, we introduce generl formul for computing determinnts. Rewriting
More informationMaterials Analysis MATSCI 162/172 Laboratory Exercise No. 1 Crystal Structure Determination Pattern Indexing
Mterils Anlysis MATSCI 16/17 Lbortory Exercise No. 1 Crystl Structure Determintion Pttern Inexing Objectives: To inex the xry iffrction pttern, ientify the Brvis lttice, n clculte the precise lttice prmeters.
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More informationLine Integrals. Partitioning the Curve. Estimating the Mass
Line Integrls Suppose we hve curve in the xy plne nd ssocite density δ(p ) = δ(x, y) t ech point on the curve. urves, of course, do not hve density or mss, but it my sometimes be convenient or useful to
More informationNumerical Integration
Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl
More informationParticle Lifetime. Subatomic Physics: Particle Physics Lecture 3. Measuring Decays, Scatterings and Collisions. N(t) = N 0 exp( t/τ) = N 0 exp( Γt/)
Sutomic Physics: Prticle Physics Lecture 3 Mesuring Decys, Sctterings n Collisions Prticle lifetime n with Prticle ecy moes Prticle ecy kinemtics Scttering cross sections Collision centre of mss energy
More information20 MATHEMATICS POLYNOMIALS
0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of
More informationNonLinear & Logistic Regression
NonLiner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find
More informationEntropy and Ergodic Theory Notes 10: Large Deviations I
Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms
More informationConstruction and Selection of Single Sampling Quick Switching Variables System for given Control Limits Involving Minimum Sum of Risks
Construction nd Selection of Single Smpling Quick Switching Vribles System for given Control Limits Involving Minimum Sum of Risks Dr. D. SENHILKUMAR *1 R. GANESAN B. ESHA RAFFIE 1 Associte Professor,
More informationMath 360: A primitive integral and elementary functions
Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:
More information63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1
3 9. SEQUENCES AND SERIES 63. Representtion of functions s power series Consider power series x 2 + x 4 x 6 + x 8 + = ( ) n x 2n It is geometric series with q = x 2 nd therefore it converges for ll q =
More informationData Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading
Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3dVritionl Method
More informationSTRAND B: NUMBER THEORY
Mthemtics SKE, Strnd B UNIT B Indices nd Fctors: Tet STRAND B: NUMBER THEORY B Indices nd Fctors Tet Contents Section B. Squres, Cubes, Squre Roots nd Cube Roots B. Inde Nottion B. Fctors B. Prime Fctors,
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/15265463200250010048.pdf
More information31.2. Numerical Integration. Introduction. Prerequisites. Learning Outcomes
Numericl Integrtion 3. Introduction In this Section we will present some methods tht cn be used to pproximte integrls. Attention will be pid to how we ensure tht such pproximtions cn be gurnteed to be
More informationChapter 4 Contravariance, Covariance, and Spacetime Diagrams
Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz
More informationx ) dx dx x sec x over the interval (, ).
Curve on 6 For , () Evlute the integrl, n (b) check your nswer by ifferentiting. ( ). ( ). ( ).. 6. sin cos 7. sec csccot 8. sec (sec tn ) 9. sin csc. Evlute the integrl sin by multiplying the numertor
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More informationB Veitch. Calculus I Study Guide
Clculus I Stuy Guie This stuy guie is in no wy exhustive. As stte in clss, ny type of question from clss, quizzes, exms, n homeworks re fir gme. There s no informtion here bout the wor problems. 1. Some
More informationNormal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution
Norml Distribution Lecture 6: More Binomil Distribution If X is rndom vrible with norml distribution with men µ nd vrince σ 2, X N (µ, σ 2, then P(X = x = f (x = 1 e 1 (x µ 2 2 σ 2 σ Sttistics 104 Colin
More informationCHAPTER 9 BASIC CONCEPTS OF DIFFERENTIAL AND INTEGRAL CALCULUS
CHAPTER 9 BASIC CONCEPTS OF DIFFERENTIAL AND INTEGRAL CALCULUS BASIC CONCEPTS OF DIFFERENTIAL AND INTEGRAL CALCULUS LEARNING OBJECTIVES After stuying this chpter, you will be ble to: Unerstn the bsics
More informationSynoptic Meteorology I: Finite Differences September Partial Derivatives (or, Why Do We Care About Finite Differences?
Synoptic Meteorology I: Finite Differences 1618 September 2014 Prtil Derivtives (or, Why Do We Cre About Finite Differences?) With the exception of the idel gs lw, the equtions tht govern the evolution
More information5: The Definite Integral
5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce
More informationPolynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230
Polynomil Approimtions for the Nturl Logrithm nd Arctngent Functions Mth 23 You recll from first semester clculus how one cn use the derivtive to find n eqution for the tngent line to function t given
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationLecture Notes: Orthogonal Polynomials, Gaussian Quadrature, and Integral Equations
18330 Lecture Notes: Orthogonl Polynomils, Gussin Qudrture, nd Integrl Equtions Homer Reid My 1, 2014 In the previous set of notes we rrived t the definition of Chebyshev polynomils T n (x) vi the following
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, Kmens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationNumbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point
GCSE C Emple 7 Work out 9 Give your nswer in its simplest form Numers n inies Reiprote mens invert or turn upsie own The reiprol of is 9 9 Mke sure you only invert the frtion you re iviing y 7 You multiply
More informationStrokeBased Performance Metrics for Handwritten Mathematical Expressions
2011 Interntionl Conference on ocument Anlysis n Recognition StrokeBse Performnce Metrics for Hnwritten Mthemticl Expressions Richr Znii rlz@cs.rit.eu Amit Pilly p2731@rit.eu eprtment of Computer Science
More informationMultiple Testing in a TwoStage Adaptive Design with Combination Tests Controlling FDR
Multiple Testing in TwoStge Adptive Design with Combintion Tests Controlling FDR Snt K. Srkr, Jingjing Chen nd Wenge Guo Mrch 13, 2012 Snt K. Srkr is Cyrus H. K. Curtis Professor, Deprtment of Sttistics,
More informationMATHS NOTES. SUBJECT: Maths LEVEL: Higher TEACHER: Aidan Roantree. The Institute of Education Topics Covered: Powers and Logs
MATHS NOTES The Institute of Eduction 06 SUBJECT: Mths LEVEL: Higher TEACHER: Aidn Rontree Topics Covered: Powers nd Logs About Aidn: Aidn is our senior Mths techer t the Institute, where he hs been teching
More informationAP Calculus BC Review Applications of Integration (Chapter 6) noting that one common instance of a force is weight
AP Clculus BC Review Applictions of Integrtion (Chpter Things to Know n Be Able to Do Fin the re between two curves by integrting with respect to x or y Fin volumes by pproximtions with cross sections:
More informationHow to simulate Turing machines by invertible onedimensional cellular automata
How to simulte Turing mchines by invertible onedimensionl cellulr utomt JenChristophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex
More informationApplied. Grade 9 Assessment of Mathematics. Released assessment Questions
Applie Gre 9 Assessment of Mthemtics 21 Relese ssessment Questions Recor your nswers to the multiplechoice questions on the Stuent Answer Sheet (21, Applie). Plese note: The formt of this booklet is ifferent
More informationECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance
Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between
More informationDan G. Cacuci. Department of Mechanical Engineering, University of South Carolina
SECONDORDER ADJOINT SENSITIVITY ANALYSIS PROCEDURE (SOASAP) FOR COMPUTING EXACTLY AND EFFICIENTLY FIRST AND SECONDORDER SENSITIVITIES IN LARGESCALE LINEAR SYSTEMS: II. ILLUSTRATIVE APPLICATION TO
More informationPrecalculus Spring 2017
Preclculus Spring 2017 Exm 3 Summry (Section 4.1 through 5.2, nd 9.4) Section P.5 Find domins of lgebric expressions Simplify rtionl expressions Add, subtrct, multiply, & divide rtionl expressions Simplify
More informationContinuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive
More informationMA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.
MA123, Chpter 1: Formuls for integrls: integrls, ntiderivtives, nd the Fundmentl Theorem of Clculus (pp. 27233, Gootmn) Chpter Gols: Assignments: Understnd the sttement of the Fundmentl Theorem of Clculus.
More informationIntuitionistic Fuzzy Lattices and Intuitionistic Fuzzy Boolean Algebras
Intuitionistic Fuzzy Lttices nd Intuitionistic Fuzzy oolen Algebrs.K. Tripthy #1, M.K. Stpthy *2 nd P.K.Choudhury ##3 # School of Computing Science nd Engineering VIT University Vellore632014, TN, Indi
More informationElectric Potential. Concepts and Principles. An Alternative Approach. A Gravitational Analogy
. Electric Potentil Concepts nd Principles An Alterntive Approch The electric field surrounding electric chrges nd the mgnetic field surrounding moving electric chrges cn both be conceptulized s informtion
More informationLine and Surface Integrals: An Intuitive Understanding
Line nd Surfce Integrls: An Intuitive Understnding Joseph Breen Introduction Multivrible clculus is ll bout bstrcting the ides of differentition nd integrtion from the fmilir single vrible cse to tht of
More informationArithmetic & Algebra. NCTM National Conference, 2017
NCTM Ntionl Conference, 2017 Arithmetic & Algebr Hether Dlls, UCLA Mthemtics & The Curtis Center Roger Howe, Yle Mthemtics & Texs A & M School of Eduction Relted Common Core Stndrds First instnce of vrible
More informationDistance And Velocity
Unit #8  The Integrl Some problems nd solutions selected or dpted from HughesHllett Clculus. Distnce And Velocity. The grph below shows the velocity, v, of n object (in meters/sec). Estimte the totl
More information1 From NFA to regular expression
Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work
More informationLecture 3. Limits of Functions and Continuity
Lecture 3 Limits of Functions nd Continuity Audrey Terrs April 26, 21 1 Limits of Functions Notes I m skipping the lst section of Chpter 6 of Lng; the section bout open nd closed sets We cn probbly live
More informationProblem Set 9. Figure 1: Diagram. This picture is a rough sketch of the 4 parabolas that give us the area that we need to find. The equations are:
(x + y ) = y + (x + y ) = x + Problem Set 9 Discussion: Nov., Nov. 8, Nov. (on probbility nd binomil coefficients) The nme fter the problem is the designted writer of the solution of tht problem. (No one
More informationDiscrete Leastsquares Approximations
Discrete Lestsqures Approximtions Given set of dt points (x, y ), (x, y ),, (x m, y m ), norml nd useful prctice in mny pplictions in sttistics, engineering nd other pplied sciences is to construct curve
More informationUsing integration tables
Using integrtion tbles Integrtion tbles re inclue in most mth tetbooks, n vilble on the Internet. Using them is nother wy to evlute integrls. Sometimes the use is strightforwr; sometimes it tkes severl
More informationThe final exam will take place on Friday May 11th from 8am 11am in Evans room 60.
Mth 104: finl informtion The finl exm will tke plce on Fridy My 11th from 8m 11m in Evns room 60. The exm will cover ll prts of the course with equl weighting. It will cover Chpters 1 5, 7 15, 17 21, 23
More informationTHERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION
XX IMEKO World Congress Metrology for Green Growth September 9,, Busn, Republic of Kore THERMAL EXPANSION COEFFICIENT OF WATER FOR OLUMETRIC CALIBRATION Nieves Medin Hed of Mss Division, CEM, Spin, mnmedin@mityc.es
More informationA. Limits  L Hopital s Rule ( ) How to find it: Try and find limits by traditional methods (plugging in). If you get 0 0 or!!, apply C.! 1 6 C.
A. Limits  L Hopitl s Rule Wht you re finding: L Hopitl s Rule is used to find limits of the form f ( x) lim where lim f x x! c g x ( ) = or lim f ( x) = limg( x) = ". ( ) x! c limg( x) = 0 x! c x! c
More information#6A&B Magnetic Field Mapping
#6A& Mgnetic Field Mpping Gol y performing this lb experiment, you will: 1. use mgnetic field mesurement technique bsed on Frdy s Lw (see the previous experiment),. study the mgnetic fields generted by
More informationFarey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University
U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions
More informationWeek 10: Riemann integral and its properties
Clculus nd Liner Algebr for Biomedicl Engineering Week 10: Riemnn integrl nd its properties H. Führ, Lehrstuhl A für Mthemtik, RWTH Achen, WS 07 Motivtion: Computing flow from flow rtes 1 We observe the
More informationThe International Association for the Properties of Water and Steam. Release on the Ionization Constant of H 2 O
IAPWS R7 The Interntionl Assocition for the Properties of Wter nd Stem Lucerne, Sitzerlnd August 7 Relese on the Ioniztion Constnt of H O 7 The Interntionl Assocition for the Properties of Wter nd Stem
More informationS. S. Dragomir. 1. Introduction. In [1], Guessab and Schmeisser have proved among others, the following companion of Ostrowski s inequality:
FACTA UNIVERSITATIS NIŠ) Ser Mth Inform 9 00) 6 SOME COMPANIONS OF OSTROWSKI S INEQUALITY FOR ABSOLUTELY CONTINUOUS FUNCTIONS AND APPLICATIONS S S Drgomir Dedicted to Prof G Mstroinni for his 65th birthdy
More informationMATH STUDENT BOOK. 10th Grade Unit 5
MATH STUDENT BOOK 10th Grde Unit 5 Unit 5 Similr Polygons MATH 1005 Similr Polygons INTRODUCTION 3 1. PRINCIPLES OF ALGEBRA 5 RATIOS AND PROPORTIONS 5 PROPERTIES OF PROPORTIONS 11 SELF TEST 1 16 2. SIMILARITY
More informationSPECIALIST MATHEMATICS
Victorin Certificte of Euction 07 SUPERVISOR TO ATTACH PROCESSING LABEL HERE Letter STUDENT NUMBER SPECIALIST MATHEMATICS Written emintion Thursy 8 June 07 Reing time:.00 pm to.5 pm (5 minutes) Writing
More information4. GREEDY ALGORITHMS I
4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 PersonAddison Wesley http://www.cs.princeton.edu/~wyne/kleinbergtrdos
More informationA Convergence Theorem for the Improper Riemann Integral of Banach Spacevalued Functions
Interntionl Journl of Mthemticl Anlysis Vol. 8, 2014, no. 50, 24512460 HIKARI Ltd, www.mhikri.com http://dx.doi.org/10.12988/ijm.2014.49294 A Convergence Theorem for the Improper Riemnn Integrl of Bnch
More informationPhysics 9 Fall 2011 Homework 2  Solutions Friday September 2, 2011
Physics 9 Fll 0 Homework  s Fridy September, 0 Mke sure your nme is on your homework, nd plese box your finl nswer. Becuse we will be giving prtil credit, be sure to ttempt ll the problems, even if you
More informationhttp:dx.doi.org1.21611qirt.1994.17 Infrred polriztion thermometry using n imging rdiometer by BALFOUR l. S. * *EORD, Technion Reserch & Development Foundtion Ltd, Hif 32, Isrel. Abstrct This pper describes
More information11.1 Exponential Functions
. Eponentil Functions In this chpter we wnt to look t specific type of function tht hs mny very useful pplictions, the eponentil function. Definition: Eponentil Function An eponentil function is function
More informationClassification: Rules. Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como
Metodologie per Sistemi Intelligenti Clssifiction: Prof. Pier Luc Lnzi Lure in Ingegneri Informtic Politecnico di Milno Polo regionle di Como Rules Lecture outline Why rules? Wht re clssifiction rules?
More informationarxiv: v1 [math.rt] 1 Jul 2015
QUIVER GRASSMANNIANS OF TYPE D n PART 1: SCHUBERT SYSTEMS AND DECOMPOSITIONS INTO AFFINE SPACES rxiv:1507.00392v1 [mth.rt] 1 Jul 2015 OLIVER LORSCHEID AND THORSTEN WEIST ABSTRACT. Let Q be quiver of extene
More information3 Regular expressions
3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll
More informationMACsolutions of the nonexistent solutions of mathematical physics
Proceedings of the 4th WSEAS Interntionl Conference on Finite Differences  Finite Elements  Finite Volumes  Boundry Elements MACsolutions of the nonexistent solutions of mthemticl physics IGO NEYGEBAUE
More informationOrthogonal Polynomials and LeastSquares Approximations to Functions
Chpter Orthogonl Polynomils nd LestSqures Approximtions to Functions **4/5/3 ET. Discrete LestSqures Approximtions Given set of dt points (x,y ), (x,y ),..., (x m,y m ), norml nd useful prctice in mny
More informationCSCI FOUNDATIONS OF COMPUTER SCIENCE
1 CSCI 2200 FOUNDATIONS OF COMPUTER SCIENCE Spring 2015 My 7, 2015 2 Announcements Homework 9 is due now. Some finl exm review problems will be posted on the web site tody. These re prcqce problems not
More informationComparisonSorting and Selecting in Totally Monotone Matrices
Chpter 1 ComprisonSorting nd Selecting in Totlly Monotone Mtrices Nog Alon Yossi Azr Abstrct An m n mtrix A is clled totlly monotone if for ll i 1 < i 2 nd j 1 < j 2, A[i 1, j 1] > A[i 1, j 2] implies
More informationLecture 1. Functional series. Pointwise and uniform convergence.
1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is
More informationMore precisely, given the collection fx g, with Eucliden distnces between pirs (; b) of ptterns: = p (x? x b ) ; one hs to nd mp, ' : R n distnceerro
Improved Multidimensionl Scling Anlysis Using Neurl Networks with DistnceError Bckpropgtion Llus Grrido (), Sergio Gomez () nd Jume Roc () () Deprtment d'estructur i Constituents de l Mteri/IFAE Universitt
More informationStrategy: Use the Gibbs phase rule (Equation 5.3). How many components are present?
University Chemistry Quiz 4 2014/12/11 1. (5%) Wht is the dimensionlity of the threephse coexistence region in mixture of Al, Ni, nd Cu? Wht type of geometricl region dose this define? Strtegy: Use the
More informationProblem Set 3
14.102 Problem Set 3 Due Tuesdy, October 18, in clss 1. Lecture Notes Exercise 208: Find R b log(t)dt,where0
More informationdx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω.
Chpter 8 Stility theory We discuss properties of solutions of first order two dimensionl system, nd stility theory for specil clss of liner systems. We denote the independent vrile y t in plce of x, nd
More information1.3 Regular Expressions
56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,
More informationMapping the delta function and other Radon measures
Mpping the delt function nd other Rdon mesures Notes for Mth583A, Fll 2008 November 25, 2008 Rdon mesures Consider continuous function f on the rel line with sclr vlues. It is sid to hve bounded support
More informationUniversitaireWiskundeCompetitie. Problem 2005/4A We have k=1. Show that for every q Q satisfying 0 < q < 1, there exists a finite subset K N so that
Problemen/UWC NAW 5/7 nr juni 006 47 Problemen/UWC UniversitireWiskundeCompetitie Edition 005/4 For Session 005/4 we received submissions from Peter Vndendriessche, Vldislv Frnk, Arne Smeets, Jn vn de
More informationThe Trapezoidal Rule
_.qd // : PM Pge 9 SECTION. Numericl Integrtion 9 f Section. The re of the region cn e pproimted using four trpezoids. Figure. = f( ) f( ) n The re of the first trpezoid is f f n. Figure. = Numericl Integrtion
More information1 Error Analysis of Simple Rules for Numerical Integration
cs41: introduction to numericl nlysis 11/16/10 Lecture 19: Numericl Integrtion II Instructor: Professor Amos Ron Scries: Mrk Cowlishw, Nthnel Fillmore 1 Error Anlysis of Simple Rules for Numericl Integrtion
More informationHarmonic Mean Derivative  Based Closed Newton Cotes Quadrature
IOSR Journl of Mthemtics (IOSRJM) eissn:  pissn: 9X. Volume Issue Ver. IV (My.  Jun. 0) PP  www.iosrjournls.org Hrmonic Men Derivtive  Bsed Closed Newton Cotes Qudrture T. Rmchndrn D.Udykumr nd
More informationSection 6.1 Definite Integral
Section 6.1 Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot e determined
More information1 Nondeterministic Finite Automata
1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you
More informationContextspecific infinite mixtures for clustering gene expression profiles across diverse microarray dataset
Gene Expression Contextspecific infinite mixtures for clustering gene expression profiles cross diverse microrry dtset Liu, X. 1,, Sivgnesn, S. 3, Yeung, K.Y. 4, Guo, J. 1, Bumgrner, R.E. 4, MedvedovicMrio
More information