SVM and a Novel POOL Method Coupled with THEMATICS for. Protein Active Site Prediction

Size: px
Start display at page:

Download "SVM and a Novel POOL Method Coupled with THEMATICS for. Protein Active Site Prediction"

Transcription

1 SVM and a Novel POOL Method Coupled wth THEMATICS fo Poten Actve Ste Pedcton A DISSERTATION SUBMITTED TO THE COLLEGE OF COMPUTER AND INFORMATION SCIENCE OF NORTHEASTERN UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY By Wenxu Tong Apl 2008

2 Wenxu Tong, 2008 ALL RIGHTS RESERVED

3 Acknowledgements I have a lot of people to thank. I am mostly ndebted to my advso D. Ron Wllams. He geneously allowed me to wok on a poblem wth the dea he developed decades ago and guded me though the eseach to tun t fom just a mee dea nto a useful system to solve mpotant poblems. Wthout hs kndness, wsdom and pesstence, thee would be no ths dssetaton. Anothe peson I am so fotunate to meet and wok wth s D. May Jo Ondechen, my co-advso. It was he who developed the THEMATICS method, whch ths dssetaton woks on. He gudance and help dung my eseach s ctcal fo me. I am gateful to all the commttee membes fo eadng and commentng my dssetaton. Especally, I am thankful to D. Jay Aslam, who povded a lot of advce fo my eseach, and D. Bob Futelle, who bought me nto the feld and povded much help n my wtng of ths dssetaton. I also thank D. Budl fo the tme he spent sevng as my commttee membe, especally dung all the dffcultes and nconvenence he happened to expeence unfotunately dung the tme. I am so fotunate to wok n the THEMATICS goup wth D. Leo Muga, D. Yng We and my fellow gaduate student soon to be D. Heathe Bodkn. I also thank D. Jun Gong, D. Emne Ylmaz and soon to be D. Vglu Pavlu, wthout the geneous help, my jouney though the tunnel towads my degee would be much dake and hade. I would not have been what I am wthout the love and suppot of my paents, Yunkun Tong and Xazhu Wang. Thanks to asng us and gvng us the best educaton possble dung all the hadshp they had endued, both of my sste, Weny Tong and I have eceved Ph.D. degees, the hghest degee one can expect. I am so gateful and poud of them.

4 Last but defntely not the least I would thank Yng Yang, my beloved wfe. Wthout he patence and confdence n me, I could not magne that I can do what I have done and I wll humbly dedcate ths dssetaton to he. v

5 Table of Contents Abstact Intoducton THEMATICS and poten actve ste pedcton Machne Leanng Backgound and Related Wok Poten Actve Ste Pedcton Machne Leanng Commonly used supevsed leanng methods Pobablty based appoach Pefomance measue fo classfcaton poblems THEMATICS The THEMATICS method and ts featues Statstcal analyss wth THEMATICS Challenges of the ste pedcton poblem usng THEMATICS data Applyng SVM to THEMATICS Intoducton THEMATICS cuve featues used n the SVM Tanng Results Success n ste pedcton Success n catalytc esdue pedcton Incopoaton of non-onzable esdues v

6 3.4.4 Compason wth othe methods Dscusson Cluste numbe and sze Falue analyss Analyss of hgh fltaton ato cases Some specfc examples Conclusons Next step New Method: Patal Ode Optmal Lkelhood (POOL) Ways to estmate class pobabltes Smple jont pobablty table look-up Naïve Bayes method The K-neaest-neghbo method POOL method Combnng CPE's POOL method n detal Maxmum lkelhood poblem wth monotoncty assumpton Convex optmzaton and K.K.T. condtons Fndng Mnmum Sum of Squaed Eo (SSE) POOL algothm Poof that the POOL algothm fnds the mnmum SSE Maxmum lkelhood vs. mnmum SSE Addtonal computatonal steps Pepocessng Intepolaton v

7 5. Applyng the POOL Method wth THEMATICS n Poten Actve Ste Pedcton Intoducton THEMATICS cuves and othe featues used n the POOL method Pefomance measuement Computatonal pocedue Results Ionzable esdues usng only THEMATICS featues Ionzable esdues usng THEMATICS plus cleft nfomaton All esdues usng THEMATICS plus cleft nfomaton All esdues usng THEMATICS, cleft nfomaton and sequence consevaton, f applcable Recall-fltaton ato cuves Compason wth othe methods Rank of the fst postve Dscusson Summay and Conclusons Contbutons Futue eseach Appendces Appendx A. The tanng set used n THEMATICS-SVM Appendx B. The test set used n THEMATICS-SVM Appendx C. The 64 poten testng set used n THEMATICS-POOL Appendx D. The 160 poten testng set used n THEMATICS-POOL v

8 Bblogaphy v

9 Lst of Fgues Fgue 2.1 Ttaton cuves Fgue 3.1 The success ate fo ste pedcton on a pe-poten bass Fgue 3.2 Dstbuton of the 64 potens acoss dffeent values fo the fltaton ato Fgue 3.3 Recall-false postve ate plot (ROC cuves) of SVM vesus othe methods Fgue 3.4 SVM pedcton fo poten1qfe Fgue 3.5 The SVM pedcton fo 2PLC Fgue 4.1 Thee cases of G n elaton to the convex cone of constants Fgue 5.1 Aveaged ROC cuve compang POOL(T4), We s statstcal analyss and Tong s SVM usng THEMATICS featues Fgue 5.2 Aveaged ROC cuves compang dffeent methods of pedctng onzable actve ste esdues usng a combnaton of THEMATICS and geometc featues of onzable esdues only Fgue 5.3 Aveaged ROC cuve compang POOL methods appled to onzable esdues only CHAIN(TION, G) and to all esdues CHAIN(TALL, G) Fgue 5.4 Aveaged ROC cuves compang dffeent methods of combnng THEMATICS, geometc and sequence consevaton featues of all esdues Fgue 5.5 Aveaged RFR cuve of fo CHAIN(T, G, C) on the 160 poten test set Fgue 5.6 ROC cuves compang CHAIN(T, G), CHAIN(T, G, C) and Petova s method Fgue 5.7 Hstogam of the fst annotated actve ste esdue x

10 Lst of Tables Table 2.1 Confuson matx of classfcaton labelng Table 3.1 Pefomance of the SVM pedctons alone vesus the SVM egonal pedctons that nclude all esdues wthn a 6Å sphee of each SVM-pedcted esdue Table 3.2 Compason of THEMATICS-SVM and othe methods...50 Table 5.1 Wlcoxon sgned-ank tests between methods shown n fgue Table 5.2 Wlcoxon sgned-ank tests between methods shown n fgue Table 5.3 Compason of senstvty, pecson, and AUC of CHAIN(T, G, C) wth Youn s epoted esults fo potens n the same famly, supe famly, and fold Table 5.4 Compason of CHAIN(T, G) and CHAIN(T, G, C) wth Petova s method Table 5.5 Compason of CHAIN(T, G) and CHAIN(T, G, C) wth Xe s method x

11 Lst of Abbevaton ANN ASA AUC AveS CPE CSA E.C. numbe H-H equaton K.K.T. condtons k-nn MAP MCC MAS ML PDB POOL RFR cuve ROC cuve SSE SVM THEMATICS VC dmenson Atfcal Neual Netwok Aea of Solvent Accessblty Aea Unde the Cuve Aveaged Specfcty Class Pobablty Estmato Catalytc Ste Atlas Enzyme Commsson numbe Hendeson-Hasselbalch equaton Kaush-Kuhn-Tucke condtons k-neaest Neghbo method Maxmum a posteo Matthews Coelaton Coeffcent Mean Aveage Specfcty Maxmum Lkelhood Poten Data Bank Patal Ode Optmal Lkelhood Recall-Fltaton Rato cuve Receve Opeatng Chaactestc cuve Sum of Squaed Eos Suppot Vecto Machne Theoetcal Mcoscopc Ttaton Cuves Vapnk-Chevonenks dmenson 11

12 Abstact Poten actve ste pedcton s a vey mpotant poblem n bonfomatcs. THEMATICS s a smple and effectve method based on the specal electostatc popetes of onzable esdues to pedct such stes fom poten thee-dmensonal stuctue alone. The pocess nvolves dstngushng computed ttaton cuves wth petubed shape fom nomal ones; the dffeences ae subtle n many cases. In ths dssetaton, I develop and apply specal machne leanng technques to automate the pocess and acheve hghe senstvty than esults fom othe methods whle mantanng hgh specfcty. I fst pesent applcaton of suppot vecto machnes (SVM) to automate the actve ste pedcton usng THEMATICS; at the tme ths wok was developed, t acheved bette pefomance than any othe 3D stuctue based methods. I then pesent the moe ecently developed Patal Ode Optmal Lkelhood (POOL) method, whch estmates the pobabltes of esdues beng actve unde cetan natual monotoncty assumptons. The dssetaton shows that applyng the POOL method just on THEMATICS featues outpefoms the SVM esults. Futhemoe, snce the oveall appoach s based on estmatng cetan pobabltes fom labeled tanng data, t povdes a pncpled way to combne the use of THEMATICS featues wth othe non-electostatc featues poposed by othes. In patcula, I consde the use of geometc featues as well, and the esultng classfes ae the best stuctue-only pedctos yet found. Fnally, I show that addng n sequence-based consevaton scoes whee applcable yelds a method that outpefoms all exstng method whle usng only whateve combnaton of stuctue-based o sequence-based featues s avalable. 12

13 Chapte 1 Intoducton 13

14 Ths dssetaton employs both standad and novel machne leanng technques to automate one aspect of the poblem of poten functon pedcton fom the thee-dmensonal stuctue. In addton to applyng establshed technques, patculaly the suppot vecto machne (SVM), I ntoduce a novel method, called patal ode optmal lkelhood (POOL), to pefom the task of selecton of functonally mpotant stes n poten stuctues. In my appoach to the poten functon pedcton poblem, I stat wth just the 3D stuctue of potens and use THEMATICS, one of the most effectve methods whch focuses on electostatc featues of esdues. Late, I also add some geometc featues, and then the consevaton of esdues among homologous sequences, when avalable, nto ou system to acheve bette esults. 1.1 THEMATICS and poten actve ste pedcton. Functon pedcton (pedctng poten functon fom poten stuctue) s an mpotant and challengng task n genomcs and poteomcs 1. Moe and moe poten stuctues have been deposted to the PDB (Poten Data Bank) database, many wth unknown functons. As of ths wtng, thee ae ove 3600 poten stuctues n the PDB of unknown o uncetan functon. The ecent development of geneatng stuctues fom potens expessed fom gene sequences usng hgh thoughput methods 2-6 only makes effectve and effcent functon pedcton even moe mpotant, as most of these Stuctual Genomcs potens ae of unknown functon. Detemnaton of actve stes, ncludng enzyme catalytc stes, lgand bndng stes, ecognton eptopes, and othe functonally mpotant stes s one of the keys to poten functon pedcton. In addton, the mpotance of ste pedcton goes beyond pedctng actve stes fo potens wth unknown functon. Even fo a poten wth known functon, t s not necessaly tue that the actve ste of that poten s fully o patally chaactezed. Coectly fndng the actve ste of a poten s always a peequste to undestandng the poten s catalytc mechansm. It also opens the doo to the desgn of lgands to nhbt, actvate, o othewse modfy the poten s functon. Poten engneeng applcatons to desgn a poten of patcula functons 7-9 also eque knowledge of the pope featues needed to ceate a functonng actve ste. 14

15 Because of ts mpotance n genomcs and poteomcs, many dffeent methods have been developed to pedct the actve ste of a poten We wll suvey some of these methods n a late secton. But among them, thee s one patcula method, namely THEMATICS (Theoetcal Mcoscopc Ttaton Cuves), whch s poweful, accuate and pecse 16, Based on poten 3D stuctue alone, t can pedct coect actve stes that ae hghly localzed n small egons of the potens stuctues. The detals of the THEMATICS method wll be gven late, but the key pont of ths method s that t takes advantage of the specal chemcal and electostatc popetes of actve ste esdues, snce actve ste esdues tend to have anomalous ttaton behavo; THEMATICS geneates the ttaton cuves of onzable esdues of a poten. In ts ognal fomulaton the pesence of two o moe esdues wth petubed ttaton cuves n physcal poxmty s consdeed a elable pedcto of the actve stes fo potens. Fo the THEMATICS method to wok well, one needs a cteon to dstngush the petubed ttaton cuves fom nomal, unpetubed ones, whch s not a tval task. My wok uses machne-leanng technology to automatcally ths pocess. In SVM, I ted to solve ths poblem n the fom of classfcaton by pedctng each esdue as ethe an actve ste esdue, o not. Late, I developed the POOL method to solve ths poblem by ank-odeng the esdues n a poten accodng to the pobablty of beng n an actve ste, based on how petubed the ttaton cuves ae n addton to some othe 3D-stuctue-based nfomaton. Late stll, sequence consevaton nfomaton, f avalable, was added as well. 1.2 Machne Leanng. Machne leanng s a well-developed feld n compute scence. Thee ae many types of tasks, angng fom enfocement leanng 25 to moe passve foms of leanng, lke supevsed leanng and unsupevsed leanng 26. A typcal supevsed leanng task s to ceate a functon o classfcaton fom a set of tanng data. If the output of the functon s a label fom some fnte set of classes, t s called a classfcaton poblem. If the output s a contnuous value, t s called a egesson poblem. A tanng set 15

16 conssts of a set of tanng examples,.e. pas of nput-output vectos. And the machne-leanng poblem s typcally an optmzaton poblem to geneate a functon that wll gve an output fom a vald nput that genealzes fom the seen tanng data n a easonable way. The pat that gets optmzed s typcally genealzaton eo, whch s the eo a cetan taned machne wll make on unseen data wth the same dstbuton as that of the populaton. Fo an unsupevsed leanng task, thee s no labeled tanng data. Usually, the task n unsupevsed leanng s to cluste the obseved data wth some ctea, o ft a model to epesent the obseved data. I wll focus on supevsed leanng; hee my leanng task s essentally a classfcaton task. As wll be descbed below, n one pat of ths wok the goal wll be to estmate actual class pobabltes, whch n some espects s lke a egesson poblem. 16

17 Chapte 2 Backgound and Related Wok 17

18 2.1 Poten Actve Ste Pedcton. Snce the man focus of ths dssetaton s n Compute Scence, I wll just befly suvey some of the methods used fo ths applcaton, to seve as a backgound fo the method compason late n my dssetaton. Thee ae two majo classes of methodology used to pedct poten actve stes. Almost all cuent methods n actve ste pedcton use one of them o a mx of both appoaches. The fst methodology s based on sequence compason, o evolutonay nfomaton deved fom sequence algnments. The atonale s that actve stes of a poten ae mpotant egons n the potens, and that the amno acds, temed esdues, n actve stes theefoe should be moe conseved thoughout evoluton than some othe egons of the poten. If we can fnd hghly conseved egons among sequences n smla potens fom dffeent souces (speces/tssues), o even n dffeent potens but wth smla functons, most lkely actve stes should consst of subsets wthn these egons. Ths s a vald assumpton and ndeed, many methods have been developed based on ths appoach, such as ConSuf 27, Rate4Ste 28, and othes Howeve thee ae two dawbacks to ths appoach. Fst, n ode to use ths method, thee have to be at least 10, and pefeably 50, dffeent poten sequences wth cetan degees of smlaty n ode to get elable esults. The method does not wok well f the smlates between sequences ae ethe too hgh o too low. Thee ae studes showng that sequence-based methods can tansfe elably the extacted functonal nfomaton only when appled to sequences wth as hgh as 40% sequence dentty 33, 34. Ths dawback makes the method unsutable fo many potens, patcula Stuctual Genomcs potens, snce they often do not have enough smla sequences wth sutable ange of smlates. 18

19 Second, although most actve ste esdues tend to be conseved though evoluton, t s cetanly not tue that all conseved egons of a poten ae actve stes. Resdues n poten sequences can be conseved fo a vaety of easons, not just because of nvolvement n actve stes. One well-known counteexample s the set of esdues that stablze the stuctue of the poten; they ae so mpotant to the poten that once mutated, the poten wll not have the pope stuctue to pefom ts functon. These esdues wll be conseved among dffeent poten homologues, even f they ae not actve stes. Theefoe typcally stes pedcted fom sequence based methods ae non-local, spannng a much lage aea than the tue actve ste. Anothe dffculty ases fo cases whee an actve ste egon n a poten s less conseved than othe egons of the poten, especally when the functon and/o substate of the potens n the class ae somewhat vesatle. The second methodology s stuctue-based actve ste pedcton. Thee ae dffeent popetes that have been studed and used n dffeent methods, such as electostatcs popetes as n THEMATICS 16, esdue nteacton as n the gaph theoetc method SARIG 35, van de Waals bndng enegy of a pobe molecule as n Q-ste Fnde 20, geometc cleft locaton as n sufnet 36 and castp 37, and a geometc shape descpto temed geometc potental 38. Thee ae also studes that combne esults fom dffeent methods, employng ethe statstcal o machne-leanng technques. Among all such studes, I lst a few examples that ethe use smla esdue popetes o smla machne leanng methods as I used n my eale and cuent study. P-cats 21, uses a k-neaest neghbo method to smooth the jont pobablty lookup table; a study by Guttedge uses a neual netwok and spatal clusteng to pedct the locaton of actve stes 18 ; Petova s wok uses a suppot vecto machne (SVM) to pedct catalytc esdues 39 ; and Youn s wok uses a suppot vecto machne to pedct catalytc esdues n potens 40. All these methods use both sequence consevaton and 3D stuctual nfomaton. Dependng on the popetes that these methods ae based on, the computatonal cost and accuacy vaes. 19

20 Among all these methods, THEMATICS s the most accuate to date. The computatonal cost s acceptable. To analyze a typcal poten usng THEMATICS takes less than an hou on a desktop PC, although actual CPU tmes depend on poten sze. The detals of ths method wll be explaned n a late secton. Although THEMATICS s the most effectve and accuate method among these when used on ts own, t s natual to consde whethe the pedctons can be mpoved by usng addtonal nfomaton. I examne ths usng both geometc and consevaton nfomaton, and fnd that ths s ndeed the case. 2.2 Machne Leanng. Machne leanng s a vey boad subfeld of atfcal ntellgence. It s almost mpossble to suvey the whole aea n ths dssetaton. Hee, I focus on just supevsed leanng, mostly classfcaton. Even ths aea s stll too boad, and I wll befly ntoduce the famewok and some of the most commonly used methods wth the basc pncples Commonly used supevsed leanng methods. The fst method s called atfcal neual netwok (ANN) o just neual netwok (NN) 26, 41. It s based on a computatonal model of a goup of nteconnected nodes (neuons) akn to the nevous system n humans. Each neuon has a cetan numbe of nputs and typcally one output. The nput to a neuon can be ethe the featues of nput data, o the outputs of othe neuons. The output of one neuon can seve as nput to multple neuons. Typcally, thee s a weght assocated wth each nput of each neuon. At pocessng tme (classfyng a quey nstance), each neuon n the netwok takes the nput and computes the weghted sum of the total nput wth the assocated weghts, and geneates ts output by some nonlnea functon f. Thee ae dffeent flavos of the stuctues of ANN, such as feedfowad vesus ecuent netwok. Dung tanng tme, a cost functon s defned to estmate the accuacy of the ANN wth espect to the data, o essentally a measuement of how much eo the cuent ANN makes on the tanng data. The leanng pocess s to fnd the optmal settng of the stuctue and/o weghts of the 20

21 ANN to mnmze the cost functon on the tanng data. ANN n geneal s a vey poweful method, and t has been used n numeous applcatons ncludng n poten actve ste pedctons 18. One dawback s that ANN s a somewhat black-box method, meanng although one may fnd a vey good classfe, the stuctue of the netwok and the weghts assocated wth each nput may not eveal too much useful nfomaton on why t woks. Anothe popula and ntutvely appealng method s the neaest neghbo method, o ts moe geneal fom, the k-neaest-neghbo method (k-nn) 42. The pncpal dea s to classfy the quey nstance based on ts k neaest neghbos among the tanng set, whch epesent the most smla tanng nstances. The success of ths method eles on seveal choces; k and the dstance functon that defnes what smla means ae among the most mpotant ones. The tanng o leanng pocess of ths method s somewhat dffeent fom most othe machne leanng technques. In most cases, nstead of solvng an optmzaton poblem, t uses coss valdaton dectly to select the best k and best dstance functon. Howeve, ths method and the naïve Bayes method, whch wll be dscussed late, ae both susceptble to the pesence of coelated featues. The method s also susceptble to the pesence of nosy and elevant featues. The suppot vecto machne (SVM) 43, 44 s a elatvely newly developed method. A one-sentence descpton of ths method s that t uses the kenel tck to fnd a best lnea sepaato (hypeplane) n kenel space to sepaate nstances whch ae not lnealy sepaable n featue space. Thee ae two majo advantages of SVM. Fst, unlke ANN o some othe classfcaton technques, the lnea sepaato SVM fnds s not only the one that successfully sepaate the two classes (n had magn case) o that make the fewest eos (n soft magn case), but s the one that s the best among all of those sepaatos. The eason that t fnds the best among all possble good sepaatos s t maxmzes the magn, whch measues how fa ths sepaato can move wthout makng moe mstakes on the tanng data, and thee s a goous poof that a classfe gvng the maxmum magn tends to make the fewest eos on the testng data. Anothe advantage s that the kenel tck maps nstances n the ognal featue space to 21

22 nstances n the kenel space and n cetan cases the lnealy-nsepaable nstances n featue space become lnealy-sepaable n kenel space and the kenel tansfom s easy to compute and the explct fom of mappng functon between nstances s not equed to be known. To successfully use ths method, one needs to select the ght kenel. Thee ae commonly used kenels, but to take full advantage of usng and developng kenels s not a tval task. I have appled ths method to poten actve ste pedcton wth success 45. Last, I wll menton boostng, anothe method developed qute ecently whch has acheved a lot of success 46, 47. Boostng s a meta-leanng algothm. Boostng occus n stages, by ncementally addng to the cuent leaned functon. At evey stage, a weak leane (.e., one that has accuacy geate than chance) s taned wth the data. The output of the weak leane s then added to the leaned functon, wth some stength (popotonal to how accuate the weak leane s). Then, the data s e-weghted: examples that the cuent leaned functon gets wong ae "boosted" n mpotance, so that futue weak leanes wll attempt to fx the eos. If evey weak leane s guaanteed to pefom bette than andom guessng, the boostng method can fnd the leaned functon that makes fewe eos on tanng data than any pe-set theshold vey fast. It s a vey poweful method to combne dffeent leanes nto one supe system. Thee s also a well-developed mathematcal theoy showng that boostng can lowe genealzaton eo Pobablty based appoach. A lot of machne leanng wok ovelaps wth statstcs, whee pobablty s used to classfy nstances dectly. The fst dea ntoduced hee s Bayesan nfeence, whch s based on Bayes theoem 26. Bayes theoem may be wtten as: P( A B) = P( B A) P( A) P( B) The pobablty of an event A condtonal on anothe event B s geneally dffeent fom the pobablty of B condtonal on A. Howeve, thee s a defnte elatonshp between the two, and Bayes' theoem s the 22

23 statement of that elatonshp. P(A) and P(B) ae called po pobablty, and P(A B) s called condtonal pobablty of A gven B. Bayes theoem s mpotant to a numbe of applcatons, ncludng seveal dffeent places n the pesent poblem. One way to fomulate ou poblem s to geneate the hypothess H that gves the pobablty that a esdue wth cetan featues s n the actve ste, based on the obseved data (D), o tanng examples,.e., P(H D). Take H hee as a look-up table fo pobablty of postve wth dffeent featue values x. To use such H at quey tme, just go to the enty that has the same eadng as x and ead the coespondng pobablty. Usually the numbe of ways to fll out the look-up table H s nfnte, so how should one choose? One smple answe s to choose the H that gves the lagest P(D H); ths s called the maxmum lkelhood (ML) hypothess. Takng P(H), the po pobabltes of H nto consdeaton, one could pck the H that gves the lagest P(D H)P(H); ths gves the maxmum a posteo (MAP) hypothess. Notce that ML s equvalent to MAP wth flat po, when the po pobablty of all hypotheses unde consdeaton ae same. The POOL method I ntoduce below fo actve ste pedcton s MAP wth all hypotheses satsfyng the monotoncty constants havng a flat po, whle all othe hypotheses have a po pobablty at 0. Both ML and MAP method pck out the most favoable hypothess H out of all possble ones based on the data, and use that to pedct the most pobable class of new quey data, whch s, gven quey data q, the pobablty that q s n class c s detemned by P(C=c q, H ), and the objectve s to fnd the patcula c that gves the lagest P(C=c q, H ). Thee s anothe way to do pedcton at quey tme, namely consde the pedcton fom all possble hypotheses and sum the esult wth the pobablty of each hypothess as weghts: P(C=c q, D) = P(C=c q, H, D)P(H D). Ths s called the Bayes classfe, whch gves an optmal esult, but s often dffcult to compute n pactce Pefomance measue fo classfcaton poblems. In the context of bnay classfcaton tasks, the tems tue postves, tue negatves, false postves and false negatves ae used to descbe the gven classfcaton of an tem (the class label assgned to the tem 23

24 by a classfe) wth the desed coect classfcaton (the class the tem actually belongs to). Ths s llustated by the confuson matx below: 24

25 Pedcted classfcaton Postve Negatve Actual classfcaton Postve Tue Postve (TP) False Negatve (FN) Negatve False Postve (FP) Tue Negatve (TN) Table 2.1. Confuson matx of classfcaton labelng. 25

26 In the confuson matx above, TP, FP, FN and TN ae the numbe of tue postve, false postve, false negatve and tue negatve nstances espectvely. Although all the nfomaton about the classfe s pefomance s ncluded n the confuson matx, people tend to use some othe measuement deved fom the lsted nfomaton to compae pefomance of classfes. We lst some commonly used ones: TP ecall = senstvty = TP + FN specfcty = TN FP + TN pecson = postve TP pedctve value = TP + FP negatve TN pedctve value = TN + FN false postve ate = 1 FP specfcty = TN + FP TP + TN accuacy = TP + TN + FP + FN eo = 1 FP + FN accuacy = TP + TN + FP + FN TP + FP fltaton ato = TP + TN + FP + FN MCC = TP TN FP FN ( TP + FP)( TP + FN)( TN + FP)( TN + FN) Most of the measuements above ae vey staghtfowad and can be teated as mee defntons wth no need fo futhe explanaton, wth the excepton of the fltaton ato and the Matthews coelaton 26

27 coeffcent (MCC). I nvented the tem fltaton ato, to be used n place of pecson and false postve ate n the pesent poblem. One of the advantages of the fltaton ato s that t s the only measuement lsted above that can be detemned wth nfomaton fom pedcted classfcaton, wthout the equement of knowng the actual classfcaton, whch s always unknown n pactce. Anothe eason I nvent and use fltaton ato s that n the pesent poblem, nfomaton avalable n the lteatue about actual postves s ncomplete, even n ou tanng and testng dataset. Thus a cetan facton of the nomnal false postves pobably ae not false. In stuatons lke ous whee one expects tue postves to epesent a faly small popoton of all the nstances, and the measued false postve ate s suspect, usng fltaton ato s moe appopate than some othe measuements, such as pecson. Both of these ssues wll be dscussed n moe detal late n the dssetaton. On the othe hand, MCC s wdely used n machne leanng as a measue of the qualty of bnay classfcatons. It takes nto account both senstvty and selectvty and s geneally egaded as a balanced measue whch can be used even f the classes ae of vey dffeent szes. It etuns a value between -1 and +1. A coeffcent of +1 epesents a pefect pedcton, 0 an aveage andom pedcton and -1 the wost possble pedcton. Whle thee s no pefect way of descbng the confuson matx of tue and false postves and negatves by a sngle numbe, the MCC s geneally egaded as beng one of the best such measues. In the denomnato of MCC, f any of the fou sums s zeo, the denomnato can be abtaly set to one; ths esults n a MCC of zeo, whch can be shown to be the coect lmtng value THEMATICS I am gong to dscuss the THEMATICS method n moe detal because ths s the bass fo most of the nput data n the wok of ths dssetaton The THEMATICS method and ts featues. In the applcaton of THEMATICS, one begns wth the 3D stuctue of the quey poten, solves the Posson-Boltzmann (P-B) equatons usng well-establshed methods 49-52, and then pefoms a Monte 27

28 Calo pocedue to compute the poton occupatons of each onzable amno acd as a functon of the ph. Each such functon s called a ttaton cuve, as shown n fgue 2.1 (a). Fom the theoetcal ttaton cuves computed fom the 3D stuctue of a quey poten, THEMATICS dentfes esdues (amno acds) that exhbt sgnfcant devaton fom Hendeson-Hasselbalch (H-H) behavo, whch I now descbe. A typcal onzable esdue n a poten obeys the H-H equaton, whch may be expessed as a poton occupaton O as a functon of ph as: ph a 1 O ( ph ) = (10 pk + 1) (1) Fo the esdues that fom a caton upon potonaton (Ag, Hs, Lys, and the N-temnus), the mean net chage C on patcula esdue s equal to O, wheeas fo the esdues that fom an anon upon depotonaton (Asp, Cys, Glu, Ty, and the C-temnus), the mean net chage s gven by ( O 1) as: C ( ph ) = O( ph ) catonc (2) C ( ph ) = O( ph) 1 anonc (3) Note that C epesents the aveage net chage on a patcula esdue fo a lage ensemble of poten molecules. Equatons (1) - (3) have the sgmod shape that s typcal of a weak acd o base that obeys the H-H equaton. Thus, as ph nceases, the pedcted aveage chage falls shaply n a ph ange close to the pk a, whch s defned as the ph at whch that esdue s potonated n exactly half of the poten molecules n the ensemble. Undelyng the THEMATICS appoach s the obsevaton that the computed ttaton cuves tend to devate moe fom ths H-H shape fo onzable esdues belongng to actve stes than fo onzable esdues not belongng to such stes. The key step n the applcaton of the THEMATICS appoach s thus ecognzng sgnfcant devaton fom H-H behavo n the shape of these pedcted ttaton cuves. 28

29 When THEMATICS was fst developed, vsual nspecton of the computed cuves was used to dentfy THEMATICS postve esdues. Although smple, t s neffcent, vulneable to bas, and n some cases neffectve, snce some devatons of the cuves ae subtle and not easly ecognzed vsually, as ndcated by fgue 2.1(b). 29

30 (a) (b) Fgue 2.1. Ttaton cuves. (a) A standad HH cuve (black), a typcal petubed cuve (ed), and a typcal unpetubed cuve fom esdues not n actve ste (blue). (b) Ttaton cuves fom actve ste esdues (ed) vesus non-actve-ste esdues (geen) fom a set of 20 potens (Appendx A); only glutamate esdues ae shown. 30

31 Befoe ntoducng the methods fo automaton of the classfcaton of esdues usng THEMATICS, I wll fst pesent the featue extacton pocess. In ode to pefom any machne-leanng o statstcal analyss on ttaton cuves, one needs to fnd featues that ae easy to compute and ae effectve to dstngush postve and negatve nstances. I have defned featues that may be used to measue the devaton of a patcula ttaton cuve fom H-H behavo. In patcula, fou featues extacted fom the ttaton cuves ae most useful n sepaatng THEMATICS postves esdues fom the othes. These fou featues ae based on the fst fou moments of the devatves of the ttaton cuves, as I now descbe befly. A moe detaled descpton can be found n Ko s study 24. Defne the vaable x to be the offset of the ph fom the pk a, as: x = ph pk a, (4) Then equaton (1) fo Hendeson-Hasselbalch ttaton cuves becomes: x 1 O ( x) = (10 + 1). (5) The key obsevaton on whch the moment analyss s based s that, fo any ttaton cuve O(x), whethe of Hendeson-Hasselbalch fom o not, the coespondng devatve f ( ph ) = do / dx = do / d( ph ) (6) s effectvely a pobablty densty functon (gnong those ae cases when the ttaton cuve fals to be a non-deceasng functon of x, n whch case ths devatve functon takes on negatve values) 53. The n th moment of f s defned as n M n = ( ph ) f ( ph ) d( ph ) (7) and the coespondng n th cental moment µ n s 31

32 n µ = ( ph M 1) f ( ph ) d( ph ) (8) n whee these ntegals ae ove all space ( to + ). The featues I use ae based on the fst moment M 1 and the second, thd, and fouth cental moments µ 2, µ 3, and µ 4, espectvely, of the devatves f. Fo a pue H-H ttaton cuve these moments ae M 1 = pk a, µ 2 = 0.620, µ 3 = 0, and µ 4 = (9) It s nteestng to note that, fo an abtay pobablty densty functon, M 1 s ts mean and µ 2 s ts vaance, whle µ 3 and µ 4 ae elated to the skewness and kutoss, espectvely, standad quanttes used n statstcs to measue depatue fom nomalty. When appled to a geneal ttaton cuve of onzable esdues n a poten, the pk a shft s closely elated to how much M 1 dffes fom the fee-soluton pk a. Those esdues that nteact stongly wth othe onzng esdues n such a way that the pedcted ttaton functons O(pH) ae elongated wll have boade fst devatve functons f and thus have geneally hghe values fo µ 2 and especally µ 4. The moment µ 3 measues the asymmety of the functon f and has a nonzeo value fo any esdue that nteacts wth othe onzng goups n such a way that the stength of ths nteacton n the ange ph < pk a s dffeent fom that n the ange ph > pk a. Thus t s clea that the fst moment and the second, thd, and fouth cental moments ae useful measues fo detemnng devaton fom H-H behavo. The methods ntoduced below all use some of the fou featues descbed above, wth some addtonal featues n some specfc methods Statstcal analyss wth THEMATICS. One automated analyss was poposed and studed by Ko 24. Ko ntoduced smple statstcal metcs to automatcally evaluate the degee of petubaton of a ttaton cuve fom H-H behavo. The method s smple, just lookng at two of the above featues, namely µ 3 and µ 4. A statstcal Z-scoe was computed on 32

33 these featues;.e. fo evey cuve analyzed, the devaton of the µ 3 and µ 4 values fom the mean, expessed n unts of the standad devaton, of all cuves fom the same poten wee computed. Any esdue wth a ttaton cuve wth ethe the absolute value of Z-scoe of µ 3 o Z-scoe of µ 4 geate than 1.0 was classfed as a THEMATICS postve esdue and any such esdues wth at least one othe THEMATICS postve esdue located wthn 9Å wee epoted as actve ste canddates. Good esults wee obtaned fo the dentfcaton of actve stes n a set of 44 potens wth Ko s method. Although ths method has excellent ecall of catalytc stes, dentfyng the coect catalytc ste fo 90% of enzymes, the ecall ate s lowe (about 50%) fo the dentfcaton of catalytc esdues. It s desable to mpove the catalytc esdue ecall ate and also to expand the method to nclude pedctons of nononzable esdues. We studed Ko s analyss and modfed t 54, ntoducng a new factonal paamete, α, whch typcally uns between 0.95 and 1.0. In ths method, the mean and the standad devaton of µ 3 and µ 4 ae calculated fom the ttaton cuves fom a poton of the esdues n a poten, n contast to the whole populaton n Ko s method. The poton of the esdues excluded fom the sample ae the esdues wth ttaton cuves wth µ 3 and µ 4 values n the hghest (1-α) facton. Ths modfcaton dd yeld bette ecall of annotated catalytc esdues than Ko s analyss, but the optmal α s dffeent fo dffeent potens and was fnally fxed at 0.99, the value that yelds the best oveall pefomance when aveaged ove the set of annotated potens. The pupose of ths α s to exclude esdues wth ttaton cuves wth the most exteme µ 3 and µ 4 values fom nfluencng the mean and standad devaton of the populaton too much and thus yeldng bette statstcs and slghtly moe elable pedctons. Meanwhle, Yang developed anothe ule-based statstcal analyss dentfyng THEMATICS postve esdues 55. In addton to the fou featues descbed eale, he method uses an addtonal featue, a value called the buffe ange R, whch measues the wdth of the ph ange ove whch the esdue s 33

34 patally onzed. Also, outles ae selected wthn each amno acd type, when possble, nstead of the ente set of onzable esdues. The pefomance of ths method s a lttle bette than Ko s method. The thee statstcs-based analyses lsted above all employ handcafted cutoff values to dffeentate the postve fom the negatve nstances. The study descbed n ths dssetaton begns wth the hypothess that a machne leanng method can utlze smla sets of featues, defne a theshold n a systematc way, and acheve bette pefomance n pactce Challenges of the ste pedcton poblem usng THEMATICS data. One of the challenges of the task to classfy THEMATICS esults ases fom specal chaactestcs of the tanng data set. Fst, the vast majoty of the esdues n the tanng data ae negatve examples. Lteatue-confmed actve ste esdues typcally consst of less than 3% of total esdues. At the same tme, the negatve examples, whch compse most of the data n the tanng set, shae some common popety, whle the elatvely few postve examples have abnomal behavos n a vaed way. Ths s one of the key easons that a smple outle detecton pocess lke Ko s analyss s qute successful n solvng ths poblem. But t s not clea how ths method can ncopoate addtonal non-thematics featues to possbly mpove the actve-ste pedcton. Secondly, the natue of ths poblem lmts the qualty of the tanng data. The ultmate goal of the poject s to pedct the actve stes of potens usng THEMATICS data, howeve, the absolute cteon to label a esdue n a poten as actve s that someone has done the expement n the lab and publshed the esult suppotng the clam. Thee ae databases collectng such annotatons and by no means ae these annotatons complete. Thee s anothe subtlety n that although THEMATICS postve esdues have been shown to be a vey elable ndcato of actve stes, THEMATICS sometme pedcts addtonal neaby esdues that ae not annotated as actve, ncludng second shell esdues. Thee s some evdence to suppot the hypothess that these second shell esdues may be mpotant. Altenatvely, they 34

35 may be affected by the specal electostatc feld ceated by the neaby actve stes esdues. The THEMATICS postve esdues n the second case may not be shown expementally to be actve ste esdues. Because esdue actvty s often measued n a knetcs expement and a numbe of factos can sometmes cause lage eos n these expements, the tanng set nevtably contans some postve nstances that ae msclassfed n the fst place, o some nstances that cannot be coectly dstngushed by the model. In patcula, thee ae most pobably nstances of tue postves that ae mpopely annotated as negatves, smply because no expements have been ted on the vast majoty of esdues. In ode to ovecome these two obstacles, n my eale wok of neual netwok machne leanng and SVM method, I cleaned the tanng set. Instead of usng just lteatue confmed postve nstances, I also labeled appaent THEMATICS postves, nea a known actve ste although not expementally dentfed as actve ste esdues. I also emoved some of the solated THEMATICS postve nstances fom the tanng set. Although ths data cleanng dd mpove the esults, t s ad hoc and lacks a systematc justfcaton. Fo any machne leanng poblem, f thee s some po belef, o bas, whch tuns out to be tue, applyng t should always help the pefomance. Afte studyng THEMATICS and ts applcaton n poten actve ste pedcton, t would be fa to conclude the followng THEMATICS pncples as po belef: THEMATICS Pncple 1: The moe petubed the ttaton cuve s (elatve to othe ttaton cuves n the same poten), the geate the pobablty that esdue s n the actve ste. THEMATICS Pncple 2: The moe petubed the neghbong ttaton cuves ae (elatve to othe ttaton cuves n the same poten), the geate the pobablty that esdue s n the actve ste. The ad hoc method used befoe mplctly cleaned the data based on the THEMATICS pncples. In addton to THEMATICS featues, to whch we can apply THEMATICS pncples, thee ae some non-thematics featues havng ethe postve o negatve coelaton to the pobablty that a esdue 35

36 s located n the actve ste. Those featues may not be a elable ndcato by themselves, but combned wth THEMATICS methods, they may mpove the oveall pedcton accuacy. Whle thee may be ways to enfoce nductve bas n classfes lke neual netwoks and SVMs, I beleve the most staghtfowad appoach s nstead to ty to estmate P(class attbutes) nonpaametcally, whle enfocng these pncples as constants, as explaned n Chaptes 4 and 5. 36

37 Chapte 3 Applyng SVM to THEMATICS 37

38 3.1 Intoducton. As dscussed n eale chaptes, THEMATICS s a technque fo the pedcton of local nteacton stes n a poten fom ts thee-dmensonal stuctue alone. Vaous appoaches have been taken to automate and standadze the pocess wth vaous senstvty and specfcty. Hee, I wll pesent my fst wok on ths poject, usng a suppot vecto machne, wth fou extacted featues fom THEMATICS alone to pedct the actve stes of enzymes. In ths chapte t s shown how suppot vecto machnes (SVM s) may be combned wth THEMATICS to acheve a substantally hghe ecall ate fo catalytc esdues wth only a small sacfce n specfcty when compaed to Ko s statstcal analyss of THEMATICS 24. It s agued that clustes pedcted by THEMATICS-SVM ae small, local netwoks of onzable esdues wth stong couplng between the potonaton events; these chaactestcs appea to be vey common, pehaps nealy unvesal, n enzyme actve stes. Pefomance of THEMATICS-SVM n actve ste pedcton s compaed wth othe 3Dstuctue-based methods, ncludng THEMATICS combned wth pevous analyses and shown to etun equal o bette ecall wth geneally hghe specfcty and lowe fltaton ato. The hgh specfcty and low fltaton ato tanslate to bette qualty, moe localzed, pedctons. Ths wok bulds on the po wok of Ko usng vaants of some of the same featues that wee found to be successful thee, plus some addtonal featues. Results of ou method ae pesented fo 60 dffeent potens. In ths chapte, I also pesent a way to extend the method s capabltes to the pedcton of nononzable esdues. 3.2 THEMATICS cuve featues used n the SVM To use an SVM to classfy esdues as ethe lkely o not lkely to be n the actve ste, I epesent the computed ttaton cuves as ponts n a fou-dmensonal space. These fou featues ae based on the fst fou moments of these cuves, as descbed n secton

39 The fou featues, namely M 1, µ 2, µ 3 and µ 4 ae conceptually smla to those descbed n Ko s analyss 24, except I slghtly modfed the nomalzaton pocess to pevent both the sample mean and sample standad devaton fom beng too stongly nfluenced by exteme values. A moe obust estmato s used to dstngush typcal fom atypcal ttaton cuves wthn a sngle poten than the standad Z-scoe. In my nomalzaton, each of the fou moments was nomalzed to ts coespondng obust Z-scoe Z, whch s defned as ts devaton fom the medan, dvded by the nomalzed ntequatle dstance, the dffeence between the 75 th pecentle value and 25 th pecentle values, fo the coespondng featue acoss all onzable esdues n that poten. A nomalzaton facto of comes fom the nomal dstbuton wth a mean of zeo and a standad devaton of one. Thus fo a gven featue Ф, I defne Z as: { ( Φ) } Φ-MEDIAN Z( Φ)= PERCENTILE( Φ,0.75) PERCENTILE( Φ, 0.25) (10) whee the medan and coespondng pecentles ae based on the value of that featue fo all onzable esdues n a gven poten. Thus ths method acheves the same effect as We s method 54 wthout ntoducng an exta paamete to be fne-tuned. Fo the even-numbeed moments, Z n, the obust Z-scoe fo the n th cental moment s defned as: Z n = Z (µ n ) (n even) (11) The only even-numbeed moments used n the pesent study ae the second and fouth, so the coespondng obust Z-scoes Z ae Z 2 and Z 4. Lkewse the only odd-numbeed moments ae the fst and thd. The coespondng obust Z-scoes ae the devatons of the absolute values fom the medan. In patcula, we defne Z 3 as Z 3 = Z ( µ 3 ) (12) 39

40 The populaton ove whch the medan and pecentles ae computed ncludes esdues of dffeent types wth dffeent fee pk a s. In ode to compae the computed fst moments acoss all esdue types, the offset fst moment fo a gven esdue s defned as: M 1 offset = M 1 - pk a (fee) (13) whee pk a (fee) s the pk a fo that esdue n fee soluton. Note that by equaton (9), a H-H esdue has M 1 offset = 0. Ths offset may be compaed acoss all esdues n the poten. Thus Z 1 s defned as: Z 1 = Z ( M 1 offset ). (14) Note that only the fst moment eques ths modfcaton to make all esdue types n the poten compaable snce the H-H equaton has only one fee paamete, the esdue type-dependent tanslaton paamete pk a. To summaze, the esult of all these computatons s to ceate, fo each onzable esdue n any gven poten, a 4-tuple of descptos (Z 1, Z 2, Z 3, Z 4 ) of the theoetcal ttaton cuve. Z 2, Z 3, and Z 4 descbe the shape of the cuve and Z 1 measues ts dsplacement along the hozontal axs. 3.3 Tanng. A set of 20 potens was used as the tanng set. The poten names, the E.C. numbes and the PDB ID fo each of the 20 potens n the tanng set ae lsted n Appendx A. The labelng of the ttaton cuves fo tanng puposes was pefomed as follows: All esdues lsted n CatRes/CSA as actve wee labeled postve. Also labeled postve wee onzable esdues located nea such annotated actve esdues wth ttaton cuves that dsplayed petubed ttaton cuves on vsual nspecton. All othe esdues wee labeled negatve, wth the excepton of a few esdues wth vsually petubed ttaton cuves and wth no lteatue annotaton that ae not nea any othe petubed esdues; they wee emoved fom the tanng data set entely. (Note that such esdues wth petubed ttaton 40

41 cuves that ae not n spatal poxmty wth othe petubed esdues ae not consdeed pedctve n THEMATICS.) Fom 1575 onzable esdues n the 20 poten tanng set, I emove 46 solated esdues wth petubed ttaton cuves. Ths leaves a tanng set of 1529 onzable esdues, among whch 140 ae labeled as postve tanng examples. Fo each onzable esdue n the tanng set, the fou moment-based featues and the coespondng labels wee fed nto the SVM usng SVMLght 56. Fo both tanng and classfcaton, the quadatc kenel K(x,z) = (1+<x,z>) 2 was used. The elatve cost of msclassfcaton of postve and negatve tanng examples was set such that false negatves wee penalzed 10 tmes as much as false postves. Ths was done because thee ae many moe negatve examples than postve examples n the tanng set, because of the am to ncease the esdue ecall ate, and because I have much moe confdence n the labelng of the false negatves than the false postves (see sectons and 3.4). In addton, a lnea kenel and seveal othe choces of paametes wee ted, but these esulted n ethe smla o slghtly moe tanng eos. 3.4 Results Typcal ctea used to measue classfe pefomance ae ecall (also called senstvty), the numbe of coectly pedcted postves dvded by the numbe of tue postves, and pecson (elated to specfcty), and the numbe of coectly pedcted postves dvded by the total numbe of pedcted postves. Ideally, both measues ae 100%, whch means all and only the tue postves ae dentfed as such by the classfe. In the pesent case, one can be moe confdent of the tue postve data, because fo evey labeled actve esdue thee s expemental evdence suppotng that labelng. On the othe hand, tue negatve data ae not as elable because the expements ae ncomplete; some mpotant esdues may not have been tested expementally. Futhemoe some of the expemental lteatue has not been ncluded n the CatRes/CSA database, because of the dffculty of exhaustve lteatue seachng. A bette ndcato of the selectvty of the method fo pesent puposes s the fltaton ato, the facton of total 41

42 esdues that ae epoted as postve. Now the goal of the system s to acheve a hgh ecall wth a low fltaton ato. A set of 64 test potens was selected andomly fom the CatRes database 57. Thee s no ovelap between ths test set of 64 potens and the set of 20 potens used to tan the SVM. The taned SVM was appled on the test set to measue the oveall accuacy of the method, assumng that the CatRes annotatons defne the tue postve esdues. Results ae summazed hee, whle a detaled lst of all potens studed wth all pedcted esdues and clustes can be found n the Appendx B Success n ste pedcton. Fst I examne the degee of matchng between ou pedctons and the CatRes lst fo each poten. Oveall, the SVM dentfed an aveage of 2.7 clustes pe subunt. Based on the ovelap of the pedcted actve ste and the CatRes lsted set, the pedcton fo a poten s assgned to one of thee categoes. If 50% o moe CatRes lsted actve esdues wee found by the system, we consde ths a coect ste pedcton. If some, but fewe than half, of the CatRes lsted actve esdues wee found by ou system, we consde t patally coect. If none of the CatRes lsted actve esdues wee found by ou system, we consde the ste pedcton ncoect. Ths type of categozaton has been used pevously 18. Measung ths degee of ovelap of pedcted clustes wth just the onzable CatRes lsted actve-ste esdues, the pecentages of potens fo whch the pedctons ae coect, patally coect, and ncoect ae 86%, 5% and 9% espectvely, as shown n fgue 2(a) Success n catalytc esdue pedcton. Out of the 9303 onzable esdues fom the 64 potens, 1338 wee pedcted as actve ste canddates by the SVM, fomng 244 clustes. Thee ae 233 onzable esdues labeled as actve ste esdues n the CatRes database and 182 of them wee found by ou SVM, coespondng to a global esdue ecall of 78%. The aveage esdue ecall ate, aveaged ove all 64 potens, s 76%. 42

43 Fo these 64 potens, fo fltaton ato defned as esdues pedcted ove a total of esdues ncludng both onzable and non-onzables, the aveage s only 3.9%. Ths ato s less than 8% fo each of the 64 potens. The aveage pecson, o facton of pedcted esdues that ae known tue postves, s 20% ove the 64 poten set, usng only the CatRes/CSA annotatons to defne the tue postves Incopoaton of non-onzable esdues. Snce not all actve ste esdues ae onzable, t s also of nteest to see how well the SVM-epoted esdues seve as pedctos of actvty n the spatal vcnty. Theefoe I also defne a THEMATICS postve egon to be the spatal egon wthn 6Å of any esdue that belongs to a THEMATICS postve cluste. Ths may allow the method to fnd some catalytcally mpotant esdues that do not have a petubed ttaton cuve (ncludng non-onzable esdues). The total numbe of esdues found by ths cteon acoss the 64 test potens s 4795, out of total esdues. Among 366 esdues that ae labeled as actve ste esdues n CatRes, 263 wee found by the system, coespondng to a global ecall of 72%, whle the aveage ecall pe poten s 81%. The aveage pecson, o facton of pedcted esdues that ae known tue postves, s 21% ove the 64 poten set, usng only the CatRes/CSA annotatons to defne the tue postves. Table 3.1 compaes the pefomance of the staght SVM pedctons vesus the SVM+Regon pedctons. Whle the expanson to nclude the neghbohood suoundng the pedcted esdue leads to a somewhat hghe ecall ate, thee s consdeable sacfce n the pecson and ncease n the fltaton ato. 43

Multistage Median Ranked Set Sampling for Estimating the Population Median

Multistage Median Ranked Set Sampling for Estimating the Population Median Jounal of Mathematcs and Statstcs 3 (: 58-64 007 ISSN 549-3644 007 Scence Publcatons Multstage Medan Ranked Set Samplng fo Estmatng the Populaton Medan Abdul Azz Jeman Ame Al-Oma and Kamaulzaman Ibahm

More information

P 365. r r r )...(1 365

P 365. r r r )...(1 365 SCIENCE WORLD JOURNAL VOL (NO4) 008 www.scecncewoldounal.og ISSN 597-64 SHORT COMMUNICATION ANALYSING THE APPROXIMATION MODEL TO BIRTHDAY PROBLEM *CHOJI, D.N. & DEME, A.C. Depatment of Mathematcs Unvesty

More information

Machine Learning 4771

Machine Learning 4771 Machne Leanng 4771 Instucto: Tony Jebaa Topc 6 Revew: Suppot Vecto Machnes Pmal & Dual Soluton Non-sepaable SVMs Kenels SVM Demo Revew: SVM Suppot vecto machnes ae (n the smplest case) lnea classfes that

More information

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Physics 2A Chapter 11 - Universal Gravitation Fall 2017 Physcs A Chapte - Unvesal Gavtaton Fall 07 hese notes ae ve pages. A quck summay: he text boxes n the notes contan the esults that wll compse the toolbox o Chapte. hee ae thee sectons: the law o gavtaton,

More information

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1 Machne Leanng -7/5 7/5-78, 78, Spng 8 Spectal Clusteng Ec Xng Lectue 3, pl 4, 8 Readng: Ec Xng Data Clusteng wo dffeent ctea Compactness, e.g., k-means, mxtue models Connectvty, e.g., spectal clusteng

More information

Set of square-integrable function 2 L : function space F

Set of square-integrable function 2 L : function space F Set of squae-ntegable functon L : functon space F Motvaton: In ou pevous dscussons we have seen that fo fee patcles wave equatons (Helmholt o Schödnge) can be expessed n tems of egenvalue equatons. H E,

More information

8 Baire Category Theorem and Uniform Boundedness

8 Baire Category Theorem and Uniform Boundedness 8 Bae Categoy Theoem and Unfom Boundedness Pncple 8.1 Bae s Categoy Theoem Valdty of many esults n analyss depends on the completeness popety. Ths popety addesses the nadequacy of the system of atonal

More information

Learning the structure of Bayesian belief networks

Learning the structure of Bayesian belief networks Lectue 17 Leanng the stuctue of Bayesan belef netwoks Mlos Hauskecht mlos@cs.ptt.edu 5329 Sennott Squae Leanng of BBN Leanng. Leanng of paametes of condtonal pobabltes Leanng of the netwok stuctue Vaables:

More information

Chapter 23: Electric Potential

Chapter 23: Electric Potential Chapte 23: Electc Potental Electc Potental Enegy It tuns out (won t show ths) that the tostatc foce, qq 1 2 F ˆ = k, s consevatve. 2 Recall, fo any consevatve foce, t s always possble to wte the wok done

More information

UNIT10 PLANE OF REGRESSION

UNIT10 PLANE OF REGRESSION UIT0 PLAE OF REGRESSIO Plane of Regesson Stuctue 0. Intoducton Ojectves 0. Yule s otaton 0. Plane of Regesson fo thee Vaales 0.4 Popetes of Resduals 0.5 Vaance of the Resduals 0.6 Summay 0.7 Solutons /

More information

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions A Bef Gude to Recognzng and Copng Wth Falues of the Classcal Regesson Assumptons Model: Y 1 k X 1 X fxed n epeated samples IID 0, I. Specfcaton Poblems A. Unnecessay explanatoy vaables 1. OLS s no longe

More information

A. Thicknesses and Densities

A. Thicknesses and Densities 10 Lab0 The Eath s Shells A. Thcknesses and Denstes Any theoy of the nteo of the Eath must be consstent wth the fact that ts aggegate densty s 5.5 g/cm (ecall we calculated ths densty last tme). In othe

More information

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation By Rudy A. Gdeon The Unvesty of Montana The Geatest Devaton Coelaton Coeffcent and ts Geometcal Intepetaton The Geatest Devaton Coelaton Coeffcent (GDCC) was ntoduced by Gdeon and Hollste (987). The GDCC

More information

Energy in Closed Systems

Energy in Closed Systems Enegy n Closed Systems Anamta Palt palt.anamta@gmal.com Abstact The wtng ndcates a beakdown of the classcal laws. We consde consevaton of enegy wth a many body system n elaton to the nvese squae law and

More information

3. A Review of Some Existing AW (BT, CT) Algorithms

3. A Review of Some Existing AW (BT, CT) Algorithms 3. A Revew of Some Exstng AW (BT, CT) Algothms In ths secton, some typcal ant-wndp algothms wll be descbed. As the soltons fo bmpless and condtoned tansfe ae smla to those fo ant-wndp, the pesented algothms

More information

Tian Zheng Department of Statistics Columbia University

Tian Zheng Department of Statistics Columbia University Haplotype Tansmsson Assocaton (HTA) An "Impotance" Measue fo Selectng Genetc Makes Tan Zheng Depatment of Statstcs Columba Unvesty Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at

More information

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences Geneatng Functons, Weghted and Non-Weghted Sums fo Powes of Second-Ode Recuence Sequences Pantelmon Stăncă Aubun Unvesty Montgomey, Depatment of Mathematcs Montgomey, AL 3614-403, USA e-mal: stanca@studel.aum.edu

More information

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering Themodynamcs of solds 4. Statstcal themodynamcs and the 3 d law Kwangheon Pak Kyung Hee Unvesty Depatment of Nuclea Engneeng 4.1. Intoducton to statstcal themodynamcs Classcal themodynamcs Statstcal themodynamcs

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Coespondence Analyss & Related Methods Ineta contbutons n weghted PCA PCA s a method of data vsualzaton whch epesents the tue postons of ponts n a map whch comes closest to all the ponts, closest n sense

More information

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law Physcs 11b Lectue # Electc Feld Electc Flux Gauss s Law What We Dd Last Tme Electc chage = How object esponds to electc foce Comes n postve and negatve flavos Conseved Electc foce Coulomb s Law F Same

More information

Chapter Fifiteen. Surfaces Revisited

Chapter Fifiteen. Surfaces Revisited Chapte Ffteen ufaces Revsted 15.1 Vecto Descpton of ufaces We look now at the vey specal case of functons : D R 3, whee D R s a nce subset of the plane. We suppose s a nce functon. As the pont ( s, t)

More information

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints. Mathematcal Foundatons -1- Constaned Optmzaton Constaned Optmzaton Ma{ f ( ) X} whee X {, h ( ), 1,, m} Necessay condtons fo to be a soluton to ths mamzaton poblem Mathematcally, f ag Ma{ f ( ) X}, then

More information

Exact Simplification of Support Vector Solutions

Exact Simplification of Support Vector Solutions Jounal of Machne Leanng Reseach 2 (200) 293-297 Submtted 3/0; Publshed 2/0 Exact Smplfcaton of Suppot Vecto Solutons Tom Downs TD@ITEE.UQ.EDU.AU School of Infomaton Technology and Electcal Engneeng Unvesty

More information

Dirichlet Mixture Priors: Inference and Adjustment

Dirichlet Mixture Priors: Inference and Adjustment Dchlet Mxtue Pos: Infeence and Adustment Xugang Ye (Wokng wth Stephen Altschul and Y Kuo Yu) Natonal Cante fo Botechnology Infomaton Motvaton Real-wold obects Independent obsevatons Categocal data () (2)

More information

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis Optmzaton Meods: Lnea Pogammng- Revsed Smple Meod Module Lectue Notes Revsed Smple Meod, Dualty and Senstvty analyss Intoducton In e pevous class, e smple meod was dscussed whee e smple tableau at each

More information

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle 1 PHYS 705: Classcal Mechancs Devaton of Lagange Equatons fom D Alembet s Pncple 2 D Alembet s Pncple Followng a smla agument fo the vtual dsplacement to be consstent wth constants,.e, (no vtual wok fo

More information

24-2: Electric Potential Energy. 24-1: What is physics

24-2: Electric Potential Energy. 24-1: What is physics D. Iyad SAADEDDIN Chapte 4: Electc Potental Electc potental Enegy and Electc potental Calculatng the E-potental fom E-feld fo dffeent chage dstbutons Calculatng the E-feld fom E-potental Potental of a

More information

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M Dola Bagayoko (0) Integal Vecto Opeatons and elated Theoems Applcatons n Mechancs and E&M Ι Basc Defnton Please efe to you calculus evewed below. Ι, ΙΙ, andιιι notes and textbooks fo detals on the concepts

More information

LASER ABLATION ICP-MS: DATA REDUCTION

LASER ABLATION ICP-MS: DATA REDUCTION Lee, C-T A Lase Ablaton Data educton 2006 LASE ABLATON CP-MS: DATA EDUCTON Cn-Ty A. Lee 24 Septembe 2006 Analyss and calculaton of concentatons Lase ablaton analyses ae done n tme-esolved mode. A ~30 s

More information

Rigid Bodies: Equivalent Systems of Forces

Rigid Bodies: Equivalent Systems of Forces Engneeng Statcs, ENGR 2301 Chapte 3 Rgd Bodes: Equvalent Sstems of oces Intoducton Teatment of a bod as a sngle patcle s not alwas possble. In geneal, the se of the bod and the specfc ponts of applcaton

More information

Khintchine-Type Inequalities and Their Applications in Optimization

Khintchine-Type Inequalities and Their Applications in Optimization Khntchne-Type Inequaltes and The Applcatons n Optmzaton Anthony Man-Cho So Depatment of Systems Engneeng & Engneeng Management The Chnese Unvesty of Hong Kong ISDS-Kolloquum Unvestaet Wen 29 June 2009

More information

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15, Event Shape Update A. Eveett A. Savn T. Doyle S. Hanlon I. Skllcon Event Shapes, A. Eveett, U. Wsconsn ZEUS Meetng, Octobe 15, 2003-1 Outlne Pogess of Event Shapes n DIS Smla to publshed pape: Powe Coecton

More information

Part V: Velocity and Acceleration Analysis of Mechanisms

Part V: Velocity and Acceleration Analysis of Mechanisms Pat V: Velocty an Acceleaton Analyss of Mechansms Ths secton wll evew the most common an cuently pactce methos fo completng the knematcs analyss of mechansms; escbng moton though velocty an acceleaton.

More information

Scalars and Vectors Scalar

Scalars and Vectors Scalar Scalas and ectos Scala A phscal quantt that s completel chaacteed b a eal numbe (o b ts numecal value) s called a scala. In othe wods a scala possesses onl a magntude. Mass denst volume tempeatue tme eneg

More information

An Approach to Inverse Fuzzy Arithmetic

An Approach to Inverse Fuzzy Arithmetic An Appoach to Invese Fuzzy Athmetc Mchael Hanss Insttute A of Mechancs, Unvesty of Stuttgat Stuttgat, Gemany mhanss@mechaun-stuttgatde Abstact A novel appoach of nvese fuzzy athmetc s ntoduced to successfully

More information

4 Recursive Linear Predictor

4 Recursive Linear Predictor 4 Recusve Lnea Pedcto The man objectve of ths chapte s to desgn a lnea pedcto wthout havng a po knowledge about the coelaton popetes of the nput sgnal. In the conventonal lnea pedcto the known coelaton

More information

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c nd Intenatonal Confeence on Electcal Compute Engneeng and Electoncs (ICECEE 15) Dstnct 8-QAM+ Pefect Aays Fanxn Zeng 1 a Zhenyu Zhang 1 b Lnje Qan 1 c 1 Chongqng Key Laboatoy of Emegency Communcaton Chongqng

More information

Amplifier Constant Gain and Noise

Amplifier Constant Gain and Noise Amplfe Constant Gan and ose by Manfed Thumm and Wene Wesbeck Foschungszentum Kalsuhe n de Helmholtz - Gemenschaft Unvestät Kalsuhe (TH) Reseach Unvesty founded 85 Ccles of Constant Gan (I) If s taken to

More information

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. "Flux": = da i. "Force": = -Â g a ik k = X i. Â J i X i (7.

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. Flux: = da i. Force: = -Â g a ik k = X i. Â J i X i (7. Themodynamcs and Knetcs of Solds 71 V. Pncples of Ievesble Themodynamcs 5. Onsage s Teatment s = S - S 0 = s( a 1, a 2,...) a n = A g - A n (7.6) Equlbum themodynamcs detemnes the paametes of an equlbum

More information

Remember: When an object falls due to gravity its potential energy decreases.

Remember: When an object falls due to gravity its potential energy decreases. Chapte 5: lectc Potental As mentoned seveal tmes dung the uate Newton s law o gavty and Coulomb s law ae dentcal n the mathematcal om. So, most thngs that ae tue o gavty ae also tue o electostatcs! Hee

More information

Stellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory:

Stellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory: Stella Astophyscs Ovevew of last lectue: We connected the mean molecula weght to the mass factons X, Y and Z: 1 1 1 = X + Y + μ 1 4 n 1 (1 + 1) = X μ 1 1 A n Z (1 + ) + Y + 4 1+ z A Z We ntoduced the pessue

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 151 Lectue 18 Hamltonan Equatons of Moton (Chapte 8) What s Ahead We ae statng Hamltonan fomalsm Hamltonan equaton Today and 11/6 Canoncal tansfomaton 1/3, 1/5, 1/10 Close lnk to non-elatvstc

More information

A Novel Ordinal Regression Method with Minimum Class Variance Support Vector Machine

A Novel Ordinal Regression Method with Minimum Class Variance Support Vector Machine Intenatonal Confeence on Mateals Engneeng and Infomaton echnology Applcatons (MEIA 05) A ovel Odnal Regesson Method wth Mnmum Class Vaance Suppot Vecto Machne Jnong Hu,, a, Xaomng Wang and Zengx Huang

More information

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS ultscence - XXX. mcocd Intenatonal ultdscplnay Scentfc Confeence Unvesty of skolc Hungay - pl 06 ISBN 978-963-358-3- COPLEENTRY ENERGY ETHOD FOR CURVED COPOSITE BES Ákos József Lengyel István Ecsed ssstant

More information

Experimental study on parameter choices in norm-r support vector regression machines with noisy input

Experimental study on parameter choices in norm-r support vector regression machines with noisy input Soft Comput 006) 0: 9 3 DOI 0.007/s00500-005-0474-z ORIGINAL PAPER S. Wang J. Zhu F. L. Chung Hu Dewen Expemental study on paamete choces n nom- suppot vecto egesson machnes wth nosy nput Publshed onlne:

More information

19 The Born-Oppenheimer Approximation

19 The Born-Oppenheimer Approximation 9 The Bon-Oppenheme Appoxmaton The full nonelatvstc Hamltonan fo a molecule s gven by (n a.u.) Ĥ = A M A A A, Z A + A + >j j (883) Lets ewte the Hamltonan to emphasze the goal as Ĥ = + A A A, >j j M A

More information

CS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks

CS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks C649 enso etwoks IP Tack Lectue 3: Taget/ouce Localaton n enso etwoks I-Jeng Wang http://hng.cs.jhu.edu/wsn06/ png 006 C 649 Taget/ouce Localaton n Weless enso etwoks Basc Poblem tatement: Collaboatve

More information

Multipole Radiation. March 17, 2014

Multipole Radiation. March 17, 2014 Multpole Radaton Mach 7, 04 Zones We wll see that the poblem of hamonc adaton dvdes nto thee appoxmate egons, dependng on the elatve magntudes of the dstance of the obsevaton pont,, and the wavelength,

More information

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS Intenatonal Jounal of Mathematcal Engneeng Scence ISSN : 22776982 Volume Issue 4 (Apl 202) http://www.mes.com/ https://stes.google.com/ste/mesounal/ APPLICATIONS OF SEMIGENERALIZED CLOSED SETS G.SHANMUGAM,

More information

Robust Feature Induction for Support Vector Machines

Robust Feature Induction for Support Vector Machines Robust Featue Inducton fo Suppot Vecto Machnes Rong Jn Depatment of Compute Scence and Engneeng, Mchgan State Unvesty, East Lansng, MI4884 ROGJI@CSE.MSU.EDU Huan Lu Depatment of Compute Scence and Engneeng,

More information

Contact, information, consultations

Contact, information, consultations ontact, nfomaton, consultatons hemsty A Bldg; oom 07 phone: 058-347-769 cellula: 664 66 97 E-mal: wojtek_c@pg.gda.pl Offce hous: Fday, 9-0 a.m. A quote of the week (o camel of the week): hee s no expedence

More information

N = N t ; t 0. N is the number of claims paid by the

N = N t ; t 0. N is the number of claims paid by the Iulan MICEA, Ph Mhaela COVIG, Ph Canddate epatment of Mathematcs The Buchaest Academy of Economc Studes an CECHIN-CISTA Uncedt Tac Bank, Lugoj SOME APPOXIMATIONS USE IN THE ISK POCESS OF INSUANCE COMPANY

More information

Vibration Input Identification using Dynamic Strain Measurement

Vibration Input Identification using Dynamic Strain Measurement Vbaton Input Identfcaton usng Dynamc Stan Measuement Takum ITOFUJI 1 ;TakuyaYOSHIMURA ; 1, Tokyo Metopoltan Unvesty, Japan ABSTRACT Tansfe Path Analyss (TPA) has been conducted n ode to mpove the nose

More information

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems Dept. of Math. Unvesty of Oslo Statstcal Reseach Repot No 3 ISSN 0806 3842 June 2010 Bayesan Assessment of Avalabltes and Unavalabltes of Multstate Monotone Systems Bent Natvg Jøund Gåsemy Tond Retan June

More information

9/12/2013. Microelectronics Circuit Analysis and Design. Modes of Operation. Cross Section of Integrated Circuit npn Transistor

9/12/2013. Microelectronics Circuit Analysis and Design. Modes of Operation. Cross Section of Integrated Circuit npn Transistor Mcoelectoncs Ccut Analyss and Desgn Donald A. Neamen Chapte 5 The pola Juncton Tanssto In ths chapte, we wll: Dscuss the physcal stuctue and opeaton of the bpola juncton tanssto. Undestand the dc analyss

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite PHYS 015 -- Week 5 Readng Jounals today fom tables WebAssgn due Wed nte Fo exclusve use n PHYS 015. Not fo e-dstbuton. Some mateals Copyght Unvesty of Coloado, Cengage,, Peason J. Maps. Fundamental Tools

More information

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline. In Homewok, you ae (supposedly) Choosing a data set 2 Extacting a test set of size > 3 3 Building a tee on the taining set 4 Testing on the test set 5 Repoting the accuacy (Adapted fom Ethem Alpaydin and

More information

THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS

THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS The 4th Intenatonal Wokshop on Atmosphec Icng of Stuctues, Chongqng, Chna, May 8 - May 3, 20 THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS Sun Muxa, Da Dong*, Hao Yanpeng, Huang

More information

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS #A39 INTEGERS 9 (009), 497-513 GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS Mohaad Faokh D. G. Depatent of Matheatcs, Fedows Unvesty of Mashhad, Mashhad,

More information

Test 1 phy What mass of a material with density ρ is required to make a hollow spherical shell having inner radius r i and outer radius r o?

Test 1 phy What mass of a material with density ρ is required to make a hollow spherical shell having inner radius r i and outer radius r o? Test 1 phy 0 1. a) What s the pupose of measuement? b) Wte all fou condtons, whch must be satsfed by a scala poduct. (Use dffeent symbols to dstngush opeatons on ectos fom opeatons on numbes.) c) What

More information

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4 CSJM Unvesty Class: B.Sc.-II Sub:Physcs Pape-II Ttle: Electomagnetcs Unt-: Electostatcs Lectue: to 4 Electostatcs: It deals the study of behavo of statc o statonay Chages. Electc Chage: It s popety by

More information

4.4 Continuum Thermomechanics

4.4 Continuum Thermomechanics 4.4 Contnuum Themomechancs The classcal themodynamcs s now extended to the themomechancs of a contnuum. The state aables ae allowed to ay thoughout a mateal and pocesses ae allowed to be eesble and moe

More information

Pattern Analyses (EOF Analysis) Introduction Definition of EOFs Estimation of EOFs Inference Rotated EOFs

Pattern Analyses (EOF Analysis) Introduction Definition of EOFs Estimation of EOFs Inference Rotated EOFs Patten Analyses (EOF Analyss) Intoducton Defnton of EOFs Estmaton of EOFs Infeence Rotated EOFs . Patten Analyses Intoducton: What s t about? Patten analyses ae technques used to dentfy pattens of the

More information

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time Intenatonal Jounal of ompute Applcatons (5 ) Volume 44 No, Apl Optmal System fo Wam Standby omponents n the esence of Standby Swtchng Falues, Two Types of Falues and Geneal Repa Tme Mohamed Salah EL-Shebeny

More information

A. P. Sakis Meliopoulos Power System Modeling, Analysis and Control. Chapter 7 3 Operating State Estimation 3

A. P. Sakis Meliopoulos Power System Modeling, Analysis and Control. Chapter 7 3 Operating State Estimation 3 DRAF and INCOMPLEE able of Contents fom A. P. Saks Melopoulos Powe System Modelng, Analyss and Contol Chapte 7 3 Opeatng State Estmaton 3 7. Intoducton 3 7. SCADA System 4 7.3 System Netwok Confguato 7

More information

(8) Gain Stage and Simple Output Stage

(8) Gain Stage and Simple Output Stage EEEB23 Electoncs Analyss & Desgn (8) Gan Stage and Smple Output Stage Leanng Outcome Able to: Analyze an example of a gan stage and output stage of a multstage amplfe. efeence: Neamen, Chapte 11 8.0) ntoducton

More information

PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED SCHEME

PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED SCHEME Sept 04 Vol 5 No 04 Intenatonal Jounal of Engneeng Appled Scences 0-04 EAAS & ARF All ghts eseed wwweaas-ounalog ISSN305-869 PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED

More information

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems Engneeng echancs oce esultants, Toques, Scala oducts, Equvalent oce sstems Tata cgaw-hll Companes, 008 Resultant of Two oces foce: acton of one bod on anothe; chaacteed b ts pont of applcaton, magntude,

More information

Efficiency of the principal component Liu-type estimator in logistic

Efficiency of the principal component Liu-type estimator in logistic Effcency of the pncpal component Lu-type estmato n logstc egesson model Jbo Wu and Yasn Asa 2 School of Mathematcs and Fnance, Chongqng Unvesty of Ats and Scences, Chongqng, Chna 2 Depatment of Mathematcs-Compute

More information

2/24/2014. The point mass. Impulse for a single collision The impulse of a force is a vector. The Center of Mass. System of particles

2/24/2014. The point mass. Impulse for a single collision The impulse of a force is a vector. The Center of Mass. System of particles /4/04 Chapte 7 Lnea oentu Lnea oentu of a Sngle Patcle Lnea oentu: p υ It s a easue of the patcle s oton It s a vecto, sla to the veloct p υ p υ p υ z z p It also depends on the ass of the object, sla

More information

SOME NEW SELF-DUAL [96, 48, 16] CODES WITH AN AUTOMORPHISM OF ORDER 15. KEYWORDS: automorphisms, construction, self-dual codes

SOME NEW SELF-DUAL [96, 48, 16] CODES WITH AN AUTOMORPHISM OF ORDER 15. KEYWORDS: automorphisms, construction, self-dual codes Факултет по математика и информатика, том ХVІ С, 014 SOME NEW SELF-DUAL [96, 48, 16] CODES WITH AN AUTOMORPHISM OF ORDER 15 NIKOLAY I. YANKOV ABSTRACT: A new method fo constuctng bnay self-dual codes wth

More information

2 dependence in the electrostatic force means that it is also

2 dependence in the electrostatic force means that it is also lectc Potental negy an lectc Potental A scala el, nvolvng magntues only, s oten ease to wo wth when compae to a vecto el. Fo electc els not havng to begn wth vecto ssues woul be nce. To aange ths a scala

More information

PHY126 Summer Session I, 2008

PHY126 Summer Session I, 2008 PHY6 Summe Sesson I, 8 Most of nfomaton s avalable at: http://nngoup.phscs.sunsb.edu/~chak/phy6-8 ncludng the sllabus and lectue sldes. Read sllabus and watch fo mpotant announcements. Homewok assgnment

More information

Transport Coefficients For A GaAs Hydro dynamic Model Extracted From Inhomogeneous Monte Carlo Calculations

Transport Coefficients For A GaAs Hydro dynamic Model Extracted From Inhomogeneous Monte Carlo Calculations Tanspot Coeffcents Fo A GaAs Hydo dynamc Model Extacted Fom Inhomogeneous Monte Calo Calculatons MeKe Ieong and Tngwe Tang Depatment of Electcal and Compute Engneeng Unvesty of Massachusetts, Amhest MA

More information

CSU ATS601 Fall Other reading: Vallis 2.1, 2.2; Marshall and Plumb Ch. 6; Holton Ch. 2; Schubert Ch r or v i = v r + r (3.

CSU ATS601 Fall Other reading: Vallis 2.1, 2.2; Marshall and Plumb Ch. 6; Holton Ch. 2; Schubert Ch r or v i = v r + r (3. 3 Eath s Rotaton 3.1 Rotatng Famewok Othe eadng: Valls 2.1, 2.2; Mashall and Plumb Ch. 6; Holton Ch. 2; Schubet Ch. 3 Consde the poston vecto (the same as C n the fgue above) otatng at angula velocty.

More information

Approximate Abundance Histograms and Their Use for Genome Size Estimation

Approximate Abundance Histograms and Their Use for Genome Size Estimation J. Hlaváčová (Ed.): ITAT 2017 Poceedngs, pp. 27 34 CEUR Wokshop Poceedngs Vol. 1885, ISSN 1613-0073, c 2017 M. Lpovský, T. Vnař, B. Bejová Appoxmate Abundance Hstogams and The Use fo Genome Sze Estmaton

More information

4 SingularValue Decomposition (SVD)

4 SingularValue Decomposition (SVD) /6/00 Z:\ jeh\self\boo Kannan\Jan-5-00\4 SVD 4 SngulaValue Decomposton (SVD) Chapte 4 Pat SVD he sngula value decomposton of a matx s the factozaton of nto the poduct of thee matces = UDV whee the columns

More information

Constraint Score: A New Filter Method for Feature Selection with Pairwise Constraints

Constraint Score: A New Filter Method for Feature Selection with Pairwise Constraints onstant Scoe: A New Flte ethod fo Featue Selecton wth Pawse onstants Daoqang Zhang, Songcan hen and Zh-Hua Zhou Depatment of ompute Scence and Engneeng Nanjng Unvesty of Aeonautcs and Astonautcs, Nanjng

More information

CHAPTER 7. Multivariate effect sizes indices

CHAPTER 7. Multivariate effect sizes indices CHAPTE 7 Multvaate effect szes ndces Seldom does one fnd that thee s only a sngle dependent vaable nvolved n a study. In Chapte 3 s Example A we have the vaables BDI, POMS_S and POMS_B, n Example E thee

More information

Professor Wei Zhu. 1. Sampling from the Normal Population

Professor Wei Zhu. 1. Sampling from the Normal Population AMS570 Pofesso We Zhu. Samplg fom the Nomal Populato *Example: We wsh to estmate the dstbuto of heghts of adult US male. It s beleved that the heght of adult US male follows a omal dstbuto N(, ) Def. Smple

More information

Re-Ranking Retrieval Model Based on Two-Level Similarity Relation Matrices

Re-Ranking Retrieval Model Based on Two-Level Similarity Relation Matrices Intenatonal Jounal of Softwae Engneeng and Its Applcatons, pp. 349-360 http://dx.do.og/10.1457/sea.015.9.1.31 Re-Rankng Reteval Model Based on Two-Level Smlaty Relaton Matces Hee-Ju Eun Depatment of Compute

More information

Minimal Detectable Biases of GPS observations for a weighted ionosphere

Minimal Detectable Biases of GPS observations for a weighted ionosphere LETTER Eath Planets Space, 52, 857 862, 2000 Mnmal Detectable Bases of GPS obsevatons fo a weghted onosphee K. de Jong and P. J. G. Teunssen Depatment of Mathematcal Geodesy and Postonng, Delft Unvesty

More information

gravity r2,1 r2 r1 by m 2,1

gravity r2,1 r2 r1 by m 2,1 Gavtaton Many of the foundatons of classcal echancs wee fst dscoveed when phlosophes (ealy scentsts and atheatcans) ted to explan the oton of planets and stas. Newton s ost faous fo unfyng the oton of

More information

Physics 202, Lecture 2. Announcements

Physics 202, Lecture 2. Announcements Physcs 202, Lectue 2 Today s Topcs Announcements Electc Felds Moe on the Electc Foce (Coulomb s Law The Electc Feld Moton of Chaged Patcles n an Electc Feld Announcements Homewok Assgnment #1: WebAssgn

More information

Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002 Backwad Haplotype Tansmsson Assocaton (BHTA) Algothm A Fast ult-pont Sceenng ethod fo Complex Tats Tan Zheng Depatment of Statstcs Columba Unvesty Febuay 5 th, 2002 Ths s a jont wok wth Pofesso Shaw-Hwa

More information

Some Approximate Analytical Steady-State Solutions for Cylindrical Fin

Some Approximate Analytical Steady-State Solutions for Cylindrical Fin Some Appoxmate Analytcal Steady-State Solutons fo Cylndcal Fn ANITA BRUVERE ANDRIS BUIIS Insttute of Mathematcs and Compute Scence Unvesty of Latva Rana ulv 9 Rga LV459 LATVIA Astact: - In ths pape we

More information

Density Functional Theory I

Density Functional Theory I Densty Functonal Theoy I cholas M. Hason Depatment of Chemsty Impeal College Lonon & Computatonal Mateals Scence Daesbuy Laboatoy ncholas.hason@c.ac.uk Densty Functonal Theoy I The Many Electon Schönge

More information

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork Joned Physcs Analyss Cente Summe Wokshop on the Reacton Theoy Execse sheet 8 Vncent Matheu Contact: http://www.ndana.edu/~sst/ndex.html June June To be dscussed on Tuesday of Week-II. Classwok. Deve all

More information

Theo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS

Theo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS RESEARCH ESSAY COSISE PARIAL LEAS SQUARES PAH MODELIG heo K. Djksta Faculty of Economcs and Busness, Unvesty of Gonngen, ettelbosje, 9747 AE Gonngen HE EHERLADS {t.k.djksta@ug.nl} Jög Hensele Faculty of

More information

Optimization Algorithms for System Integration

Optimization Algorithms for System Integration Optmzaton Algothms fo System Integaton Costas Papadmtou 1, a and Evaggelos totsos 1,b 1 Unvesty of hessaly, Depatment of Mechancal and Industal Engneeng, Volos 38334, Geece a costasp@uth.g, b entotso@uth.g

More information

Advanced Robust PDC Fuzzy Control of Nonlinear Systems

Advanced Robust PDC Fuzzy Control of Nonlinear Systems Advanced obust PDC Fuzzy Contol of Nonlnea Systems M Polanský Abstact hs pape ntoduces a new method called APDC (Advanced obust Paallel Dstbuted Compensaton) fo automatc contol of nonlnea systems hs method

More information

KEYWORDS: survey sampling; prediction; estimation; imputation; variance estimation; ratios of totals

KEYWORDS: survey sampling; prediction; estimation; imputation; variance estimation; ratios of totals Usng Pedcton-Oented Softwae fo Suvey Estmaton - Pat II: Ratos of Totals James R. Knaub, J. US Dept. of Enegy, Enegy Infomaton dmnstaton, EI-53.1 STRCT: Ths atcle s an extenson of Knaub (1999), Usng Pedcton-Oented

More information

Monte Carlo comparison of back-propagation, conjugate-gradient, and finite-difference training algorithms for multilayer perceptrons

Monte Carlo comparison of back-propagation, conjugate-gradient, and finite-difference training algorithms for multilayer perceptrons Rocheste Insttute of Technology RIT Schola Woks Theses Thess/Dssetaton Collectons 20 Monte Calo compason of back-popagaton, conugate-gadent, and fnte-dffeence tanng algothms fo multlaye peceptons Stephen

More information

MACHINE LEARNING. Mistake and Loss Bound Models of Learning

MACHINE LEARNING. Mistake and Loss Bound Models of Learning Iowa State Unvesty MACHINE LEARNING Vasant Honava Bonfomatcs and Computatonal Bology Pogam Cente fo Computatonal Intellgence, Leanng, & Dscovey Iowa State Unvesty honava@cs.astate.edu www.cs.astate.edu/~honava/

More information

Wave Equations. Michael Fowler, University of Virginia

Wave Equations. Michael Fowler, University of Virginia Wave Equatons Mcael Fowle, Unvesty of Vgna Potons and Electons We ave seen tat electons and potons beave n a vey smla fason bot exbt dffacton effects, as n te double slt expement, bot ave patcle lke o

More information

Units, Physical Quantities and Vectors

Units, Physical Quantities and Vectors What s Phscs? Unts, Phscal Quanttes and Vectos Natual Phlosoph scence of matte and eneg fundamental pncples of engneeng and technolog an epemental scence: theo epement smplfed (dealed) models ange of valdt

More information

Location-Aware Cross-Tier Coordinated Multipoint Transmission in Two-Tier Cellular Networks

Location-Aware Cross-Tier Coordinated Multipoint Transmission in Two-Tier Cellular Networks Locaton-Awae Coss-Te Coodnated Multpont Tansmsson n Two-Te Cellula Netwoks Ahmed Hamd Sak and Ekam Hossan axv:45.876v cs.ni] 8 Sep 4 Abstact Mult-te cellula netwoks ae consdeed as an effectve soluton to

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detecton and Etmaton Theoy Joeph A. O Sullvan Samuel C. Sach Pofeo Electonc Sytem and Sgnal Reeach Laboatoy Electcal and Sytem Engneeng Wahngton Unvety 411 Jolley Hall 314-935-4173 (Lnda anwe) jao@wutl.edu

More information

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates Appled Mathematcal Scences, Vol. 7, 03, no. 5, 67-633 HIKARI Ltd, www.m-hka.com http://dx.do.og/0.988/ams.03.38448 A Study about One-Dmensonal Steady State Heat ansfe n ylndcal and Sphecal oodnates Lesson

More information