IMA Preprint Series # 2103

Size: px
Start display at page:

Download "IMA Preprint Series # 2103"

Transcription

1 STATISTICAL CHARACTERIZATIO OF PROTEI ESEMBLES By Dego Roter Gullermo Sapro an Vjay Pane IMA Preprnt Seres # 03 ( Marc 006 ) ISTITUTE FOR MATHEMATICS AD ITS APPLICATIOS UIVERSITY OF MIESOTA 400 Ln Hall 07 Curc Street S.E. Mnneapols, Mnnesota Pone: 6/ Fax: 6/ URL: ttp://

2 Statstcal Caracterzaton of Proten Ensembles Dego Roter,, Gullermo Sapro, an Vjay Pane 3 Abstract Wen accountng for structural fluctuatons or measurement errors, a sngle rg structure may not be suffcent to represent a proten. One approac to solve ts problem s to represent te possble conformatons as a screte set of observe conformatons, an ensemble. In ts work, we follow a fferent rcer approac, an ntrouce a framework for estmatng probablty ensty functons n very g mensons, an ten apply t to represent ensembles of fole protens. Ts propose approac combnes tecnques suc as kernel ensty estmaton, maxmum lkeloo, cross-valaton, an bootstrappng. We present te unerlyng teoretcal an computatonal framework an apply t to artfcal ata an proten ensembles obtane from molecular ynamcs smulatons, an compare te results wt tose obtane expermentally, llustratng te potental an avantages of ts representaton. Introucton an motvaton A sngle structure s often use to represent a proten. Ts can be consere natural wen te structure s assume to be rg, but proten structure s known to fluctuate uner pysologcal contons. Ts fact, overlooke n many stuatons n favor of te smplcty of a sngle structure, may be avantageously accounte for at tmes wen g resoluton tecnques for structure etermnaton are avalable. Even f te true structure were fxe an unque, te uncertanty n ts etermnaton by te (mperfect) measurement of some property (.e., ffracton, magnetc resonance, etc.), also prouces varablty, snce tose metos generally optmze a moel to ft te observatons, a process prone to fn multple local mnma. A smlar stuaton arses wen smulatons are use for structure etermnaton, te moele energy lanscape s populate wt multple local mnma. As a result of ts an oter ntrnsc smulaton caracterstcs (e.g., ranomness), multple structure representatves for te same proten are possble. In applcatons were te fluctuatons can not be gnore, an moreover, are to be favorably explote, ow soul tey be represente an ncorporate nto te calculatons? One way s to represent te proten structure not as a sngle conformaton but as a fnte set of conformatons, corresponng to fferent observatons of ts state. In ts work we propose a fferent rcer approac, consstng of estmatng a probablty ensty functon (pf) from te avalable observatons of te state, an usng ts pf to represent te ensemble (a fnte set of conformatons s just a partcular case of ts, wt te pf beng elta functons place n te observaton ponts). Ts rc representaton s startng to gan nterest n te proten researc communty, e.g., [, ] an Lnorff-Larsen (personal communcaton). For example, ts type of representaton as been recently pursue to rank te space of conformatons n agreement wt MR observatons []. Department of Electrcal an Computer Engneerng, Unversty of Mnnesota, 00 Unon St. SE, Mnneapols, M 55455, USA. rot,gulle@ece.umn.eu To wom corresponence soul be aresse. 3 Cemstry Department, Stanfor Unversty, Stanfor, CA , USA. pane@stanfor.eu

3 Ts representaton may allow one to see aspects prevously en wen te ensemble was regare as a set of screte conformatons (e.g., te moes), an also proves a natural framework to perform certan operatons, e.g., compare ensembles of te same proten tat were obtane by fferent metos, or etermne te probablty tat a partcular conformaton belongs to te ensemble [3]. ew conformatons tat combne propertes of te ensemble can be obtane as well from te pf. A possble furter applcaton of te framework comes from te observaton tat multple local mnma le close to te global mnmum, as postulate to explan te robustness of te natve state, an suggeste as a way to prect t for certan proten classes [4]. Ts observaton, f general, can be translate nto te requrement tat te natve structure (or ensemble) must rese were te ensty of local mnma s g. Furter motvaton for te necessty to unerstan te conformatonal space an ts probablty strbuton s suggeste by recent work towar g-resoluton e novo structure precton [5], see n partcular te autors remark tat conformatonal samplng remans te prmary stumblng block towar ts callengng goal. It s mperatve ten to ave a goo escrpton of te ensemble, an eally, suc escrpton s gven by a probablty ensty functon. Important by-proucts of te approac ere ntrouce nclue an ea of te completeness of te sample to represent te space of conformatons tat te proten aopts, an estmate of te conformatonal entropy (an ts error) wc may ave mportant termoynamc consequences (Lnorff-Larsen, personal communcaton); an a measurement of te epenency between varables. It s tereby clearly supporte by te current efforts n proten researc te nee for a goo unerstanng of te proten conformatonal space, an n partcular, of ts probablty ensty functon (pf). It s te prmary goal of ts paper to present a teoretcal an computatonal framework to compute suc a probablty ensty functon. Proten ensembles consst of conformatons usually avng unres or even tousans egrees of freeom. How can nferences be mae from samples szes tat are rougly of te same orer? Ts callengng queston s aresse n ts artcle. We erve, uner clear optmalty crtera from nformaton teory, te best possble pf from te avalable ata. To aceve ts, we ecompose te global ensty as a prouct of lower mensonal factors, contonal probabltes temselves, cosen by a genetc algortm to maxmze te global lkeloo. Te approac explots te fact tat eac egree of freeom (or coornate) oes not strongly epen on every oter coornate, but only on a few, wc are automatcally foun by our approac. Ten, we procee to estmate eac factor usng classcal ensty estmaton tecnques, tat s, Kernel Densty Estmaton an Maxmum Lkeloo. In aton to computng te probablty strbuton of te ensemble, we explctly an automatcally obtan te crtcal epenences between te varables, suc as torson angles. Te man ata use n ts work comes from smulatons of proten ensembles obtane by means of molecular ynamcs [6]. Te framework ere escrbe can be use to caracterze oter proten ensembles, compute eter va molecular ynamcs or usng oter structural etermnaton meto (e.g., rotamerc lbrares wt applcatons n g-resoluton proten folng [7], or even rect multple pyscal measurements). Te framework can also be use to nclue proten flexblty n proten ockng [8]. More on ts wll be presente n te scusson secton. Te remaner of ts paper s organze as follows. In Secton we gve a escrpton of

4 te matematcal an computatonal meto propose to compute te esre probablty ensty functon, gven a fnte set of conformatons. As a proof of concept an for peagogc reasons, we frst use te evelope framework on an artfcal ataset. Ts s presente n Secton 3.. In Secton 3. we use te framework n real ata. In aton to computng te pf, we explctly erve te crtcal nner epenences of te torson angles, an prouce novel conformatons sample from te compute pf. Ter relatonsp wt expermental ata s stue as well. Conclung remarks an scussons are prove n Secton 4. Metos An ensemble 4 s a set of conformatons of te same proten. Eac conformaton correspons to a partcular arrangement of te proten s consttutve atoms n tree mensonal (3D) space. Ts arrangement can be escrbe (or partally escrbe) by fferent sets of features epenng on te applcaton at an. In ts work, we conser te backbone of te proten, wc can be completely escrbe by te usual (M-) torson angles ((M-) φ s an (M-) ψ s) were M s te number of resues n te proten [9]. Our goal s to evelop a tecnque to estmate te ensty of te unknown process tat generates te set of conformatons, te ensemble. Ts ensty s to be estmate from ts avalable samples (a fnte set of conformatons represente by vectors of lengt (M-)). To aress ts we conser tat a coornate of te sample conformaton s relate to just a few oter coornates, wtout knowng n avance to wc ones. In oter wors, we set out to nfer te relatonsps between te coornates (torson angles n our example), an use ts nformaton to estmate te ensty of te process more effcently. Wle ts s a natural assumpton base on te cemcal nature of protens, t s also funamental to reuce te mensonalty of te problem, wc s neee ue to te exstence of only fnte an relatvely few observatons. Te propose computatonal framework nvolves a number of components n te fels of statstcs, nformaton teory, artfcal neural networks, an computer scence. It s plosopcally relate to Hnton s Proucts of Experts (PoE) [0], n te sense tat several low mensonal probablty enstes ( experts n Hnton s termnology), eac one able only to explan local features of te ataset, are compose (multple) to explan global features. Tese approaces are best at explanng global features from local etals, but can not n general anle te effects of global features on local etals. Our approac ffers from Hnton s n tat te experts are nepenent by (automatc) constructon, tus avong te nee to renormalze te prouct. An atonal fference between te approaces s tat Hnton uses parametrc moels for te experts, wle we estmate ts sape rectly from te ata. Furter restrctons on te experts n our approac also guarantee te expert s eterogenety, an assure tat all te local features are consere n te constructon of te global ensty. Our approac s also relate to Akake s Informaton Crteron (AIC) [, ], n tat te coce of te orer of te moel selecte s base n te Kullback-Lebler stance to te true unknown probablty ensty. Contrary to AIC, n our approac te number of parameters oes not explctly appears n te crteron, but only troug te egree of smootng apple. Ts as 4 To avo te confuson erve from usng te wor sample to enote a set of conformatons an also a sngle conformaton from tat set we reserve ts wor for te frst meanng ( a set of conformatons ). We also use ensemble for te same concept. We use te wors observaton an pont for te secon meanng ( a sngle conformaton ). 3

5 te avantage tat eac parameter s not equally wegte, but t s wegte accorng to te partcular role t plays n te moel. In ts secton we present te propose ensty estmaton framework, an te ratonale ben te selecton of te partcular metos to fulfll eac task. For tat purpose, an for easy reference an completeness, a bref revew of eac relevant meto s nclue. In Secton., a proceure for estmatng a ensty s ntrouce. In Secton., we analyze te errors an lmts of ts estmaton proceure. Fnally, n Secton.3, we exten te tecnque for te kn of ataset of nterest (conformaton ensembles). To avo obscurng te man concepts, te non crtcal mplementaton etals are omtte from ts artcle. Tey can be obtane, togeter wt te coe, from te autors by request.. Densty estmaton.. Maxmum lkeloo prncple In te searc for te best ensty estmate, te best way to start s to efne wat s best, or at least, wat s better. Te fel of statstcs proves a tool for tat purpose: te maxmum lkeloo prncple [3]. Te ratonale ben te maxmum lkeloo prncple can be state qute smple: from all te possble moels (or enstes) tat coul ave generate te ensemble, select te most probable one. More formally, let S = { x, x,..., x } be an ensemble of nepenent an ( ) entcally strbute observatons n R M, generate by one of te moels n, te set of all possble moels (formally efne n..). Wtout gong nto furter etals at ts pont, let us menton tat bot te ata ponts x an te moels belong to a contnuous space. Ts s te reason to use enstes nstea of probabltes trougout tese explanatons. Let M be a moel n te famly parameterze by, a parameter to be etale n Secton... Ten, usng Bayes law, te probablty ensty tat te moel generate te ensemble can be expresse as: p ( ) ( ) ( M ) p M S = p S M. p( S) were: p ( S M ) s te probablty ensty tat te moel M generates te ensemble S. We wll refer to t as te ensemble (or sample) lkeloo an we wll enote t by L ( S). Snce te observatons n S are assume to be nepenent, ts term can be easly compute (now te moel s assume L S = P S M = P x M = f x, were f s te ensty to be known), ( ) ( ) ( ) ( ) = corresponng to te moel M ; p ( S) s te uncontone probablty ensty of te ensemble. It s of lttle nterest ere, snce t s equal for all te moels an terefore as s commonly one, we wll gnore t; an p s te a pror probablty ensty of te moel, knowlege tat soul bas te coce of ( ) M one moel over anoter. Wen suc nformaton s not avalable, or f t s not relable, te common practce s to assume tat all moels are equally probable. For te atasets ealt wt n ts artcle, a pror nformaton ctate by te pyscs of te process s avalable (see [4] ). everteless, we coose for smplcty not to nclue ts nformaton n te moel at ts stage, = 4

6 keepng n mn tat te results can be mprove by ong oterwse. ote tat ts a pror probablty s efne over te space of moels. p( M ) Consequently, te factor s te same for all moels (wen all moels are equally p( S) possble), an te most probable moel s smply te one tat makes te ata most probable, maxmzes te sample lkeloo. Smplfcaton can be obtane by maxmzng te logartm of ts quantty nstea, log L S = log P S M = log f x ( ( )) ( ) ( ) Snce te logartm functon s monotoncally ncreasng, t attans te maxmum at te same moel but wll ave a muc smpler ervatve. It wll be useful for later evelopments to note te close relatonsp between te loglkeloo an te emprcal entropy or fnte sample average of entropy, 5 efne as [5] : H = = S x = ( f ) log( f ( )) Its relaton to te log-lkeloo s realy apparent: H S ( f ) = log( L ( S) ) Ten, maxmzng log-lkeloo s equvalent to mnmzng te emprcal entropy, an snce we fn te notaton smpler an te concepts rcer, we coose to work wt te latter. An entcal result can be obtane troug a completely fferent (at frst sgt) approac usng te relatve entropy, also known n te lterature as te Kullback-Lebler vergence, cross entropy, or asymmetrc vergence [5]. It s efne, for two enstes f(x) an g(x), as f ( x ( ) = ( ) ) D f g f x. log x g( x) Te relatve entropy s a measure of te stance between two strbutons. 6 Consequently t seems natural to efne te best ensty estmate ( ˆ f ( x ) ) as te one tat mnmzes te stance to te true unknown ensty ( f (x) ). Te new score to mnmze s [3] : D = fˆ ( x) ( ˆ f ( x) f f ) f ( x) log x = f ( x) log( f ( x) ) x f ( x) log( fˆ( x) ) = H ( f ) E ( ( ˆ ) ( ( ˆ log f ( x) H ( f ) E log f ( x) ) = H ( f ) H ( fˆ ) f S + x = Snce te frst term n te last expresson s constant, te expresson to mnmze s entcal to te one tat alreay was foun n Equaton (). Te maxmum lkeloo moel s te one tat s closer to te true ensty, as measure by te relatve entropy stance. Te approxmately S () = f x) log f ( x) 5 Snce te emprcal entropy converges to te entropy ( ) x ( as te sample sze grows [34], we may abuse language an use te terms as synonyms. 6 Strctly speakng t s not a stance snce t s not symmetrc an oes not satsfy te trangle nequalty. oneteless, t s always non-negatve an zero f an only f f = g, an for ts reason t s often useful to tnk of t as a stance between strbutons [5]. 5

7 equal symbol n te last ervaton entals tat, wen a fnte sample s use, te score obtane s only an approxmaton to te true score for te moel. Te mplcatons of ts fact are scusse n Secton.. Dfferent metrcs coul ave been cosen to measure te screpancy between te estmate an true enstes, an eac coce woul ave resulte n a fferent (optmal) estmate. Our coce, base on classcal nformaton teory an statstcs, tres to capture te orer of te true ensty (reflecte by a functon of te quotent of te two enstes beng ntegrate) rater tan ts absolute value (as woul be te case for te L p norm, were a functon of te fference between te enstes s ntegrate). An atonal avantage of ts coce s tat t leas to more tractable calculatons. As Vola ponts out [3], tere are tree man reasons wy maxmum lkeloo may fal to fn an accurate moel: a) Tere are no suffcently accurate moels n te consere set of possble moels. For ts reason t s mportant to mpose only te weakest assumptons on te ensty, n our case smootness, Secton..; b) Te searc for te best moel may fal to scover te moel tat globally mnmzes te entropy (because t gets stuck n a local mnma), even toug t belongs to te consere set of possble moels, ence te mportance of a goo optmzaton algortm; an c) Unlkely observatons rawn from a moel are only mprobable, not mpossble. If an unlkely sample s rawn from a moel, t coul well be assgne to anoter moel tat makes t more lkely. Ts rsk becomes smaller as te sample sze grows... Te ypotess space Havng prove a way to compare moels, an from ts to select te best one, te next step s to efne te set of possble moels, or ypotess space, from were te best moel soul be cosen. Snce we o not want to loose generalty at ts pont by restrctng our attenton to a partcular kn of enstes, we aopt te sample-base approac, were te sample tself efnes te moel. In partcular, we are ntereste n te meto of Kernel Densty Estmaton, also known as Parzen Wnow Densty Estmaton [6]. Accorng to ts meto, n one menson te ensty estmates are gven by x x fˆ ( x, S) =. K () = were K(x) 7 s a probablty ensty functon known as te kernel functon an, apart from nexng te space, s te wnow wt, also known as te banwt or te smootng parameter. An alternatve but oterwse equvalent expresson can be gven n terms of te x convoluton of te sample wt te kernel, fˆ ( x, S) = K δ ( x ) x. = Te role of te kernel s to sprea te mass of te observatons aroun ts orgnal poston. Usually, K s a unmoal even functon, fallng off quckly to zero. Bell sape functons n general, an Gaussans n partcular, are frequently use kernels. In ts work we use te Von Mses kernel, wc plays te role of te Gaussan ensty for angular ata [7]. 7 To smplfy te exposton we assume tat te kernel s stretce by te banwt. For peroc kernels (n [ 0,π ) n our case) clearly ts approac oes not work, an te sape of te kernel must cange as well wen te banwt canges. See [7] for a etale scusson. 6

8 Wat was just presente n one menson s easly generalze to ger mensons. In - mensons, te equaton for te kernel approxmaton (analogous to Equaton ()) s x x fˆ ( x, S) =. K (3) =..3 Crossvalaton It was state n Secton.. tat one of te reasons tat mgt lea to te maxmum lkeloo crteron to perform poorly s te absence of aequate moels n te ypotess space. Ts coul tempt te naïve user to nclue as many moels as possble n tat set. In partcular, for te efnton of te set of functons tat we gave n te prevous secton, ts means tat no constrans are mpose on te banwt. Ts s not recommenable, let s see wy. Wen te same sample s use bot to approxmate te ensty functon an to estmate te entropy (lkeloo), te expresson for te entropy (from equatons () an (3)) becomes: x x j H ( ) ( ( )) S fˆ = fˆ log x, S = log. K (4) = = j= Remember tat Equaton (4) assgns a score to eac moel/ensty fˆ. Te lower te score, te better te moel. Te problem of ensty estmaton can ten be state as fnng te moel, or n oter wors fnng te banwt, tat mnmzes Equaton (4). Unfortunately, ts expresson as no mnmum for ( 0, ), an as t tens to mnus nfnty, te corresponng selecte moel gets furter an furter away from te esre moel. Snce te mass of te kernel sts aroun te orgn an falls off quckly as we move away from t, x x j x x K K = K ( 0) 0, an ( ˆ ) ( ) j= K 0 H S f log. As a 0 0 result, te ensty estmate tens to te functon tat as one elta n eac one of te sample ponts. Te soluton was over-trane to ft te ata, sregarng any prevous knowlege about te true ensty we may ave a. Ts stresses te mportance of carefully coosng te ypotess space. Wat s known about te functons soul be nclue n te efnton of te ypotess space to avo overfttng, but not overong t, rskng to exclue te correct ensty from te ypotess space. A possble escape from ts stuaton s to set a lower lmt for te banwt. But, ow to select ts lmt n a sensble way s far from trval. Furtermore, ts lmt soul epen on te sample sze an te unknown ensty tself. We rater use anoter approac nstea, known as cross-valaton or leave-one-out [3]. Alternatve approaces for estmatng te banwt can be foun n [6, 8]. Snce te problem orgnate for usng te same sample twce (bot to construct te ensty functon an to estmate te entropy by evaluatng te ensty n te sample ponts), te crossvalaton tecnque splts te sample n two (or uses two fferent samples f possble), an uses one part to construct te ensty an te oter to estmate te entropy. It may seem at frst as a waste of ata to use some sample ponts to compute te entropy, nstea of usng tem to estmate te ensty, wc s ultmately wat we want to o. Crossvalaton cleverly solves ts problem spenng only one of te ponts of te sample: One part of te set contans all te ponts but one, an s use to construct te ensty, wle te oter part as a sngle pont tat s use to evaluate te ensty (.e. te ensty s compute at ts pont). 7

9 Ts process s repeate tmes, leavng out eac pont once, an obtanng a corresponng ensty estmate eac tme. Te contrbuton of every pont to te entropy s ten ae togeter to get a fnal estmate of te entropy. Te new expresson for te entropy tat as to be mnmze s: ( ) x x j H S fˆ = log. K = = (5) ( ) j j Altoug te prase repeate tmes may suggest te opposte, te comparson of equatons (4) an (5) sows tat te meto wt cross-valaton takes almost te same amount of tme...4 Puttng tngs togeter: A smple example Havng escrbe a crteron to compare moels (cross-valaton maxmum lkeloo, sectons.. an..3), an a set of moels from wc to cose (a famly of kernel approxmate functons, Secton..), we reac a pont were t woul be elpful to present an example sowng ow ensty estmaton actually works. For te sake of te example, suppose tat a ensty f(x) as sown wt a black lne n Fgure a as to be estmate from a sample consstng of 00 ponts, sown n te same fgure as vertcal re lnes. ote tat f(x) s scontnuous an tus oes not belong to te famly of functons tat can be constructe by placng smoot kernels on fnte sample ponts (s not ten n our ypotess space), but t can be approxmate. Te sample s use to efne a famly of ensty functons (va te kernel meto explane n Secton..), were eac member s caracterze by a banwt value. Usng Equaton (5), a score (or entropy value H) s compute for eac member of te famly. Fgure b sows te corresponence between tose values (, te banwt, an H, te entropy). As explane n Secton.., te functon avng te lowest entropy (marke wt a re crcle n Fgure b) s foun (usng classc graent escent optmzaton [9] ), an selecte as te one tat best estmates te true ensty. Ts functon s epcte n re n Fgure c, along wt two oter members of te same famly of functons: one (n blue) avng a smaller banwt (blue crcle n Fgure b), wc was over-trane/over-ftte to te ata, sowng excessve oscllatons, an te oter (n green) avng a larger banwt (green crcle n Fgure b), was over-smoote, loosng part of te structure of te true ensty. Te corresponng kernels use for te approxmaton are sown n Fgure.. Qualty of te estmates In te prevous secton, two qualtatvely fferent but nterrelate estmates were obtane: ensty estmates an entropy estmates. It s our nterest n ts secton to stuy te qualty of bot of tem. In Secton.., te screpancy between te ensty estmate an te true ensty s analyze. Snce our goal s to estmate a ensty, we fn partcularly mportant to get a feelng for te kn of errors we soul expect. In Secton.. te error n te score (entropy) s stue. Snce ts score s use to compare an coose moels, te error n te score wll clearly affect te cances of selectng a goo moel. 8

10 Fgure : Te ensty estmaton process. a) Te true ensty an te sample. b) Te entropy lanscape. c) Tree estmates corresponng to tree fferent banwts: te re, green an blue are te best, over-smoote an unersmoote estmators respectvely. ) Te kernels use for eac of te estmates n c)... Te ensty estmate An estmate for te ensty as been obtane n te prevous secton. But ow s t relate to te true ensty? Informally, we coul say tat te estmate wll be a smoote verson of te true ensty, plus ranom nose [6]. Ts can be seen by conserng te expectaton of te estmate at a sngle pont n te scalar case. By efnton, ts s not an unbase estmator of te ensty. 8 Te bas an varance n te ensty estmate at a pont (tese are bas s a pontwse measure), coul be easly compute [6] : σ K bas( x) = f ( x) + ger - orer terms n σ K t K( t ) t f ( x) var ( x) = var( fˆ ( x, S) ) K( t ) t Let us llustrate tese concepts by contnung wt te example n Secton..4. To see te varablty nerent to te process of rawng ensembles, we repeate 30 tmes te process carre out n te example above. Eac tme we rew an ensemble S, estmate te best ensty an plotte te result n Fgure a. For reference, te true ensty was plotte n black as before. otce tat eac estmate s qute fferent from te oters, but all surroun te expermental average (or expecte) ensty, wc s plotte n re. As was mentone before, ts ensty s a smoote verson of te true ensty. Fgure b sows a ban of wt stanar evatons aroun te average ensty. 8 In teory, te estmaton coul be unbase wen 0 an te kernel tens to te elta functon or f(x) as boune frequency content an te kernel s an eal low pass flter n te support of F(w), te Fourer transform of f(x). 9

11 Te mportant pont to remember from ts secton s tat te (pontwse) bas ncreases as te banwt ncreases, wle te opposte s true for te varance. Cross-valaton maxmum lkeloo s te juge tat fns te compromse between te two, keepng te bas as low as possble wtout ncreasng too muc te varance. To avo excessve varance ten some bas soul be ntrouce n te form of smootng. Ts smootng prouces greater eteroraton on te fast rsng an ecayng parts of te ensty, beng tose te ones contanng ger frequency components (Fgure b)... Te entropy estmate As we saw n Secton.., te sample S as a corresponng entropy value for eac moel n te ypotess space. Ts value can be use, troug te maxmum lkeloo prncple, to fn te best moel n te space. As expecte from a score use for comparng moels, t s a measure of te global ft of eac moel (recall te scusson about relatve entropy at te en of Secton..). Snce we only ave lmte nformaton about te process (contane n te fnte sample avalable), we o not expect ts score to be nfallble n scrmnatng between moels. It s only an estmate of te gooness of ft between te moel an te process. An as an estmate, t s perturbe by nose. To llustrate ts we return to te example. To create Fgure a, multple ensembles were rawn from te process an corresponng ensty estmates were compute. Togeter wt eac ensty estmate came a score (te lowest entropy) estmate. Tese are plotte as blue crcles n Fgure 3. Usng te fact tat te strbuton of te entropy estmate s asymptotcally normal [0], a Gaussan was ftte to tem. For comparson, also te true entropy value of te true ensty was plotte as a black vertcal lne. Altoug all te samples were rawn from te same process, te compute entropes ffer from te true entropy. Even more, tey o not even surroun te true entropy. Ts bas as ts orgn n te loss of score (entropy) ue to te smootng apple, an t coul be compensate (n teory) usng tecnques smlar to tose tat we wll later evelop to asses te varance. But, soul t be compensate? We fn tat t soul not, snce t correspons to a real eteroraton of te ensty estmate tat soul be taken nto account wen comparng moels. Moreover, compensatng te score only affects te result of te comparson but oes not mprove te ensty estmate. We also note a curosty n Fgure 3: tere are some ensty estmates tat explan te ata better tat te true ensty. Ts s because tese estmates are only tryng to explan te ponts n te observe ensemble. Te re lne n te fgure correspons to te entropy of te average Fgure : From eac sample a fferent estmate of te ensty s obtane. a) Multple ensty estmates (n blue) from multple samples S. Te average estmate s sown n re. b) A ban (n cyan) of two stanar evatons aroun te pontwse average of te estmates (n re). 0

12 Fgure 3: Eac ensty estmate as a corresponng entropy estmate. Densty of te entropy estmates n blue, true entropy n black an entropy of te average ensty n re. ensty. How ten, can a jugment between moels be attrbute to an ntrnsc fference between te moels an not to ranom effects prouce by te nose? We nee frst an estmate for te varance of te entropy estmate. Joe [] erve explct expressons for te bas an varance of ts estmate. Unfortunately te true ensty s requre n te computaton, makng t of lttle practcal use for us, except for provng an ea of te orer of convergence of te errors n (te sample sze), (te banwt) an (te menson of te oman). A practcal alternatve s explore n te next secton...3 Te bootstrap estmate Bootstrap, [], s a useful tool to compute error measures for ensty estmate functonals. Te ea s smlar to wat was one n Fgure 3. In orer to compute te varablty (or even te strbuton) of te estmate, we take many samples from te process an compute te functon of nterest for eac one (Fgure 4a). Te problem wt ts approac s tat usually we only ave one sample S from te process to be stue, an t s not possble to get more. Te soluton propose by te bootstrap tecnque s to generate (rectly or nrectly) new samples from te orgnal sample S, an use tese to asses te varablty of te functonal (fgures 4b an 4c). Te classc bootstrap meto works as follows (see Fgure 4b):. A sample of sze #S s rawn, wt replacement, from te orgnal sample S.. Ts new sample s use to compute te esre functonal (n our case, te sample s frst use to estmate a ensty tat s ten plugge-n nto te entropy functonal). 3 Steps an are repeate to obtan many estmates, an n turn an estmate for te varance of te esre functonal s estmate (te entropy n our case). As just presente, ts meto can not be use n our case, snce repeate sample ponts wll force te cross-valaton sceme to coose te soluton wt null banwt an eltas n te ata ponts (as explane n Secton..3). Instea, we use a varaton of te classcal bootstrap tecnque, known as smoote bootstrap [, 3]. In ts approac, we raw te new sample from te estmate of te ensty (tat we ave to fn anyway), an not from te orgnal sample tself (see Fgure 4c):. A ensty estmate s constructe from te sample, as explane n Secton... A new sample s rawn from ts ensty an use to compute te esre functonal; te form of te ensty estmate makes rawng samples from t partcularly easy. 3. As n te classc bootstrap, step s repeate to obtan many estmates tat wll be use to

13 a) True Densty (known) f (x) Samples (as many as esre) S S D Densty Estmates fˆ( x) fˆ D ( x) Hˆ ˆ... H D Entropy Estmates b) True Densty (unknown) f (x) Sample (only one) S Bootstrap resamples S ˆ ˆ S B Bootstrap Densty Estmates ˆ * f ( x) ˆ * f B ( x) H ˆ * ˆ *... H B c) True Densty (unknown) f (x) Sample (only one) S Densty Estmate (only one) ˆ f SB ( x) Smoot Bootstrap resamples S ˆ ˆ S B Bootstrap Densty Estmates ˆ * f ( x) ˆ * f B ( x) Ĥ H ˆ * ˆ *... H B Fgure 4: Te Bootstrap an te Smoot Bootstrap. a) Stuaton n te example of Fgure 3. Snce te true ensty s known, t s possble to raw multple samples { S,..., S n }, an from eac one of tem estmate a new ensty an corresponng entropy. b) Classc bootstrap approac: only one sample S s avalable an te new samples are generate from t. c) Smoote bootstrap tecnque: a ensty estmate ˆ ( x) s frst constructe an ten use to raw new samples. f SB calculate te varance of te esre functonal s estmate. Step s performe only once an te obtane ensty s use many tmes n step to raw furter samples. Te ratonale ben te ea of te bootstrap (asses te varablty of Ĥ from * * tat of H ˆ ˆ... H B ) s supporte by te fact tat ( Hˆ H ) an ( Hˆ Hˆ ) * ave te same lmt strbuton [4] (see Fgure 4 for te exact efnton of tese varables). Let us use te example starte n Secton..4 to exemplfy tese concepts. Takng avantage of te fact tat te true ensty functon s known (n contrast to wat usually appens n a real scenaro), we can procee as n Fgure a to get many ensty estmates from corresponng ensembles rawn from te process. Ten, eac estmate (e.g., te -t) s use n two ways (see Fgure 5). Frst, to compute an estmate for te entropy ( Ĥ ) an secon to compute a collecton of ensembles ( S... B S ) tat n turn s use to prouce new ensty

14 True Densty (known) f (x) H Samples (as many as esre) S S D Densty Estmates fˆ( x) fˆ ( x) ˆ H D... ˆ H D Smoot Bootstrap resamples ( ˆ ( )... ˆ B B f x f ( x) ) an corresponng entropy ( Hˆ... Hˆ ) estmates. Te ea s to examne te B vablty of usng te varablty of Hˆ... Hˆ to estmate te varablty of Ĥ. In general we wll only ave one estmate of te score ( Ĥ n Fgure 4c) nstea of te many estmates ( Ĥ n Fgure 5) artfcally generate n ts case (wc was possble because we know te true ensty). Tus, we wll ave to resort to bootstrap resamples for an estmate of te varablty of te score. Fgure 6a sows te enstes for te fferent scores appearng n Fgure 5 for our toy example. Te true entropy, wc s te entropy compute from te true ensty, s sown n black. Te ensty of te entropy estmates Ĥ, compute from te ensty estmates f ˆ ( x ), s sown n blue. Te ensty of te entropy estmates H ˆ compute from te bootstrap ensty estmates ( fˆ j ( x) ) are sown n re. Fgure 6b compares te estmate for te varance of te B score ( ( Hˆ... Hˆ D ) σ to ts real value ( σ ( Hˆ Hˆ D ).... Ts estmate for te varance prouce values above te true value but of te expecte orer. Settng ase te problem prouce by te repeate ponts n te classc bootstrap resample (wc mgt lea to estmatng zero banwt kernels), t s not clear weter usng te smoote bootstrap wll mprove te performance of te estmator over te classc bootstrap n ts case, an f t oes, ow soul te smootng banwt ( SB n Fgure 4c) be foun [5, 6]. It s known tat te ensty tat s use to create te resamples n te bootstrap worl soul be base on a larger banwt SB tan te orgnal one (obtane usng cross-valaton maxmum lkeloo) [4], but tere are no specfc rules to coose t. Uner tese contons, we ece to use te orgnal banwt aware of te fact tat ts matter requres furter researc. In Secton.3.3 below we wll use te smoote bootstrap to fn an estmate of te error n te scores of te moels to be compare, an we wll take tese errors nto account to perform te comparson. S B S D Bootstrap Densty Estmates fˆ ( x) fˆ ( x) B D Hˆ... ˆ Fgure 5: Smulatng te Bootstrap. Relatonsp between samples an estmates use n Fgure 6. See text for etals. B H D.3 Curse of mensonalty If we a a sample contanng a very large number of observatons (relatve to te menson 3

15 Fgure 6: Denstes for te fferent scores efne n Fgure 5 for te toy example. a) True entropy n black, ensty of te estmate n blue, an estmates of te ensty estmate n re. b) True varance n blue an ensty of te varance estmate n re. of te space), te problem s solve followng te above escrbe framework. Unfortunately, ts s not usually te case. Even for very small protens, te number of egrees of freeom s n te tens or unres, an terefore, te number of structures neee to ecently estmate te ensty s probtvely large. Ts problem s very well known n statstcs an as been ubbe te curse of mensonalty. In ts secton we suggest a possble soluton, wc s applcable wen te ata as a certan property tat wll allow us to make, for a gven error, te sample sze vrtually nepenent of te number of resues (egrees of freeom) n te proten. Ts s one of te man contrbutons of ts artcle. So far, any reference to te partcular source of te nformaton was avoe, wat was escrbe n prevous sectons s val for any ataset. However at ts pont, sgnfcant smplfcatons can be obtane by explotng an ntrnsc caracterstc of te ensembles, for example, of fole proten conformatons. In te followng, we restrct our attenton to ts kn of ataset, altoug te same apples to any ataset wt te property we ntrouce n te next paragrap. An ensemble of fole proten conformatons, caracterze for example by te backbone torson angles, as te esrable property tat eac angle s manly relate to a small subset of te oter angles, an more mportant, s vrtually nepenent of te rest. Wat appens to one angle of te conformaton s manly affecte by te prevous an followng angles along te can an peraps by tose angles n ts spatal proxmty. Ts s so because a relaton between angles tat are far away wll be ar to explan wtout te ncluson of an angle lyng n between (te nfluence as to be transmtte n some way) an because te number of resues surrounng a gven resue s boune (ue to packng conseratons). We now formalze tese concepts..3. Dve an conquer We set out to estmate te ensty p ( x x,..., ), x 9 of a -mensonal ranom varable (recall from te begnnng of Secton tat =(M-), were M s te number of resues n te proten). Ts ensty can be wrtten as te prouct of two nepenent factors, 9 In prevous sectons te ensty to be estmate was calle f(.) n agreement wt te lterature on ensty estmaton. However, at ts pont, were propertes of probablty enstes are beng nvoke, te use of p(.) seems more natural. 4

16 ( x, x,..., x ) p( x x,..., x ). p( x, x x ) p,..., = 3 Te -subscrpts n te rgt an se were ntrouce to account for te fact tat any coornate (arbtrarly cosen) can be factore out nepenently of ts poston n te orgnal vector. Applyng ts step nuctvely, te ensty can be wrtten as te prouct of nepenent factors: p x, x,..., x = p x x,..., x. p x x,..., x... p x x. p x (6) ( ) ( ) ( ) ( ) ( ) 3 By constructon an n contrast to Hnton s PoE [0], te factors (experts) n (6) are nepenent. Snce, as state above, only a few torson angles (strongly) nfluence a specfc angle, te rest can be scare from te contonng set for tat partcular angle. Assume tat we know tat any angle s (strongly) nfluence by no more tan n oters (wt n muc smaller tan ), ten 0 p ( x x,..., x ) p( x x,..., x n ). p( x x,..., x n )... p( x x ). p( x ), Te orgnal problem of estmatng a -mensonal ensty was reuce to tat of estmatng nepenent enstes n menson (n+) or lower. Usng propertes of te proten conformatons, we ave ten sgnfcantly reuce te problem mensonalty (assumng of course tat n s sgnfcantly lower tan ). For our purposes, t s more practcal, an equally val, to stop te factorzaton earler tan before an conser te followng factorzaton n (almost) unform menson factors: ( x x,..., x ) p( x x,..., x n ). p( x x,..., x n )... p( x x ) p,..., (7), n+ Woul t ave been better to factor out more tan one varable at a tme? Tat s, coul te followng be a better factorzaton tan (7)? p ( x, x,..., x ) p x, x x,..., x n... p( x,..., x ), n (8), + It s easy to see tat no, an tereby (7) s optmal n ts sense. Te entropy corresponng to te ensty n (7) s (see Secton.. an [5] for te use basc propertes of te entropy): ( x x,..., x ) H ( x x,..., x n ) + H ( x x,..., x n ) H ( x x ) H =,..., (9), n+ From te conseratons mae n Secton.., t follows tat ts s te expresson tat as to be mnmze. In Secton. we sowe ow to compute eac of te summans n (9). In te next sectons we explan ow to fn te new parameters ntrouce n ts secton, namely te - nexes (te contone an contonng varables) an te number n of contonng varables. In ts searc t wll be necessary to access repeate tmes te values of te entropy summans, for tat reason t makes more sense to compute tese summans n avance, store tem n a atabase an only ten start te searc. It can be farly objecte tat snce te number of sets of n varables grows very fast as n ncreases, ts atabase can be mpractcally bg. Our experments sow tat, at least for te teste atasets, te nterestng subsets of n varables can be foun ncrementally by ang varables to te nterestng subsets of (n-) varables. Fgure 7a presents an example of ts fact, extracte from te ataset use n Secton 3.3. Ts fact as been extensvely observe n ts 0 For smplcty we assume te number of contonng varables (n) to be equal n every factor, but ts s not strctly necessary. Actually, we only sowe ow to compute entropes of te form H(x,,x ). Entropes of te form H(x x,,x ) can be compute smply usng te relatonsp: H(x x,,x ) = H(x,,x ) - H(x,,x ). 5

17 a b Fgure 7. Progressve scovery. a) One example: Te best set of two contonng varables for varable x s a superset of te best set of one contonng varable. In ts grap te re stars sow te entropy value for te corresponng sets (a selecton of tem s also name n re). Te blue lnes connect eac contonng set wt ts best subset avng one less varable. b) D-Hstogram of ponts of te form (I c,i p ) corresponng to te blue lnes n Fgure 9, were I c s te nex (normalze to be between 0 an ) of te entropy of te cl set n te sorte set of entropes of clren an I p s te nex of te entropy of te parent set n te sorte set of entropes of parents. For example, te lne connectng H( 37 38) an H( 37) n a) correspons to one observaton of coornates (0,0) n ts grap, snce bot sets ave te lowest entropy n ter respectve levels. ataset an oters, an s ocumente n Fgure 7b..3. Dscoverng te epenences Let us frst fx n, an fn te prouct of enstes of te form n Equaton (7) tat best explans te ata (as te smaller entropy) for tat n. Let us frst ntrouce some notaton to smplfy te exposton. Let I, be a permutaton of te ntegers from to (e.g., I,5 = (3,5,,4,) ), I j, be te nces of tat permutaton from te poston j to (e.g., I 3,5 = (,4,) ), an j I, n Also let C { x,..., x } j j te j t 3 nex of tat permutaton (e.g., I for te example above). n j,5 = = be te set of n coornates contonng te coornate x j n equatons 6

18 (7) an (9). Wat s left to o ten s to fn te -nexes subject to te contons:,..., = I : te -nexes must efne a permutaton of te coornates (snce every. ( ), coornate s eventually factore out, but te orer s arbtrary); an n. C j I j+, : te set of contonng varables for j s a subset of te varables tat follow j n te permutaton (snce only tose varables not yet factore out can be use to conton). As state at te begnnng of ts secton, t must contan n varables. Once te permutaton I, s foun (we wll explan ow we o ts below), an assumng n tat all te terms of te form ( ) H x j C j ave been prevously compute an store n a atabase, n te contonng coornates C j for eac contone varable j can be stragtforwarly scovere wt te followng smple rule: just conser from te set + te n nexes {,..., } j n j n tat mnmze ( ) I j, H x j C j. It can be sown tat for a gven permutaton, no oter contonng set can o better. Havng establse ts, te notaton can be extene furter to rewrte Equaton (9) as n n n n H ( I, ) = H ( I, C ) + H ( I, C ) H ( I, C n ) + H ( I n+, ) (0) n were te C s are calculate as just explane. Ts notaton stresses te fact tat only te permutaton as to be foun, te oter sets follow usng te smple rule explane above. Incentally, we see tat H ( I, ) s not affecte by permutatons of te last n postons of I,. To select a goo permutaton I,, a smple genetc algortm base on te eas of mutaton an selecton was use. It bascally works as follows:. Select a ranom ntal permutaton I, (e.g., I,5 = (3,5,,4, ), assumng =5 for te sake of te example). = H I, usng te pre-store values.. Evaluate ts entropy, ( ) H, 3. Coose two postons at ranom (e.g., an 4), an swap tose coornates (to obtan I (3,4,,5, ) ). ', 5 = H '= ',. 5. If te new permutaton s better tan te prevous one ( H <' H ), keep t ( I, I', ). 6. Wle te entropy H s ecreasng an te maxmum number of teratons as not been reace, return to step 3. Two mofcatons to ts algortm were foun to mprove ts performance. Frst, only ajacent postons were swappe n Step 3, allowng for a more effcent computaton of H n Step 4; an secon, te swaps are accepte or rejecte wt certan probablty tat epens 4. Evaluate te entropy for te new permutaton, H ( I ) exponentally n H = H ' H. By ts nature, ts algortm s prone to fn local mnma. To avo or reuce ts problem te algortm s run many tmes for eac value of n, an n eac case te best moel s kept for tat partcular n. Intutvely, for small n s, eac factor n Equaton (7) wll be well estmate (snce te ensty beng estmate s low mensonal), but te epenences between varables mgt be lost. Conversely, for large n s, te epenences wll be capture, but te qualty of eac factor wll be eterorate. Clearly a compromse must be mae. Ts s etale next. 7

19 .3.3 Selectng te orer n In Secton..3 we explane ow to use smoote bootstrap to compute te varance of entropy estmates lke te summans H n ( I, ) n Equaton (0). We frst exten ts result to C compute te varance for te entropy of te permutatons, ( I ) H,. Te summans n te rgt an se of Equaton (0) are nepenent of eac oter, an ence te varance of ts sum s te sum of te varances of eac summan n σ, H I, C = σ H ( I ) = σ n ( ) + H ( I ). Also, recall tat eac summan n Equaton (0) s n+, approxmately normally strbute, an ten ts sum also s. At ts pont of te algortm, several moels ave been foun, one for eac n. Eac moel as a corresponng score (entropy) as etermne by te maxmum lkeloo prncple (see Secton..), an eac score as a corresponng varance, as compute usng smoot bootstrap (see Secton..3). To sum up, te results obtane so far look lke ts: n Moel Entropy Varance 0 0 p ( x ). p( x )... p( x ) H ( I ) 0, σ 0 H ( I ) ( x x ) p( x x )... p( x ) p H ( I, ). n + 0, σ H ( I ) m m p ( x x,..., x m ). p( x x,..., x m )... p( x,..., x ) H ( ) m+ m I, σ m H ( ) m I, Wc one of te m moels s best? Accorng to te maxmum lkeloo prncple, te one tat as te lowest entropy. But we assgne to te moels, va te smoote bootstrap, not just a score, but a probablty ensty of scores. We ave ten to efne a crteron to coose between tose moels. Assume we ave two moels A an B wt n A an n B contonng varables respectvely, n A > n B. Assume also tat te ensty of te entropy estmate for eac moel s as sown n Fgure 8. Wc moel s better? Intutvely we soul coose moel A, snce te probablty tat t performs better tan moel B (agan accorng to maxmum lkeloo), s ger tan te probablty of te reverse case. Obvously f te opposte were true, we soul coose moel B. If neter one of te possbltes s true, t makes sense to coose te computatonally less expensve opton, meanng te moel wt te smallest number of contonng varables. Formalzng ts, we en up wt a selecton rule (for n A > n B ): k P( H A < H B ) > kp( H A H B ) P( H A < H B ) > + k, Fgure 8: Coosng te moel s orer (n). Densty of te entropy estmates for two ypotetcal moels A an B. Wc one soul be cosen? 8

20 were k s a parameter ntrouce to account for te fact tat moel A s computatonally more expensve an soul be convenently ajuste. Knowng tat te enstes of te entropy estmates are (approxmately) Gaussan wt mean an varance gven by te two last columns of te prevous table respectvely, te probablty can be compute (for nstance usng Monte Carlo), an te conton evaluate for te moels, resultng n a unque moel beng selecte as te best representatve for te ata. Ts moel may not be suffcent, but t s te best tat can be obtane wt te avalable ata accorng to our optmalty crteron. Ts s an mportant concept: our propose framework fns te best possble moel (wt respect to te selecte optmalty crteron), an f ts one s not goo enoug, more ata wll ave to be collecte, but te ata was use effcently. Fgure 9 summarzes te proceure explane n ts secton. Ts conclues te ervaton of te computatonal framework. We procee now to ts valaton an applcaton to real ata. 3 Results an applcatons In orer to test our meto, we start wt a set of artfcally constructe examples. In Secton 3. we present tese results. In Secton 3. we apply te meto to two real atasets. Frst an ensemble of conformatons of te resues long -arpn tryptopan zpper [7] s analyze (Secton 3..). Ten n Secton 3.., te vlln eapece [8], a sgnfcantly more complex pepte avng 36 resues s stue. 3. Valaton va artfcal examples Four artfcal atasets wt fferent epenency levels were constructe to valate te propose meto. Tose are scematcally represente on te left of Fgure 0. In all tese atasets te sample sze () was 500, an te menson of te ata ponts () was 6. Te meto explane n Secton was apple to eac ataset. In all cases te epenences were correctly scovere. Te rgt se of Fgure 0 sows te evoluton of te entropy as more contonng varables are allowe n te moel for eac ataset. In eac case, te correct number of contonng varables (n) was foun. It s wort notng tat contrarly to wat s expecte for an nfnte ataset, ang unnecessary contonng varables eterorates te moel. For fnte sample szes, te prce to pay for conserng more epenences s more smootng, wc n turn eterorates te moel. As explane before, a compromse must be mae, an ts s one ere automatcally, by selectng te optmum n usng te rule escrbe n Secton.3.3 an te Entropy Database Searc Factors for Best Proucts (ML + GA) Factors n C... C n I, & n n Compute Factors Entropy Varance (SB)..3 n ( I ) H n, Select Best Moel Prouct Mean Entropy σ n n H ( I, C ) Factors Entropy Varance σ H ( ) n I, Best Moel Prouct Entropy Varance Compute Factors Entropy Varance.3.3 ML Maxmum Lkeloo GA Genetc Algortm SB Smoote Bootstrap Fgure 9: Te complete proceure explane n Secton.3. Te blue numbers n te corner of eac box correspon to te sectons n wc te partcular computatonal box s explane. 9

21 nformaton n te graps of Fgure Proten examples 3.. -arpn tryptopan zpper [3, 6, Our frst real ensemble conssts of 48 conformatons of te -arpn tryptopan zpper 7], a pepte avng resues. Te backbone of tese peptes can be escrbe by torson angles ( φ s an ψ s), an consequently we nee to estmate a mensonal probablty strbuton functon (pf). In Fgure a, te evoluton (wt respect to n, te number of contonng varables) of te entropy s splaye. ote tat nclung more tan two contonng varables n te ensty factors oes not sgnfcantly mprove te estmate (compare to tree), an can even eterorate t (compare to four). As explane n Secton.., te magntue of te mprovement/eteroraton soul be juge relatve to te varance corresponng to te score of eac moel (represente by te blue ban n Fgure a). Applyng te algortm of Secton.3.3, we select te moel wt two contonng varables as te one tat best represents te ensemble. We o not clam tat te true mensonalty of te process s two, but only tat for te current avalable sample, te beneft obtane n computng a pf usng te atonal epenences accounte for wen nclung Dataset : x ~ x ~ x ~ 3 (.6,0.95 ) x4 ~ ( 3.,0.45 ) (.8,0.74 ) x5 ~ ( 5.,0.37 ) ( 5.8,0.54 ) x ~ ( 0.6,0.79 ) 6 Dataset : Dataset 3: Dataset 4: x ~ x ~ x ~ 3 x ~ x ~ x ~ 3 x ~ w, w, x ~ x ~ 3 w, ( 3.,0.80 ) x4 ~ (.9,0.56 ) (.5,0.57 ) x5 = x + x + w ( 0.3,0.50 ) x6 = x3 + x4 + w w ~ ( 0,0.05 ) (.3,0.47 ) x4 ~ ( 6.0,0.33 ) ( 3.3,0.53 ) x5 = x + x + x3 (.6,0.79 ) x6 = x x3 + x4 w ~ ( 0,0.05 ) ( 3.7,0.34 ) x4 = x (.8,0.4 ) x5 = x ( 3.4,0.50 ) x6 = x3 w, w ~ ( 0,0.05 ) 3 + w + w + w 3 + w + w Fgure 0: Valaton wt four artfcally generate atasets. Left: te relatonsp between te varables s scematcally represente. Mle: a formal escrpton of te relatonsps. Rgt: evoluton of te entropy as more contonng varables are nclue. Te cyan ban represents te 99% confence nterval. 0

22 more tan two contonng varables, s smaller tat te arm one by te atonal smootng requre. It s nterestng to observe te epenences between varables foun by te algortm, Fgure. otce tat as expecte, te angles are often contone on te ajacent (along te can) angles of te same kn (φ or ψ ), or on te corresponng angle of te opposte kn. Ts s furter evence tat te algortm s ong wat t soul. An nterestng fact to note s tat we fn more frequent contonng on te ψ s; ts can be explane by te asymmetrc roles of φ an ψ n te Ramacanran map. It s appealng tat ts effect emerges from te combnaton of te formalsm an te smulaton ata, rater tan sometng wc must be nclue by an. We can gan nsgt of te structure of te ensemble by explctly computng te ensty value n te avalable observatons usng te moel selecte by our framework. Fgure b presents a plot of tese values sorte from te least lkely to te most. It can be seen n ts fgures tat a few conformatons are muc more lkely tan te rest (note tat te log-ensty s plotte) an a few conformatons are muc more unlkely tan te rest. Te expermentally etermne structures (PDB [9] entry LE0), also sown n te fgure, can be seen to ave smlar probablty enstes locate between tese two extremes. Are tose conformatons smlar to eac oter, or more precsely, are tere many moes n te ensty or just one? To conser ts queston, wc nees explct pf estmaton as one ere, we plotte n Fgure c te stances between all te conformatons. Snce te conformatons are sorte (as explane above), te stances between te most probable conformatons le n te upper rgt corner. Te pattern n ts area agrees wt a unmoal Contone Varable Contonng Varable Fgure : Depenences scovere between te varables. Depenency agram wen two (n blue) an tree (n re) contonng varables are allowe. Te contonng varables for a gven varable appear as φ ( n row n ots n te corresponng row. For nstance, wen two contonng varables are allowe for, { cos } were x an y are Te stance was compute usng te angles formula: ( x y) = ( x y ) te topmost squares), φ 3 (3 r column n te leftmost squares) an two are fferent selecte. -mensonal If one more conformatons varable s allowe, as escrbe φ 6 s also by ter nclue. torson angles. ψ ( n column n te rgtmost squares) =

23 b a c Fgure : Analyss of te estmate ensty. a) Evoluton of te entropy as more contonng varables are nclue. As before, te ban represents te 99% confence nterval. b) Logartm of te ensty for eac structure (observaton). Te structures are sorte from te least probable (on te left) to te most probable (on te rgt). Te expermentally etermne structures (PDB entry LE0), wc were not part of te set use for ervng te pf, are sown n green. c) Dstances between all te structures sorte as n b. ) Dstances between te 0% most probable structures sorte to sow clusters. Use te color bar prove to translate nto te corresponng numerc values. ensty. Ts suggests a way to coose a representatve for te ensemble, f one nees to be selecte, smply as te most probable conformaton. Zoomng n on te top ten percent of te structures owever, sows tat tese structures cluster aroun two stnct moes (Fgure ), tat le relatvely close to eac oter. Te tree most lkely conformatons an te tree least lkely conformatons from te avalable ensemble are sown n Fgure 3. As expecte, te most lkely conformatons ave more yrogen bons an ence are more stable. 3.. Vlln eapece Te secon real ensemble we analyze conssts of 543 conformatons of te vlln eapece molecule [3, 6], a pepte avng 36 resues (70 torson angles). Te same tests escrbe n te prevous secton were performe on ts ataset an smlar conclusons can be rawn. Fgure 4 sows te corresponng results. In ts case, snce te sample s more tan tree tmes bgger tan n te prevous example (543 versus 48), te system s able to capture tree contonng varables nstea of two (see Fgure 4b). As before, few conformatons are Fgure 3: Te most lkely an unlkely conformatons. Te tree most lkely (top) an less lkely (bottom) conformatons accorng to te ensty compute by our propose framework.

24 a b ) c ) ) Fgure 4: Analyss of te estmate ensty for te secon ataset. a) Logartm of te ensty for eac smulate (black lne) an te expermental (green crcles) structure. Te structures are sorte from te least probable (on te left) to te most probable (on te rgt). Te ensty of te average natve s ncate by a blue square. b) Depenency of te entropy on te number of contonng varables. c) Dstances between all te structures sorte as n a. ) Dstances between te 0% most probable structures sorte to sow clusters. Use te color bar prove to translate nto te numerc values. muc more lkely tan te rest (Fgure 4a) an tose are stuate n wat appears to be te moe of a unmoal ensty (Fgure 4c). However, wen ts moe s carefully examne, t splts nto several moes (Fgure 4). For ts molecule, an ensemble of structures etermne usng MR tecnques can be foun n [30], togeter wt ts mnmze average ( natve ) (PDB entry VII). We can estmate te probablty ensty of tese structures an compare t wt te probablty ensty obtane Cluster Cluster Cluster 3 Most Lkely Least Lkely Fgure 5: Selecte conformatons from te ensemble. Te most probable conformatons of te tree (most probable) clusters of fgure 4 are n te frst tree rows, one cluster per row. Te two least probable conformatons of te wole ensemble an te natve appear n te fourt row. 3

25 for te oter structures n te ensemble. By ong ts we foun tat te expermental ensemble (crcles an square n Fgure 4a) belongs to te group of most unlkely structures. Ts mgt ncate tat te ensemble of smulate structures s not yet correctly capturng te wole nformaton about te natve state 3 an also rases te queston of ow muc confence soul be gven to a sngle (expermental) structure. Te most an least probable conformatons are sown n Fgure 5. To furter stuy ts penomenon of low probablty assocate wt te natve expermental structures (extracte from te work of McKngt et al. [30] ), we plotte te value of eac compute factor (from Equaton (7)) n ts corresponng spatal locaton (Fgure 6). In te same fgure we nclue a grap of te factors were te natve structure appears to ffer te most from te smulate structures, resultng n low factor values, an tereby small probabltes. For te sake of vsualzaton, an snce accorng to Fgure 4b not muc explcatve power s obtane wen usng more tan one contonng varable, we only nclue one contonng varable n te graps of te factors. Te tecnque presente n ts work can not only be use to asses te probablty of an exstng structure, but also to generate novel structures avng (presumably) g probablty. To obtan tese structures we start by coosng a pont n te space of conformatons an follow te recton of te graent of te compute pf, untl a local maxmum s reace. In Fgure 7, left, te cange of te probablty s sown for tree fferent groups: tose tat starte at te most unlkely conformatons of te orgnal ensemble (n re), tose tat starte at te most lkely conformatons of te orgnal ensemble (n blue) an tose tat starte at conformatons of te orgnal ensemble avng ntermeate probablty (n green). It can be seen n te fgure tat, n every case, but especally for te re (most unlkely) structures, tere was a marke mprovement n te probablty of te structures. Wy s te probablty ncreasng? One explanaton s tat te optmzaton s assemblng togeter popular parts to create te new structures, fnng te consensus among te observe structures for eac regon of te proten. Tese new conformatons automatcally obtane from te compute pf can be use for example as novel ntal contons for molecular ynamcs or for proucng new canates for g resoluton proten esgn. It s also of nterest to asses te natveness of te generate structures. As we repeately mentone n ts work, see Introucton, usng a sngle structure to caracterze te natve state may not be te best approac. In ts case, te stance from te new structures to te expermental ensemble (Fgure 7, center) s of te same orer as te nternal varablty of te expermental natve observatons temselves (Fgure 8). Lackng te necessary observatons of te expermental structure to follow te approac ntrouce n ts artcle, we ave to resort nstea to te stance () below to te natve ensemble as a measurement of natveness (te stance to te ensemble s gven by te mnmal stance to all te elements n t). Tese results are plotte n Fgure 7, center. ote tat te stance for te majorty of te most lkely (blue an green) structures s mprove by te optmzaton. In Fgure 7 rgt, were we plotte te log-probablty of te new structures versus ter stance to te natve ensemble, t can be seen tat tere s a tenency for te closest structures to te natve ensemble to be te most lkely. Tereby, te pf s fnng from all te molecular ynamcs results, te best ones accorng to ts stance. Tese best ones ten lea to new conformatons, new samples of te conformaton space, va te pf graent ascent tecnque mentone above, Fgure 7 left. 3 Possbly ue to an naccurate force fel use n te smulaton. 4

26 Fgure 6: Contrbuton of eac factor to te overall ensty of te average natve. Top: Te ensty of eac factor (φ s on te rgt,ψ s on te left) s represente by te color of te corresponng ball. Te four lowest factors are labele an a grap s nclue at te bottom. Bottom: Grap of te log-ensty for selecte factors. Black ots represent te molecular ynamcs samples use to compute te ensty; wte crcles represent te expermental structures; gray square s te natve (expermental average) structure; wte x an * represent te least an most lkely structures of te molecular ynamcs ensemble respectvely. ote tat te natve as some factors not locate at te top of te ensty, tereby explanng wy te overall probablty of ts sngle structure s low. 5

27 Fgure 7: ew structures. Tose were obtane by graent ascent startng at te most unlkely conformatons of te orgnal ensemble (n re), te most lkely conformatons of te orgnal ensemble (n blue), an orgnal conformatons of ntermeate lkeloo. 4 Conclusons an scusson A meto to estmate a probablty ensty functon n te space of fole proten conformatons was evelope. Ts meto oes not ave any free parameters to fx oter tan te sape of te kernel use (Secton..), an reles on funamental results from estmaton an nformaton teory an on te assumpton tat only a few angles strongly nfluence a specfc torson angle (or n general, a few varables strongly affect anoter specfc varable). Ts s neee n orer to reuce te mensonalty of te problem. Wt our framework, we not only obtan te best possble pf (moulo our optmalty crtera), but also learn te orer of te moel (n) an explctly fn te torson angles (varables) epenences. Better estmates mgt be obtane f constrants between te angles (e.g., te allowe regons of te Ramacanran plot) an/or energy prors (.e. Boltzmann wegtng) are nclue n te moel. Obvously ts can be stragtforwarly one an mprovements are expecte. Oter acknowlege places for mprovement nclue te optmzaton proceure use to coose te banwt for te kernels (wc may be substtute by more effcent metos); te Fast Gauss Transform or Dual Tree metos can be use to spee up te computaton of te entropes [3, 3] ; an te genetc algortm explane n Secton.3. to scover te epenences. Also, more researc s requre for selectng te banwt use n te smoot bootstrap step (Secton..3), unless ts problem s altogeter avoe by usng oter metos (e.g., te jackknfe [] ) wc o not prouce resamples avng repeate observatons. Fgure 8: Dstance matrx between expermental conformatons. Angular stance between te expermental structures (from [3] ) compute usng (). 6

28 Havng a meto to statstcally caracterze te fole ensemble, t coul be temptng to apply te same meto to oter ensembles as well. Recall tat n orer to apply ts propose meto, te ensemble must satsfy te basc assumpton about te epenency between te torson angles (or te varables use for te structure representaton n general). In partcular, for te ensemble of partally fole conformatons, ts assumpton s less val ue to te versty of long range nteractons tat can take place, beng te number of epenences n tat are to be taken nto account n ts case qute large, an tereby, not obtanng a mensonalty reucton as sgnfcant as te one obtane n te fole case. Agan, ts meto can be easly extene to oter escrptons of protens suc as tose nclung te se cans or even to completely fferent atasets, f tose satsfy te basc assumpton state above. Furtermore, we expect better results f more complete escrptons are supple (e.g. te se can angles/par-wse stances are nclue), snce tose can also be nclue as contonng varables, f tey turn out to be te most nformatve. Our results suggests tat for a gven accuracy, te number of sample ponts n te ensemble oes not nee to grow exponentally wt te number of resues n te pepte, but only wt te number of tose actually affectng eac oter. In oter wors, te true menson of te ataset s muc smaller tan te total number of torson angles, beng efne only by te nteractons n small negboroos an not across te wole proten. Wen comparng structures n orer to valate our tecnque (Secton 3..), we use te angles stance (wc, by te way, s ntmately relate to te Von Mses kernels): x, y { cos x y } ( ) = ( ) = Te coce of ts metrc s someow arbtrary, unermnng te results erve usng t. Ts coce of metrc s not completely compatble wt te ensty estmate propose n ts work. If we were to beleve tat Equaton () s te rgt metrc to use for ts space, we soul ave cosen symmetrcal kernels n -mensons. Conversely, snce as explane before, ts type of kernels s a very ba coce, we tnk tat usng te metrc of () s not optmal. In Secton.., wle presentng te kernel ensty estmaton tecnque, a stance was mplctly use to asses te contrbuton tat eac observaton as on te pont were te ensty s beng estmate. ow tat a ensty as been foun, t may be nterestng to use ts relatonsp between stances an enstes n te oter recton, to erve a stance from te ensty. Ts natural stance takes nto account te structure of te ensemble, an ts wll be furter stue elsewere. Snce t s commonplace n ts fel to use te C α -RMSD to compare conformatons, for completeness, we conclue by nclung n Fgure 9 te equvalent of fgures 7b-c an 8 compute usng ts metrc nstea of (). Ts approac presents two man ffcultes. Frst, t s not trval to compute te 3 coornates of te C α s from te torson angles (we use te stanar Eng an Huber [33] angles for te reconstructon). Secon, small varatons n te torson angles can prouce large varatons n te 3 coornates. As before, we compute te RMSD stance to te wole expermental natve ensemble, not just to an average structure. ote once agan te automatc clusterng of te most probable conformatons, as compute wt our pf, an ow t gets closer to te expermental ensemble. ote also te large nner varablty (left fgure) among te expermental conformatons temselves, of te same orer of te varablty of te new conformatons create by our algortm wen startng from te most probable ones. Of course, f ts kn of comparson was ntene, a fferent set of features soul ave been cosen n te frst place (not te torson angles). () 7

29 Fgure 9: Graps corresponng to fgures 7b-c an 8 but compute usng C α -RMSD. To conclue, base on statstcs an nformaton teory, we ave presente a framework to compute te strbuton of proten conformatons, wt possble applcatons from proten comparsons to conformaton space samplng to g resoluton proten esgn. Tese applcatons, as well as te explotaton of te pf to efne new stance functons (to be reporte elsewere), may promote a sft from te current empass on sngle structures to te conseraton of wole ensembles, allowng all te avalable nformaton to play a role. Acknowlegements: We tank Bojan Zagrovc for te carefully reang of te paper, s suggestons (especally tose resultng n te ncluson of fgures 7 an 9) an for pontng out relevant references. We also tank Dav Baker, Alexaner Grossberg an Brgt Grun for elpful scussons, an OR, SF, GA an DARPA for te founng. Ts work was carre out n part usng computng resources at te Unversty of Mnnesota Supercomputng Insttute. 5 References. Repng W, Habeck M, lges M. (005) Inferental structure etermnaton. Scence 309: Roter D, Sapro G, Pane V. (005) Statstcal caracterzaton of proten ensembles. RECOMB 005 Poster Abstracts: Zagrovc B, Snow CD, Kal S, Srts MR, Pane VS. (00) atve-lke mean structure n te unfole ensemble of small protens. Journal of Molecular Bology. 33: Sortle D, Smons KT, Baker D. (998) Clusterng of low-energy conformatons near te natve structures of small protens. Proceengs of te atonal Acaemy of Scences, USA. 95: Braley P, Msura, K. M. S., Baker D. (005) Towar g-resoluton e novo structure precton for small protens. Scence 309: Pane VS, Stanfor Unversty. (005) Folng@ome strbute computng. Avalable: ttp://folng.stanfor.eu/ va te Internet. 7. Baker D. (003) Te baker laboratory. Avalable: ttp:// va te Internet. 8

ENTROPIC QUESTIONING

ENTROPIC QUESTIONING ENTROPIC QUESTIONING NACHUM. Introucton Goal. Pck the queston that contrbutes most to fnng a sutable prouct. Iea. Use an nformaton-theoretc measure. Bascs. Entropy (a non-negatve real number) measures

More information

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling Open Journal of Statstcs, 0,, 300-304 ttp://dx.do.org/0.436/ojs.0.3036 Publsed Onlne July 0 (ttp://www.scrp.org/journal/ojs) Multvarate Rato Estmator of te Populaton Total under Stratfed Random Samplng

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

New Liu Estimators for the Poisson Regression Model: Method and Application

New Liu Estimators for the Poisson Regression Model: Method and Application New Lu Estmators for the Posson Regresson Moel: Metho an Applcaton By Krstofer Månsson B. M. Golam Kbra, Pär Sölaner an Ghaz Shukur,3 Department of Economcs, Fnance an Statstcs, Jönköpng Unversty Jönköpng,

More information

COMP4630: λ-calculus

COMP4630: λ-calculus COMP4630: λ-calculus 4. Standardsaton Mcael Norrs Mcael.Norrs@ncta.com.au Canberra Researc Lab., NICTA Semester 2, 2015 Last Tme Confluence Te property tat dvergent evaluatons can rejon one anoter Proof

More information

Explicit bounds for the return probability of simple random walk

Explicit bounds for the return probability of simple random walk Explct bouns for the return probablty of smple ranom walk The runnng hea shoul be the same as the ttle.) Karen Ball Jacob Sterbenz Contact nformaton: Karen Ball IMA Unversty of Mnnesota 4 Ln Hall, 7 Church

More information

Competitive Experimentation and Private Information

Competitive Experimentation and Private Information Compettve Expermentaton an Prvate Informaton Guseppe Moscarn an Francesco Squntan Omtte Analyss not Submtte for Publcaton Dervatons for te Gamma-Exponental Moel Dervaton of expecte azar rates. By Bayes

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fall Analyss of Expermental Measurements B. Esensten/rev. S. Erree Hypothess Testng, Lkelhoo Functons an Parameter Estmaton: We conser estmaton of (one or more parameters to be the expermental etermnaton

More information

Yukawa Potential and the Propagator Term

Yukawa Potential and the Propagator Term PHY304 Partcle Physcs 4 Dr C N Booth Yukawa Potental an the Propagator Term Conser the electrostatc potental about a charge pont partcle Ths s gven by φ = 0, e whch has the soluton φ = Ths escrbes the

More information

2. High dimensional data

2. High dimensional data /8/00. Hgh mensons. Hgh mensonal ata Conser representng a ocument by a vector each component of whch correspons to the number of occurrences of a partcular wor n the ocument. The Englsh language has on

More information

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors

More information

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors

More information

ENGI9496 Lecture Notes Multiport Models in Mechanics

ENGI9496 Lecture Notes Multiport Models in Mechanics ENGI9496 Moellng an Smulaton of Dynamc Systems Mechancs an Mechansms ENGI9496 Lecture Notes Multport Moels n Mechancs (New text Secton 4..3; Secton 9.1 generalzes to 3D moton) Defntons Generalze coornates

More information

Analytical classical dynamics

Analytical classical dynamics Analytcal classcal ynamcs by Youun Hu Insttute of plasma physcs, Chnese Acaemy of Scences Emal: yhu@pp.cas.cn Abstract These notes were ntally wrtten when I rea tzpatrck s book[] an were later revse to

More information

The Noether theorem. Elisabet Edvardsson. Analytical mechanics - FYGB08 January, 2016

The Noether theorem. Elisabet Edvardsson. Analytical mechanics - FYGB08 January, 2016 The Noether theorem Elsabet Evarsson Analytcal mechancs - FYGB08 January, 2016 1 1 Introucton The Noether theorem concerns the connecton between a certan kn of symmetres an conservaton laws n physcs. It

More information

On a one-parameter family of Riordan arrays and the weight distribution of MDS codes

On a one-parameter family of Riordan arrays and the weight distribution of MDS codes On a one-parameter famly of Roran arrays an the weght strbuton of MDS coes Paul Barry School of Scence Waterfor Insttute of Technology Irelan pbarry@wte Patrck Ftzpatrck Department of Mathematcs Unversty

More information

Chapter 2 Transformations and Expectations. , and define f

Chapter 2 Transformations and Expectations. , and define f Revew for the prevous lecture Defnton: support set of a ranom varable, the monotone functon; Theorem: How to obtan a cf, pf (or pmf) of functons of a ranom varable; Eamples: several eamples Chapter Transformatons

More information

TR/95 February Splines G. H. BEHFOROOZ* & N. PAPAMICHAEL

TR/95 February Splines G. H. BEHFOROOZ* & N. PAPAMICHAEL TR/9 February 980 End Condtons for Interpolatory Quntc Splnes by G. H. BEHFOROOZ* & N. PAPAMICHAEL *Present address: Dept of Matematcs Unversty of Tabrz Tabrz Iran. W9609 A B S T R A C T Accurate end condtons

More information

Large-Scale Data-Dependent Kernel Approximation Appendix

Large-Scale Data-Dependent Kernel Approximation Appendix Large-Scale Data-Depenent Kernel Approxmaton Appenx Ths appenx presents the atonal etal an proofs assocate wth the man paper [1]. 1 Introucton Let k : R p R p R be a postve efnte translaton nvarant functon

More information

Problem Set 4: Sketch of Solutions

Problem Set 4: Sketch of Solutions Problem Set 4: Sketc of Solutons Informaton Economcs (Ec 55) George Georgads Due n class or by e-mal to quel@bu.edu at :30, Monday, December 8 Problem. Screenng A monopolst can produce a good n dfferent

More information

GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS

GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS YANG FAN AND RUI HAN Abstract. We stuy the mult-mensonal operator (H xu) n = m n = um + f(t n

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:

More information

Hard Problems from Advanced Partial Differential Equations (18.306)

Hard Problems from Advanced Partial Differential Equations (18.306) Har Problems from Avance Partal Dfferental Equatons (18.306) Kenny Kamrn June 27, 2004 1. We are gven the PDE 2 Ψ = Ψ xx + Ψ yy = 0. We must fn solutons of the form Ψ = x γ f (ξ), where ξ x/y. We also

More information

So... why do we keep having this debate: rules/symbols vs. prototypes/connections?

So... why do we keep having this debate: rules/symbols vs. prototypes/connections? So... wy o we keep avng ts ebate: rules/symbols vs. prototypes/connectons? So... Te real problem: a spurous contest between logc an probablty. Neter logc nor probablty on ts own s suffcent to account for

More information

On Pfaff s solution of the Pfaff problem

On Pfaff s solution of the Pfaff problem Zur Pfaff scen Lösung des Pfaff scen Probles Mat. Ann. 7 (880) 53-530. On Pfaff s soluton of te Pfaff proble By A. MAYER n Lepzg Translated by D. H. Delpenc Te way tat Pfaff adopted for te ntegraton of

More information

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nnth Internatonal IBPSA Conference Montréal, Canaa August 5-8, 2005 SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nabl Nassf, Stanslaw Kajl, an Robert Sabourn École e technologe

More information

Solution for singularly perturbed problems via cubic spline in tension

Solution for singularly perturbed problems via cubic spline in tension ISSN 76-769 England UK Journal of Informaton and Computng Scence Vol. No. 06 pp.6-69 Soluton for sngularly perturbed problems va cubc splne n tenson K. Aruna A. S. V. Rav Kant Flud Dynamcs Dvson Scool

More information

MULTI-REGULARIZATION PARAMETERS ESTIMATION FOR GAUSSIAN MIXTURE CLASSIFIER BASED ON MDL PRINCIPLE

MULTI-REGULARIZATION PARAMETERS ESTIMATION FOR GAUSSIAN MIXTURE CLASSIFIER BASED ON MDL PRINCIPLE MULI-REGULARIZAIO PARAMEERS ESIMAIO FOR GAUSSIA MIXURE CLASSIFIER BASED O MDL PRICIPLE Xulng Zou,, Png Guo an C L Plp Cen 3 e Laboratory of Image Processng an Pattern Recognton, Beng ormal Unversty, Beng,

More information

The Finite Element Method: A Short Introduction

The Finite Element Method: A Short Introduction Te Fnte Element Metod: A Sort ntroducton Wat s FEM? Te Fnte Element Metod (FEM) ntroduced by engneers n late 50 s and 60 s s a numercal tecnque for solvng problems wc are descrbed by Ordnary Dfferental

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

On Liu Estimators for the Logit Regression Model

On Liu Estimators for the Logit Regression Model CESIS Electronc Workng Paper Seres Paper No. 59 On Lu Estmators for the Logt Regresson Moel Krstofer Månsson B. M. Golam Kbra October 011 The Royal Insttute of technology Centre of Excellence for Scence

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Adaptive Kernel Estimation of the Conditional Quantiles

Adaptive Kernel Estimation of the Conditional Quantiles Internatonal Journal of Statstcs and Probablty; Vol. 5, No. ; 206 ISSN 927-7032 E-ISSN 927-7040 Publsed by Canadan Center of Scence and Educaton Adaptve Kernel Estmaton of te Condtonal Quantles Rad B.

More information

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II PubH 7405: REGRESSION ANALSIS SLR: INFERENCES, Part II We cover te topc of nference n two sessons; te frst sesson focused on nferences concernng te slope and te ntercept; ts s a contnuaton on estmatng

More information

Multigrid Methods and Applications in CFD

Multigrid Methods and Applications in CFD Multgrd Metods and Applcatons n CFD Mcael Wurst 0 May 009 Contents Introducton Typcal desgn of CFD solvers 3 Basc metods and ter propertes for solvng lnear systems of equatons 4 Geometrc Multgrd 3 5 Algebrac

More information

An efficient method for computing single parameter partial expected value of perfect information

An efficient method for computing single parameter partial expected value of perfect information An effcent metho for computng sngle parameter partal expecte value of perfect nformaton Mark Strong,, Jeremy E. Oakley 2. School of Health an Relate Research ScHARR, Unversty of Sheffel, UK. 2. School

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Y = X + " E [X 0 "] = 0 K E ["" 0 ] = = 2 I N : 2. So, you get an estimated parameter vector ^ OLS = (X 0 X) 1 X 0 Y:

Y = X +  E [X 0 ] = 0 K E [ 0 ] = = 2 I N : 2. So, you get an estimated parameter vector ^ OLS = (X 0 X) 1 X 0 Y: 1 Ecent OLS 1. Consder te model Y = X + " E [X 0 "] = 0 K E ["" 0 ] = = 2 I N : Ts s OLS appyland! OLS s BLUE ere. 2. So, you get an estmated parameter vector ^ OLS = (X 0 X) 1 X 0 Y: 3. You know tat t

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Direct Methods for Solving Macromolecular Structures Ed. by S. Fortier Kluwer Academic Publishes, The Netherlands, 1998, pp

Direct Methods for Solving Macromolecular Structures Ed. by S. Fortier Kluwer Academic Publishes, The Netherlands, 1998, pp Drect Metods for Solvng Macromolecular Structures Ed. by S. Forter Kluwer Academc Publses, Te Neterlands, 998, pp. 79-85. SAYRE EQUATION, TANGENT FORMULA AND SAYTAN FAN HAI-FU Insttute of Pyscs, Cnese

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Numerical Simulation of One-Dimensional Wave Equation by Non-Polynomial Quintic Spline

Numerical Simulation of One-Dimensional Wave Equation by Non-Polynomial Quintic Spline IOSR Journal of Matematcs (IOSR-JM) e-issn: 78-578, p-issn: 319-765X. Volume 14, Issue 6 Ver. I (Nov - Dec 018), PP 6-30 www.osrournals.org Numercal Smulaton of One-Dmensonal Wave Equaton by Non-Polynomal

More information

Visualization of 2D Data By Rational Quadratic Functions

Visualization of 2D Data By Rational Quadratic Functions 7659 Englan UK Journal of Informaton an Computng cence Vol. No. 007 pp. 7-6 Vsualzaton of D Data By Ratonal Quaratc Functons Malk Zawwar Hussan + Nausheen Ayub Msbah Irsha Department of Mathematcs Unversty

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Shuai Dong. Isaac Newton. Gottfried Leibniz

Shuai Dong. Isaac Newton. Gottfried Leibniz Computatonal pyscs Sua Dong Isaac Newton Gottred Lebnz Numercal calculus poston dervatve ntegral v velocty dervatve ntegral a acceleraton Numercal calculus Numercal derentaton Numercal ntegraton Roots

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Robust Dynamc rogrammng for Dscounte Infnte-Horzon Markov Decson rocesses wth Uncertan Statonary Transton

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecture 3 Contnuous Systems an Fels (Chapter 3) Where Are We Now? We ve fnshe all the essentals Fnal wll cover Lectures through Last two lectures: Classcal Fel Theory Start wth wave equatons

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

A capacitor is simply two pieces of metal near each other, separated by an insulator or air. A capacitor is used to store charge and energy.

A capacitor is simply two pieces of metal near each other, separated by an insulator or air. A capacitor is used to store charge and energy. -1 apactors A capactor s smply two peces of metal near each other, separate by an nsulator or ar. A capactor s use to store charge an energy. A parallel-plate capactor conssts of two parallel plates separate

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

A Discrete Approach to Continuous Second-Order Boundary Value Problems via Monotone Iterative Techniques

A Discrete Approach to Continuous Second-Order Boundary Value Problems via Monotone Iterative Techniques Internatonal Journal of Dfference Equatons ISSN 0973-6069, Volume 12, Number 1, pp. 145 160 2017) ttp://campus.mst.edu/jde A Dscrete Approac to Contnuous Second-Order Boundary Value Problems va Monotone

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

One can coose te bass n te 'bg' space V n te form of symmetrzed products of sngle partcle wavefunctons ' p(x) drawn from an ortonormal complete set of

One can coose te bass n te 'bg' space V n te form of symmetrzed products of sngle partcle wavefunctons ' p(x) drawn from an ortonormal complete set of 8.54: Many-body penomena n condensed matter and atomc pyscs Last moded: September 4, 3 Lecture 3. Second Quantzaton, Bosons In ts lecture we dscuss second quantzaton, a formalsm tat s commonly used to

More information

Lecture 4. Heritability. Heritability: An Intuitive Approach First Definition

Lecture 4. Heritability. Heritability: An Intuitive Approach First Definition Lecture Hertablty Hertablty: n Intutve pproac Frst enton Broa Sense: Proporton o te penotypc varaton ue to genetc causes H G Y Narro Sense: Proporton o te penotypc varaton ue to atve genetc eects Y Useul

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

CHARACTERISTIC EARTHQUAKE MAGNITUDE: MATHEMATICAL VERSUS EMPIRICAL MODELS

CHARACTERISTIC EARTHQUAKE MAGNITUDE: MATHEMATICAL VERSUS EMPIRICAL MODELS Te 4 t World Conference on Eartquake Engneerng October 2-7, 28, Bejng, Cna CHAACTEISTIC EATHQUAKE MAGNITUDE: MATHEMATICAL VESUS EMPIICAL MODELS ABSTACT : G. Grandor, E Guagent 2 and L. Petrn 3 Emertus

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Solving Singularly Perturbed Differential Difference Equations via Fitted Method

Solving Singularly Perturbed Differential Difference Equations via Fitted Method Avalable at ttp://pvamu.edu/aam Appl. Appl. Mat. ISSN: 193-9466 Vol. 8, Issue 1 (June 013), pp. 318-33 Applcatons and Appled Matematcs: An Internatonal Journal (AAM) Solvng Sngularly Perturbed Dfferental

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

FINITE-SAMPLE PROPERTIES OF THE MAXIMUM LIKELIHOOD ESTIMATOR FOR THE BINARY LOGIT MODEL WITH RANDOM COVARIATES

FINITE-SAMPLE PROPERTIES OF THE MAXIMUM LIKELIHOOD ESTIMATOR FOR THE BINARY LOGIT MODEL WITH RANDOM COVARIATES conometrcs Workng Paper WP0906 ISSN 485-644 Department of conomcs FINIT-SAMPL PROPRTIS OF TH MAIMUM LIKLIHOOD STIMATOR FOR TH BINARY LOGIT MODL WITH RANDOM COVARIATS Qan Chen School of Publc Fnance an

More information

Throughput Capacities and Optimal Resource Allocation in Multiaccess Fading Channels

Throughput Capacities and Optimal Resource Allocation in Multiaccess Fading Channels Trougput Capactes and Optmal esource Allocaton n ultaccess Fadng Cannels Hao Zou arc 7, 003 Unversty of Notre Dame Abstract oble wreless envronment would ntroduce specal penomena suc as multpat fadng wc

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Homological methods in Non-commutative Geometry Tokyo, 2007/2008 1

Homological methods in Non-commutative Geometry Tokyo, 2007/2008 1 Homologcal methos n Non-commutatve Geometry Tokyo, 2007/2008 1 Lecture 2. Secon bcomplex for cyclc homology. Connes fferental. Cyclc homology an the e Rham cohomology n the HKR case. Homology of small

More information

A Spline based computational simulations for solving selfadjoint singularly perturbed two-point boundary value problems

A Spline based computational simulations for solving selfadjoint singularly perturbed two-point boundary value problems ISSN 746-769 England UK Journal of Informaton and Computng Scence Vol. 7 No. 4 pp. 33-34 A Splne based computatonal smulatons for solvng selfadjont sngularly perturbed two-pont boundary value problems

More information

On the First Integrals of KdV Equation and the Trace Formulas of Deift-Trubowitz Type

On the First Integrals of KdV Equation and the Trace Formulas of Deift-Trubowitz Type 2th WSEAS Int. Conf. on APPLIED MATHEMATICS, Caro, Egypt, December 29-3, 2007 25 On the Frst Integrals of KV Equaton an the Trace Formulas of Deft-Trubowtz Type MAYUMI OHMIYA Doshsha Unversty Department

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

1 Generating functions, continued

1 Generating functions, continued Generatng functons, contnued. Generatng functons and parttons We can make use of generatng functons to answer some questons a bt more restrctve than we ve done so far: Queston : Fnd a generatng functon

More information

Chapter 24 Work and Energy

Chapter 24 Work and Energy Chapter 4 or an Energ 4 or an Energ You have one qute a bt of problem solvng usng energ concepts. ac n chapter we efne energ as a transferable phscal quantt that an obect can be sa to have an we sa that

More information