Extending boosting for large scale spoken language understanding

Size: px
Start display at page:

Download "Extending boosting for large scale spoken language understanding"

Transcription

1 Mach Learn (2007) 69: DOI /s Extendng boostng for arge scae spoken anguage understandng Gokhan Tur Receved: 19 August 2005 / Revsed: 10 June 2007 / Accepted: 28 August 2007 / Pubshed onne: 25 September 2007 Sprnger Scence+Busness Meda, LLC 2007 Abstract We propose three methods for extendng the Boostng famy of cassfers motvated by the rea-fe probems we have encountered. Frst, we propose a semsupervsed earnng method for expotng the unabeed data n Boostng. We then present a nove cassfcaton mode adaptaton method. The goa of adaptaton s optmzng an exstng mode for a new target appcaton, whch s smar to the prevous one but may have dfferent casses or cass dstrbutons. Fnay, we present an effcent and effectve cost-senstve cassfcaton method that extends Boostng to aow for weghted casses. We evauated these methods for ca cassfcaton n the AT&T VoceTone spoken anguage understandng system. Our resuts ndcate that t s possbe to obtan the same cassfcaton performance by usng 30% ess abeed data when the unabeed data s utzed through semsupervsed earnng. Usng mode adaptaton we can acheve the same cassfcaton accuracy usng ess than haf of the abeed data from the new appcaton. Fnay, we present sgnfcant mprovements n the mportant (.e., hgher weghted) casses wthout a sgnfcant oss n overa performance usng the proposed cost-senstve cassfcaton method. Keywords Boostng Semsupervsed earnng Mode adaptaton Cost-senstve cassfcaton Ca cassfcaton Spoken daog systems 1 Introducton Statstca cassfcaton agorthms have ong been studed n the machne earnng communty. Typcay, cassfcaton modes are traned usng arge amounts of task data that are usuay abeed by humans. By abeng, we mean assgnng one or more of the predefned casses to each exampe. Budng better cassfcaton systems n a shorter tme frame s a Edtor: Dan Roth. Ths work was done when the author was wth AT&T Labs-Research, Forham Park, NJ G. Tur ( ) SRI Internatona Speech Technoogy and Research Lab, Meno Park, CA 94025, USA e-ma: gokhan@speech.sr.com

2 56 Mach Learn (2007) 69: need for most rea-word appcatons. For nstance, consder a natura anguage ca routng system where the am s to route the ncomng cas n a customer care ca center. In such a system the am s to dentfy the customer s ntent (ca-type), whch s bascay framed as a ca cassfcaton probem. Consder the utterance I woud ke to know my account baance. Assumng that the utterance s recognzed correcty by an automatc speech recognzer, the correspondng ca-type woud be Teme(Baance) and the acton woud be promptng the baance to the user or routng ths ca to the bng department. In cases where numerous such systems for dfferent ca centers from dfferent domans or companes need to be but, preparng abeed data for each system s a very expensve, tme consumng, and aborous process. Assumng that the botteneck s not the coecton of data but nstead abeng t, we propose usng semsupervsed earnng. Another probem wth most cassfcaton agorthms s that they make the assumpton that the nput channe s statonary, that s, the dstrbuton of the ncomng (tranng and test) exampes s constant. Athough keepng the tranng and test sets fxed may be a good dea n comparng dfferent cassfcaton approaches, for appcatons ke those descrbed above, the statonary dstrbuton assumpton may not aways be true. For exampe, f the correspondng company ntroduces a new servce, one may expect caers to start nqurng about that servce, whch may be unseen or very nfrequent n the tranng data. Contnuous human abeng of new data and mode retranng aevates ths probem. However, adaptaton to tme-varyng statstcs by ncrementng the effect of a sma set of newy abeed exampes mght be a more cost-effectve and faster souton. Lkewse, n some cases there may be a new appcaton very smar to an exstng one, such as product soutons for two companes from the same sector. Therefore, the od modes can be adapted to the new appcaton. In ths paper, we propose usng mode adaptaton technques for ths probem. Furthermore, typcay n the proposed cassfcaton agorthms a casses have the same mportance or cost for mutcass cassfcaton. For most rea-word appcatons, ths s not the case; not a casses have the same weght. The accuracy of a partcuar cass can be more crtca than others or hgh precson may be needed for some casses. Especay whe deang wth more than a few casses, ths s unavodabe. For the ca routng exampe, mscassfyng an utterance askng for an account baance as a request for canceaton s more costy than the other way around, as t may resut n a ost customer. In ths paper, we propose a cost-senstve cassfcaton approach. Our proposed methods depend on a partcuar cassfcaton agorthm: namey Boostng. Boostng s an teratve agorthm; on each teraton, a weak cassfer s traned on a weghted tranng set, and at the end, the weak cassfers are combned nto a snge, combned cassfer (Freund and Schapre 1997). The Boostng agorthm has been used successfuy for many cassfcaton tasks, such as text categorzaton (Schapre and Snger 2000). We propose the foowng methods for extendng the Boostng famy of cassfers: Semsupervsed Learnng: The goa s to expot the unabeed data n Boostng. We propose a method for augmentng the cassfcaton mode traned usng the human-abeed exampes wth the machne-abeed exampes n a weghted manner. The machne-abeed exampes are automatcay constructed usng the abes output by the mode traned wth human-abeed exampes. The resutng mode s actuay a superset of weak-earners of the nta mode. The new weak earners are earned once the nta ones are apped to a the human- and machne-abeed data. We aso compare ths approach wth a taskndependent basene approach where human-abeed data s smpy concatenated to the machne-abeed data. Mode Adaptaton: The goa s to adapt an exstng mode to a new target appcaton, whch s smar but may have dfferent casses or cass dstrbutons. Ths may aso n-

3 Mach Learn (2007) 69: cude contnuous adaptaton of an exstng mode to tme-varyng statstcs or expotng out-of-doman data for tranng the target mode. The dea s smar to the semsupervsed earnng technque. Boostng appes the exstng mode to the new data, and then new weak earners are added to ths exstng mode usng the new data n a weghted manner. Cost-Senstve Cassfcaton: The goa s to extend Boostng to aow weghted casses. Our man dea s to change the error functon consderng the weghts of the casses, thus ensurng optma accuracy at each teraton for the mportant casses. In other words, we have changed the crteron to choose the weak earner accordng to the assocated costs of the casses. In the foowng secton we summarze the prevous reated work on a three of these areas. We then brefy descrbe Boostng n Sect. 3 snce a extensons requre modfcatons to the orgna agorthm. Sectons 4, 5, and6 present our proposed methods for semsupervsed earnng, mode adaptaton, and cost-senstve cassfcaton, respectvey. In Sect. 7 we present our resuts usng a ca cassfcaton task from the AT&T VoceTone spoken daog system. 1 2 Reated work The foowng descrbes the reated work for each topc of our research. 2.1 Boostng for ca cassfcaton and text categorzaton Boostng, the machne earnng agorthm we focus n ths paper, has been used for a number of anguage processng tasks, such as ca cassfcaton and text categorzaton. Schapre and Snger have presented emprca resuts comparng Boostng wth other state-of-theart cassfcaton methods (Schapre and Snger 2000). For exampe, for text categorzaton Boostng outperformed other methods such as Roccho, Nave Bayes, k-nearest-neghbor (KNN), RIPPER, and Seepng Experts on the newswre data, especay when numerous exampes are avaabe for tranng. They have aso provded the frst experments on usng Boostng for a smpe ca routng task wth ony sx casses. A ater Boostng agorthm was used for the BBN ca routng system (Ztoun et a. 2001). In ths 23-way cassfcaton task, an addtona 10% mprovement has been observed compared to the prevousy best cassfcaton method, named Beta cassfcaton. AT&T has successfuy used Boostng n arge-scae ca cassfcaton systems, frst wth a Hepdesk appcaton (D Fabbrzo et a. 2002) and ater for ts enterprse customers as part of the AT&T VoceTone spoken daog system (Gupta et a. 2006). The agorthm was then extended to hande manuay wrtten ca cassfcaton rues to augment the abeed data n speca cases (Schapre et a. 2005). Ths paper has emerged from the apparent needs of extensons to the basc Boostng agorthm durng the rea-fe appcaton deveopments for VoceTone. 1 The VoceTone system s provded by AT&T for customer care centers.

4 58 Mach Learn (2007) 69: Semsupervsed earnng Semsupervsed earnng agorthms that use both abeed and unabeed data have been used for cassfcaton n order to reduce the need for abeed tranng data. Bum and Mtche (1998) proposed a semsupervsed earnng approach caed co-tranng. For cotranng, the features n the probem doman shoud naturay dvde nto two sets. Then the exampes cassfed wth hgh confdence scores wth one vew can be used as the tranng data of other vews. For exampe, for web page cassfcaton, one vew can be the text n the web pages and another vew can be the text n the hypernks pontng to those web pages. For the same task, Ngam et a. (2000) used an agorthm for earnng from abeed and unabeed documents based on the combnaton of the Expectaton Maxmzaton (EM) agorthm and a Nave Bayes cassfer. Ngam and Ghan (2000) then combned co-tranng and EM agorthms, comng up wth the Co-EM agorthm, whch s the probabstc verson of co-tranng. Ghan (2002) ater combned the Co- EM agorthm wth error-correctng output codng (ECOC) to expot the unabeed data, n addton to the abeed data. For spoken anguage understandng, we have presented semsupervsed earnng approaches for expotng unabeed data (Tur and Hakkan-Tür 2003). Note that our focus s dfferent from tranng ca routng systems wth automatc speech recognzer (ASR) output nstead of manua transcrptons such as (Iyer et a. 2002; Ashaw 2003). In ths paper we present how Boostng can be extended to augment an exstng mode usng unabeed data. 2.3 Mode adaptaton Athough statstca mode adaptaton has been we studed n some specfc areas such as speech recognton for acoustc and anguage modeng (Rccard and Gorn 2000; Bacchan et a. 2004; Dgaaks et a. 1995, among others), there s comparaby ess work done on machne earnng and natura anguage processng. We beeve ths s the frst study presentng mode adaptaton for anguage understandng. One recent study s on the adaptaton of natura anguage understandng usng a common adaptaton method of maxmum a posteror (MAP) adaptaton (He and Young 2004), whch adapts the hdden vector state mode but for an arne trave nformaton appcaton (ATIS) to another (DARPA Communcator). Another study s about supervsed and unsupervsed adaptaton of probabstc context-free grammars to a new doman agan usng MAP adaptaton (Roark and Bacchan 2003). For spoken anguage understandng, we have proposed a mode adaptaton approach usng Boostng (Tur 2005). In the machne earnng terature, mode adaptaton has been studed under the ttes of meta-earnng (Prodromds and Stofo 1998) and muttask earnng (Caruana 1997). Metaearnng ams at earnng usefu nformaton from arge and nherenty dstrbuted sources (n our case, appcatons). Gven mutpe cassfcaton modes traned usng oca data, the goa s to tran a meta-eve cassfer combnng a these basene modes usng a meta-eve tranng set. Muttask earnng ams at tranng tasks (n our case, appcatons) n parae whe usng a shared representaton. What s earned for each task can hep other tasks be earned better. 2.4 Cost-senstve cassfcaton In the machne earnng terature, vast majorty of the cassfers, ncudng Boostng, does not hande weghted casses automatcay, a ack that we address n ths work. There are a number of studes expcty attackng ths and they fa nto three categores:

5 Mach Learn (2007) 69: Makng exstng cassfers cost-senstve (Tur 2004; Drummond and Hote 2000; Fan et a. 1999). For exampe, Fan et a. (1999) have proposed the AdaCost agorthm whch extends the AdaBoost agorthm gvng weghts to ndvdua tranng exampes nstead of casses. Usng Bayes rsk theory to assgn each exampe to ts owest rsk cass (Domngos 1999; Margneantu 2002; Zadrozny and Ekan 2001). Changng the cass dstrbutons n the tranng set such that the cost-nsenstve cassfer earned w perform equay to a cost senstve cassfer earned from the orgna set (Zadrozny et a. 2003). Whe the frst approach s cassfer dependent, t may be more effectve for some cassfcaton agorthms. The atter two approaches are cassfer ndependent. The thrd approach, whe practca, requres ether oss of exstng tranng data (for down-sampng) or repcaton of t (up-sampng) and hence may be suboptma for some of the cassfcaton agorthms. The second approach requres the cassfer to output a probabty for each of the casses. In the cases usng Bayes rsk theory, a cost matrx C s usuay used, where the entry (, j) s the cost of predctng cass when the true cass s j. Then the Bayes optma predcton for a sampe x s the cass that mnmzes the expectaton of the cost, or the condtona rsk (Duda and Hart 1973): R( x) = j P(j x)c(,j) (1) where P(j x) s the probabty of sampe x to be cass j. Usng ths formua of the condtona rsk demands good estmaton of the cass probabtes. For cassfers that output scores for casses, cabraton can be used to estmate the probabtes, so cost-nsenstve cassfers that have been we deveoped, such as SVM, Boostng, and so on can be used. 3 Boostng We propose methods for usng Boostng for semsupervsed earnng, mode adaptaton, and cost-senstve cassfcaton. Before presentng these methods, we need to present the Boostng agorthm n deta. Boostng s an teratve agorthm; on each teraton, t, a weak cassfer, h t, s traned on a weghted tranng set, and at the end, the weak cassfers are combned nto a snge, combned cassfer (Freund and Schapre 1997). For exampe, for text categorzaton, one can use word n-grams as features, and each weak cassfer (e.g., decson stump, whch s a snge node decson tree) can check the absence or presence of an n-gram. The agorthm generazed for mutcass and mutabe cassfcaton s as foows: Let X = x 1,...,x m denote the doman of possbe tranng exampes and Y be a fnte set of casses of sze Y =k.fory Y,etY [] for cass Y be { +1 f Y, Y []= 1 otherwse for each exampe. The agorthm begns by ntazng a unform weght dstrbuton D 1 (, ) over tranng exampes and abes. D t (, ) ndcates the weght of a tranng sampe, for the cass,. After each round ths weght dstrbuton s updated so that the exampe-cass combnatons that are easer to cassfy get ower weghts and vce versa. The ntended effect

6 60 Mach Learn (2007) 69: s to force the weak earnng agorthm to concentrate on the exampes and abes that w be the most benefca to the overa goa of fndng a hghy accurate cassfcaton rue. More formay, the agorthm s as foows: Gven tranng data from the nstance space S ={(x 1,Y 1 ),...,(x m,y m )} where x X and Y Y. Intaze the dstrbuton D 1 (, ) = 1 mk. For each teraton t = 1,...,T do Tran a base earner h t : X R usng dstrbuton D t. Update D t+1 (, ) = D t(, )e αt Y []h t (x,) Z t where Z t s a normazaton factor Z t = Y m D t (, )e αt Y []h t (x,) =1 and α t s the weght of the base earner. Output of the fna cassfer s then defned as f(x,)= T α t h t (x, ). t=1 The Boostng agorthm s ndependent of the weak cassfers empoyed. It s assumed that at each teraton a weak cassfer h t s traned usng sampes assocated wth casses. Each sampe may have a dfferent weght, hence the dstrbuton D t. The fna score for each cass, f(x,)s then smpy defned as the weghted summaton of the ndvdua weak earners scores, h t. α t s the weght of each weak earner. Athough there may be many ways to compute ths weght, one of them can be usng error rate, ɛ t of each weak earner: α t = 1 2 n 1 ɛ t. ɛ t Schapre and Snger (1999) have proved a bound on the emprca Hammng oss (HL) of H n the Boostng agorthm. Hammng oss s defned as the fracton of exampes,, and abes,, for whch the sgn of f(x,), H(x,) { 1, 1}, s dfferent from Y []. Theorem 1 where where HL(H) T Z t (2) t=1 HL(H)= 1 mk, : H(x,) Y [] = 1 mk δ(x,)= { 1 f H(x,) Y [], 0 otherwse. δ(x,) (3)

7 Mach Learn (2007) 69: Proof By unraveng the update rue, we have that D T +1 ( + 1) = e Y []f(x,) mk t Z. (4) t Moreover, f H(x,) Y [] then Y []f(x,) 0 mpyng that e Y []f(x,) 1. Thus, Combnng (3), (4), and (5), we get 1 mk δ(x,) 1 mk δ(x,) e Y []f(x,). (5) = t=1 e Y []f(x,) ( T T Z t )D T +1 (, ) = Z t. (6) Theorem 1 s mportant snce t contans the oss functon Boostng tres to mnmze and the crteron weak earners need to mnmze. For semsupervsed earnng and mode adaptaton the goa s to change the oss functon to mnmze and for the cost-senstve cassfcaton the goa s to change the weak earner seecton crteron. As seen from (6), ths agorthm can be seen as a procedure for fndng a near combnaton of base cassfers that attempts to mnmze an exponenta oss functon (Schapre and Snger 1999), whch n ths case s: e Y []f(x,). (7) An aternatve woud be to mnmze a ogstc oss functon as suggested by (Fredman et a. 2000), namey n(1 + e Y []f(x,) ). (8) The confdence of a cass,, for exampe, x s then computed wth a ogstc functon, ρ(),as 1 P(Y []=+1 x ) = ρ(f(x,))= (9) 1 + e Kf (x,) where K s 1 for ogstc oss, and 2 for exponenta oss (Cons et a. 2002). A more detaed expanaton and anayss of ths agorthm can be found n (Schapre 2001). t=1 4 Semsupervsed earnng The am of semsupervsed earnng s to expot the unabeed exampes for a statstca cassfcaton task. We assume that there s some amount of tranng data avaabe for tranng an nta cassfer. The basc dea s to use ths cassfer to abe the unabeed data automatcay, and mprove the cassfer performance usng the machne-abeed exampes, thus reducng the amount of human-abeng effort necessary to come up wth better statstca systems.

8 62 Mach Learn (2007) 69: Fg. 1 Semsupervsed earnng framework The smpest method for semsupervsed earnng s augmentng the human-abeed data wth machne-abeed data. In ths method, frst an nta mode s traned usng the humanabeed data, whch s then used to cassfy the unabeed data. Then we add the unabeed exampes drecty to the tranng data, by usng the machne-abeed casses. In order to reduce the nose added because of cassfer errors, we add ony those exampes that are cassfed wth a confdence hgher than some threshod. Ths threshod can be set usng a separate hed-out set. Then whoe data ncudng both human- and machne-abeed exampes are used for tranng the cassfer agan. We w evauate and compare ths method wth our proposed method n Sect. 7. Fgure 1 depcts the process proposed for semsupervsed earnng. Ths method s smar to ncorporatng pror knowedge nto Boostng (Schapre et a. 2005). In that work, a mode fttng both the human-abeed tranng data and the task knowedge s traned. In our case, the am s to tran a mode that fts both the human-abeed and machne-abeed data. In that sense, ths s actuay an appcaton of that work for semsupervsed earnng. Fttng a mode and a data set s mpemented as foows: We frst tran an nta mode, usng the human-abeed data. Then, the Boostng agorthm measures the ft to the machne-abeed data and the ft to the nta mode. To measure the ft to the machne-abeed data, the agorthm uses the ogstc oss as gven by (8). The ft to the nta mode s measured usng the Kuback Leber (KL) dvergence (or bnary reatve entropy). More formay, the agorthm now ams to mnmze the foowng oss functon, whch s actuay an extenson of the ogstc oss: (n(1 + e Y []f(x,) ) + ηkl(p (Y [.]=1 x ) ρ(f(x,.)))) (10) where ( ) ( ) p 1 p KL(p q) = p n + (1 p)n q 1 q s the KL dvergence between two dstrbutons p and q. In our case, these two dstrbutons correspond to the cass confdences from the nta mode, P(Y [.]=1 x ) andtothedstrbuton from the constructed mode, ρ(f(x,.)),asdefnedby(9). Ths term s bascay the dstance from the nta mode but by human-abeed data and the new mode but

9 Mach Learn (2007) 69: wth machne-abeed data. 2 η s used to contro the reatve mportance of these two terms. Ths weght may be determned emprcay on a hed-out set. In addton to that, n order to reduce the nose added because of cassfer errors, we can expot ony those exampes that are cassfed wth a confdence hgher than some threshod. Ths threshod s aso optmzed usng the hed-out set. The modfcaton of the AdaBoost agorthm to ncorporate ths new oss functon s expaned n deta n (Schapre et a. 2005). Equaton (10) can be rewrtten as C + n(1 + e Y []f(x,) ) + ηp (Y []=1 x ) n(1 + e f(x,) ) + η(1 P(Y []=1 x )) n(1 + e f(x,) ). Ths functon can be mnmzed by addng two addtona exampes for each gven exampe wth weghts ηp (Y []=1 x ) and η(1 P(Y []=1 x )) for Y []=1andY []= 1, respectvey. The AdaBoost agorthm then consders these weghts of ndvdua exampes nstead of startng wth unform dstrbuton. 5 Mode adaptaton The am of mode adaptaton s to expot the exstng abeed data and modes for mprovng the performance of the new smar appcatons usng a supervsed adaptaton method. The basc assumpton s that there s an exstng mode traned wth data smar to the target appcaton. Then the dea s to adapt ths cassfcaton mode usng the sma amount of aready-abeed data from the target appcaton, thus reducng the amount of human abeng necessary to come up wth more accurate statstca cassfcaton systems. The very same adaptaton technque can be empoyed for mprovng the exstng mode for nonstatonary data, where the data characterstcs of an appcaton change over tme. There are at east two other ways of expotng the exstng abeed data from a smar appcaton. We w evauate and compare these methods to adaptaton n Sect. 7. Smpe Data Concatenaton (smpe): where the new cassfcaton mode s traned usng the data from the prevous appcaton concatenated to the data abeed for the target appcaton. Tagged Data Concatenaton (tagged): where the new cassfcaton mode s traned usng both data sets, but each set s tagged wth the source appcaton. That s, n addton to the exampes, we use the source of each exampe as an addtona feature durng cassfcaton. In our approach for adaptaton, we begn wth an exstng cassfcaton mode. Then usng the abeed data from the target appcaton we bud an augmented mode based on ths exstng mode. Note that ths mode s not traned from scratch (hence the adaptaton). The earnng starts from an exstng mode and then augments that mode. In other words, we add more teratons (hence more weak earners) to the exstng mode (whch s nothng but a set of weak earners). Ths method s smar to expotng unabeed exampes as 2 Note that, athough P and ρ do not gve probabtes, but rather some confdence scores between 0 and 1, ths functon provdes an estmate of dstance.

10 64 Mach Learn (2007) 69: presented n the prevous secton, where a mode that fts both the manuay-abeed tranng data and machne-abeed data s traned. In ths case, the am s to tran a mode that fts both a sma amount of appcaton-specfc abeed data and the exstng mode from the smar appcaton. More formay, the Boostng agorthm tres to mnmze the foowng oss functon: (n(1 + e Y []f(x,) ) + ηkl(p (Y [.]=1 x ) ρ(f(x,.)))) (11) where P(Y []=1 x ) s the dstrbuton from the nta exstng mode and ρ(f(x,.)) s the dstrbuton from the newy constructed mode as gven by (9). Ths term s bascay the dstance from the exstng mode to the new mode but wth newy abeed n-doman data. Here, agan, η s used to contro the reatve mportance of these two terms and may be determned emprcay on a hed-out set. 6 Cost-senstve cassfcaton The am of cost-senstve cassfcaton s to extend Boostng to aow weghted casses. Our man dea s to change the crteron to choose the best weak earner at each round of Boostng. The bound gven by (1) mpes that n order to mnmze the tranng error, a reasonabe approach woud be mnmzng Z t on each round of Boostng. Ths eads to a crteron for fndng weak hypotheses, h t (x, ) for a gven teraton, t (Schapre and Snger 1999). Assume that the weak earners, h, make ther predctons based on a parttonng of the doman X nto dsjont bocks X j. For exampe, f the weak earner s a decson stump, checkng the absence or presence of an n-gram, there are two bocks for each cass. Let c j = h(x, ) for x X j.defne = W j b :x X j D(,)(1 δ(x,)) where b ={, +} dependng on the vaue of Y [] 1, 1. In other words W j + (W j )sthe tota weght of sampes n partton j (not) abeed as. Usng ths termnoogy: Z t = D t (, )e Y []cj = W j + e c j + W j e c j. j :x X j j It s then straghtforward to see that the optma c j, mnmzng Z t s c j = 1 ( j ) W 2 n +. W j Puttng ths n pace resuts n: Z t = j 2 W j + W j. Then t s enough to choose the weak earner that mnmzes ths vaue.

11 Mach Learn (2007) 69: Now assume that not a abes are equay mportant, that s, there s an assocated cost or weght, w,toagvenabe,. In such a case we need to defne a weghted Hammng oss (WHL): WHL(H ) = 1 w δ(x,). (12) mk It s easy to see that when w > 1 mnmzng Z t may not be the best crteron for a weak earner to optmze the weghted Hammng oss, because the nequaty may not hod. WHL(H ) Theorem 2 Wth the weghts of the casses beng w, the foowng bound hods on the weghted Hammng oss of H : T t=1 Z t WHL(H ) T Y t t=1 f w 1 where where ŵ s a functon of w. Y t = Y ŵ m =1 Proof By unraveng the update rue, we have that D t (, )e αt Y []h t (x,) D T +1 ( + 1) = e Y []f(x,) mk t Z. (13) t Moreover, f H(x,) Y [] then Y []f(x,) 0 mpyng that e Y []f(x,) 1. Thus, Combnng (12), (13), and (14), we get WHL(H ) = 1 mk snce D T +1 (, ) 1. δ(x,) e Y []f(x,). (14) w δ(x,) 1 w e Y []f(x,) = ( T w Z t )D T +1 (, ) mk t=1 ( T ( T = w D T +1 (, ) t=1 Z t) t=1 Z t) w

12 66 Mach Learn (2007) 69: Defne the constant W = w.then ( T ) ( T ) WHL(H ) W 1 t Zt = Y t t=1 t=1 when ŵ = w W 1 t. Foowng smar steps, ths eads us to a new crteron to seect the weak hypotheses: Choose the weak earner that mnmzes the foowng: Y t = ŵ j 2 W j + W j. Note that ths s n contrast to the unweghted case, where the weak earner optmzes Z t = j 2 W j + W j. In other words, ths corresponds to changng the seecton crteron for the weak earner, h t, n the AdaBoost agorthm. No change s requred n the agorthm presented n Sect Experments and resuts We evauated the proposed methods usng the utterances from the database of the AT&T VoceTone spoken daog system (Gupta et a. 2006). Ths s a ca routng system where the am s to route the nput cas n a customer care ca center. In ths natura anguage spoken daog system, caers are greeted by the open-ended prompt How May I Hep You? Users then ask questons about ther phone bs, cang pans, and so on. The system tres to dentfy the customer s ntent (ca-type), whch s bascay framed as a ca cassfcaton probem smar to the terature (Chu-Carro and Carpenter 1999; Gorn et a. 1997; Natarajan et a. 2002). If the system s unabe to understand the caer wth hgh enough confdence, then the conversaton w proceed wth ether a carfcaton or a confrmaton prompt. In our experments a the utterances are transcrbed. We performed our tests usng the Boostexter too (Schapre and Snger 2000). For a experments, we used word n-grams as features and decson stumps as weak cassfers. 7.1 Evauaton metrcs Whe evauatng cassfcaton performance, we used many two metrcs, both usng mcroaveragng aowng mutpe ca-types. The frst one s the top cass error rate (TCER), whch s the fracton of utterances n whch the ca-type wth maxmum probabty was not one of the true ca-types. Inspred by the nformaton retreva communty, the second metrc we used s the F-Measure, whch s the harmonc mean of reca and precson. Reca s defned as the proporton of a the true ca-types that are correcty deduced by the cassfer. It s obtaned by dvdng the number of true postves by the sum of true postves and fase negatves. Precson s defned as the proporton of a the accepted ca-types that are aso

13 Mach Learn (2007) 69: true. It s obtaned by dvdng true postves by the sum of true postves and fase postves. True (Fase) postves are the number of ca-types for an utterance for whch the deduced ca-type has got a confdence above a gven threshod, hence accepted, and s (not) among the correct ca-types. Fase negatves are the number of ca-types for an utterance for whch the deduced ca-type has got a confdence ess than a threshod, hence rejected, but s among the true ca-types. More formay, et a = #{trueabe =+, predcted =+}, b = #{trueabe =+, predcted = }, c = #{trueabe =, predcted =+}, d = #{trueabe =, predcted = }. Then reca = F Measure = a (a + b), precson = a (a + c), 2 reca precson reca + precson. One dfference between these two evauaton metrcs s that the top cass error rate evauates ony the top-scorng ca-type for an utterance, whereas the F-Measure evauates a the ca-types exceedng the gven threshod. For ower threshods, the precson s ower but reca s hgher, and vce versa for hgher threshods. To optmze the F-Measure, we check ts vaue for a threshods between 0 and 1, and use the best one as the F-Measure of that system, snce t s aways possbe to change the operatona threshod of the system. 7.2 Semsupervsed earnng experments To evauate the proposed semsupervsed earnng method, we seected a ca cassfcaton appcaton. The data characterstcs are gven n Tabe 1. In ths frst set of experments, we kept the number of teratons fxed to 500. Frst, we seected the optma threshod of topscorng ca-type confdences usng TCER on the hed-out set. Obvousy, there s a trade-off n seectng the threshod. If t s set to a ower vaue, that means a arger amount of nosy data, and f t s set to a hgher vaue, that means a esser amount of usefu or nformatve data. Fgure 2 proves ths behavor for the hed-out set. We traned nta modes usng 2,000, 4,000, and 8,000 human-abeed utterances and then augmented these as descrbed n the smpe method, wth the remanng data n the tranng set (usng ony machne-abeed ca-types). On the x axs, we have dfferent threshods to seect from the unabeed data that the cassfer uses, and on the y axs we have the cassfcaton error rate (TCER) f that data s aso expoted. A threshod of 0 means usng a the machne-abeed data and 1 means usng none. As seen, there s consstenty a 1 1.5% dfference n cassfcaton error Tabe 1 Data characterstcs used n the semsupervsed earnng experments Tranng Data Sze 57,829 utterances Test Data Sze 3,513 utterances Hed-out Data Sze 3,500 utterances Number of Ca-types 49

14 68 Mach Learn (2007) 69: Fg. 2 Trade-off for choosng the threshod to seect among the machne-abeed data on the hed-out set rates usng varous threshods for each data sze, and the owest error rates are acheved wth threshods around Fgure 3 depcts the performance usng the proposed semsupervsed earnng method, by pottng the earnng curves for varous nta abeed data set szes. In the fgure, x axs s the amount of human-abeed tranng utterances, and y axs s the cassfcaton error rate of the correspondng mode on the test set. The basene s the top curve, wth the hghest error rate, where no machne-abeed data s used. To compare our proposed approach, we aso empoyed the smpe data augmentaton method, where we concatenate the humanabeed and machne-abeed data. In these experments, we seected 0.5 as the threshod for seectng machne-abeed data, and for each data sze, we optmzed the weght η n the proposed method usng the hed-out set. As n the case of the hed-out set, we consstenty obtaned 1 1.5% cassfer error rate reductons on the test set usng both approaches when the abeed tranng data sze s ess than 15,000 utterances. 3 The reducton n the need for human-abeed data to acheve the same cassfcaton performance s around 30%. For exampe we get the same performance when we expot machne-abeed data wth 8,000 human-abeed utterances nstead of 12,000 utterances. The proposed method performed consstenty better (though not sgnfcanty) when there are fewer human-abeed exampes. The smpe semsupervsed earnng method outperformed the proposed method after 5,000 human-abeed utterances. 3 For ths test set, 1.3% s statstcay sgnfcant accordng to a Z-test for 95% confdence nterva.

15 Mach Learn (2007) 69: Fg. 3 Resuts usng semsupervsed earnng. The top-most earnng curve s obtaned usng just human-abeed data as a basene. Beow that e the earnng curves usng the frst (concatenaton of humanand machne-abeed data) and second (Boostng adaptaton) methods 7.3 Mode adaptaton experments To evauate the proposed mode adaptaton method, we seected two appcatons, T 1 and T 2, both from the teecommuncatons doman, where users have requests about ther phone bs, cang pans, and so on. The frst appcaton s concerge-ke and has a the ca-types the second appcaton covers. The second appcaton s used ony for a specfc subset of ca-types. The data propertes are shown n Tabe 2. We computed the perpexty of the ca-type probabty dstrbuton as Perpexty = 2 c C (p(c) og p(c)) where p(c) s the pror probabty of a ca-type c C. As seen, the perpexty of the second appcaton s sgnfcanty ower whe the utterances are onger. As shown n Fg. 4 the cass dstrbutons for these two appcatons are sgnfcanty dfferent. We have about 9 tmes more data for the frst appcaton. To avod deang wth fndng the optma teraton numbers n Boostng, we terated many tmes, got the error rate after each teraton and used the best error rate n a the resuts beow. In ths experment, the goa s adaptng the cassfcaton mode for T 1 usng T 2 so that the resutng mode for T 2 woud perform better. Tabe 3 presents the basene resuts usng tranng and test data combnatons. The rows ndcate the tranng sets, and coumns ndcate the test sets. The vaues are the cassfcaton error rates, whch are the ratos of the utterances

16 70 Mach Learn (2007) 69: Fg. 4 Cass dstrbutons for appcatons T 1 (above) andt 2 (beow). The casses for two appcatons are agned Tabe 2 Data characterstcs used n the mode adaptaton experments T 1 T 2 Tranng Data Sze 53,022 utterances 5,866 utterances Test Data Sze 5,529 utterances 614 utterances Number of Ca-Types Ca-Type Perpexty Average Utterance Length 8.06 words words for whch the cassfer s top scorng cass s not one of the correct ca-types. The thrd row s smpy the concatenaton of both tranng sets (ndcated by smpe). The fourth row (ndcated by tagged) s obtaned by tranng the cassfer wth an extra feature ndcatng the source of that utterance, ether T 1 or T 2. The performance of the adaptaton s shown n the ast three rows (ndcated by adapt). As seen, athough the two appcatons are very smar, when the tranng set does not match the test set, the performance drops drastcay. Addng T 1 tranng data to T 2 does not hep, and actuay hurts sgnfcanty. 4 Ths negatve effect dsappears when we denote the source of the tranng data, but no mprovement has been observed on the performance of the cassfcaton mode for T 2. Adaptaton experments 4 For ths test set, 3% s statstcay sgnfcant accordng to a Z-test for 95% confdence nterva.

17 Mach Learn (2007) 69: Tabe 3 Adaptaton resuts for the experments. smpe ndcates smpe concatenaton, tagged ndcates usng an extra feature denotng the source of tranng data, adapt ndcates adaptaton wth dfferent η vaues Tranng Set Test Set T 1 T 2 T % 26.87% T % 13.36% T 1 + T 2 smpe 14.15% 16.78% T 1 + T 2 tagged 14.05% 13.36% T 1 + T 2 adapt(η = 0.1) 19.01% 12.54% T 1 + T 2 adapt(η = 0.5) 16.13% 14.01% T 1 + T 2 adapt(η = 0.9) 15.27% 15.96% Fg. 5 Resuts usng ca-type cassfcaton mode adaptaton. x-axs s the amount of abeed data from appcaton T 2. The top earnng curve s obtaned usng just T 2 data as a basene. Beow that es the earnng curve usng the adaptaton usng dfferent η vaues ndcate nterestng resuts. We see that usng a vaue of 0.1, t s actuay possbe to outperform the mode performance traned usng ony T 2 tranng data. Snce we expect the proposed adaptaton method to work better wth ess appcaton specfc tranng data, we draw the earnng curves as presented n Fg. 5 usng 0.1 as the η vaue. The top curve s the basene, obtaned usng random seecton of ony T 2 tranng data. When we adapt the T 1 mode wth ony 1,106 utterances from T 2 we see 2.5% absoute

18 72 Mach Learn (2007) 69: Tabe 4 Data characterstcs used n the cost-senstve cassfcaton experments Tranng Data Sze 9,094 utterances Test Data Sze 5,171 utterances Number of Ca-Types 84 Ca-Type Perpexty Average Utterance Length words Tabe 5 Performance change when one ca-type s weghted more than others Weght Overa Teme(Baance) Reca Precson F-Measure Reca Precson F-Measure mprovement, whch means 56% reducton (from about 2,500 utterances to 1,106 utterances for an error rate of 16.77%) n the amount of data needed to acheve that performance. As the number of human-abeed utterances from appcaton T 2 ncreases, the dfference between random and adaptaton curves gets smaer, but the adaptaton curve reaches the performance obtaned wth usng a random data wth 40% ess data. Throughout the curves we see that adaptaton performs better consstenty than the basene, though not sgnfcanty. 7.4 Cost-senstve cassfcaton experments To evauate the proposed cost-senstve cassfcaton method, we seected a ca cassfcaton appcaton, agan. Tabe 4 summarzes the characterstcs of our appcaton ncudng amount of tranng and test data, tota number of ca-types, average utterance ength, and ca-type perpexty. Smar to semsupervsed earnng experments, we kept the number of Boostng teratons fxed to 500. As a frst experment, we chose one moderatey frequent ca-type, namey Teme(Baance), occurrng 189 tmes and ncreased ts weght, whe keepng a other weghts as 1. Tabe 5 shows the change n the performance of that ca-type and that of the overa performance. We tred varous weghts for Teme(Baance) as shown n the frst coumn of the tabe. As seen, the F-Measure of that ca-type has ncreased sgnfcanty by 6.3% absoute wthout a sgnfcant change n the overa (or ndvdua cass) performance(s) when the weght s set to Note that when we contnue ncreasng the weght, performance begns to deterorate because of the decrease n precson. The vast majorty of the weak earners seected for the mode traned wth the weght of 10,000 are reated to that hgherweghted ca-type, makng the other ca-types harder to wn, and even resutng n worse performance for that ca-type because of overtranng. To see whether the same behavor s true for a set of mportant ca-types, not just one, we randomy seected 12 ca-types, occurrng 766 tmes, and gave them hgher weghts. Tabe 6 presents our resuts. The F-Measure for these ca-types ncreased by 3.5% absoute wthout a sgnfcant oss n overa performance wth a weght of 100, but note the steady decrease n overa performance wth ncreasng weghts athough the performance of the mportant casses contnues to mprove. 5 For ths test set, 1.1% s statstcay sgnfcant accordng to a Z-test for 95% confdence nterva.

19 Mach Learn (2007) 69: Tabe 6 Performance change when one set of ca-types s weghted more than others Weght Overa Important Ca-types Reca Precson F-Measure Reca Precson F-Measure Concusons We have proposed three methods for extendng the Boostng famy of cassfers, namey, semsupervsed earnng, mode adaptaton, and cost-senstve cassfcaton. We have shown the effectveness of these methods n a rea-fe appcaton, the AT&T VoceTone ca routng system. Note that athough these methods are cassfer (namey, Boostng) dependent, the deas are more genera and can be apped to other cassfers. For exampe, n a Nave Bayes cassfer, adaptaton can be mpemented as near mode nterpoaton or a Bayesan adaptaton (ke MAP) can be empoyed. It s aso possbe to appy the dea for other cassfcaton tasks that may need adaptaton such as topc cassfcaton or named entty extracton. Our future work ncudes unsupervsed adaptaton of cassfcaton modes. Ths w enabe us to bootstrap new modes wthout abeng any appcaton-specfc data. For costsenstve cassfcaton we woud ke to extend ths formuaton so as to enabe costs for pars of casses nstead of snge casses. Acknowedgements We thank Robert E. Schapre for provdng us wth the Boostexter cassfer and hs suggestons. We thank Dek Hakkan-Tür, Murat Saracar, Patrck Haffner, and Mazn Gbert for many hepfu dscussons. References Ashaw, H. (2003). Effectve utterance cassfcaton wth unsupervsed phonotactc modes. In Proceedngs of the human anguage technoogy conference (HLT)-conference of the North Amercan chapter of the assocaton for computatona ngustcs (NAACL), Edmonton, Canada. Bacchan, M., Roark, B., & Saracar, M. (2004). Language mode adaptaton wth MAP estmaton and the perceptron agorthm. In Proceedngs of the human anguage technoogy conference (HLT)-conference of the North Amercan chapter of the assocaton for computatona ngustcs (NAACL), Boston, MA. Bum, A., & Mtche, T. (1998). Combnng abeed and unabeed data wth co-tranng. In Proceedngs of the workshop on computatona earnng theory (COLT), Madson, WI. Caruana, R. (1997). Muttask earnng. Machne Learnng, 28(1), Chu-Carro, J., & Carpenter, B. (1999). Vector-based natura anguage ca routng. Computatona Lngustcs, 25(3), Cons, M., Schapre, R. E., & Snger, Y. (2002). Logstc regresson, AdaBoost and Bregman dstances. Machne Learnng, 48(1/2/3). D Fabbrzo, G., Dutton, D., Gupta, N., Hoster, B., Rahm, M., Rccard, G., Schapre, R., & Schroeter, J. (2002). AT&T hep desk. In Proceedngs of the nternatona conference on spoken anguage processng (ICSLP), Denver,CO. Dgaaks, V., Rtschev, D., & Neumeyer, L. G. (1995). Speaker adaptaton usng constraned estmaton of Gaussan mxtures. IEEE Transactons on Speech and Audo Processng, 3(5). Domngos, P. (1999). MetaCost: a genera method for makng cassfers cost senstve. In: Proceedngs of the nternatona conference on knowedge dscovery and data mnng (KDD), San Dego, CA. Drummond, C., & Hote, R. C. (2000). Expotng the cost n senstvty of decson tree spttng crtera. In Proceedngs of the nternatona conference on machne earnng (ICML), Pao Ato, CA.

20 74 Mach Learn (2007) 69: Duda, R. O., & Hart, P. E. (1973). Pattern cassfcaton and scene anayss. New York: Wey. Fan, W., Stofo, S. J., Zhang, J., & Chan, P. K. (1999). AdaCost: mscassfcaton cost-senstve boostng. In Proceedngs of the nternatona conference on machne earnng (ICML), Bed, Sovena. Freund, Y., & Schapre, R. E. (1997). A decson-theoretc generazaton of on-ne earnng and an appcaton to boostng. Journa of Computer and System Scences, 55(1), Fredman, J., Haste, T., & Tbshran, R. (2000). Addtve ogstc regresson: a statstca vew of boostng. The Annas of Statstcs, 38(2), Ghan, R. (2002). Combnng abeed and unabeed data for mutcass text categorzaton. In Proceedngs of the nternatona conference on machne earnng (ICML), Sydney, Austraa. Gorn, A. L., Rccard, G., & Wrght, J. H. (1997). How may I hep You? Speech Communcaton, 23, Gupta, N., Tur, G., Hakkan-Tür, D., Bangaore, S., Rccard, G., & Rahm, M. (2006). The AT&T spoken anguage understandng system. IEEE Transactons on Speech and Audo Processng, 14(1), He, Y., & Young, S. (2004). Robustness ssues n a data-drven spoken anguage understandng system. In Proceedngs of the HLT/NAACL workshop on spoken anguage understandng, Boston, MA. Iyer, R., Gsh, H., & McCarthy, D. (2002). Unsupervsed tranng technques for natura anguage ca routng. In: Proceedngs of the nternatona conference on acoustcs, speech and sgna processng (ICASSP), Orando, FL. Margneantu, D. (2002). Cass probabty estmaton and cost-senstve cassfcaton decsons. In Proceedngs of the European conference on machne earnng (ICML), Hesnk, Fnand. Natarajan, P., Prasad, R., Suhm, B., & McCarthy, D. (2002). Speech enabed natura anguage ca routng: BBN ca drector. In Proceedngs of the nternatona conference on spoken anguage processng (ICSLP),Denver,CO. Ngam, K., & Ghan, R. (2000). Anayzng the effectveness and appcabty of co-tranng. In Proceedngs of the nternatona conference on nformaton and knowedge management (CIKM), McLean, VA. Ngam, K., McCaum, A., Thrun, S., & Mtche, T. (2000). Text cassfcaton from abeed and unabeed documents usng EM. Machne Learnng, 39(2/3), Prodromds, A. L., & Stofo, S. (1998). Mnng databases wth dfferent schemas: ntegratng ncompatbe cassfers. In Proceedngs of the nternatona conference on knowedge dscovery and data mnng (KDD), NewYork,NY. Rccard, G., & Gorn, A. L. (2000). Stochastc anguage adaptaton over tme and state n a natura spoken daog system. IEEE Transactons on Speech and Audo Processng, 8(1), 3 9. Roark, B., & Bacchan, M. (2003). Supervsed and unsupervsed PCFG adaptaton to nove domans. In Proceedngs of the human anguage technoogy conference (HLT)-conference of the North Amercan chapter of the assocaton for computatona ngustcs (NAACL), Edmonton, Canada. Schapre, R. E. (2001). The boostng approach to machne earnng: an overvew. In Proceedngs of the MSRI workshop on nonnear estmaton and cassfcaton, Berkeey,CA. Schapre, R. E., & Snger, Y. (1999). Improved boostng agorthms usng confdence-rated predctons. Machne Learnng, 37(3), Schapre, R. E., & Snger, Y. (2000). Boostexter: a boostng-based system for text categorzaton. Machne Learnng, 39(2/3), Schapre, R. E., Rochery, M., Rahm, M., & Gupta, N. (2005). Boostng wth pror knowedge for ca cassfcaton. IEEE Transactons on Speech and Audo Processng, 13(2). Tur, G. (2004). Cost-senstve ca cassfcaton. In Proceedngs of the nternatona conference on spoken anguage processng (ICSLP), Jeju-Isand, Korea. Tur, G. (2005). Mode adaptaton for spoken anguage understandng. In Proceedngs of the nternatona conference on acoustcs, speech and sgna processng (ICASSP), Phadepha, PA. Tur, G., & Hakkan-Tür, D. (2003). Expotng unabeed utterances for spoken anguage understandng. In Proceedngs of the European conference on speech communcaton and technoogy (EUROSPEECH), Geneva, Swtzerand. Zadrozny, B., & Ekan, C. (2001). Learnng and makng decsons when costs and probabtes are both unknown. In Proceedngs of the nternatona conference on knowedge dscovery and data mnng (KDD), San Francsco, CA. Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-senstve earnng by cost-proportonate exampe weghtng. In Proceedngs of the IEEE nternatona conference on data mnng, Mebourne, FL. Ztoun, I., Kuo, H.-K. J., & Lee, C.-H. (2001). Natura anguage ca routng: towards combnaton and boostng of cassfers. In Proceedngs of the IEEE automatc speech recognton and understandng (ASRU) workshop, Trento, Itay.

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning CHAPTER 3 GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION Machne Learnng Copyrght c 205. Tom M. Mtche. A rghts reserved. *DRAFT OF September 23, 207* *PLEASE DO NOT DISTRIBUTE

More information

Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation

Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation Bayesan earnng of Herarchca Mutnoma Mxture Modes of Concepts for Automatc Image Annotaton Ru Sh Tat-Seng Chua Chn-Hu ee 2 and Sheng Gao 3 Schoo of Computng Natona Unversty of Sngapore Sngapore 7543 2 Schoo

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Delay tomography for large scale networks

Delay tomography for large scale networks Deay tomography for arge scae networks MENG-FU SHIH ALFRED O. HERO III Communcatons and Sgna Processng Laboratory Eectrca Engneerng and Computer Scence Department Unversty of Mchgan, 30 Bea. Ave., Ann

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Optimum Selection Combining for M-QAM on Fading Channels

Optimum Selection Combining for M-QAM on Fading Channels Optmum Seecton Combnng for M-QAM on Fadng Channes M. Surendra Raju, Ramesh Annavajjaa and A. Chockangam Insca Semconductors Inda Pvt. Ltd, Bangaore-56000, Inda Department of ECE, Unversty of Caforna, San

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin 9 Adaptve Soft -Nearest-Neghbour Cassfers wth Large argn Abstract- A nove cassfer s ntroduced to overcome the mtatons of the -NN cassfcaton systems. It estmates the posteror cass probabtes usng a oca Parzen

More information

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Active Learning with Support Vector Machines for Tornado Prediction

Active Learning with Support Vector Machines for Tornado Prediction Actve Learnng wth Support Vector Machnes for Tornado Predcton Theodore B. Trafas, Indra Adranto, and Mchae B. Rchman Schoo of Industra Engneerng, Unversty of Okahoma, 0 West Boyd St, Room 4, Norman, OK

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing reyword Whte aancng wth Low Computaton Cost for On- oard Vdeo Capturng Peng Wu Yuxn Zoe) Lu Hewett-Packard Laboratores Hewett-Packard Co. Pao Ato CA 94304 USA Abstract Whte baancng s a process commony

More information

L-Edge Chromatic Number Of A Graph

L-Edge Chromatic Number Of A Graph IJISET - Internatona Journa of Innovatve Scence Engneerng & Technoogy Vo. 3 Issue 3 March 06. ISSN 348 7968 L-Edge Chromatc Number Of A Graph Dr.R.B.Gnana Joth Assocate Professor of Mathematcs V.V.Vannaperuma

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Story Link Detection based on Dynamic Information Extending

Story Link Detection based on Dynamic Information Extending Story Lnk Detecton based on Dynamc Informaton Extendng Xaoyan Zhang Tng Wang Huowang Chen Department of Computer Scence and Technoogy, Schoo of Computer, Natona Unversty of Defense Technoogy No.137, Yanwach

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Chapter 6 Hidden Markov Models. Chaochun Wei Spring 2018

Chapter 6 Hidden Markov Models. Chaochun Wei Spring 2018 896 920 987 2006 Chapter 6 Hdden Markov Modes Chaochun We Sprng 208 Contents Readng materas Introducton to Hdden Markov Mode Markov chans Hdden Markov Modes Parameter estmaton for HMMs 2 Readng Rabner,

More information

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy daptve and Iteratve Least Squares Support Vector Regresson Based on Quadratc Ren Entrop Jngqng Jang, Chu Song, Haan Zhao, Chunguo u,3 and Yanchun Lang Coege of Mathematcs and Computer Scence, Inner Mongoa

More information

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident ICTCT Extra Workshop, Bejng Proceedngs The Appcaton of BP Neura Network prncpa component anayss n Forecastng Road Traffc Accdent He Mng, GuoXucheng &LuGuangmng Transportaton Coege of Souast Unversty 07

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Sparse Training Procedure for Kernel Neuron *

Sparse Training Procedure for Kernel Neuron * Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Inthem-machine flow shop problem, a set of jobs, each

Inthem-machine flow shop problem, a set of jobs, each THE ASYMPTOTIC OPTIMALITY OF THE SPT RULE FOR THE FLOW SHOP MEAN COMPLETION TIME PROBLEM PHILIP KAMINSKY Industra Engneerng and Operatons Research, Unversty of Caforna, Bereey, Caforna 9470, amnsy@eor.bereey.edu

More information

[WAVES] 1. Waves and wave forces. Definition of waves

[WAVES] 1. Waves and wave forces. Definition of waves 1. Waves and forces Defnton of s In the smuatons on ong-crested s are consdered. The drecton of these s (μ) s defned as sketched beow n the goba co-ordnate sstem: North West East South The eevaton can

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph

More information

A marginal mixture model for discovering motifs in sequences

A marginal mixture model for discovering motifs in sequences A margna mxture mode for dscoverng motfs n sequences E Voudgar and Konstantnos Bekas Abstract. In ths study we present a margna mxture mode for dscoverng probabstc motfs n categorca sequences. The proposed

More information

International Journal "Information Theories & Applications" Vol.13

International Journal Information Theories & Applications Vol.13 290 Concuson Wthn the framework of the Bayesan earnng theory, we anayze a cassfer generazaton abty for the recognton on fnte set of events. It was shown that the obtane resuts can be appe for cassfcaton

More information

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES WAVELE-BASED IMAGE COMPRESSION USING SUPPOR VECOR MACHINE LEARNING AND ENCODING ECHNIQUES Rakb Ahmed Gppsand Schoo of Computng and Informaton echnoogy Monash Unversty, Gppsand Campus Austraa. Rakb.Ahmed@nfotech.monash.edu.au

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

3. Stress-strain relationships of a composite layer

3. Stress-strain relationships of a composite layer OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Lecture 3: Shannon s Theorem

Lecture 3: Shannon s Theorem CSE 533: Error-Correctng Codes (Autumn 006 Lecture 3: Shannon s Theorem October 9, 006 Lecturer: Venkatesan Guruswam Scrbe: Wdad Machmouch 1 Communcaton Model The communcaton model we are usng conssts

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Uncertainty Specification and Propagation for Loss Estimation Using FOSM Methods

Uncertainty Specification and Propagation for Loss Estimation Using FOSM Methods Uncertanty Specfcaton and Propagaton for Loss Estmaton Usng FOSM Methods J.W. Baer and C.A. Corne Dept. of Cv and Envronmenta Engneerng, Stanford Unversty, Stanford, CA 94305-400 Keywords: Sesmc, oss estmaton,

More information

Correspondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION

Correspondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION Correspondence Performance Evauaton for MAP State Estmate Fuson Ths paper presents a quanttatve performance evauaton method for the maxmum a posteror (MAP) state estmate fuson agorthm. Under dea condtons

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Application of support vector machine in health monitoring of plate structures

Application of support vector machine in health monitoring of plate structures Appcaton of support vector machne n heath montorng of pate structures *Satsh Satpa 1), Yogesh Khandare ), Sauvk Banerjee 3) and Anrban Guha 4) 1), ), 4) Department of Mechanca Engneerng, Indan Insttute

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Pulse Coded Modulation

Pulse Coded Modulation Pulse Coded Modulaton PCM (Pulse Coded Modulaton) s a voce codng technque defned by the ITU-T G.711 standard and t s used n dgtal telephony to encode the voce sgnal. The frst step n the analog to dgtal

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Predicting Model of Traffic Volume Based on Grey-Markov

Predicting Model of Traffic Volume Based on Grey-Markov Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed DISTRIBUTED PROCESSIG OVER ADAPTIVE ETWORKS Casso G Lopes and A H Sayed Department of Eectrca Engneerng Unversty of Caforna Los Angees, CA, 995 Ema: {casso, sayed@eeucaedu ABSTRACT Dstrbuted adaptve agorthms

More information

Polite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints

Polite Water-filling for Weighted Sum-rate Maximization in MIMO B-MAC Networks under. Multiple Linear Constraints 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Pote Water-fng for Weghted Sum-rate Maxmzaton n MIMO B-MAC Networks under Mutpe near Constrants An u 1, Youjan u 2, Vncent K. N. au 3, Hage

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecture Sldes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydn@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probablty

More information

Distributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang

Distributed Moving Horizon State Estimation of Nonlinear Systems. Jing Zhang Dstrbuted Movng Horzon State Estmaton of Nonnear Systems by Jng Zhang A thess submtted n parta fufment of the requrements for the degree of Master of Scence n Chemca Engneerng Department of Chemca and

More information

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach Achevng Optma Throughput Utty and Low Deay wth SMA-ke Agorthms: A Vrtua Mut-hanne Approach Po-Ka Huang, Student Member, IEEE, and Xaojun Ln, Senor Member, IEEE Abstract SMA agorthms have recenty receved

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

A new P system with hybrid MDE- k -means algorithm for data. clustering. 1 Introduction

A new P system with hybrid MDE- k -means algorithm for data. clustering. 1 Introduction Wesun, Lasheng Xang, Xyu Lu A new P system wth hybrd MDE- agorthm for data custerng WEISUN, LAISHENG XIANG, XIYU LIU Schoo of Management Scence and Engneerng Shandong Norma Unversty Jnan, Shandong CHINA

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM Journa of heoretca and Apped Informaton echnoogy th February 3. Vo. 48 No. 5-3 JAI & LLS. A rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 NONLINEAR SYSEM IDENIFICAION BASE ON FW-LSSVM, XIANFANG

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information