Unified Utility Maximization Framework for Resource Selection

Size: px
Start display at page:

Download "Unified Utility Maximization Framework for Resource Selection"

Transcription

1 Unfe Utlty Maxmzaton Framewor for Resoure Seleton Luo S Language Tehnology Inst. Shool of Compute Sene Carnege Mellon Unversty Pttsburgh, PA 523 ls@s.mu.eu Jame Callan Language Tehnology Inst. Shool of Compute Sene Carnege Mellon Unversty Pttsburgh, PA 523 allan@s.mu.eu ABSTRACT Ths paper presents a unfe utlty framewor for resoure seleton of strbute text nformaton retreval. Ths new framewor shows an effent an effetve way to nfer the probabltes of relevane of all the ouments aross the text atabases. Wth the estmate relevane nformaton, resoure seleton an be mae by expltly optmzng the goals of fferent applatons. Spefally, when use for atabase reommenaton, the seleton s optmze for the goal of hghreall (nlue as many relevant ouments as possble n the selete atabases; when use for strbute oument retreval, the seleton targets the hgh-preson goal (hgh preson n the fnal merge lst of ouments. Ths new moel proves a more sol framewor for strbute nformaton retreval. Empral stues show that t s at least as effetve as other state-of-the-art algorthms. Categores an Subjet Desrptors H.3.3 [Informaton Searh an Retreval]: General Terms Algorthms Keywors strbute nformaton retreval, resoure seleton. INTRODUCTION Conventonal searh engnes suh as Google or AltaVsta use a-ho nformaton retreval soluton by assumng all the searhable ouments an be ope nto a sngle entralze atabase for the purpose of nexng. Dstrbute nformaton retreval, also nown as feerate searh [,4,7,,4,22] s fferent from a-ho nformaton retreval as t aresses the ases when ouments annot be aqure an store n a sngle atabase. For example, Hen Web ontents (also alle nvsble or eep Web ontents are nformaton on the Web that annot be aesse by the onventonal searh engnes. Permsson to mae gtal or har opes of all or part of ths wor for personal or lassroom use s grante wthout fee prove that opes are not mae or strbute for proft or ommeral avantage an that opes bear ths note an the full taton on the frst page. To opy otherwse, or republsh, to post on servers or to restrbute to lsts, requres pror spef permsson an/or a fee. CIKM 04, November 8--3, 2004, Washngton, DC, USA. Copyrght 2004 ACM //04/00 $5.00. Hen web ontents have been estmate to be 2-50 [9] tmes larger than the ontents that an be searhe by onventonal searh engnes. Therefore, t s very mportant to searh ths type of valuable nformaton. The arhteture of strbute searh soluton s hghly nfluene by fferent envronmental haratersts. In a small loal area networ suh as small ompany envronments, the nformaton provers may ooperate to prove orpus statsts or use the same type of searh engnes. Early strbute nformaton retreval researh fouse on ths type of ooperatve envronments [,8]. On the other se, n a we area networ suh as very large orporate envronments or on the Web there are many types of searh engnes an t s ffult to assume that all the nformaton provers an ooperate as they are requre. Even f they are wllng to ooperate n these envronments, t may be har to enfore a sngle soluton for all the nformaton provers or to etet whether nformaton soures prove the orret nformaton as they are requre. Many applatons fall nto the latter type of unooperatve envronments suh as the Mn projet [6] whh ntegrates non-ooperatng gtal lbrares or the QProber system [9] whh supports browsng an searhng of unooperatve hen Web atabases. In ths paper, we fous manly on unooperatve envronments that ontan multple types of nepenent searh engnes. There are three mportant sub-problems n strbute nformaton retreval. Frst, nformaton about the ontents of eah nvual atabase must be aqure (resoure representaton [,8,2]. Seon, gven a query, a set of resoures must be selete to o the searh (resoure seleton [5,7,2]. Thr, the results retreve from all the selete resoures have to be merge nto a sngle fnal lst before t an be presente to the en user (retreval an results mergng [,5,20,22]. Many types of solutons exst for strbute nformaton retreval. Invsble-web.net proves gue browsng of hen Web atabases by olletng the resoure esrptons of these atabases an bulng herarhes of lasses that group them by smlar tops. A atabase reommenaton system goes a step further than a browsng system le Invsble-web.net by reommenng most relevant nformaton soures to users queres. It s ompose of the resoure esrpton an the resoure seleton omponents. Ths soluton s useful when the

2 users want to browse the selete atabases by themselves nstea of asng the system to retreve relevant ouments automatally. Dstrbute oument retreval s a more sophstate tas. It selets relevant nformaton soures for users queres as the atabase reommenaton system oes. Furthermore, users queres are forware to the orresponng selete atabases an the returne nvual rane lsts are merge nto a sngle lst to present to the users. The goal of a atabase reommenaton system s to selet a small set of resoures that ontan as many relevant ouments as possble, whh we all a hgh-reall goal. On the other se, the effetveness of strbute oument retreval s often measure by the Preson of the fnal merge oument result lst, whh we all a hgh-preson goal. Pror researh nate that these two goals are relate but not ental [4,2]. However, most prevous solutons smply use effetve resoure seleton algorthm of atabase reommenaton system for strbute oument retreval system or solve the nonssteny wth heurst methos [,4,2]. Ths paper presents a unfe utlty maxmzaton framewor to ntegrate the resoure seleton problem of both atabase reommenaton an strbute oument retreval together by treatng them as fferent optmzaton goals. Frst, a entralze sample atabase s bult by ranomly samplng a small amount of ouments from eah atabase wth query-base samplng []; atabase sze statsts are also estmate [2]. A logst transformaton moel s learne off lne wth a small amount of tranng queres to map the entralze oument sores n the entralze sample atabase to the orresponng probabltes of relevane. Seon, after a new query s submtte, the query an be use to searh the entralze sample atabase whh proues a sore for eah sample oument. The probablty of relevane for eah oument n the entralze sample atabase an be estmate by applyng the logst moel to eah oument s sore. Then, the probabltes of relevane of all the (mostly unseen ouments among the avalable atabases an be estmate usng the probabltes of relevane of the ouments n the entralze sample atabase an the atabase sze estmates. For the tas of resoure seleton for a atabase reommenaton system, the atabases an be rane by the expete number of relevant ouments to meet the hgh-reall goal. For resoure seleton for a strbute oument retreval system, atabases ontanng a small number of ouments wth large probabltes of relevane are favore over atabases ontanng many ouments wth small probabltes of relevane. Ths seleton rteron meets the hgh-preson goal of strbute oument retreval applaton. Furthermore, the Sem-supervse learnng (SSL [20,22] algorthm s apple to merge the returne ouments nto a fnal rane lst. The unfe utlty framewor maes very few assumptons an wors n unooperatve envronments. Two ey features mae t a more sol moel for strbute nformaton retreval: It formalzes the resoure seleton problems of fferent applatons as varous utlty funtons, an optmzes the utlty funtons to aheve the optmal results aorngly; an It shows an effetve an effent way to estmate the probabltes of relevane of all ouments aross atabases. Spefally, the framewor buls logst moels on the entralze sample atabase to transform entralze retreval sores to the orresponng probabltes of relevane an uses the entralze sample atabase as the brge between nvual atabases an the logst moel. The human effort (relevane jugment requre to tran the sngle entralze logst moel oes not sale wth the number of atabases. Ths s a large avantage over prevous researh, whh requre the amount of human effort to be lnear wth the number of atabases [7,5]. The unfe utlty framewor s not only more theoretally sol but also very effetve. Empral stues show the new moel to be at least as aurate as the state-of-the-art algorthms n a varety of onfguratons. The next seton susses relate wor. Seton 3 esrbes the new unfe utlty maxmzaton moel. Seton 4 explans our expermental methoology. Setons 5 an 6 present our expermental results for resoure seleton an oument retreval. Seton 7 onlues. 2. PRIOR RESEARCH There has been onserable researh on all the sub-problems of strbute nformaton retreval. We survey the most relate wor n ths seton. The frst problem of strbute nformaton retreval s resoure representaton. The STARTS protool s one soluton for aqurng resoure esrptons n ooperatve envronments [8]. However, n unooperatve envronments, even the atabases are wllng to share ther nformaton, t s not easy to juge whether the nformaton they prove s aurate or not. Furthermore, t s not easy to oornate the atabases to prove resoure representatons that are ompatble wth eah other. Thus, n unooperatve envronments, one ommon hoe s query-base samplng, whh ranomly generates an sens queres to nvual searh engnes an retreves some ouments to bul the esrptons. As the sample ouments are selete by ranom queres, query-base samplng s not easly foole by any aversaral spammer that s ntereste to attrat more traff. Experments have shown that rather aurate resoure esrptons an be bult by senng about 80 queres an ownloang about 300 ouments []. Many resoure seleton algorthms suh as ggloss/vgloss [8] an CORI [] have been propose n the last eae. The CORI algorthm represents eah atabase by ts terms, the oument frequenes an a small number of orpus statsts (etals n []. As pror researh on fferent atasets has shown the CORI algorthm to be the most stable an effetve of the three algorthms [,7,8], we use t as a baselne algorthm n ths wor. The relevant oument strbuton estmaton (ReDDE [2] resoure seleton algorthm s a reent algorthm that tres to estmate the strbuton of relevant ouments aross the avalable atabases an rans the atabases aorngly. Although the ReDDE algorthm has been shown to be effetve, t reles on heurst onstants that are set emprally [2]. The last step of the oument retreval sub-problem s results mergng, whh s the proess of transformng atabase-spef oument sores nto omparable atabase-nepenent

3 oument sores. The sem supervse learnng (SSL [20,22] result mergng algorthm uses the ouments aqure by querybase samplng as tranng ata an lnear regresson to learn the atabase-spef, query-spef mergng moels. These lnear moels are use to onvert the atabase-spef oument sores nto the approxmate entralze oument sores. The SSL algorthm has been shown to be effetve [22]. It serves as an mportant omponent of our unfe utlty maxmzaton framewor (Seton 3. In orer to aheve aurate oument retreval results, many prevous methos smply use resoure seleton algorthms that are effetve of atabase reommenaton system. But as ponte out above, a goo resoure seleton algorthm optmze for hgh-reall may not wor well for oument retreval, whh targets the hgh-preson goal. Ths type of nonssteny has been observe n prevous researh [4,2]. The researh n [2] tre to solve the problem wth a heurst metho. The researh most smlar to what we propose here s the eson-theoret framewor (DTF [7,5]. Ths framewor omputes a seleton that mnmzes the overall osts (e.g., retreval qualty, tme of oument retreval system an several methos [5] have been propose to estmate the retreval qualty. However, two ponts stngush our researh from the DTF moel. Frst, the DTF s a framewor esgne spefally for oument retreval, but our new moel ntegrates two stnt applatons wth fferent requrements (atabase reommenaton an strbute oument retreval nto the same unfe framewor. Seon, the DTF buls a moel for eah atabase to alulate the probabltes of relevane. Ths requres human relevane jugments for the results retreve from eah atabase. In ontrast, our approah only buls one logst moel for the entralze sample atabase. The entralze sample atabase an serve as a brge to onnet the nvual atabases wth the entralze logst moel, thus the probabltes of relevane of ouments n fferent atabases an be estmate. Ths strategy an save large amount of human jugment effort an s a bg avantage of the unfe utlty maxmzaton framewor over the DTF espeally when there are a large number of atabases. 3. UNIFIED UTILITY MAXIMIZATION FRAMEWORK The Unfe Utlty Maxmzaton (UUM framewor s base on estmatng the probabltes of relevane of the (mostly unseen ouments avalable n the strbute searh envronment. In ths seton we esrbe how the probabltes of relevane are estmate an how they are use by the Unfe Utlty Maxmzaton moel. We also esrbe how the moel an be optmze for the hgh-reall goal of a atabase reommenaton system an the hgh-preson goal of a strbute oument retreval system. 3. Estmatng Probabltes of Relevane As ponte out above, the purpose of resoure seleton s hghreall an the purpose of oument retreval s hgh-preson. In orer to meet these verse goals, the ey ssue s to estmate the probabltes of relevane of the ouments n varous atabases. Ths s a ffult problem beause we an only observe a sample of the ontents of eah atabase usng query-base samplng. Our strategy s to mae full use of all the avalable nformaton to alulate the probablty estmates. 3.. Learnng Probabltes of Relevane In the resoure esrpton step, the entralze sample atabase s bult by query-base samplng an the atabase szes are estmate usng the sample-resample metho [2]. At the same tme, an effetve retreval algorthm (Inquery [2] s apple on the entralze sample atabase wth a small number (e.g., 50 of tranng queres. For eah tranng query, the CORI resoure seleton algorthm [] s apple to selet some number (e.g., 0 of atabases an retreve 50 oument s from eah atabase. The SSL results mergng algorthm [20,22] s use to merge the results. Then, we an ownloa the top 50 ouments n the fnal merge lst an alulate ther orresponng entralze sores usng Inquery an the orpus statsts of the entralze sample atabase. The entralze sores are further normalze (ve by the maxmum entralze sore for eah query, as ths metho has been suggeste to mprove estmaton auray n prevous researh [5]. Human jugment s aqure for those ouments an a logst moel s bult to transform the normalze entralze oument sores to probabltes of relevane as follows: exp( a + b S ( R( = P( rel = _ ( + exp( a + b S ( _ where S ( s the normalze entralze oument sore an a an b are the two parameters of the logst moel. These two parameters are estmate by maxmzng the probabltes of relevane of the tranng queres. The logst moel proves us the tool to alulate the probabltes of relevane from entralze oument sores Estmatng Centralze Doument Sores When the user submts a new query, the entralze oument sores of the ouments n the entralze sample atabase are alulate. However, n orer to alulate the probabltes of relevane, we nee to estmate entralze oument sores for all ouments aross the atabases nstea of only the sample ouments. Ths goal s aomplshe usng: the entralze sores of the ouments n the entralze sample atabase, an the atabase sze statsts. We efne the atabase sale fator for the th atabase as the rato of the estmate atabase sze an the number of ouments sample from ths atabase as follows: SF N _ b b = (2 Nb _ samp where N b s the estmate atabase sze an N s the b _ samp number of ouments from the th atabase n the entralze sample atabase. The ntuton behn the atabase sale fator s that, for a atabase whose sale fator s 50, f one oument from ths atabase n the entralze sample atabase has a entralze oument sore of 0.5, we may guess that there are about 50 ouments n that atabase whh have sores of about 0.5. Atually, we an apply a fner non-parametr lnear nterpolaton metho to estmate the entralze oument sore urve for eah atabase. Formally, we ran all the sample ouments from the th atabase by ther entralze oument

4 sores to get the sample entralze oument sore lst {S (s, S (s 2, S (s 3,..} for the th atabase; we assume that f we oul alulate the entralze oument sores for all the ouments n ths atabase an get the omplete entralze oument sore lst, the top oument n the sample lst woul have ran SF b /2, the seon oument n the sample lst woul ran SF b 3/2, an so on. Therefore, the ata ponts of sample ouments n the omplete lst are: {(SF b /2, S (s, (SF b 3/2, S (s 2, (SF b 5/2, S (s 3,..}. Peewse lnear nterpolaton s apple to estmate the entralze oument sore urve, as llustrate n Fgure. The omplete entralze oument sore lst an be estmate by alulatng the values of fferent rans on the entralze oument urve as: S (, j [, ]. j N b It an be seen from Fgure that more sample ata ponts proue more aurate estmates of the entralze oument sore urves. However, for atabases wth large atabase sale ratos, ths n of lnear nterpolaton may be rather naurate, espeally for the top rane (e.g., [, SF b /2] ouments. Therefore, an alternatve soluton s propose to estmate the entralze oument sores of the top rane ouments for atabases wth large sale ratos (e.g., larger than 00. Spefally, a logst moel s bult for eah of these atabases. The logst moel s use to estmate the entralze oument sore of the top oument n the orresponng atabase by usng the two sample ouments from that atabase wth hghest entralze sores. S ( exp( α 0 + α S ( s + α 2S ( s2 = + exp( α + α S ( s + α S ( s 0 2 α, 0 α an α are the parameters of the logst moel. For 2 eah tranng query, the top retreve oument of eah atabase s ownloae an the orresponng entralze oument sore s alulate. Together wth the sores of the top two sample ouments, these parameters an be estmate. After the entralze sore of the top oument s estmate, an exponental funton s ftte for the top part ([, SF b /2] of the entralze oument sore urve as: S ( j 0 2 (3 = exp( β + β j j [, SF / 2] (4 0 β = log( S ( β (5 (log( S ( s log( S ( β = (6 ( SF / 2 The two parameters β an 0 β b b are ftte to mae sure the exponental funton passes through the two ponts (, S ( an (SF b /2, S (s. The exponental funton s only use to ajust the top part of the entralze oument sore urve an the lower part of the urve s stll ftte wth the lnear nterpolaton metho esrbe above. The ajustment by fttng exponental funton of the top rane ouments has been shown emprally to proue more aurate results. Fgure. Lnear nterpolaton onstruton of the omplete entralze oument sore lst (atabase sale fator s 50. From the entralze oument sore urves, we an estmate the omplete entralze oument sore lsts aorngly for all the avalable atabases. After the estmate entralze oument sores are normalze, the omplete lsts of probabltes of relevane an be onstrute out of the omplete entralze oument sore lsts by Equaton. Formally for the th atabase, the omplete lst of probabltes of relevane s: R(, j [, ]. j N b 3.2 The Unfe Utlty Maxmzaton Moel In ths seton, we formally efne the new unfe utlty maxmzaton moel, whh optmzes the resoure seleton problems for two goals of hgh-reall (atabase reommenaton an hgh-preson (strbute oument retreval n the same framewor. In the tas of atabase reommenaton, the system nees to ee how to ran atabases. In the tas of oument retreval, the system not only nees to selet the atabases but also nees to ee how many ouments to retreve from eah selete atabase. We generalze the atabase reommenaton seleton proess, whh mpltly reommens all ouments n every selete atabase, as a speal ase of the seleton eson for the oument retreval tas. Formally, we enote as the number of ouments we woul le to retreve from the th atabase an = {, 2,...} as a seleton aton for all the atabases. The atabase seleton eson s mae base on the omplete lsts of probabltes of relevane for all the atabases. The omplete lsts of probabltes of relevane are nferre from all the avalable nformaton spefally R, whh stans for the s resoure esrptons aqure by query-base samplng an the atabase sze estmates aqure by sample-resample; S stans for the entralze oument sores of the ouments n the entralze sample atabase. If the metho of estmatng entralze oument sores an probabltes of relevane n Seton 3. s aeptable, then the most probable omplete lsts of probabltes of relevane an be erve an we enote them as θ = {(R(, j [, N b ], 2 j (R(, j [, N ],...}. Ranom vetor enotes an b2 arbtrary set of omplete lsts of probabltes of relevane an P θ R s, S as the probablty of generatng ths set of lsts. ( Fnally, to eah seleton aton an a set of omplete lsts of j

5 probabltes of relevane θ, we assoate a utlty funton U ( θ, whh nates the beneft from mang the seleton when the true omplete lsts of probabltes of relevane are θ. Therefore, the seleton eson efne by the Bayesan framewor s: = arg max θ U (, θ P( θ R. S θ One ommon approah to smplfy the omputaton n the Bayesan framewor s to only alulate the utlty funton at the most probable parameter values nstea of alulatng the whole expetaton. In other wors, we only nee to alulate U ( θ, an Equaton 7 s smplfe as follows: = arg max U (, θ Ths equaton serves as the bas moel for both the atabase reommenaton system an the oument retreval system. 3.3 Resoure Seleton for Hgh-Reall Hgh-reall s the goal of the resoure seleton algorthm n feerate searh tass suh as atabase reommenaton. The goal s to selet a small set of resoures (e.g., less than N sb atabases that ontan as many relevant ouments as possble, whh an be formally efne as: b s (7 (8 U (, θ = R( (9 N s the nator funton, whh s when the th atabase s selete an 0 otherwse. Plug ths equaton nto the bas moel n Equaton 8 an assoate the selete atabase number onstrant to obtan the followng: = arg max Subjet to : N b = N sb j R( j (0 The soluton of ths optmzaton problem s very smple. We an alulate the expete number of relevant ouments for eah atabase as follows: b N R = R ( j ( N The N sb atabases wth the largest expete number of relevant ouments an be selete to meet the hgh-reall goal. We all ths the UUM/HR algorthm (Unfe Utlty Maxmzaton for Hgh-Reall. 3.4 Resoure Seleton for Hgh-Preson Hgh-Preson s the goal of resoure seleton algorthm n feerate searh tass suh as strbute oument retreval. It s measure by the Preson at the top part of the fnal merge oument lst. Ths hgh-preson rteron s realze by the followng utlty funton, whh measures the Preson of retreve ouments from the selete atabases. I ( U (, θ = R( (2 Note that the ey fferene between Equaton 2 an Equaton 9 s that Equaton 9 sums up the probabltes of relevane of all the ouments n a atabase, whle Equaton 2 only onsers a muh smaller part of the ranng. Spefally, we an alulate the optmal seleton eson by: j = = arg max R( (3 Dfferent ns of onstrants ause by fferent haratersts of the oument retreval tass an be assoate wth the above optmzaton problem. The most ommon one s to selet a fxe number (N sb of atabases an retreve a fxe number (N ro of ouments from eah selete atabase, formally efne as: = arg max R( j Subjet to : = N = N ro sb, f j j 0 (4 Ths optmzaton problem an be solve easly by alulatng the number of expete relevant ouments n the top part of the eah atabase s omplete lst of probabltes of relevane: N = ro _ R R ( j N Top (5 Then the atabases an be rane by these values an selete. We all ths the UUM/HP-FL algorthm (Unfe Utlty Maxmzaton for Hgh-Preson wth Fxe Length oument ranngs from eah selete atabase. A more omplex stuaton s to vary the number of retreve ouments from eah selete atabase. More spefally, we allow fferent selete atabases to return fferent numbers of ouments. For smplfaton, the result lst lengths are requre to be multples of a baselne number 0. (Ths value an also be vare, but for smplfaton t s set to 0 n ths paper. Ths restrton s set to smulate the behavor of ommeral searh engnes on the Web. (Searh engnes suh as Google an AltaVsta return only 0 or 20 oument s for every result page. Ths proeure saves the omputaton tme of alulatng optmal atabase seleton by allowng the step of ynam programmng to be 0 nstea of (more etal s susse latterly. For further smplfaton, we restrt to selet at most 00 ouments from eah atabase ( <=00 Then, the seleton optmzaton problem s formalze as follows: = arg max R( j Subjet to : = N = N = 0, sb Total _ ro [0,, 2,..,0] N Total_ro s the total number of ouments to be retreve. (6 Unfortunately, there s no smple soluton for ths optmzaton problem as there are for Equatons 0 an 4. However, a

6 Input: Complete lsts of probabltes of relevane for all the DB atabases. Output: Optmal seleton soluton for Equaton 6. Create the three-mensonal array: Sel (.. DB,..N Total_ro/0,..N sb Eah Sel (x, y, z s assoate wth a seleton eson xyz, whh represents the best seleton eson n the onton: only atabases from number to number x are onsere for seleton; totally y0 ouments wll be retreve; only z atabases are selete out of the x atabase anates. An Sel (x, y, z s the orresponng utlty value by hoosng the best seleton. Intalze Sel (,..N Total_ro /0,..N sb wth only the estmate relevane nformaton of the st atabase. Iterate the urrent atabase anate from 2 to DB For eah entry Sel (, y, z: Fn suh that: = arg max ( Sel(, y, z + R( subjet to : mn( y,0 If ( Sel(, y, z + j 0 R( j 0 ynam programmng algorthm an be apple to alulate the optmal soluton. The bas steps of ths ynam programmng metho are esrbe n Fgure 2. As ths algorthm allows retrevng result lsts of varyng lengths from eah selete atabase, t s alle UUM/HP-VL algorthm. After the seleton esons are mae, the selete atabases are searhe an the orresponng oument s are retreve from eah atabase. The fnal step of oument retreval s to merge the returne results nto a sngle rane lst wth the semsupervse learnng algorthm. It was ponte out before that the SSL algorthm maps the atabase-spef sores nto the entralze oument sores an buls the fnal rane lst aorngly, whh s onsstent wth all our seleton proeures where ouments wth hgher probabltes of relevane (thus hgher entralze oument sores are selete. 4. EXPERIMENTAL METHODOLOGY 4. Testbes It s esrable to evaluate strbute nformaton retreval algorthms wth testbes that losely smulate the real worl applatons. The TREC Web olletons WT2g or WT0g [4,3] prove a way to partton ouments by fferent Web servers. In ths way, a large number (O(000 of atabases wth rather verse j j > Sel(, y, z Ths means that we shoul retreve 0 ouments from the th atabase, otherwse we shoul not selet ths atabase an the prevous best soluton Sel (-, y, z shoul be ept. Then set the value of yz an Sel (, y, z aorngly. v The best seleton soluton s gven by DB NToral _ ro /0Nsb an the orresponng utlty value s Sel ( DB, N Total_ro/0, N sb. Fgure 2. The ynam programmng optmzaton proeure for Equaton 6. Table: Testbe statsts. Sze Number of ouments Sze (MB Testbe (GB Mn Avg Max Mn Avg Max Tre Table2: Query set statsts. TREC TREC Average Length Name Top Set Top Fel (Wors Tre Ttle 3. ontents oul be reate, whh may mae ths testbe a goo anate to smulate the operatonal envronments suh as open oman hen Web. However, two weaness of ths testbe are: Eah atabase ontans only a small amount of oument (259 ouments by average for WT2g [4]; an The ontents of WT2g or WT0g are arbtrarly rawle from the Web. It s not lely for a hen Web atabase to prove personal homepages or web pages natng that the pages are uner onstruton an there s no useful nformaton at all. These types of web pages are ontane n the WT2g/WT0g atasets. Therefore, the nosy Web ata s not smlar wth that of hgh-qualty hen Web atabase ontents, whh are usually organze by oman experts. Another hoe s the TREC news/government ata [,5,7, 8,2]. TREC news/government ata s onentrate on relatvely narrow tops. Compare wth TREC Web ata: The news/government ouments are muh more smlar to the ontents prove by a top-orente atabase than an arbtrary web page, A atabase n ths testbe s larger than that of TREC Web ata. By average a atabase ontans thousans of ouments, whh s more realst than a atabase of TREC Web ata wth about 250 ouments. As the ontents an szes of the atabases n the TREC news/government testbe are more smlar wth that of a top-orente atabase, t s a goo anate to smulate the strbute nformaton retreval envronments of large organzatons (ompanes or omanspef hen Web stes, suh as West that proves aess to legal, fnanal an news text atabases [3]. As most urrent strbute nformaton retreval systems are evelope for the envronments of large organzatons (ompanes or omanspef hen Web other than open oman hen Web, TREC news/government testbe was hosen n ths wor. Tre23-00ol-bysoure testbe s one of the most use TREC news/government testbe [,5,7,2]. It was hosen n ths wor. Three testbes n [2] wth sewe atabase sze strbutons an fferent types of relevant oument strbutons were also use to gve more thorough smulaton for real envronments. Tre23-00ol-bysoure: 00 atabases were reate from TREC CDs, 2 an 3. They were organze by soure an publaton ate []. The szes of the atabases are not sewe. Detals are n Table. Three testbes bult n [2] were base on the tre23-00olbysoure testbe. Eah testbe ontans many small atabases an two large atabases reate by mergng about 0-20 small atabases together.

7 Colletons Selete. Tre23-00Col Testbe. Colletons Selete. Representatve Testbe. envronment, three fferent types of searh engnes were use n the experments: INQUERY [2], a ungram statstal language moel wth lnear smoothng [2,20] an a TFIDF retreval algorthm wth lt weght [2,20]. All these algorthms were mplemente wth the Lemur toolt [2]. These three ns of searh engnes were assgne to the atabases among the four testbes n a roun-robn manner. 5. RESULTS: RESOURCE SELECTION OF DATABASE RECOMMENDATION All four testbes esrbe n Seton 4 were use n the experments to evaluate the resoure seleton effetveness of the atabase reommenaton system. The resoure esrptons were reate usng query-base samplng. About 80 queres were sent to eah atabase to ownloa 300 unque ouments. The atabase sze statsts were estmate by the sample-resample metho [2]. Ffty queres (0-50 were use as tranng queres to bul the relevant logst moel an to ft the exponental funtons of the entralze oument sore urves for large rato atabases (etals n Seton 3.. Another 50 queres (5-00 were use as test ata. Colleton Selete. Colleton Selete. Relevant Testbe. Nonrelevant Testbe. Fgure 3. Resoure seleton experments on the four testbes. Tre23-2lb-60ol ( representatve : The atabases n the tre23-00ol-bysoure were sorte wth alphabetal orer. Two large atabases were reate by mergng 20 small atabases wth the roun-robn metho. Thus, the two large atabases have more relevant ouments ue to ther large szes, even though the enstes of relevant ouments are roughly the same as the small atabases. Tre23-AP-WSJ-60ol ( relevant : The 24 Assoate Press olletons an the 6 Wall Street Journal olletons n the tre23-00ol-bysoure testbe were ollapse nto two large atabases APall an WSJall. The other 60 olletons were left unhange. The APall an WSJall atabases have hgher enstes of ouments relevant to TREC queres than the small atabases. Thus, the two large atabases have many more relevant ouments than the small atabases. Tre23-FR-DOE-8ol ( nonrelevant : The 3 Feeral Regster olletons an the 6 Department of Energy olletons n the tre23-00ol-bysoure testbe were ollapse nto two large atabases FRall an DOEall. The other 80 olletons were left unhange. The FRall an DOEall atabases have lower enstes of ouments relevant to TREC queres than the small atabases, even though they are muh larger. 00 queres were reate from the ttle fels of TREC tops The queres 0-50 were use as tranng queres an the queres 5-00 were use as test queres (etals n Table Searh Engnes In the unooperatve strbute nformaton retreval envronments of large organzatons (ompanes or omanspef hen Web, fferent atabases may use fferent types of searh engne. To smulate the multple type-engne Resoure seleton algorthms of atabase reommenaton systems are typally ompare usng the reall metr R n [,7,8,2]. Let B enote a baselne ranng, whh s often the RBR (relevane base ranng, an E as a ranng prove by a resoure seleton algorthm. An let B an E enote the number of relevant ouments n the th rane atabase of B or E. Then R n s efne as follows: R = = = E B (7 Usually the goal s to searh only a few atabases, so our fgures only show results for seletng up to 20 atabases. The experments summarze n Fgure 3 ompare the effetveness of the three resoure seleton algorthms, namely the CORI, ReDDE an UUM/HR. The UUM/HR algorthm s esrbe n Seton 3.3. It an be seen from Fgure 3 that the ReDDE an UUM/HR algorthms are more effetve (on the representatve, relevant an nonrelevant testbes or as goo as (on the Tre23-00Col testbe the CORI resoure seleton algorthm. The UUM/HR algorthm s more effetve than the ReDDE algorthm on the representatve an relevant testbes an s about the same as the ReDDE algorthm on the Tre23-00Col an the nonrelevant testbes. Ths suggests that the UUM/HR algorthm s more robust than the ReDDE algorthm. It an be note that when seletng only a few atabases on the Tre23-00Col or the nonrelevant testbes, the ReDEE algorthm has a small avantage over the UUM/HR algorthm. We attrbute ths to two auses: The ReDDE algorthm was tune on the Tre23-00Col testbe; an Although the fferene s small, ths may suggest that our logst moel of estmatng probabltes of relevane s not aurate enough. More tranng ata or a more sophstate moel may help to solve ths mnor puzzle.

8 Table 3. Preson on the tre23-00ol-bysoure testbe when 3 atabases were selete. (The frst baselne s CORI; the seon baselne for UUM/HP methos s UUM/HR. Preson at Do Ran CORI ReDDE UUM/HR UUM/HP-FL UUM/HP-VL 5 os (-4.4% (+8.8% (+28.6%(+8.% (+27.5%(+7.2% 0 os (-4.8% (+4.8% (+26.2%(+20.5% (+25.6%(+9.9% 5 os (-2.0% (+2.9% (+22.2%(+5.7% (+20.5%(+7.% 20 os (-5.% (+4.% (+8.5%(+3.8% (+7.8%(+3.2% 30 os (-4.3% (+6.9% (+22.8%(+4.8% (+22.3%(+4.4% Table 4. Preson on the tre23-00ol-bysoure testbe when 5 atabases were selete. (The frst baselne s CORI; the seon baselne for UUM/HP methos s UUM/HR. Preson at Do Ran CORI ReDDE UUM/HR UUM/HP-FL UUM/HP-VL 5 os (-2.0% (+7.0% (+7.0%(+9.4% (+5.0%(+7.5% 0 os (-.% (+0.0% (+0.0%(+0.0% (+3.7%(+3.7% 5 os (+0.0% (+4.5% (+0.%(+5.4% (+4.6%(+9.7% 20 os (-.2% (+3.5% (+8.2%(+4.5% (+.7%(+7.9% 30 os (-3.% (+2.3% (+8.0%(+5.6% (+7.6%(+5.3% Table 5. Preson on the representatve testbe when 3 atabases were selete. (The frst baselne s CORI; the seon baselne for UUM/HP methos s UUM/HR. Preson at Do Ran CORI ReDDE UUM/HR UUM/HP-FL UUM/HP-VL 5 os (+9.7% (+24.7% (+23.7%(-0.9% (+34.4%(+7.8% 0 os (+9.4% (+35.3% (+33.5%(-.3% (+36.5%(+0.9% 5 os (+24.4% (+38.5% (+35.9%(-.9% (+4.4%( os (+25.0% (+36.0% (+34.7%(-.0% (+4.3%(+4.0% 30 os (+35.8% (+5.9% (+47.9%(-2.6% (+53.5%(+.0% Table 6. Preson on the representatve testbe when 5 atabases were selete. (The frst baselne s CORI; the seon baselne for UUM/HP methos s UUM/HR. Preson at Do Ran CORI ReDDE UUM/HR UUM/HP-FL UUM/HP-VL 5 os (+3.0% (+5.2% (+8.%(-6.% (+4.%(-0.9% 0 os (+4.6% (+0.3% (+5.0%(+4.2% (+7.5%(+6.5% 5 os (+2.9% (+9.6% (+25.7%(+5.0% (+26.0%(+5.4% 20 os (+8.9% (+24.3% (+28.8%(+3.6% (+30.6%(+5.% 30 os (+26.% (+35.3% (+34.4%(-0.7% (+36.8%(+.2% 6. RESULTS: DOCUMENT RETRIEVAL EFFECTIVENESS For oument retreval, the selete atabases are searhe an the returne results are merge nto a sngle fnal lst. In all of the experments susse n ths seton the results retreve from nvual atabases were ombne by the semsupervse learnng results mergng algorthm. Ths verson of the SSL algorthm [22] s allowe to ownloa a small number of returne oument texts on the fly to reate atonal tranng ata n the proess of learnng the lnear moels whh map atabase-spef oument sores nto estmate entralze oument sores. It has been shown to be very effetve n envronments where only short result-lsts are retreve from eah selete atabase [22]. Ths s a ommon senaro n operatonal envronments an was the ase for our experments. Doument retreval effetveness was measure by Preson at the top part of the fnal oument lst. The experments n ths seton were onute to stuy the oument retreval effetveness of fve seleton algorthms, namely the CORI, ReDDE, UUM/HR, UUM/HP-FL an UUM/HP-VL algorthms. The last three algorthms were propose n Seton 3. All the frst four algorthms selete 3 or 5 atabases, an 50 ouments were retreve from eah selete atabase. The UUM/HP-FL algorthm also selete 3 or 5 atabases, but t was allowe to ajust the number of ouments to retreve from eah selete atabase; the number retreve was onstrane to be from 0 to 00, an a multple of 0. The Tre23-00Col an representatve testbes were selete for oument retreval as they represent two extreme ases of resoure seleton effetveness; n one ase the CORI algorthm s as goo as the other algorthms an n the other ase t s qute

9 Table 7. Preson on the tre23-00ol-bysoure testbe when 3 atabases were selete (The frst baselne s CORI; the seon baselne for UUM/HP methos s UUM/HR. (Searh engnes o not return oument sores Preson at Do Ran CORI ReDDE UUM/HR UUM/HP-FL UUM/HP-VL 5 os (-8.0% (+4.6% (+28.4%(+22.8% (+28.4%( os (-5.4% (+0.6% (+24.%(+23.4% (+2.%(+20.4% 5 os (-7.4% (+.6% (+2.5%(+9.5% (+5.7%(+3.8% 20 os (-5.6% (+3.3% (+2.2%(+7.3% (+8.5%(+4.7% 30 os (-3.2% (+6.3% (+20.0%(+2.9% (+20.0%(+2.9% a lot worse than the other algorthms. Tables 3 an 4 show the results on the Tre23-00Col testbe, an Tables 5 an 6 show the results on the representatve testbe. On the Tre23-00Col testbe, the oument retreval effetveness of the CORI seleton algorthm s roughly the same or a lttle bt better than the ReDDE algorthm but both of them are worse than the other three algorthms (Tables 3 an 4. The UUM/HR algorthm has a small avantage over the CORI an ReDDE algorthms. One man fferene between the UUM/HR algorthm an the ReDDE algorthm was ponte out before: The UUM/HR uses tranng ata an lnear nterpolaton to estmate the entralze oument sore urves, whle the ReDDE algorthm [2] uses a heurst metho, assumes the entralze oument sore urves are step funtons an maes no stnton among the top part of the urves. Ths fferene maes UUM/HR better than the ReDDE algorthm at stngushng ouments wth hgh probabltes of relevane from low probabltes of relevane. Therefore, the UUM/HR reflets the hgh-preson retreval goal better than the ReDDE algorthm an thus s more effetve for oument retreval. The UUM/HR algorthm oes not expltly optmze the seleton eson wth respet to the hgh-preson goal as the UUM/HP-FL an UUM/HP-VL algorthms are esgne to o. It an be seen that on ths testbe, the UUM/HP-FL an UUM/HP-VL algorthms are muh more effetve than all the other algorthms. Ths nates that ther power omes from expltly optmzng the hgh-preson goal of oument retreval n Equatons 4 an 6. On the representatve testbe, CORI s muh less effetve than other algorthms for strbute oument retreval (Tables 5 an 6. The oument retreval results of the ReDDE algorthm are better than that of the CORI algorthm but stll worse than the results of the UUM/HR algorthm. On ths testbe the three UUM algorthms are about equally effetve. Detale analyss shows that the overlap of the selete atabases between the UUM/HR, UUM/HP-FL an UUM/HP-VL algorthms s muh larger than the experments on the Tre23-00Col testbe, sne all of them ten to selet the two large atabases. Ths explans why they are about equally effetve for oument retreval. In real operatonal envronments, atabases may return no oument sores an report only rane lsts of results. As the unfe utlty maxmzaton moel only utlzes retreval sores of sample ouments wth a entralze retreval algorthm to alulate the probabltes of relevane, t maes atabase seleton esons wthout referrng to the oument sores from nvual atabases an an be easly generalze to ths ase of ran lsts wthout oument sores. The only ajustment s that the SSL algorthm merges rane lsts wthout oument sores by assgnng the ouments wth pseuo-oument sores normalze for ther rans (In a rane lst of 50 ouments, the frst one has a sore of, the seon has a sore of 0.98 et,whh has been stue n [22]. The experment results on tre23-00col-bysoure testbe wth 3 selete atabases are shown n Table 7. The experment settng was the same as before exept that the oument sores were elmnate ntentonally an the selete atabases only return rane lsts of oument s. It an be seen from the results that the UUM/HP-FL an UUM/HP-VL wor well wth atabases returnng no oument sores an are stll more effetve than other alternatves. Other experments wth atabases that return no oument sores are not reporte but they show smlar results to prove the effetveness of UUM/HP-FL an UUM/HP- VL algorthms. The above experments suggest that t s very mportant to optmze the hgh-preson goal expltly n oument retreval. The new algorthms base on ths prnple aheve better or at least as goo results as the pror state-of-the-art algorthms n several envronments. 7. CONCLUSION Dstrbute nformaton retreval solves the problem of fnng nformaton that s sattere among many text atabases on loal area networs an Internets. Most prevous researh use effetve resoure seleton algorthm of atabase reommenaton system for strbute oument retreval applaton. We argue that the hgh-reall resoure seleton goal of atabase reommenaton an hgh-preson goal of oument retreval are relate but not ental. Ths n of nonssteny has also been observe n prevous wor, but the pror solutons ether use heurst methos or assume ooperaton by nvual atabases (e.g., all the atabases use the same n of searh engnes, whh s frequently not true n the unooperatve envronment. In ths wor we propose a unfe utlty maxmzaton moel to ntegrate the resoure seleton of atabase reommenaton an oument retreval tass nto a sngle unfe framewor. In ths framewor, the seleton esons are obtane by optmzng fferent objetve funtons. As far as we now, ths s the frst wor that tres to vew an theoretally moel the strbute nformaton retreval tas n an ntegrate manner. The new framewor ontnues a reent researh tren stuyng the use of query-base samplng an a entralze sample atabase. A sngle logst moel was trane on the entralze

10 sample atabase to estmate the probabltes of relevane of ouments by ther entralze retreval sores, whle the entralze sample atabase serves as a brge to onnet the nvual atabases wth the entralze logst moel. Therefore, the probabltes of relevane for all the ouments aross the atabases an be estmate wth very small amount of human relevane jugment, whh s muh more effent than prevous methos that bul a separate moel for eah atabase. Ths framewor s not only more theoretally sol but also very effetve. One algorthm for resoure seleton (UUM/HR an two algorthms for oument retreval (UUM/HP-FL an UUM/HP-VL are erve from ths framewor. Empral stues have been onute on testbes to smulate the strbute searh solutons of large organzatons (ompanes or oman-spef hen Web. Furthermore, the UUM/HP-FL an UUM/HP-VL resoure seleton algorthms are extene wth a varant of SSL results mergng algorthm to aress the strbute oument retreval tas when selete atabases o not return oument sores. Experments have shown that these algorthms aheve results that are at least as goo as the pror state-of-the-art, an sometmes onserably better. Detale analyss nates that the avantage of these algorthms omes from expltly optmzng the goals of the spef tass. The unfe utlty maxmzaton framewor s open for fferent extensons. When ost s assoate wth searhng the onlne atabases, the utlty framewor an be ajuste to automatally estmate the best number of atabases to searh so that a large amount of relevant ouments an be retreve wth relatvely small osts. Another extenson of the framewor s to onser the retreval effetveness of the onlne atabases, whh s an mportant ssue n the operatonal envronments. All of these are the retons of future researh. ACKNOWLEGEMENT Ths researh was supporte by NSF grants EIA an IIS Any opnons, fnngs, onlusons, or reommenatons expresse n ths paper are the authors, an o not neessarly reflet those of the sponsor. REFERENCES [] J. Callan. (2000. Dstrbute nformaton retreval. In W.B. Croft, etor, Avanes n Informaton Retreval. Kluwer Aaem Publshers. (pp [2] J. Callan, W.B. Croft, an J. Broglo. (995. TREC an TIPSTER experments wth INQUERY. Informaton Proessng an Management, 3(3. (pp [3] J. G. Conra, X. S. Guo, P. Jason an M. Mezou. (2002. Database seleton usng atual physal an aqure logal olleton resoures n a massve omanspef operatonal envronment. Dstrbute searh over the hen web: Herarhal atabase samplng an seleton. In Proeengs of the 28 th Internatonal Conferene on Very Large Databases (VLDB. [4] N. Craswell. (2000. Methos for strbute nformaton retreval. Ph. D. thess, The Australan Naton Unversty. [5] N. Craswell, D. Hawng, an P. Thstlewate. (999. Mergng results from solate searh engnes. In Proeengs of 0th Australasan Database Conferene. [6] D. D'Souza, J. Thom, an J. Zobel. (2000. A omparson of tehnques for seletng text olletons. In Proeengs of the th Australasan Database Conferene. [7] N. Fuhr. (999. A Deson-Theoret approah to atabase seleton n networe IR. ACM Transatons on Informaton Systems, 7(3. (pp [8] L. Gravano, C. Chang, H. Gara-Molna, an A. Paepe. (997. STARTS: Stanfor proposal for nternet metasearhng. In Proeengs of the 20th ACM-SIGMOD Internatonal Conferene on Management of Data. [9] L. Gravano, P. Iperots an M. Saham. (2003. QProber: A System for Automat Classfaton of Hen-Web Databases. ACM Transatons on Informaton Systems, 2(. [0] P. Iperots an L. Gravano. (2002. Dstrbute searh over the hen web: Herarhal atabase samplng an seleton. In Proeengs of the 28th Internatonal Conferene on Very Large Databases (VLDB. [] InvsbleWeb.om. [2] The lemur toolt. [3] J. Lu an J. Callan. (2003. Content-base nformaton retreval n peer-to-peer networs. In Proeengs of the 2th Internatonal Conferene on Informaton an Knowlege Management. [4] W. Meng, C.T. Yu an K.L. Lu. (2002 Bulng effent an effetve metasearh engnes. ACM Comput. Surv. 34(. [5] H. Nottelmann an N. Fuhr. (2003. Evaluatng fferent metho of estmatng retreval qualty for resoure seleton. In Proeengs of the 25th Annual Internatonal ACM SIGIR Conferene on Researh an Development n Informaton Retreval. [6] H., Nottelmann an N., Fuhr. (2003. The MIND arhteture for heterogeneous multmea feerate gtal lbrares. ACM SIGIR 2003 Worshop on Dstrbute Informaton Retreval. [7] A.L. Powell, J.C. Frenh, J. Callan, M. Connell, an C.L. Vles. (2000. The mpat of atabase seleton on strbute searhng. In Proeengs of the 23r Annual Internatonal ACM SIGIR Conferene on Researh an Development n Informaton Retreval. [8] A.L. Powell an J.C. Frenh. (2003. Comparng the performane of atabase seleton algorthms. ACM Transatons on Informaton Systems, 2(4. (pp [9] C. Sherman (200. Searh for the nvsble web. Guaran Unlmte. [20] L. S an J. Callan. (2002. Usng sample ata an regresson to merge searh engne results. In Proeengs of the 25th Annual Internatonal ACM SIGIR Conferene on Researh an Development n Informaton Retreval. [2] L. S an J. Callan. (2003. Relevant oument strbuton estmaton metho for resoure seleton. In Proeengs of the 26th Annual Internatonal ACM SIGIR Conferene on Researh an Development n Informaton Retreval. [22] L. S an J. Callan. (2003. A Sem-Supervse learnng metho to merge searh engne results. ACM Transatons on Informaton Systems, 2(4. (pp

JSM Survey Research Methods Section. Is it MAR or NMAR? Michail Sverchkov

JSM Survey Research Methods Section. Is it MAR or NMAR? Michail Sverchkov JSM 2013 - Survey Researh Methods Seton Is t MAR or NMAR? Mhal Sverhkov Bureau of Labor Statsts 2 Massahusetts Avenue, NE, Sute 1950, Washngton, DC. 20212, Sverhkov.Mhael@bls.gov Abstrat Most methods that

More information

BINARY LAMBDA-SET FUNCTION AND RELIABILITY OF AIRLINE

BINARY LAMBDA-SET FUNCTION AND RELIABILITY OF AIRLINE BINARY LAMBDA-SET FUNTION AND RELIABILITY OF AIRLINE Y. Paramonov, S. Tretyakov, M. Hauka Ra Tehnal Unversty, Aeronautal Insttute, Ra, Latva e-mal: yur.paramonov@mal.om serejs.tretjakovs@mal.om mars.hauka@mal.om

More information

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University Clusterng CS4780/5780 Mahne Learnng Fall 2012 Thorsten Joahms Cornell Unversty Readng: Mannng/Raghavan/Shuetze, Chapters 16 (not 16.3) and 17 (http://nlp.stanford.edu/ir-book/) Outlne Supervsed vs. Unsupervsed

More information

Outline. Clustering: Similarity-Based Clustering. Supervised Learning vs. Unsupervised Learning. Clustering. Applications of Clustering

Outline. Clustering: Similarity-Based Clustering. Supervised Learning vs. Unsupervised Learning. Clustering. Applications of Clustering Clusterng: Smlarty-Based Clusterng CS4780/5780 Mahne Learnng Fall 2013 Thorsten Joahms Cornell Unversty Supervsed vs. Unsupervsed Learnng Herarhal Clusterng Herarhal Agglomeratve Clusterng (HAC) Non-Herarhal

More information

Discriminative Estimation (Maxent models and perceptron)

Discriminative Estimation (Maxent models and perceptron) srmnatve Estmaton Maxent moels an pereptron Generatve vs. srmnatve moels Many sles are aapte rom sles by hrstopher Mannng Introuton So ar we ve looke at generatve moels Nave Bayes But there s now muh use

More information

A Developed Method of Tuning PID Controllers with Fuzzy Rules for Integrating Processes

A Developed Method of Tuning PID Controllers with Fuzzy Rules for Integrating Processes A Develope Metho of Tunng PID Controllers wth Fuzzy Rules for Integratng Proesses Janmng Zhang, Nng Wang an Shuqng Wang Abstrat The proportonal ntegral ervatve (PID) ontrollers are wely apple n nustral

More information

ECE 522 Power Systems Analysis II 2 Power System Modeling

ECE 522 Power Systems Analysis II 2 Power System Modeling ECE 522 Power Systems Analyss II 2 Power System Moelng Sprng 218 Instrutor: Ka Sun 1 Outlne 2.1 Moelng of synhronous generators for Stablty Stues Synhronous Mahne Moelng Smplfe Moels for Stablty Stues

More information

ECE 422 Power System Operations & Planning 2 Synchronous Machine Modeling

ECE 422 Power System Operations & Planning 2 Synchronous Machine Modeling ECE 422 Power System Operatons & Plannng 2 Synhronous Mahne Moelng Sprng 219 Instrutor: Ka Sun 1 Outlne 2.1 Moelng of synhronous generators for Stablty Stues Synhronous Mahne Moelng Smplfe Moels for Stablty

More information

The corresponding link function is the complementary log-log link The logistic model is comparable with the probit model if

The corresponding link function is the complementary log-log link The logistic model is comparable with the probit model if SK300 and SK400 Lnk funtons for bnomal GLMs Autumn 08 We motvate the dsusson by the beetle eample GLMs for bnomal and multnomal data Covers the followng materal from hapters 5 and 6: Seton 5.6., 5.6.3,

More information

An Evaluation on Feature Selection for Text Clustering

An Evaluation on Feature Selection for Text Clustering An Evaluaton on Feature Seleton for Text Clusterng Tao Lu Department of Informaton Sene, anka Unversty, Tann 30007, P. R. Chna Shengpng Lu Department of Informaton Sene, Pekng Unversty, Beng 0087, P. R.

More information

Global Exponential Stability of FAST TCP

Global Exponential Stability of FAST TCP Global Exponental Stablty of FAST TCP Joon-Young Cho Kyungmo Koo Dav X. We Jn S. Lee an Steven H. Low Abstrat We onser a sngle-lnk mult-soure network wth the FAST TCP soures. We propose a ontnuous-tme

More information

A ME Model Based on Feature Template for Chinese Text Categorization

A ME Model Based on Feature Template for Chinese Text Categorization A ME Model Based on Feature Template for Chnese Text Categorzaton L Pe-feng *+ Zhu Qao-mng *+ L Jun-hu * * Shool of Computer Sene & Tehnology Soohow Unversty Suzhou, Jangsu, Chna Abstrat - Wth enterng

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Instance-Based Learning and Clustering

Instance-Based Learning and Clustering Instane-Based Learnng and Clusterng R&N 04, a bt of 03 Dfferent knds of Indutve Learnng Supervsed learnng Bas dea: Learn an approxmaton for a funton y=f(x based on labelled examples { (x,y, (x,y,, (x n,y

More information

Jointly optimized rate-compatible UEP-LDPC codes for half-duplex co-operative relay networks

Jointly optimized rate-compatible UEP-LDPC codes for half-duplex co-operative relay networks Khattak an Sanberg EURASIP Journal on Wreless Communatons an Networkng 2014, 2014:22 http://wn.euraspournals.om/ontent/2014/1/22 RESEARCH Open Aess Jontly optmze rate-ompatble UEP-LDPC oes for half-uplex

More information

Learning to Identify Unexpected Instances in the Test Set

Learning to Identify Unexpected Instances in the Test Set Learnng to Ientfy Unexpete Instanes n the Test Set Xao-L L Insttute for Infoomm Researh, 21 Heng Mu Keng Terrae, Sngapore, 119613 xll@2r.a-star.eu.sg Bng Lu Department of Computer Sene, Unversty of Illnos

More information

Machine Learning: and 15781, 2003 Assignment 4

Machine Learning: and 15781, 2003 Assignment 4 ahne Learnng: 070 and 578, 003 Assgnment 4. VC Dmenson 30 onts Consder the spae of nstane X orrespondng to all ponts n the D x, plane. Gve the VC dmenson of the followng hpothess spaes. No explanaton requred.

More information

Image retrieval at low bit rates: BSP Trees vs. JPEG

Image retrieval at low bit rates: BSP Trees vs. JPEG mage retreval at low bt rates: Trees vs. Mhal Sth, an Geral Shaefer Shool of Computng an Tehnology The ottngham Trent Unversty, ottngham, U.K. Dept. of Computng, Eletrons an Automate Control Slesan Unversty

More information

Controller Design for Networked Control Systems in Multiple-packet Transmission with Random Delays

Controller Design for Networked Control Systems in Multiple-packet Transmission with Random Delays Appled Mehans and Materals Onlne: 03-0- ISSN: 66-748, Vols. 78-80, pp 60-604 do:0.408/www.sentf.net/amm.78-80.60 03 rans eh Publatons, Swtzerland H Controller Desgn for Networed Control Systems n Multple-paet

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

New Liu Estimators for the Poisson Regression Model: Method and Application

New Liu Estimators for the Poisson Regression Model: Method and Application New Lu Estmators for the Posson Regresson Moel: Metho an Applcaton By Krstofer Månsson B. M. Golam Kbra, Pär Sölaner an Ghaz Shukur,3 Department of Economcs, Fnance an Statstcs, Jönköpng Unversty Jönköpng,

More information

Approaches to Modeling Clinical PK of ADCs

Approaches to Modeling Clinical PK of ADCs Sesson 4b: PKPD Mong of ntboy-drug onjugate (Symposum) Otober 4, 24, Las egas, NE pproahes to Mong lnal PK of Ds Leon Gbansy QuantPharm LL ntboy-drug onjugates Ø ntboy (or antboy fragment) lne (through

More information

ENTROPIC QUESTIONING

ENTROPIC QUESTIONING ENTROPIC QUESTIONING NACHUM. Introucton Goal. Pck the queston that contrbutes most to fnng a sutable prouct. Iea. Use an nformaton-theoretc measure. Bascs. Entropy (a non-negatve real number) measures

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction.

Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction. Exat nferene: ntroduton Exat nferene: ntroduton Usng a ayesan network to ompute probabltes s alled nferene n general nferene nvolves queres of the form: E=e E = The evdene varables = The query varables

More information

A Theorem of Mass Being Derived From Electrical Standing Waves (As Applied to Jean Louis Naudin's Test)

A Theorem of Mass Being Derived From Electrical Standing Waves (As Applied to Jean Louis Naudin's Test) A Theorem of Mass Beng Derved From Eletral Standng Waves (As Appled to Jean Lous Naudn's Test) - by - Jerry E Bayles Aprl 4, 000 Ths paper formalzes a onept presented n my book, "Eletrogravtaton As A Unfed

More information

A solution to the Curse of Dimensionality Problem in Pairwise Scoring Techniques

A solution to the Curse of Dimensionality Problem in Pairwise Scoring Techniques A soluton to the Curse of Dmensonalty Problem n Parwse orng Tehnques Man Wa MAK Dept. of Eletron and Informaton Engneerng The Hong Kong Polytehn Unversty un Yuan KUNG Dept. of Eletral Engneerng Prneton

More information

Adaptive Multilayer Neural Network Control of Blood Pressure

Adaptive Multilayer Neural Network Control of Blood Pressure Proeedng of st Internatonal Symposum on Instrument Sene and Tenology. ISIST 99. P4-45. 999. (ord format fle: ISIST99.do) Adaptve Multlayer eural etwork ontrol of Blood Pressure Fe Juntao, Zang bo Department

More information

Clustering through Mixture Models

Clustering through Mixture Models lusterng through Mxture Models General referenes: Lndsay B.G. 995 Mxture models: theory geometry and applatons FS- BMS Regonal onferene Seres n Probablty and Statsts. MLahlan G.J. Basford K.E. 988 Mxture

More information

On Liu Estimators for the Logit Regression Model

On Liu Estimators for the Logit Regression Model CESIS Electronc Workng Paper Seres Paper No. 59 On Lu Estmators for the Logt Regresson Moel Krstofer Månsson B. M. Golam Kbra October 011 The Royal Insttute of technology Centre of Excellence for Scence

More information

Analysis of Heterocatalytic Reactor Bed Based on Catalytic Pellet Models

Analysis of Heterocatalytic Reactor Bed Based on Catalytic Pellet Models III Internatonal Intersplnary Tehnal Conferene of Young entsts 19-1 May 010, Pozna, Polan Analyss of Heteroatalyt Reator Be Base on Catalyt Pellet Moels yörgy Rá, Unversty of Pannona Tamás Varga, Unversty

More information

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models + Maxent Moels an Dsrmnatve Estmaton Generatve vs. Dsrmnatve moels + Introuton n So far we ve looke at generatve moels n Language moels Nave Bayes 2 n But there s now muh use of ontonal or srmnatve probablst

More information

Complement of an Extended Fuzzy Set

Complement of an Extended Fuzzy Set Internatonal Journal of Computer pplatons (0975 8887) Complement of an Extended Fuzzy Set Trdv Jyot Neog Researh Sholar epartment of Mathemats CMJ Unversty, Shllong, Meghalaya usmanta Kumar Sut ssstant

More information

DOAEstimationforCoherentSourcesinBeamspace UsingSpatialSmoothing

DOAEstimationforCoherentSourcesinBeamspace UsingSpatialSmoothing DOAEstmatonorCoherentSouresneamspae UsngSpatalSmoothng YnYang,ChunruWan,ChaoSun,QngWang ShooloEletralandEletronEngneerng NanangehnologalUnverst,Sngapore,639798 InsttuteoAoustEngneerng NorthwesternPoltehnalUnverst,X

More information

TIME-VARYING LINEAR PREDICTION FOR SPEECH ANALYSIS

TIME-VARYING LINEAR PREDICTION FOR SPEECH ANALYSIS 5th European Sgnal roessng Conferene (EUSICO 7), oznan, olan, September 3-7, 7, opyrght by EURASI IME-VARYIG LIEAR REDICIO FOR SEECH AALYSIS Karl Shnell an Arl Laro Insttute of Apple hyss, Goethe-Unversty

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Cooperative Self Encoded Spread Spectrum in Fading Channels

Cooperative Self Encoded Spread Spectrum in Fading Channels I. J. Communatons, etwork an Sstem Senes, 9,, 9-68 Publshe Onlne Ma 9 n SRes (http://www.srp.org/journal/jns/). Cooperatve Self Enoe Sprea Spetrum n Fang Channels Kun HUA, Won Mee JAG, Lm GUYE Unverst

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Curve Fitting with the Least Square Method

Curve Fitting with the Least Square Method WIKI Document Number 5 Interpolaton wth Least Squares Curve Fttng wth the Least Square Method Mattheu Bultelle Department of Bo-Engneerng Imperal College, London Context We wsh to model the postve feedback

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nnth Internatonal IBPSA Conference Montréal, Canaa August 5-8, 2005 SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nabl Nassf, Stanslaw Kajl, an Robert Sabourn École e technologe

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Iterative Discovering of User s Preferences Using Web Mining

Iterative Discovering of User s Preferences Using Web Mining Internatonal Journal of Computer Sene & Applatons Vol. II, No. II, pp. 57-66 2005 Tehnomathemats Researh Foundaton Iteratve Dsoverng of User s Preferenes Usng Web Mnng Mae Kewra Futsu Serves, Span, Camno

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Some remarks about the transformation of Charnes and Cooper by Ezio Marchi *)

Some remarks about the transformation of Charnes and Cooper by Ezio Marchi *) Some remars about the transformaton of Charnes an Cooper b Eo Marh * Abstrat In ths paper we eten n a smple wa the transformaton of Charnes an Cooper to the ase where the funtonal rato to be onsere are

More information

Extending Relevance Model for Relevance Feedback

Extending Relevance Model for Relevance Feedback Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

Interval Valued Neutrosophic Soft Topological Spaces

Interval Valued Neutrosophic Soft Topological Spaces 8 Interval Valued Neutrosoph Soft Topologal njan Mukherjee Mthun Datta Florentn Smarandah Department of Mathemats Trpura Unversty Suryamannagar gartala-7990 Trpura Indamal: anjan00_m@yahooon Department

More information

Voltammetry. Bulk electrolysis: relatively large electrodes (on the order of cm 2 ) Voltammetry:

Voltammetry. Bulk electrolysis: relatively large electrodes (on the order of cm 2 ) Voltammetry: Voltammetry varety of eletroanalytal methods rely on the applaton of a potental funton to an eletrode wth the measurement of the resultng urrent n the ell. In ontrast wth bul eletrolyss methods, the objetve

More information

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

A Theorem of Mass Being Derived From Electrical Standing Waves (As Applied to Jean Louis Naudin's Test)

A Theorem of Mass Being Derived From Electrical Standing Waves (As Applied to Jean Louis Naudin's Test) A Theorem of Mass Beng Derved From Eletral Standng Waves (As Appled to Jean Lous Naudn's Test) - by - Jerry E Bayles Aprl 5, 000 Ths Analyss Proposes The Neessary Changes Requred For A Workng Test Ths

More information

Fusion of Neural Classifiers for Financial Market Prediction

Fusion of Neural Classifiers for Financial Market Prediction Fuson of Neural Classfers for Fnanal Market Predton Trsh Keaton Dept. of Eletral Engneerng (136-93) Informaton Senes Laboratory (RL 69) Calforna Insttute of Tehnology HRL Laboratores, LLC Pasadena, CA

More information

I. INTRODUCTION. Keywords Web Mining, Web Usage Mining, Page Rank, Web Map

I. INTRODUCTION. Keywords Web Mining, Web Usage Mining, Page Rank, Web Map An Extended Algorm of Page Rankng Consderng Chronologal Dmenson of Searh Sandeep Gupta #, Mohd. Husan * # Computer Sene and Engneerng,NIMS Unversty, Japur, Rajasan, Inda * Dretor, AZAD IET, Luknow, UP,

More information

Optimal Resource Allocation in Satellite Networks: Certainty Equivalent Approach versus Sensitivity Estimation Algorithms

Optimal Resource Allocation in Satellite Networks: Certainty Equivalent Approach versus Sensitivity Estimation Algorithms Optmal Resoure Alloaton n Satellte Networks: Certanty Equvalent Approah versus Senstvty Estmaton Algorthms Frano Davol*, Maro Marhese, Maurzo Mongell* * DIST - Department of Communatons, Computer an Systems

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Lecture 8: Time & Clocks. CDK: Sections TVS: Sections

Lecture 8: Time & Clocks. CDK: Sections TVS: Sections Lecture 8: Tme & Clocks CDK: Sectons 11.1 11.4 TVS: Sectons 6.1 6.2 Topcs Synchronzaton Logcal tme (Lamport) Vector clocks We assume there are benefts from havng dfferent systems n a network able to agree

More information

Some Results on the Counterfeit Coins Problem. Li An-Ping. Beijing , P.R.China Abstract

Some Results on the Counterfeit Coins Problem. Li An-Ping. Beijing , P.R.China Abstract Some Results on the Counterfet Cons Problem L An-Png Bejng 100085, P.R.Chna apl0001@sna.om Abstrat We wll present some results on the ounterfet ons problem n the ase of mult-sets. Keywords: ombnatoral

More information

Performance Modeling of Hierarchical Memories

Performance Modeling of Hierarchical Memories Performane Modelng of Herarhal Memores Marwan Sleman, Lester Lpsky, Kshor Konwar Department of omputer Sene and Engneerng Unversty of onnetut Storrs, T 0669-55 Emal: {marwan, lester, kshor}@engr.uonn.edu

More information

ELEKTRYKA 2016 Zeszyt 3-4 ( )

ELEKTRYKA 2016 Zeszyt 3-4 ( ) ELEKTRYKA 206 Zeszyt 3-4 (239-240) Rok LXII Waldemar BAUER, Jerzy BARANOWSKI, Tomasz DZIWIŃSKI, Paweł PIĄTEK, Marta ZAGÓROWSKA AGH Unversty of Sene and Tehnology, Kraków OUSTALOUP PARALLEL APPROXIMATION

More information

Semantically Enhanced Uyghur Information Retrieval Model

Semantically Enhanced Uyghur Information Retrieval Model JOURNAL OF SOFTWARE, VOL. 7, NO. 6, JUNE 202 35 Semantally Enhane Uyghur Informaton Retreval Moel Bo Ma Researh Center for Multlngual Informaton Tehnology, Xnang Tehnal Insttute of Physs an Chemstry, Chnese

More information

Using Maximum Entropy for Text Classification

Using Maximum Entropy for Text Classification Usng Maxmum Entropy for Text Classfaton Kamal Ngam kngam@s.mu.edu John Lafferty lafferty@s.mu.edu Andrew MCallum mallum@justresearh.om Shool of Computer Sene Carnege Mellon Unversty Pttsburgh, PA 15213

More information

The calculation of ternary vapor-liquid system equilibrium by using P-R equation of state

The calculation of ternary vapor-liquid system equilibrium by using P-R equation of state The alulaton of ternary vapor-lqud syste equlbru by usng P-R equaton of state Y Lu, Janzhong Yn *, Rune Lu, Wenhua Sh and We We Shool of Cheal Engneerng, Dalan Unversty of Tehnology, Dalan 11601, P.R.Chna

More information

Brander and Lewis (1986) Link the relationship between financial and product sides of a firm.

Brander and Lewis (1986) Link the relationship between financial and product sides of a firm. Brander and Lews (1986) Lnk the relatonshp between fnanal and produt sdes of a frm. The way a frm fnanes ts nvestment: (1) Debt: Borrowng from banks, n bond market, et. Debt holders have prorty over a

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

APLSSVM: Hybrid Entropy Models for Image Retrieval

APLSSVM: Hybrid Entropy Models for Image Retrieval Internatonal Journal of Intellgent Informaton Systems 205; 4(2-2): 9-4 Publshed onlne Aprl 29, 205 (http://www.senepublshnggroup.om/j/js) do: 0.648/j.js.s.205040202.3 ISSN: 2328-7675 (Prnt); ISSN: 2328-7683

More information

ENGI9496 Lecture Notes Multiport Models in Mechanics

ENGI9496 Lecture Notes Multiport Models in Mechanics ENGI9496 Moellng an Smulaton of Dynamc Systems Mechancs an Mechansms ENGI9496 Lecture Notes Multport Moels n Mechancs (New text Secton 4..3; Secton 9.1 generalzes to 3D moton) Defntons Generalze coornates

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

Phase Transition in Collective Motion

Phase Transition in Collective Motion Phase Transton n Colletve Moton Hefe Hu May 4, 2008 Abstrat There has been a hgh nterest n studyng the olletve behavor of organsms n reent years. When the densty of lvng systems s nreased, a phase transton

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Socially-aware Multiagent Learning towards Socially Optimal Outcomes

Socially-aware Multiagent Learning towards Socially Optimal Outcomes Soally-aware Multagent Learnng towards Soally Optmal Outomes Xaohong L and hengwe Zhang and Janye Hao and Karl Tuyls and Sq hen 3 Abstrat. In multagent envronments, the apablty of learnng s mportant for

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method Maxmzng Overlap of Large Prmary Samplng Unts n Repeated Samplng: A comparson of Ernst s Method wth Ohlsson s Method Red Rottach and Padrac Murphy 1 U.S. Census Bureau 4600 Slver Hll Road, Washngton DC

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable

More information

Opinion Consensus of Modified Hegselmann-Krause Models

Opinion Consensus of Modified Hegselmann-Krause Models Opnon Consensus o Moe Hegselmann-Krause Moels Yueheng Yang, Dmos V. Dmarogonas an Xaomng Hu Abstrat We onser the opnon onsensus problem usng a mult-agent settng base on the Hegselmann-Krause (H-K Moel.

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Risk Based Maintenance Strategy for Coastal Concrete Structures

Risk Based Maintenance Strategy for Coastal Concrete Structures 10DBMC Internatonal Conférene On Durablty of Bulng Materals an Components Rsk Base Mantenane Strategy for Coastal Conrete Strutures C.Q. L 1, W. Lawanwsut 1, J.J 2 1 Unversty of Dunee, Department of Cvl

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information