THE EFFECTS OF INDEXING STRATEGY-QUERY TERM COMBINATION ON RETRIEVAL EFFECTIVENESS IN A SWEDISH FULL TEXT DATABASE. Per Ahlgren

Size: px
Start display at page:

Download "THE EFFECTS OF INDEXING STRATEGY-QUERY TERM COMBINATION ON RETRIEVAL EFFECTIVENESS IN A SWEDISH FULL TEXT DATABASE. Per Ahlgren"

Transcription

1 THE EFFECTS OF INDEXING STATEGY-QUEY TEM COMBINATION ON ETIEVAL EFFECTIVENESS IN A SWEDISH FULL TEXT DATABASE er Ahlgren Akademsk avhandlng som med llsånd av samhällsveenskaplga fakuleen vd Göeborgs unverse för vnnande av dokorsexamen framläggs ll offenlg gransknng kl. 3.5 fredagen den 7 december år 2004 sal C 203, Högskolan Borås, Allégaan, Borås

2 Tle: The effecs of ndexng sraegy-query erm combnaon on rereval effecveness n a Swedsh full ex daabase Absrac: Ths hess deals wh Swedsh full ex rereval and he problem of morphologcal varaon of query erms n he documen daabase. The sudy s an nformaon rereval expermen wh a es collecon. Whle no Swedsh es collecon was avalable, such a collecon was consruced. I consss of a documen daabase conanng 6,336 news arcles, and 52 opcs wh four-graded 0,, 2, 3) relevance assessmens. The effecs of ndexng sraegy-query erm combnaon on rereval effecveness were suded. Three of fve esed mehods nvolved ndexng sraeges ha used conflaon, n he form of normalzaon. Furher, wo of hese hree combnaons used ndexng sraeges ha employed compound splng. Normalzaon and compound splng were performed by SWETWOL, a morphologcal analyzer for he Swedsh language. A fourh combnaon aemped o group relaed erms by rgh hand runcaon of query erms. A search exper performed he runcaon. The four combnaons were compared o each oher and o a baselne combnaon, where no aemp was made o counerac he problem of morphologcal varaon of query erms n he documen daabase. Two suaons were examned n he evaluaon: he bnary relevance suaon and he mulple degree relevance suaon. Wh regard o he bnary relevance suaon, where he hree posve) relevance degrees, 2, 3) were merged no one, and where precson was used as evaluaon measure, he four alernave combnaons ouperformed he baselne. The bes performng combnaon was he combnaon ha used runcaon. Ths combnaon performed beer han or equal o a medan precson value for 4 of he 52 opcs. One reason for he relavely good performance of he runcaon combnaon was he capacy of s queres o rereve dfferen pars of speech. In he mulple degree relevance suaon, where he hree posve) relevance degrees were reaned, rereval effecveness was aken o be he accumulaed gan he user receves by examnng he rereval resul up o gven posons. The evaluaon measure used was ndcg normalzed cumulaed gan wh dscoun). Ths measure creds rereval mehods ha ) rank hghly relevan documens hgher han less relevan ones, and 2) rank relevan of any degree) documens hgh. Wh respec o 2), ndcg nvolves a dscoun componen: a dscoun wh regard o he relevance score of a relevan of any degree) documen s performed, and hs dscoun s greaer and greaer, he hgher poson he documen has n he ranked ls of rereved documens. In he mulple degree relevance suaon, he fve combnaons were evaluaed under four dfferen user scenaros, where each scenaro smulaed a ceran user ype. Agan, he four alernave combnaons ouperformed he baselne, for each user scenaro. The runcaon combnaon had he bes performance under each user scenaro. Ths oucome agreed wh he performance resul n he bnary relevance suaon. However, here were also dfferences beween he wo relevance suaons. For 25 percen of he opcs and wh regard o one of he four user scenaros, he se of bes performng combnaons n he bnary relevance suaon was dsunc from he se of bes performng combnaons n he mulple degree relevance suaon. The user scenaro n queson was such ha almos all mporance was placed on hghly relevan documens, and he dscoun was sharp. The man concluson of he hess s ha normalzaon and rgh hand runcaon performed by a search exper) enhanced rereval effecveness n comparson o he baselne, rrespecve of whch of he wo relevance suaons we consder. Furher, he hree ndexng sraegy-query erm combnaons based on normalzaon were almos as good as he combnaon ha nvolves runcaon. Ths holds for boh relevance suaons. Keywords: base word form ndex, full ex rereval, ndexng sraeges, nfleced word form ndex, morphologcal analyss, normalzaon, Swedsh, SWETWOL, runcaon, user scenaros

3 Acknowledgemens I wsh o hank Jaana Kekälänen for her good deas, sound commens and encouragemen. I also wsh o hank Hekk Keskusalo for hs houghful remarks on an earler verson of he hess, and for several fruful dscussons durng he docoral proec. I m graeful o Kalervo Järveln, no only for hs good commens on earler versons of he hess, bu also for supporng me ever snce I sared my maser proec n nformaon rereval. Wh respec o he conen of he hess, hanks also go o he followng people: Ea Aro, Chrsan Benne, Johan Eklund, Turd Hedlund, Lars Höglund, Fred Karlsson, Brger Larsen, Krser Lndén, aul McNamee, Ar rkola, Dane Sonnenwald and Eero Sormunen. Furher, hanks o Lef Grönqvs for supplyng he es documens, o Anders Sensröm for performng he runcaon of he sudy, o Boel Bssmarck and Krser Johannesson for checkng he Englsh and o Chrsan Swalander for edng. eer Cederlund, Lars Jonsson, Erk Nornder and Helena Vallo performed he relevance assessmens, and were hereby of grea asssance. Fnally, I wsh o hank Lars Höglund and Irene Wormell for makng possble for me o spend abou egh monhs n 2002 a he Deparmen of Informaon Sudes, Unversy of Tampere, Fnland.

4 Conens AT ONE FAMEWOK 9 Inroducon 2 Cenral conceps of he research seng 4 2. Auomac ndexng Lexcal analyss Sop words Semmng and normalzaon emoval of addonal hgh frequency erms The ndex Vsualzaon of he oulned auomac ndexng process ereval models Boolean model Vecor model Two probablsc models Evaluaon of rereval effecveness 39 3 Some lngusc phenomena wh relevance o I, and conflaon 4 3. Some lngusc phenomena wh relevance o I roperes of Swedsh relaed o I Conflaon Semmng Normalzaon 46 4 esearch on conflaon esearch on a morphologcally smple language esearch on morphologcally more complex languages Summary of he man resuls 57 AT TWO EXEIMENT 59 5 Tes documens and he ndexng sraeges used n he sudy 6 5. Tes documens Indexng of he Swedsh news arcles Lexcal analyss of he Swedsh news arcles Sop words Indexng sraegy based on nfleced word forms Indexng sraeges based on normalzaon Vsualzaon of he ndexng process 72 6 Varables, am of he sudy and research quesons 74 7 Daa and mehods InQuery rereval sysem Topcs, queres and poolng Topcs Queres 80

5 7.2.3 oolng elevance assessmens elevance scale Assessmen process Daa for pools, recall bases and for ses of rrelevan documens Evaluaon Gan vecors Bnary relevance suaon Mulple degree relevance suaon Sgnfcance esng 99 8 Fndngs Bnary relevance suaon recson a gven DCVs of he fve ndexng sraegy-query erm combnaons Tess of sgnfcance Effecveness by opcs Mulple degree relevance suaon 8.2. US US US US Tess of sgnfcance Effecveness by opcs under US Dscusson Bnary relevance suaon and US2.2: dealed opc-by-opc analyss Bnary relevance suaon and US2.2: changes n relave effecveness Splng of compounds n queres Expanson of query base forms wh dervaves 33 0 Concluson 35 eferences 37 Appendx Topcs used n he sudy 42 Appendx 2 One of he used opcs: Englsh verson 50 Appendx 3 Sample word lss and correspondng queres 5 Appendx 4 Examples of erms no recognzed by SWETWOL 57 Appendx 5 roblem wh queres for SSLIT-bfq and SSLIT-EL-bfq 58 Appendx 6 Alernave oken defnon 6 Appendx 7 Insrucons for he assessors Englsh ranslaon) 64

6 AT ONE FAMEWOK 9

7 0

8 Chaper Inroducon Informaon rereval I) reas varous mehods for sorage, srucurng and rereval of documens. The am of an I sysem s o reurn, as a response o a user query, documens ha are relevan o he nformaon need behnd he query. Today, a huge number of exual documens s avalable va he Inerne and oher nformaon sources. In he lgh of hs, s mporan o develop mehods ha faclae ex rereval. There s a lack of research wh respec o Swedsh full ex rereval, as Hedlund, rkola and Järveln 200) pon ou. There has been research n full ex rereval for oher languages, especally for Englsh, bu he resuls for oher languages canno be auomacally appled o Swedsh. The reason s ha Swedsh has properes ha, e.g., Englsh, does no have. Examples of such properes are hgh frequency of compounds, he use of glue morphemes n compounds and hgh proporon of homographs. Moreover, Swedsh s nfleconally more complcaed han, e.g., Englsh. In rereval, he documens are mached o he query. Belkn and Crof 987) dsngush beween exac mach rereval echnques and bes mach rereval echnques. Wh regard o he Boolean rereval echnque, a documen eher sasfes he query a Boolean expresson) or no. The Boolean echnque s herefore an exac mach echnque. A rereval echnque whch adms approxmaon of he query condons s called a bes mach echnque. A echnque of hs ype nvolves a measure of he degree of smlary beween a documen and a query for example, cf. he rereval echnque of he vecor model). The I sysem used n hs sudy, InQuery Verson 3.), s a probablsc sysem and employs a bes mach rereval echnque. The problem of morphologcal varaon of query erms n he documen collecon s a well-known problem n I research. In I sysems wh a bes mach rereval echnque, he degree of smlary beween a documen and a query s bascally deermned by he number of, and no seldom) frequency of, machng erms,.e., erms ha occur boh n he documen and n he query. Documens, possbly relevan, ha conan morphologcal varans of he query erms, varans dsnc from he query erms hemselves, have no machng erms. Therefore, hese documens wll no be rereved. I researchers have aemped o counerac he morphologcal varaon problem by applyng dfferen conflaon mehods, lke semmng and normalzaon, n he ndexng process. Semmng and normalzaon aemp o group morphologcal varans n he documens by assocang hem wh a common form. Ths form acs as a represenave of he varans, and can, nsead of he varans, be placed n he ndex, wh poners o he documens where he varans occur. Then, f he represenave form s used also n he query, documens ha conan dfferen varans are rereved. 2 However, Sahlgren e al. 2002) used a Swedsh es collecon and esed he effecveness of auomac query expanson. McNamee and Mayfeld 2004) suded he mpac of n-gram characer ndexng of Swedsh full ex on rereval effecveness, and cross-language nformaon rereval problems for Swedsh are dscussed n Hedlund, 2003). 2 I s no necessary o have he represenave forms as ndex erms and hen use hese forms n he query. Anoher approach s o expand an orgnal query erm wh all word forms dsnc from he query erm) n he documen collecon ha are assocaed, by he appled conflaon mehod, wh he same form as he query erm Harman, 99).

9 Wh respec o Englsh full ex rereval, several sudes have nvesgaed he mpac on rereval effecveness of applyng dfferen conflaon mehods n ndexng e.g., Krovez, 993; Hull, 996). However, here s no much knowledge of how dfferen ways of ndexng Swedsh full ex documens affec rereval effecveness. Ths sudy, whch uses Swedsh news arcles as es documens, compares fve dfferen ndexng sraegy-query erm combnaons wh respec o rereval effecveness. The sudy nvolves four ndexng sraeges. The baselne sraegy of he sudy s o place each word form ha occurs n he exs of he collecon n he ndex. In parcular, all he nfleced varans of a gven word are placed n he ndex as such. The applcaon of he baselne sraegy gves rse o an nfleced word form ndex. The movaon for usng he sraegy n queson as he baselne s, besdes he fac ha s he radonal ndexng sraegy for a ex daabase 3, ha some daabase hoss provdng Swedsh full ex documens presenly use nfleced word form ndces. 4 The oher hree ndexng sraeges used n he sudy are based on normalzaon, whch n hs hess refers o he ransformaon of nfleced word forms o her base forms, her lexcal caon forms Karlsson, 992). Each nfleced word form ha occurs n he ex collecon s ransformed o a base form, whch s hen placed n he ndex. An ndex generaed n hs manner s called a base word form ndex, and such an ndex does no conan, n prncple, nfleconal word forms. The consrucon of a base word form ndex can be seen as an aemp o parly overcome he morphologcal varaon problem, menoned above. Wh respec o a base word form ndex, documens ha are dssmlar n he sense ha hey conan dfferen nfleconal varans of a gven word are represened poned o) n he same locaon: he locaon ha conans he base form of he word. If he base form s hen used n he query, all hese documens wll be rereved. Two of he hree normalzaon sraeges nvolve compound splng. A compound ha occurs n a documen n he ex collecon s spl no s componens. Then hese componens n base form) and he compound self are placed n he ndex, ponng o he same address. The fve ndexng sraegy-query erm combnaons nvolved n he hess are evaluaed under wo relevance suaons. In one of hese, bnary relevance s used. The oher suaon uses four degrees of relevance, and hs laer suaon employs four dfferen user scenaros. The dea here s o analyze he performance of he combnaons n relaon o assumpons made abou he users of a hypohecal rereval sysem. When he sudy sared, no Swedsh es collecon for I was avalable. Such a collecon was herefore consruced, and he consrucon process wll be dscussed n he hess. Furher, hs work s among he frs ha sudes feaures of he Swedsh language relevan o I. We look a he problem of morphologcal varaon of query erms n he documen collecon. Three of he four ndexng sraeges suded employ normalzaon. The relaon beween normalzaon and rereval effecveness has no been suded o any grea exen n he I communy. Moreover, he auhor s no aware of any work on normalzaon and Swedsh full ex rereval. The expermen of he sudy was performed n a radonal laboraory envronmen, as opposed o an operaonal envronmen whch serves real users. I s clear ha he radonal laboraory envronmen for I s no whou problems Harer and Her, 997). However, by workng n a 3 The sraegy, or a varan of, where sop words are no placed n he ndex, s he radonal ndexng sraegy for a ex daabase. 4 For an example, see resstex, 2004). 2

10 laboraory envronmen, he researcher s able o es deas n a conrolled way. Oher varables han he ndependen varable) ha may affec rereval effecveness are conrolled, varables lke skll of searcher and conen of daabase Tague-Suclffe, 992). Ths hess s dvded no wo pars: Framework, whch comprses he frs four chapers, and Expermen, whch comprses Chaper 5 o Chaper 0. The par Framwork gves he conex n whch he expermen of he sudy was performed. The remander of hs par s srucured as follows. In Chaper 2, conceps ha are cenral for he research seng are gven. Chaper 3 pus forward some lngusc phenomena wh relevance o I, and gves some properes of Swedsh relaed o I. esearch ha s relaed o he subec of he hess s repored n Chaper 4. The par Expermen s srucured n he followng way. Chaper 5 gves nformaon on he es documens used and descrbes he ndexng sraeges of he sudy. Chaper 6 presens he am of he hess and gves s research quesons. In Chaper 7, he rereval sysem, requess and queres of he expermen are reaed. Furher, relevance daa and he evaluaon of rereval effecveness are descrbed. Chaper 8 gves he resuls of he expermen. In Chaper 9, a dscusson s gven, and conclusons are pu forward n Chaper 0. 3

11 Chaper 2 Cenral conceps of he research seng In hs chaper he conceps ha are cenral for he research seng are presened. I can be sad o be ha par of nformaon scence ha develops and ess mehods for sorage, srucurng and rereval of documens. I s a muldscplnary area, and uses conceps from dscplnes lke probably heory, logc, and lnguscs see, e.g., Sparck Jones, Walker and oberson, 2000; Cresan and van sbergen, 994; Srzalkowsk, 995, respecvely). In I, a documen s defned as an obec ha conans daa Frakes, 992). Usually, he daa consss of ex, bu documens may also conan oher ypes of daa, lke phoos and vdeo clps. An I sysem s a compuer based sysem for he sorage and rereval of documens. The documens sored n he daabase of he sysem may be bblographcal records,.e., represenaons of, e.g., books or ournal arcles. A bblographcal record conans daa le, auhor, dae of publcaon, and so on) abou he represened obec. Daa of hs ype s referred o as meadaa. However, s usual oday ha he sored documens conss of, e.g., he ournal arcles hemselves, and no only of represenaons of hem. In ha case, one may speak of a full ex sysem. In he remander of hs chaper, we assume ha documens conss of ex. A query s a formal represenaon, n he language of a gven I sysem, of an nformaon need. The nformaon need may be expressed by a reques or, synonymously, a opc): a formulaon of an nformaon need n naural language. In he normal case, a query conans erms, possbly n combnaon wh varous operaors. Accordng o Korfhage 997, p. 334), a erm s a word or a phrase havng a dsnc meanng, where a phrase s defned as a conguous se of words whn a senence bd., p. 329). However, anoher erm defnon s gven below. Examples of operaors ha are used n queres are he Boolean operaors, AND, O and NOT, and dsance operaors. A dsance operaor expresson saes ha s argumens, wo or more erms, should occur whn a maxmal dsance of each oher, possbly n he same order as whn he operaor, n a documen. Le L be he se of all leers, whch belong o he exended ASCII characer se. Then, le Σ = L { 0,, K,9} {@,},{,,],[,\, ^,~}, where he frs occurrence of } n he rghmos operand belongs o he operand. A erm, n hs sudy, s a non-empy srng of characers such ha for each characer σ ha occurs n, σ Σ. For example, expressons lke Sockholm, M~nchen, \land are erms, bu neher #sum nor nformaon rereval hs srng has an occurrence of a space beween nformaon and rereval) s a erm. 5 An I sysem maches a query o he documens sored n s daabase. Opmally, he sysem rereves hose and only hose documens ha are relevan o he nformaon need on whch he query s based. Normally, hough, hs opmal case s no realzed. Insead, some relevan documens are mssed, and some rrelevan documens are rereved. In fac, he rereval of documens nvolves uncerany. A crcumsance ha gves rse o uncerany s ha boh he sored nformaon and he nformaon need of he user are normally 5 The reason for usng \, ~ n hree of he examples should be evden from Chaper 5. 4

12 expressed n naural language. I s possble ha generaors of exs use a lngusc expresson, for a gven concep, ha devaes from he expresson employed by he user, for he same concep. If so, relevan documens may be mssed. Anoher possbly s ha generaors of exs use a lngusc expresson n a sense ha devaes from he sense he user assocaes wh he expresson. If so, rrelevan documens may be rereved. erformance mprovemens n an I sysem can be obaned by he applcaon of relevance feedback bd., pp. 2-3). elevance feedback s an erave process such ha ) he user assesses he relevance of some rereved documens, and 2) he relevance daa obaned from he user s ulzed by he sysem o modfy he query. 2. Auomac ndexng Le D = d, K, d } be he documen daabase of a gven I sysem. In hs sudy, we defne an { N ndex for D as a se of enres, I = { e, K, e m }, such ha for each enry e n I, where e =,{ L, K, L }), and k L = p, f, pos, K, pos, p s a erm, p s a poner o a documen n D n whch occurs, f p s he frequency wh whch occurs n he documen ha p pons o, and pos, K, pos gves he f f p posons of n he documen ha p pons o. p f p Noe ha he expresson occurs n he above defnon should be nerpreed lberally. For example, may be a word n base form, and p may be a poner o a documen where nfleconal varans of, raher han self, occur. Wha s he case depends on he ndexng sraegy employed. The erms ha occur n he ndex for D are he ndex erms for D. By he vocabulary for D, V D, we refer o he se, K, } of ndex erms for D. If an enry e I nvolves a poner p o a documen { m d D, we say ha d s represened a e. For a gven documen d, he se of all ndex erms VD such ha d s represened a e can be regarded as an ndex represenaon of d. Before rereval akes place, an ndex s creaed from D. The process of algorhmcally examnng documens o generae ndex erms s called auomac ndexng Fox, 992, p. 02). The frs sep n auomac ndexng s lexcal analyss, whch s reaed n he nex secon. 5

13 2.. Lexcal analyss A oken s an occurrence of a non-empy srng of characers. Lexcal analyss can be defned as he process of converng an npu sream of characers no a sream of okens. The sream of okens gves rse o he se of canddae ndex erms. We le C denoe he se of canddae ndex erms. These canddae ndex erms may be furher processed, e.g., checked agans a sop ls. Before he lexcal analyss s performed, one mus decde whch non-empy characer srngs coun as a oken wh respec o he documens n he daabase. Ths s no only a queson of he recognon of spaces as separaors. In Baeza-Yaes and bero-neo, 999, pp ), he followng four cases are consdered: Dgs. Sequences of dgs can be consdered as bad ndex erms bd., p. 66). The reason s ha such a sequence poorly dscrmnaes beween relevan and rrelevan documens. If a sequence of dgs s an ndex erm and used as a query erm, he query may rereve a lo of rrelevan documens. 6 One can, hen, consder sequences of dgs as non-okens. In ha case, he se of canddae ndex erms wll no conan such sequences. A erm wh one or more occurrences of dgs can be mporan. For example, he alpha-numercal srng U2, whch refers o a rock group, s proper as an ndex erm and may herefore be consdered as a oken. Hyphens. Should hyphenaed words be broken up no her pars? If so, he pars may, dependng on wha s regarded as a oken, be canddae ndex erms, bu no he hyphenaed word self. The splng of hyphenaed words has he advanage of couneracng lngusc varaon n he documen daabase and s fruful from he recall 7 pon of vew. Consder, as an example, he expressons sae-of-he-ar and sae of he ar bd., p. 66). Assume ha splng akes place and ha he four words nvolved are added o he ndex. Under hs assumpon, a documen d = [saeof-he-ar] and anoher documen d 2 = [sae of he ar] wll boh be represened n he ndex a all four enres. If, on he oher hand, sae-of-he-ar s added o he ndex, ogeher wh sae, of, he and ar, he ndex represenaons for d and for d 2 wll be dson. The splng of hyphenaed words resembles compound splng, whch s an mporan componen n wo of he ndexng sraeges of hs sudy. Compound splng s reaed n Secon There are words such ha he hyphen s an negral par of hem. For example, consder DC-9, whch refers o a famly of arcrafs. Splng of a word lke hs may hur precson. Assume ha splng akes place and ha DC and 9 are added o he ndex. A query abou DC-9) wh DC and 9 as only erms may rereve rrelevan documens abou, e.g., Washngon DC. 8 uncuaon marks. In he normal case, puncuaon marks are removed n he lexcal analyss bd., p. 66). For example, he do n 287B.C approxmaely he dae of brh of Archmedes) may be removed, and he resulng srng, 287BC, may hen be regarded as a oken. 6 However, he problem may be crcumvened by usng queres ha combne he dg erm wh several oher erms. 7 The measures recall and precson are defned n Secon If sequences of dgs are no allowed as okens, only DC s lef as a canddae ndex erm. Wh DC as an ndex erm and a query abou DC-9) wh DC as only erm, he precson can obvously be very low. 6

14 287B.C s an example of a srng ha has a puncuaon mark as an negral par. However, he removal of he do does no seem o affec rereval o any greaer exen. The reason s ha does no seem lkely ha he oucome of he removal operaon,.e., 287BC, occurs n he documen daabase wh a meanng ha dffers from he meanng of 287B.C. Case of leers. The case of leers s normally no mporan n ndex erms. For example, algebra and Algebra have he same meanng, and would no make sense o creae wo ndex enres for hem. The sandard scenaro s herefore ha all ex s ransformed o eher lower or upper case, a ransformaon ha enhances recall. There are, however, examples where he case of leers maers. Consder Smh and smh. If all ex s ransformed o lower upper) case, he wo words are relaed n he sense ha hey are represened n he ndex by he same enry, namely he enry ha conans he lower upper) case form. I s hen possble ha a query, whch concerns, e.g., smhs and consss of smh, rereves a lo of documens, where persons named Smh are referred o. If so, he precson of he search may be poor Sop words As menoned above, he oucome of he lexcal analyss s he se C of canddae ndex erms. One may consder removng from C he mos frequenly occurrng words n he language of he documens ha are ndexed. Such words are referred o as sop words, and a sop ls s a ls of sop words. In Englsh, examples of sop words are and, of, he and o. 9 These words are lkely o occur n almos every documen n a daabase of Englsh documens. Therefore, hey dscrmnae badly beween relevan and rrelevan documens, and can be consdered as poor ndex erms. Anoher reason for removng sop words has o do wh effcency: he removal reduces he sze of he ndexng srucure consderably, snce he number of ndex enres s reduced. If a sop ls s used durng auomac ndexng, each canddae ndex erm n C s checked agans he ls. If occurs n he ls, s removed from C. If does no occur n he ls, s no removed from C. Thus, C s ransformed no a proper) subse of self, a subse ha does no conan ems from he sop ls. Le C _ denoe hs subse. no SW The removal of sop words s, hough, no whou drawbacks. Documens ha conan ceran wellknown phrases, e.g., o be or no o be, may be hard o fnd f a sop ls has been used durng auomac ndexng. If a sop ls has no been used, and f he I sysem offers dsance operaors, documens ha conan he phrase are easy o rereve Semmng and normalzaon From he reduced se of canddae ndex erms, C _, or, f sop words are no removed, from C no self, a smaller se, say C ST or C BF, may be produced by applcaon of semmng or normalzaon. In I, semmng usually refers o he removal of suffxes from word forms, and he oucome of he semmng process s a sem. In hs hess, normalzaon refers o he ransformaon of nfleced word forms o her base forms. SW 9 For an example of an Englsh sop ls, see Fox, 992, pp. 4-5). 7

15 If semmng s appled o he erms n C no _ SW C), C ST conans he sems of he erms n C no _ SW C). Alernavely, f normalzaon s appled o he erms n C no _ SW C), C BF conans he base forms of he erms n C _ C). no SW For example, assume ha he erm compuaons belongs o C _ C). Then semmng appled o compuaons yelds compu, gven ha he semmng algorhm appled s he well-known orer algorhm 0, whle normalzaon gves he base form, compuaon, for compuaons. Semmng and normalzaon are dscussed n a more dealed way n he nex chaper emoval of addonal hgh frequency erms When, possbly, sop words have been removed and, possbly, semmng or normalzaon has been appled, addonal hgh frequency erms may be removed from he se C ST C C BF, no _ SW, C). One possbly s o use he nverse documen frequency funcon Salon and McGll, 983, p. 73), IDF, defned as no SW IDF ) = / n, 2.) where n s he number of documens n he daabase, n whch he erm occurs. One may hen spulae ha f IDF ) s less han ceran hreshold value, s removed from he se C ST C C BF, no _ SW, C). For example, one may sae ha s removed f occurs n more han 25% of he documens. Ths saemen s equvalen o he saemen s removed f IDF ) <, N where N s he number of documens n he daabase The ndex 4 Fnally, an ndex s bul for he remanng erms, whch consues he ndex erms for he documen daabase. Typcally, each ndex erm s assocaed wh a se of posngs. Each posng s a hree-elemen ls of he form d, f,[ o, K, o ], d, f d, where d s a documen denfer, f d, {,2,3, K} he frequency of n d, and o he poson for he h occurrence of n d Bahle, Wllams and Zobel, 2002). The number of elemens n he se of posngs for an ndex erm s dencal o he number of documens n whch occurs. 0 The orer algorhm s descrbed n orer, 980). Cf. he defnon of ndex above. 8

16 As an example, consder Fgure 2-, wh wo hypohecal ndex enres, for he Englsh erms deep and purple... deep { 5,2,[0,50], 20,4,[,50,02,200] }... purple { 20,4,[2,5,03,20] }... Fgure 2-. Enres for wo ndex erms. deep has wo posngs, and purple has one. Therefore, deep occurs n wo documens, purple n one. In documen 20, deep has he frequency 4,.e., he number of occurrences of deep n he documen s 4. Also purple occurs n documen 20, wh he same frequency. Moreover, for each poson n for deep n documen 20, purple occurs a poson n +. Fnally, here should be a correspondence beween ype of ndex and queres. If neher semmng nor normalzaon s appled durng ndexng, a query may conan one or several varans of a gven word. However, when semmng normalzaon) s appled, a query should conan sems base forms) as query erms. The creang of sems base forms) from orgnal query erms can be done auomacally or manually Vsualzaon of he oulned auomac ndexng process In Fgure 2-2 below, he obecs and subprocesses nvolved n he oulned auomac ndexng process are vsualzed. The recangles represen obecs, he ellpses represen processes. 9

17 Documens Tex Lexcal analyss Canddae ndex erms *Check agans sop ls Terms no n sop ls Terms no n sop ls *Semmng *Normalzaon Sems Terms n base form *emoval of addonal hgh frequency erms Index erms Index Fgure 2-2. The oulned auomac ndexng process. The sar, *, ndcaes oponal subprocesses. 20

18 2.2 ereval models An I model s, followng Baeza-Yaes and bero-neo, 999, p. 23), a 4-uple M = D Q, F, q, d ), where, D s a se of represenaons of he documens n he daabase. Q s a se of formal) represenaons of nformaon needs,.e., of queres. F s a framework for modelng documen represenaons, queres, and her relaonshps. s a rankng funcon, whch assocaes a real number wh a query q Q and a documen represenaon d D. We recall ha D = d, K, d } s he documen daabase of a gven I sysem, and ha V D { N =, K, } s he vocabulary for D. Wh each documen d D, we assocae an { m m-dmensonal ndex erm vecor where r d = w, K, w ),, m, w, s he wegh 0 of he ndex erm n he documen d bd., p. 25). w, s supposed o reflec he mporance of he erm wh respec o descrbng he semanc conen of he documen d. If does no occur n d, hen w, = 0. We also nroduce a funcon, g, for each ndex erm, a funcon from he se of ndex vecors o he se of weghs assocaed wh, defned as bd., p. 25): r g d ) = w, For a gven ndex vecor d r, g maps d r on he wegh has n In Secons 2.2. and 2.2.2, we descrbe wo of he hree classcal I models he Boolean model and he vecor model. 2 Snce he expermens of he sudy was conduced n a probablsc rereval envronmen, Secon gves a farly comprehensve descrpon of wo probablsc models: he hrd classcal model he bnary ndependence model and he nference nework model. The las menoned model s he model on whch he I sysem used n he expermen of he sudy s based. The exposon of he hree classcal models s prncpally based on bd., pp ). All four models are descrbed n erms of he four componens of an I-model Boolean model The Boolean model was he frs I model generaed, and many of he early commercal I sysems, sysems ha sored bblographcal records, were based on he model. Operaonal sysems based on he model are no unusual oday, bu he model s no as domnan as used o be. The modelng framework for he Boolean model consss of se heory and proposonal logc. 2 Of he I models descrbed n Secon 2.2, he Boolean model, wh s exac mach rereval echnque, has he weakes connecon o he rereval envronmen of hs sudy. Therefore, hs model s brefly and somewha nformally descrbed. d. 2

19 In he Boolean model, a documen d D, s represened by a subse =, K, } of he Vd {,, k vocabulary for D,.e., by a subse of V D = {, K, m }. The erms n Vd are mplcly combned by he operaor AND, and are he erms of V D ha occur n d. However, we regard, wh he vecor noaon nroduced above, and equvalenly, represened by he ndex vecor r d = w, K, w ),, m, d o be where w, = f occurs n d, w, = 0 oherwse. Documens are, hen, represened by presence-absence -0) ndex vecors. A query n he Boolean model s a combnaon of ndex erms, he operaors AND, O and NOT, and parenhess. For example, AND 2 O 3 ) s a query. In he Boolean model, a documen d s rereved n relaon o a query q f and only f d sasfes he Boolean condons expressed by q. If hs s he case, d receves he smlary) value. Oherwse, d receves he value 0. For example, a documen d s rereved n relaon o AND 2 O 3 ) f and only f s presen n d, and eher 2 or 3 s presen n d. I should be clear from he nformal dscusson n he precedng paragraph ha he rereval echnque of he Boolean model s an exac mach echnque. There s no such hng n he model as approxmaon of he query condons. Ths s consdered o be a maor drawback of he model. I s desrable ha an I sysem ranks he documens n accordance wh her degree of smlary wh he query Vecor model The vecor model was presened n he early sevenes Salon, 97). In conras o he Boolean model, he rereval echnque of he vecor model s a bes mach echnque. The modellng framework for he vecor model consss of lnear algebra. The vecor model uses non-bnary weghs for erms n documens and queres. These weghs are used o calculae he degree of smlary beween a documen and a query. As n he Boolean model, a documen d s represened by he ndex vecor r d = w, K, w ) 2.2), m, However, he weghs are non-bnary. The vecor model brngs n and combnes wo facors wh respec o he weghng of erms n documens: he erm frequency facor f facor) and he nverse documen frequency facor df facor) Salon and Buckley, 988). The dea behnd he f facor s ha he number of occurrences of a erm n a documen says somehng abou how well he erm descrbes he conen of he documen. The dea behnd he df facor s ha a erm, whch occurs n a large proporon of he documens n he daabase, dscrmnaes badly beween relevan and 22

20 rrelevan documens. Le n freq, be he frequency of he erm n he documen d,.e., he number of occurrences of d. The f facor and he df facor are combned n Equaon 2.3), whch defnes he wegh of a erm n a documen d : w N = log, 2.3) n, freq, where n s he number of documens n he daabase, n whch occurs Baeza-Yaes and bero-neo, 999, p. 29). The f facor s refleced by he lef facor of he produc, he df facor 3 by he rgh. For a erm o have a large wegh n a documen, should occur frequenly n he documen, and should occur nfrequenly among he documens of he daabase. Noe ha f does no occur n d,.e, freq, = 0, hen w, = 0. Term weghng mehods based on he f and df facors are called f-df schemes. 4 A query n he vecor model s a subse of he vocabulary for documen daabase. Formally, q V =, K, }. Lke a documen, a query s represened by an m-dmensonal ndex vecor: D { m q r = w, K, ), 2.4), q w m, q where w, q s he wegh of n q. Query erms can be weghed n a smlar way as documen erms. For each erm n q, he wegh of n q may be defned as suggesed by Salon and Buckley 988): w 0.5 freq N = log, 2.5) n, q, q maxl freql, q where freq, q s he frequency of n he ex of he reques on whch q s based, and max freq l l, q he maxmum frequency wh respec o he erms ha occur n he ex of he reques. If does no belong o q, w, q s se o 0.) The f facor s hus normalzed by he maxmum erm frequency. Moreover, he f facor s normalzed o le n he nerval [0.5,]. Clearly, may be he case ha a reques s shor, perhaps a senence long. If so, each erm n he reques ex may have he frequency. Under hs frequency assumpon, for each erm n q, 3 Noe ha he rgh facor of he produc and he defnng expresson of he IDF funcon Secon 2..4) are boh hgh when he erm occurs n a few documens n he daabase. 4 The mehod gven by Equaon 2.3) s an example of such a scheme. 23

21 w, q 0.5 freq, q = log maxl freq l, q 0.5 N = log n = log = log N n N n N n 2.6) Thus, under he assumpon gven, only he df facor s aken no accoun. The degree of smlary beween a documen d and a query q s aken o be he correlaon beween he correspondng vecors d r and q r. Ths correlaon can be measured by, for nsance, he cosne measure, whch n ha case acs as he rankng funcon of he vecor model. 5 The cosne measure gves he cosne of he angle beween he vecors d r and q r, and s defned as Baeza-Yaes and bero-neo, 999, p. 27): m w = w,, q sm d, q) = 2.7) 2 m 2 w w = m, =, q The numeraor gves he scalar produc of he vecors d r and q r, whle he facors n he denomnaor, from lef o rgh, gve he norms of d r and q r, respecvely. The lef facor n he denomnaor has a normalzng effec wh respec o documen lengh. The rgh facor s consan for all documens n he daabase. The cosne measure gves values n he nerval [0,]. If none of he erms n q occur n d, he scalar produc s 0, and hen sm d, q) = 0. If a leas one of he query erms occur n d, and none of he query erms occur n all documens, sm d, q) > 0. When sm d, q) has been compued, for each documen d, he documens of he daabase are ranked accordng o descendng smlary values. The documen wh he hghes value comes frs rank ), he documen wh he nex hghes value comes second rank 2), and so on Two probablsc models The bnary ndependence model The bnary ndependence model BI) was nroduced n he md sevenes oberson and Sparck Jones, 976). Here we gve a more comprehensve presenaon of BI han wha s ypcally gven n he leraure. 5 From hs pon and n he remander of he hess, we consder he rankng funcon of a model o be such ha assocaes a real number wh a query and a documen and no a documen represenaon). 24

22 Accordng o BI, rereved documens should be ranked by decreasng probably of relevance. The man hough behnd he model s ha he probably of a documen beng relevan o a query can be compued by consderng how ndex erms are dsrbued n relevan and rrelevan documens. The rereval echnque of BI s, lke he rereval echnque of he vecor model, a bes mach echnque. The framework of BI s probably heory, n parcular Bayes heorem. As n he Boolean model, a documen ndex vecor: r d = w, K, w ),, m, d s represened by an m-dmensonal presence-absence where w, = f occurs n d, w 0 oherwse., = In BI, lke n he vecor model, a query s a subse of he vocabulary for D. Formally, q V D = {, K, m }. As s he case n he vecor model, q s represened by an m-dmensonal ndex vecor. However, hs vecor s a presence-absence vecor: q r = w, K, ),, q w m, q where w f q, w 0 oherwse., q =, q = BI res o esmae he probably ha he user wll consder he documen d relevan, n relaon o a query q. I s assumed n BI ha hs relevance probably depends only on he query and he documen represenaons. Baeza-Yaes and bero-neo, 999, p. 3) Anoher assumpon of BI s ha here exss, for a gven query q, a subse of D ha conans he relevan and only he relevan documens for q bd., p. 3). Le be hs se, he deal answer se. elevance s vewed as a bnary arbue: eher a documen s relevan o a query or s no. The documens n are herefore consdered o be equally relevan o he query. BI sees he queryng process as a process where he query, seen as a probablsc descrpon of, eravely becomes beer and beer. Inally, one has o guess whch he properes of wh respec o ndex erms) could be, and consruc a frs descrpon of. Ths nal descrpon s hen used o rereve documens n a frs search. The op n ranked documens are hen examned by he user, who decdes whch of hem ha are relevan and whch are no. On he bass of hs nformaon, a new descrpon of s generaed, and a second search s performed. By erang hs process several mes, a good approxmaon of he real descrpon of s expeced o be generaed. bd., p. 3) r Le be he se of rrelevan documens n D,.e., = D. Le d ) be he probably ha a documen wh he represenaon d r r s relevan o he query q, and le d ) be he probably ha a documen wh he represenaon d r s rrelevan o q. Thus, r r d ) = d ).) Noe ha hese expressons, whch sand for condonal probables, have references o documen ndex vecors, and no o he documens hemselves. In BI, he smlary beween d and q s defned as bd., p. 32): 25

23 26 ) ) ), d d q d sm r r =, 2.8) whch gves he odds of a documen wh he represenaon d r beng relevan o q. Bayes heorem s appled o boh he numeraor and he denomnaor, and we oban bd., p. 32) ) ) ) ) ) ) / ) ) ) / ) ), d d d d d d q d sm = = r r r r r r where ) d r ) d r ) sands for he probably ha a documen has he represenaon d r, gven ha he documen s relevan rrelevan), and ) ) ) sands for he probably ha a documen n D s relevan rrelevan). Wh respec o a gven query, ) and ) are consan for all documens n D. Therefore, hese wo facors are dropped, and he smlary s wren as ) ) ), d d q d sm r r = 2.9) An addonal assumpon of BI s ha he ndex erms are sascally ndependen, boh whn and whn bd., p. 32). Tha s, whn boh ses, each ndex erm s sascally ndependen of all he oher ndex erms. Wh hs assumpon, s possble o decompose he represenaon of d,.e., he presence-absence ndex vecor d r, no s componens. The decomposon makes feasble o consder how ndvdual ndex erms, nsead of whole represenaons, are dsrbued n and. Accordng o Sparck Jones, Walker and oberson, 2000), he documen represenaons are assumed o be unque, and assgnng probables o unque represenaons s hard. The ndependence assumpon yelds, from Equaon 2.9), he followng: = = = = = = = = 0 ) ) ) 0 ) ) 0 ) ) ) ) ) )) )) )) )) ), d g d g d g d g d g d g q d sm r r r r r r 2.0) where ) ) ) sands for he probably ha occurs n a relevan rrelevan) documen, and ) ) ) sands for he probably ha does no occur n a relevan rrelevan) documen.

24 27 Nex, we assume ha ) ) =, for each erm q Fuhr, 992). Then we oban = = = q d g q d g q d sm 0 ) ) ) ) ) ) ), r r 2.) In 2.), for each such ha occurs n d and belongs o q, we ) exend he second produc wh ) ), and 2) subsue, n he frs produc, )) ) )) ) for ) ). Then we oban = q q d g ) ) )) ) )) ) ) r, 2.2) and 2.2) = = = q d g q d g 0 ) ) ) ) ) ) r r. 2.3) To see ha hs equaly holds, noe ha ) ) )) )) ) )) )) ) ) ) )) ) )) ) = =. For a gven query, he second produc of 2.2) s consan over all documens. Ths produc s herefore dropped, and he logarhm s aken for he remanng produc. We fnally ge

25 28. ) )) log ) ) log ) )) log ) ) log ) )) ) ) log ) )) )) ) log )) ) )) ) log )) ) )) ) log ),,, ) ) ) ) ) + = + = = = = = = = = = = = w w q d sm m q q d g q d g q d g q d g q d g r r r r r The rankng funcon of BI can hen be defned as + = = ) )) log ) ) log ),,, w w q d sm m q, 2.4) whch s he defnon gven by Baeza-Yaes and bero-neo 999, p. 32). When ), q d sm has been compued, for each documen d, he documens are ranked accordng o descendng smlary values. The probables ) and ) have o be esmaed for he query erms. For he frs search, when no documens have been rereved, ) may be se o 0.5 and ) o n /N, for each q bd., p. 33). Equaon 2.4) s hen used o rank he documens. The op n ranked documens from he frs search are relevance assessed by he user. The dsrbuon of he query erms n ) documens ha are consdered o be relevan by he user, and 2) he res of he documens are used o generae new values for he wo probables for he query erms. For example, he followng wo equaons may be used for he second and laer) esmaons of he wo probables bd., p. 2): ), + + = r N n r D D, 2.5) and

26 ) = n D r, N D r + n N +, 2.6) where D r s he se of relevan as consdered by he user) rereved documens, and D, he se of n relevan rereved documens, n whch occurs. The expresson N s nroduced 6 as an adusmen componen, n order o avod problems ha arse when he values of D r and D r, are small e.g., and 0, respecvely). The new values for he wo probables, values obaned by 2.5) and 2.6), are expeced o be beer approxmaons of he real probables han he nal values. A new search s hen performed, Equaon 2.4) s appled, he user relevance assesses rereved documens, and new values for he probables are obaned, and so on. I s expeced, gven mprovemens of he approxmaons n queson, ha he rankngs of he documens are gradually mproved. Noe ha he las facor n Equaon 2.4) can be regarded as a query erm wegh. The query s consan for he eraons, bu s erms are reweghed n each eraon, by means of relevance feedback. The nference nework model The rereval sysem of hs sudy, InQuery, s based on anoher probablsc model: he nference nework model. We sar wh a shor presenaon of Bayesan neworks, whch, ogeher wh probably heory, s he modelng framework for he nference nework model. A Bayesan nework BN) has he followng wo componens Jensen, 996, pp.8-9): A unverse U = { A, K, A m } of varables nodes) and a se of dreced lnks beween varables. Wh each varable here s assocaed a fne se of parwse dson saes. Togeher, he varables and he dreced lnks form a dreced acyclc graph DAG) 7. A se of condonal probably ables. Le o he node A, A s a paren of A, and A, A U. If here s a dreced lnk from he node A A s a chld of A. Now, wh each varable r A wh parens A, K, A 0 n) n here s assocaed a condonal probably able A,, A K An ). Ths able specfes, for each possble sae a of A, he probably ha A s n sae a has he value a ), gven each combnaon of saes for he parens A, K, A. If n A has no parens n = 0), A s a roo of he nework and he condonal probably able for A reduces o he uncondonal probables A ). These probables for roos are pror probables. Le BN be a Bayesan nework, and le varables A,, A K o n A,, A K be he parens of n A. The dreced lnks from he A n BN are nended o represen causal relaonshps. The srengh of he 6 In he raos Dr, D r and n Dr, N D r. 7 A dreced graph G s acyclc f here exss no dreced pah A L An n G such ha A = An. 29

27 causal nfluence of he varables probably able for A. A,, A K on he varable n A s expressed by he condonal Bayesan neworks are also called belef neworks. In hs usage, he probables assocaed wh nodes of he nework are called belefs. When pror probables for he roo nodes and condonal probables for he non-roo nodes have been obaned, he nal probably assocaed wh each node of he BN can be compued. Assume now ha we come o know ha a node A n a BN s n one of s possble saes a. We hen se A = a ) o and updae he probables of all he oher nodes n BN. 8 The resulng probables are called poseror probables, and hese reflec he new degrees of belef compued n he lgh of he new evdence. The process of updang he probables of nodes on he bass of causal relaonshps n he BN and knowledge abou saes of nodes s called nference or model evaluaon). Kade, Hovel and Horvz, 200) The nference nework model s pu forward n Turle, 990; Turle and Crof, 990; Turle and Crof, 99). Our descrpon of he model s based foremos on Turle, 990). We wll descrbe a smplfed verson of he model, where so called ex nodes and query concep nodes are omed. Ths omsson wll neher preven us from descrbng he essence of he model nor obscure he model s connecon o he I sysem InQuery. Consder he dreced graph n Fgure 2-3: d d 2 d N- d N 2 3 m q q 2 I Fgure 2-3. An nference nework, based on Fgure 4.2 n bd., p. 46). The dreced graph n Fgure 2-3 s obvously acyclc and herefore a DAG. If we assume a condonal probably able for each node n hs DAG, s a BN, or, n he usage of bd.), an nference nework. From hs pon, we use he laer expresson for referrng o a BN. The roo nodes of he nference nework n he fgure represen he documens n a gven daabase, whle he nodes on he second level represen he ndex erms 9 for he daabase. The hrd level conans 8 To se a node o one of s possble saes s called nsanaon. 9 Turle uses he expresson represenaon node. However, we, lke Baeza-Yaes and bero-neo, 999, p. 50) use he expresson ndex erm node. 30

28 nodes ha represen queres, and he node on he fourh level represens an nformaon need. In he nework of he fgure, hs need s formally expressed n wo ways by he queres q and q Wh each node n he nework are assocaed exacly wo saes: false 0) and rue ). A documen node corresponds o he even of observng he documen represened by he node. For each documen node here s a pror belef, whch descrbes he probably of observng he documen of he node. Ths pror belef s generally se o / N, where N s he number of documens n he daabase Turle, 990, p. 42). A dreced lnk from a node for a documen d o a node for he ndex erm ndcaes ha has been assgned o d. Thus, he parens of he node for an ndex erm represens he documens ndexed by. Furher, each ndex erm node has a condonal probably able assocaed wh. The able specfes he condonal probably assocaed wh he node, gven all combnaons of ruh values 0 and ) wh respec o he parens documen nodes) of he node. As we wll see below, hs specfcaon may nvolve erm weghs. bd., pp. 4-4) A query node represens a query, and he node corresponds o he even ha he query s sasfed. A dreced lnk from a node for an ndex erm o a node for a query q ndcaes ha q conans. Thus, he parens of he node for a query q represen he ndex erms ha are conaned n q. Each query node has a condonal probably able assocaed wh. The belef n he query node, gven all combnaons of ruh values wh respec o he parens of he node ndex erm nodes), s expressed by a condonal probably able. Fnally, he leaf node of he nference nework corresponds o he even ha he nformaon need s me. bd., p. 45) Wh he pror probables / N ) for he documen nodes and condonal probables for he non-roo nodes, he nal probably assocaed wh each node of he nference nework can be compued. However, a key dea behnd he nference nework model s o nsanae a gven documen node d o rue d = ),.e., asserng ha he documen d has been observed, whle nsanang each oher documen node o false d k = 0, for k ) 2,.e., asserng ha each oher documen s unobserved. When hs s done, poseror probables can be compued: a new degree of belef for each node n he nework s compued, under he assumpon ha d =. As a specal case, he degree of belef ha he nformaon need s me s compued. By erang he nference model evaluaon) process, whch may be regarded as an evdence ransmsson process, for each documen node n he nework, N belef values are generaed. The N documens n he daabase can hen be ranked descendng on he bass of hese values. In order o use an nference nework for rereval, he belef ha a non-roo node has a ceran ruh value, gven all combnaons of ruh values for he parens of he node, mus be esmaed. Assume ha a node A n he nework has n parens, A, K, A. Then, snce each node n he nework s bnary, we can specfy he belef esmaes n a 53): n n 2 2 marx M such ha bd., pp Noe ha he nference nework model adms mulple queres for he same nformaon need. 2 Here, we le d and d k refer o boh nodes/varables and o documens. We adm hs ambguy also for and q. However, should be clear from he conex wha we are referrng o by he symbols. 3

29 . The wo rows of M represen A. The number of he frs row s 0 false) and he number of he second row s rue). 2. The column numbers, 0 o 2 n, are represened wh bnary noaon: he frs column s represened by 0L 0 n zeros), he second by 0L 0 n zeros followed by ), he hrd by 0L 0 n 2 zeros, followed by and 0), and so on. The lefmos b n such a represenaon gves he ruh value of he frs paren, A, he nex b o he rgh gves he ruh value of A 2, and so on. n 3. The cell M [ k, l], where k {0, } and 0 l < 2, conans he esmae of he belef ha A = k, gven ha he parens A, K, A of n A have he ruh values ndcaed by he bnary represenaon of l. Noe ha M s essenally a condonal probably able for A.) The belef n A s hen compued by usng he probables for he se of parens, { A, K, A n }, ogeher wh he condonal belefs provded by M. Example 2.. Ths example s based on bd., pp ). Consder he nework n Fgure 2-4: 2 3 q Fgure 2-4. An nference nework wh four nodes. For he node q wh hree parens) n Fgure 2-4, we gve a operaor. Assume ha = ) = p 2 = ) = p = = p 3 ) marx ha mplemens a sum We furher assume ha he belef n a query node q depends only on he number of parens ha are rue.e., s n sae ). Le l be a column number such ha m 0 m n) bs = n he bnary represenaon of l. Then and m M sum [, l] =, n M sum [ 0, l] n m = n The sum marx for he nework n Fgure 2-4 s hen: 32

Computing Relevance, Similarity: The Vector Space Model

Computing Relevance, Similarity: The Vector Space Model Compung Relevance, Smlary: The Vecor Space Model Based on Larson and Hears s sldes a UC-Bereley hp://.sms.bereley.edu/courses/s0/f00/ aabase Managemen Sysems, R. Ramarshnan ocumen Vecors v ocumens are

More information

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany Herarchcal Markov Normal Mxure models wh Applcaons o Fnancal Asse Reurns Appendx: Proofs of Theorems and Condonal Poseror Dsrbuons John Geweke a and Gann Amsano b a Deparmens of Economcs and Sascs, Unversy

More information

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS R&RATA # Vol.) 8, March FURTHER AALYSIS OF COFIDECE ITERVALS FOR LARGE CLIET/SERVER COMPUTER ETWORKS Vyacheslav Abramov School of Mahemacal Scences, Monash Unversy, Buldng 8, Level 4, Clayon Campus, Wellngon

More information

( ) () we define the interaction representation by the unitary transformation () = ()

( ) () we define the interaction representation by the unitary transformation () = () Hgher Order Perurbaon Theory Mchael Fowler 3/7/6 The neracon Represenaon Recall ha n he frs par of hs course sequence, we dscussed he chrödnger and Hesenberg represenaons of quanum mechancs here n he chrödnger

More information

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!) i+1,q - [(! ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL The frs hng o es n wo-way ANOVA: Is here neracon? "No neracon" means: The man effecs model would f. Ths n urn means: In he neracon plo (wh A on he horzonal

More information

Variants of Pegasos. December 11, 2009

Variants of Pegasos. December 11, 2009 Inroducon Varans of Pegasos SooWoong Ryu bshboy@sanford.edu December, 009 Youngsoo Cho yc344@sanford.edu Developng a new SVM algorhm s ongong research opc. Among many exng SVM algorhms, we wll focus on

More information

On One Analytic Method of. Constructing Program Controls

On One Analytic Method of. Constructing Program Controls Appled Mahemacal Scences, Vol. 9, 05, no. 8, 409-407 HIKARI Ld, www.m-hkar.com hp://dx.do.org/0.988/ams.05.54349 On One Analyc Mehod of Consrucng Program Conrols A. N. Kvko, S. V. Chsyakov and Yu. E. Balyna

More information

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction ECOOMICS 35* -- OTE 9 ECO 35* -- OTE 9 F-Tess and Analyss of Varance (AOVA n he Smple Lnear Regresson Model Inroducon The smple lnear regresson model s gven by he followng populaon regresson equaon, or

More information

Solution in semi infinite diffusion couples (error function analysis)

Solution in semi infinite diffusion couples (error function analysis) Soluon n sem nfne dffuson couples (error funcon analyss) Le us consder now he sem nfne dffuson couple of wo blocks wh concenraon of and I means ha, n a A- bnary sysem, s bondng beween wo blocks made of

More information

TSS = SST + SSE An orthogonal partition of the total SS

TSS = SST + SSE An orthogonal partition of the total SS ANOVA: Topc 4. Orhogonal conrass [ST&D p. 183] H 0 : µ 1 = µ =... = µ H 1 : The mean of a leas one reamen group s dfferen To es hs hypohess, a basc ANOVA allocaes he varaon among reamen means (SST) equally

More information

Robustness Experiments with Two Variance Components

Robustness Experiments with Two Variance Components Naonal Insue of Sandards and Technology (NIST) Informaon Technology Laboraory (ITL) Sascal Engneerng Dvson (SED) Robusness Expermens wh Two Varance Componens by Ana Ivelsse Avlés avles@ns.gov Conference

More information

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS THE PREICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS INTROUCTION The wo dmensonal paral dfferenal equaons of second order can be used for he smulaon of compeve envronmen n busness The arcle presens he

More information

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim Korean J. Mah. 19 (2011), No. 3, pp. 263 272 GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS Youngwoo Ahn and Kae Km Absrac. In he paper [1], an explc correspondence beween ceran

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Lnear Response Theory: The connecon beween QFT and expermens 3.1. Basc conceps and deas Q: ow do we measure he conducvy of a meal? A: we frs nroduce a weak elecrc feld E, and hen measure

More information

Li An-Ping. Beijing , P.R.China

Li An-Ping. Beijing , P.R.China A New Type of Cpher: DICING_csb L An-Png Bejng 100085, P.R.Chna apl0001@sna.com Absrac: In hs paper, we wll propose a new ype of cpher named DICING_csb, whch s derved from our prevous sream cpher DICING.

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4 CS434a/54a: Paern Recognon Prof. Olga Veksler Lecure 4 Oulne Normal Random Varable Properes Dscrmnan funcons Why Normal Random Varables? Analycally racable Works well when observaon comes form a corruped

More information

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model BGC1: Survval and even hsory analyss Oslo, March-May 212 Monday May 7h and Tuesday May 8h The addve regresson model Ørnulf Borgan Deparmen of Mahemacs Unversy of Oslo Oulne of program: Recapulaon Counng

More information

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas) Lecure 8: The Lalace Transform (See Secons 88- and 47 n Boas) Recall ha our bg-cure goal s he analyss of he dfferenal equaon, ax bx cx F, where we emloy varous exansons for he drvng funcon F deendng on

More information

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue. Lnear Algebra Lecure # Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons

More information

Epistemic Game Theory: Online Appendix

Epistemic Game Theory: Online Appendix Epsemc Game Theory: Onlne Appendx Edde Dekel Lucano Pomao Marcano Snscalch July 18, 2014 Prelmnares Fx a fne ype srucure T I, S, T, β I and a probably µ S T. Le T µ I, S, T µ, βµ I be a ype srucure ha

More information

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005 Dynamc Team Decson Theory EECS 558 Proec Shruvandana Sharma and Davd Shuman December 0, 005 Oulne Inroducon o Team Decson Theory Decomposon of he Dynamc Team Decson Problem Equvalence of Sac and Dynamc

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Ths documen s downloaded from DR-NTU, Nanyang Technologcal Unversy Lbrary, Sngapore. Tle A smplfed verb machng algorhm for word paron n vsual speech processng( Acceped verson ) Auhor(s) Foo, Say We; Yong,

More information

CHAPTER 10: LINEAR DISCRIMINATION

CHAPTER 10: LINEAR DISCRIMINATION CHAPER : LINEAR DISCRIMINAION Dscrmnan-based Classfcaon 3 In classfcaon h K classes (C,C,, C k ) We defned dscrmnan funcon g j (), j=,,,k hen gven an es eample, e chose (predced) s class label as C f g

More information

CS286.2 Lecture 14: Quantum de Finetti Theorems II

CS286.2 Lecture 14: Quantum de Finetti Theorems II CS286.2 Lecure 14: Quanum de Fne Theorems II Scrbe: Mara Okounkova 1 Saemen of he heorem Recall he las saemen of he quanum de Fne heorem from he prevous lecure. Theorem 1 Quanum de Fne). Le ρ Dens C 2

More information

FTCS Solution to the Heat Equation

FTCS Solution to the Heat Equation FTCS Soluon o he Hea Equaon ME 448/548 Noes Gerald Reckenwald Porland Sae Unversy Deparmen of Mechancal Engneerng gerry@pdxedu ME 448/548: FTCS Soluon o he Hea Equaon Overvew Use he forward fne d erence

More information

FI 3103 Quantum Physics

FI 3103 Quantum Physics /9/4 FI 33 Quanum Physcs Aleander A. Iskandar Physcs of Magnesm and Phooncs Research Grou Insu Teknolog Bandung Basc Conces n Quanum Physcs Probably and Eecaon Value Hesenberg Uncerany Prncle Wave Funcon

More information

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue. Mah E-b Lecure #0 Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons are

More information

P R = P 0. The system is shown on the next figure:

P R = P 0. The system is shown on the next figure: TPG460 Reservor Smulaon 08 page of INTRODUCTION TO RESERVOIR SIMULATION Analycal and numercal soluons of smple one-dmensonal, one-phase flow equaons As an nroducon o reservor smulaon, we wll revew he smples

More information

Cubic Bezier Homotopy Function for Solving Exponential Equations

Cubic Bezier Homotopy Function for Solving Exponential Equations Penerb Journal of Advanced Research n Compung and Applcaons ISSN (onlne: 46-97 Vol. 4, No.. Pages -8, 6 omoopy Funcon for Solvng Eponenal Equaons S. S. Raml *,,. Mohamad Nor,a, N. S. Saharzan,b and M.

More information

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Time-interval analysis of β decay. V. Horvat and J. C. Hardy Tme-nerval analyss of β decay V. Horva and J. C. Hardy Work on he even analyss of β decay [1] connued and resuled n he developmen of a novel mehod of bea-decay me-nerval analyss ha produces hghly accurae

More information

Relative controllability of nonlinear systems with delays in control

Relative controllability of nonlinear systems with delays in control Relave conrollably o nonlnear sysems wh delays n conrol Jerzy Klamka Insue o Conrol Engneerng, Slesan Techncal Unversy, 44- Glwce, Poland. phone/ax : 48 32 37227, {jklamka}@a.polsl.glwce.pl Keywor: Conrollably.

More information

Math 128b Project. Jude Yuen

Math 128b Project. Jude Yuen Mah 8b Proec Jude Yuen . Inroducon Le { Z } be a sequence of observed ndependen vecor varables. If he elemens of Z have a on normal dsrbuon hen { Z } has a mean vecor Z and a varancecovarance marx z. Geomercally

More information

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5 TPG460 Reservor Smulaon 08 page of 5 DISCRETIZATIO OF THE FOW EQUATIOS As we already have seen, fne dfference appromaons of he paral dervaves appearng n he flow equaons may be obaned from Taylor seres

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm H ( q, p, ) = q p L( q, q, ) H p = q H q = p H = L Equvalen o Lagrangan formalsm Smpler, bu

More information

Department of Economics University of Toronto

Department of Economics University of Toronto Deparmen of Economcs Unversy of Torono ECO408F M.A. Economercs Lecure Noes on Heeroskedascy Heeroskedascy o Ths lecure nvolves lookng a modfcaons we need o make o deal wh he regresson model when some of

More information

Fall 2010 Graduate Course on Dynamic Learning

Fall 2010 Graduate Course on Dynamic Learning Fall 200 Graduae Course on Dynamc Learnng Chaper 4: Parcle Flers Sepember 27, 200 Byoung-Tak Zhang School of Compuer Scence and Engneerng & Cognve Scence and Bran Scence Programs Seoul aonal Unversy hp://b.snu.ac.kr/~bzhang/

More information

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

How about the more general linear scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )? lmcd Lnear ransformaon of a vecor he deas presened here are que general hey go beyond he radonal mar-vecor ype seen n lnear algebra Furhermore, hey do no deal wh bass and are equally vald for any se of

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm Hqp (,,) = qp Lqq (,,) H p = q H q = p H L = Equvalen o Lagrangan formalsm Smpler, bu wce as

More information

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL Sco Wsdom, John Hershey 2, Jonahan Le Roux 2, and Shnj Waanabe 2 Deparmen o Elecrcal Engneerng, Unversy o Washngon, Seale, WA, USA

More information

Let s treat the problem of the response of a system to an applied external force. Again,

Let s treat the problem of the response of a system to an applied external force. Again, Page 33 QUANTUM LNEAR RESPONSE FUNCTON Le s rea he problem of he response of a sysem o an appled exernal force. Agan, H() H f () A H + V () Exernal agen acng on nernal varable Hamlonan for equlbrum sysem

More information

Testing a new idea to solve the P = NP problem with mathematical induction

Testing a new idea to solve the P = NP problem with mathematical induction Tesng a new dea o solve he P = NP problem wh mahemacal nducon Bacground P and NP are wo classes (ses) of languages n Compuer Scence An open problem s wheher P = NP Ths paper ess a new dea o compare he

More information

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION INTERNATIONAL TRADE T. J. KEHOE UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 27 EXAMINATION Please answer wo of he hree quesons. You can consul class noes, workng papers, and arcles whle you are workng on he

More information

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance INF 43 3.. Repeon Anne Solberg (anne@f.uo.no Bayes rule for a classfcaon problem Suppose we have J, =,...J classes. s he class label for a pxel, and x s he observed feaure vecor. We can use Bayes rule

More information

Clustering (Bishop ch 9)

Clustering (Bishop ch 9) Cluserng (Bshop ch 9) Reference: Daa Mnng by Margare Dunham (a slde source) 1 Cluserng Cluserng s unsupervsed learnng, here are no class labels Wan o fnd groups of smlar nsances Ofen use a dsance measure

More information

On computing differential transform of nonlinear non-autonomous functions and its applications

On computing differential transform of nonlinear non-autonomous functions and its applications On compung dfferenal ransform of nonlnear non-auonomous funcons and s applcaons Essam. R. El-Zahar, and Abdelhalm Ebad Deparmen of Mahemacs, Faculy of Scences and Humanes, Prnce Saam Bn Abdulazz Unversy,

More information

Chapter Lagrangian Interpolation

Chapter Lagrangian Interpolation Chaper 5.4 agrangan Inerpolaon Afer readng hs chaper you should be able o:. dere agrangan mehod of nerpolaon. sole problems usng agrangan mehod of nerpolaon and. use agrangan nerpolans o fnd deraes and

More information

Comb Filters. Comb Filters

Comb Filters. Comb Filters The smple flers dscussed so far are characered eher by a sngle passband and/or a sngle sopband There are applcaons where flers wh mulple passbands and sopbands are requred Thecomb fler s an example of

More information

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Journal of Appled Mahemacs and Compuaonal Mechancs 3, (), 45-5 HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Sansław Kukla, Urszula Sedlecka Insue of Mahemacs,

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecure Sldes for Machne Learnng nd Edon ETHEM ALPAYDIN, modfed by Leonardo Bobadlla and some pars from hp://www.cs.au.ac.l/~aparzn/machnelearnng/ The MIT Press, 00 alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/mle

More information

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study) Inernaonal Mahemacal Forum, Vol. 8, 3, no., 7 - HIKARI Ld, www.m-hkar.com hp://dx.do.org/.988/mf.3.3488 New M-Esmaor Objecve Funcon n Smulaneous Equaons Model (A Comparave Sudy) Ahmed H. Youssef Professor

More information

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov June 7 e-ournal Relably: Theory& Applcaons No (Vol. CONFIDENCE INTERVALS ASSOCIATED WITH PERFORMANCE ANALYSIS OF SYMMETRIC LARGE CLOSED CLIENT/SERVER COMPUTER NETWORKS Absrac Vyacheslav Abramov School

More information

Robust and Accurate Cancer Classification with Gene Expression Profiling

Robust and Accurate Cancer Classification with Gene Expression Profiling Robus and Accurae Cancer Classfcaon wh Gene Expresson Proflng (Compuaonal ysems Bology, 2005) Auhor: Hafeng L, Keshu Zhang, ao Jang Oulne Background LDA (lnear dscrmnan analyss) and small sample sze problem

More information

WiH Wei He

WiH Wei He Sysem Idenfcaon of onlnear Sae-Space Space Baery odels WH We He wehe@calce.umd.edu Advsor: Dr. Chaochao Chen Deparmen of echancal Engneerng Unversy of aryland, College Par 1 Unversy of aryland Bacground

More information

Tight results for Next Fit and Worst Fit with resource augmentation

Tight results for Next Fit and Worst Fit with resource augmentation Tgh resuls for Nex F and Wors F wh resource augmenaon Joan Boyar Leah Epsen Asaf Levn Asrac I s well known ha he wo smple algorhms for he classc n packng prolem, NF and WF oh have an approxmaon rao of

More information

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC CH.3. COMPATIBILITY EQUATIONS Connuum Mechancs Course (MMC) - ETSECCPB - UPC Overvew Compably Condons Compably Equaons of a Poenal Vecor Feld Compably Condons for Infnesmal Srans Inegraon of he Infnesmal

More information

Comparison of Differences between Power Means 1

Comparison of Differences between Power Means 1 In. Journal of Mah. Analyss, Vol. 7, 203, no., 5-55 Comparson of Dfferences beween Power Means Chang-An Tan, Guanghua Sh and Fe Zuo College of Mahemacs and Informaon Scence Henan Normal Unversy, 453007,

More information

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times Reacve Mehods o Solve he Berh AllocaonProblem wh Sochasc Arrval and Handlng Tmes Nsh Umang* Mchel Berlare* * TRANSP-OR, Ecole Polyechnque Fédérale de Lausanne Frs Workshop on Large Scale Opmzaon November

More information

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair TECHNI Inernaonal Journal of Compung Scence Communcaon Technologes VOL.5 NO. July 22 (ISSN 974-3375 erformance nalyss for a Nework havng Sby edundan Un wh ang n epar Jendra Sngh 2 abns orwal 2 Deparmen

More information

CHAPTER 5: MULTIVARIATE METHODS

CHAPTER 5: MULTIVARIATE METHODS CHAPER 5: MULIVARIAE MEHODS Mulvarae Daa 3 Mulple measuremens (sensors) npus/feaures/arbues: -varae N nsances/observaons/eamples Each row s an eample Each column represens a feaure X a b correspons o he

More information

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth Should Exac Index umbers have Sandard Errors? Theory and Applcaon o Asan Growh Rober C. Feensra Marshall B. Rensdorf ovember 003 Proof of Proposon APPEDIX () Frs, we wll derve he convenonal Sao-Vara prce

More information

Introduction to Boosting

Introduction to Boosting Inroducon o Boosng Cynha Rudn PACM, Prnceon Unversy Advsors Ingrd Daubeches and Rober Schapre Say you have a daabase of news arcles, +, +, -, -, +, +, -, -, +, +, -, -, +, +, -, + where arcles are labeled

More information

Lecture 6: Learning for Control (Generalised Linear Regression)

Lecture 6: Learning for Control (Generalised Linear Regression) Lecure 6: Learnng for Conrol (Generalsed Lnear Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure 6: RLSC - Prof. Sehu Vjayakumar Lnear Regresson

More information

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model Probablsc Model for Tme-seres Daa: Hdden Markov Model Hrosh Mamsuka Bonformacs Cener Kyoo Unversy Oulne Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon

More information

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden

More information

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy Arcle Inernaonal Journal of Modern Mahemacal Scences, 4, (): - Inernaonal Journal of Modern Mahemacal Scences Journal homepage: www.modernscenfcpress.com/journals/jmms.aspx ISSN: 66-86X Florda, USA Approxmae

More information

Tools for Analysis of Accelerated Life and Degradation Test Data

Tools for Analysis of Accelerated Life and Degradation Test Data Acceleraed Sress Tesng and Relably Tools for Analyss of Acceleraed Lfe and Degradaon Tes Daa Presened by: Reuel Smh Unversy of Maryland College Park smhrc@umd.edu Sepember-5-6 Sepember 28-30 206, Pensacola

More information

An introduction to Support Vector Machine

An introduction to Support Vector Machine An nroducon o Suppor Vecor Machne 報告者 : 黃立德 References: Smon Haykn, "Neural Neworks: a comprehensve foundaon, second edon, 999, Chaper 2,6 Nello Chrsann, John Shawe-Tayer, An Inroducon o Suppor Vecor Machnes,

More information

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6) Econ7 Appled Economercs Topc 5: Specfcaon: Choosng Independen Varables (Sudenmund, Chaper 6 Specfcaon errors ha we wll deal wh: wrong ndependen varable; wrong funconal form. Ths lecure deals wh wrong ndependen

More information

2.1 Constitutive Theory

2.1 Constitutive Theory Secon.. Consuve Theory.. Consuve Equaons Governng Equaons The equaons governng he behavour of maerals are (n he spaal form) dρ v & ρ + ρdv v = + ρ = Conservaon of Mass (..a) d x σ j dv dvσ + b = ρ v& +

More information

Panel Data Regression Models

Panel Data Regression Models Panel Daa Regresson Models Wha s Panel Daa? () Mulple dmensoned Dmensons, e.g., cross-secon and me node-o-node (c) Pongsa Pornchawseskul, Faculy of Economcs, Chulalongkorn Unversy (c) Pongsa Pornchawseskul,

More information

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Anomaly eecon Lecure Noes for Chaper 9 Inroducon o aa Mnng, 2 nd Edon by Tan, Senbach, Karpane, Kumar 2/14/18 Inroducon o aa Mnng, 2nd Edon 1 Anomaly/Ouler eecon Wha are anomales/oulers? The se of daa

More information

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

Part II CONTINUOUS TIME STOCHASTIC PROCESSES Par II CONTINUOUS TIME STOCHASTIC PROCESSES 4 Chaper 4 For an advanced analyss of he properes of he Wener process, see: Revus D and Yor M: Connuous marngales and Brownan Moon Karazas I and Shreve S E:

More information

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation Global Journal of Pure and Appled Mahemacs. ISSN 973-768 Volume 4, Number 6 (8), pp. 89-87 Research Inda Publcaons hp://www.rpublcaon.com Exsence and Unqueness Resuls for Random Impulsve Inegro-Dfferenal

More information

2/20/2013. EE 101 Midterm 2 Review

2/20/2013. EE 101 Midterm 2 Review //3 EE Mderm eew //3 Volage-mplfer Model The npu ressance s he equalen ressance see when lookng no he npu ermnals of he amplfer. o s he oupu ressance. I causes he oupu olage o decrease as he load ressance

More information

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b Inernaonal Indusral Informacs and Compuer Engneerng Conference (IIICEC 05) Arbue educon Algorhm Based on Dscernbly Marx wh Algebrac Mehod GAO Jng,a, Ma Hu, Han Zhdong,b Informaon School, Capal Unversy

More information

Chapter 6: AC Circuits

Chapter 6: AC Circuits Chaper 6: AC Crcus Chaper 6: Oulne Phasors and he AC Seady Sae AC Crcus A sable, lnear crcu operang n he seady sae wh snusodal excaon (.e., snusodal seady sae. Complee response forced response naural response.

More information

( ) [ ] MAP Decision Rule

( ) [ ] MAP Decision Rule Announcemens Bayes Decson Theory wh Normal Dsrbuons HW0 due oday HW o be assgned soon Proec descrpon posed Bomercs CSE 90 Lecure 4 CSE90, Sprng 04 CSE90, Sprng 04 Key Probables 4 ω class label X feaure

More information

Volatility Interpolation

Volatility Interpolation Volaly Inerpolaon Prelmnary Verson March 00 Jesper Andreasen and Bran Huge Danse Mares, Copenhagen wan.daddy@danseban.com brno@danseban.com Elecronc copy avalable a: hp://ssrn.com/absrac=69497 Inro Local

More information

January Examinations 2012

January Examinations 2012 Page of 5 EC79 January Examnaons No. of Pages: 5 No. of Quesons: 8 Subjec ECONOMICS (POSTGRADUATE) Tle of Paper EC79 QUANTITATIVE METHODS FOR BUSINESS AND FINANCE Tme Allowed Two Hours ( hours) Insrucons

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 0 Canoncal Transformaons (Chaper 9) Wha We Dd Las Tme Hamlon s Prncple n he Hamlonan formalsm Dervaon was smple δi δ Addonal end-pon consrans pq H( q, p, ) d 0 δ q ( ) δq ( ) δ

More information

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β SARAJEVO JOURNAL OF MATHEMATICS Vol.3 (15) (2007), 137 143 SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β M. A. K. BAIG AND RAYEES AHMAD DAR Absrac. In hs paper, we propose

More information

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition EHEM ALPAYDI he MI Press, 04 Lecure Sldes for IRODUCIO O Machne Learnng 3rd Edon alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/ml3e Sldes from exboo resource page. Slghly eded and wh addonal examples

More information

Notes on the stability of dynamic systems and the use of Eigen Values.

Notes on the stability of dynamic systems and the use of Eigen Values. Noes on he sabl of dnamc ssems and he use of Egen Values. Source: Macro II course noes, Dr. Davd Bessler s Tme Seres course noes, zarads (999) Ineremporal Macroeconomcs chaper 4 & Techncal ppend, and Hamlon

More information

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer d Model Cvl and Surveyng Soware Dranage Analyss Module Deenon/Reenon Basns Owen Thornon BE (Mech), d Model Programmer owen.hornon@d.com 4 January 007 Revsed: 04 Aprl 007 9 February 008 (8Cp) Ths documen

More information

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION S19 A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION by Xaojun YANG a,b, Yugu YANG a*, Carlo CATTANI c, and Mngzheng ZHU b a Sae Key Laboraory for Geomechancs and Deep Underground Engneerng, Chna Unversy

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machne Learnng & Percepon Insrucor: Tony Jebara SVM Feaure & Kernel Selecon SVM Eensons Feaure Selecon (Flerng and Wrappng) SVM Feaure Selecon SVM Kernel Selecon SVM Eensons Classfcaon Feaure/Kernel

More information

Attributed Graph Matching Based Engineering Drawings Retrieval

Attributed Graph Matching Based Engineering Drawings Retrieval Arbued Graph Machng Based Engneerng Drawngs Rereval Rue Lu, Takayuk Baba, and Dak Masumoo Fusu Research and Developmen Cener Co LTD, Beng, PRChna Informaon Technology Meda Labs Fusu Laboraores LTD, Kawasak,

More information

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations Chaper 6: Ordnary Leas Squares Esmaon Procedure he Properes Chaper 6 Oulne Cln s Assgnmen: Assess he Effec of Sudyng on Quz Scores Revew o Regresson Model o Ordnary Leas Squares () Esmaon Procedure o he

More information

Advanced time-series analysis (University of Lund, Economic History Department)

Advanced time-series analysis (University of Lund, Economic History Department) Advanced me-seres analss (Unvers of Lund, Economc Hsor Dearmen) 3 Jan-3 Februar and 6-3 March Lecure 4 Economerc echnues for saonar seres : Unvarae sochasc models wh Box- Jenns mehodolog, smle forecasng

More information

Graduate Macroeconomics 2 Problem set 5. - Solutions

Graduate Macroeconomics 2 Problem set 5. - Solutions Graduae Macroeconomcs 2 Problem se. - Soluons Queson 1 To answer hs queson we need he frms frs order condons and he equaon ha deermnes he number of frms n equlbrum. The frms frs order condons are: F K

More information

Online Appendix for. Strategic safety stocks in supply chains with evolving forecasts

Online Appendix for. Strategic safety stocks in supply chains with evolving forecasts Onlne Appendx for Sraegc safey socs n supply chans wh evolvng forecass Tor Schoenmeyr Sephen C. Graves Opsolar, Inc. 332 Hunwood Avenue Hayward, CA 94544 A. P. Sloan School of Managemen Massachuses Insue

More information

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA Mchaela Chocholaá Unversy of Economcs Braslava, Slovaka Inroducon (1) one of he characersc feaures of sock reurns

More information

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data Anne Chao Ncholas J Goell C seh lzabeh L ander K Ma Rober K Colwell and Aaron M llson 03 Rarefacon and erapolaon wh ll numbers: a framewor for samplng and esmaon n speces dversy sudes cology Monographs

More information

Lecture VI Regression

Lecture VI Regression Lecure VI Regresson (Lnear Mehods for Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure VI: MLSC - Dr. Sehu Vjayakumar Lnear Regresson Model M

More information

Density Matrix Description of NMR BCMB/CHEM 8190

Density Matrix Description of NMR BCMB/CHEM 8190 Densy Marx Descrpon of NMR BCMBCHEM 89 Operaors n Marx Noaon Alernae approach o second order specra: ask abou x magnezaon nsead of energes and ranson probables. If we say wh one bass se, properes vary

More information

Survival Analysis and Reliability. A Note on the Mean Residual Life Function of a Parallel System

Survival Analysis and Reliability. A Note on the Mean Residual Life Function of a Parallel System Communcaons n Sascs Theory and Mehods, 34: 475 484, 2005 Copyrgh Taylor & Francs, Inc. ISSN: 0361-0926 prn/1532-415x onlne DOI: 10.1081/STA-200047430 Survval Analyss and Relably A Noe on he Mean Resdual

More information

Analysis And Evaluation of Econometric Time Series Models: Dynamic Transfer Function Approach

Analysis And Evaluation of Econometric Time Series Models: Dynamic Transfer Function Approach 1 Appeared n Proceedng of he 62 h Annual Sesson of he SLAAS (2006) pp 96. Analyss And Evaluaon of Economerc Tme Seres Models: Dynamc Transfer Funcon Approach T.M.J.A.COORAY Deparmen of Mahemacs Unversy

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deermnsc Algorhm for Summarzng Asynchronous Sreams over a Sldng ndow Cosas Busch Rensselaer Polyechnc Insue Srkana Trhapura Iowa Sae Unversy Oulne of Talk Inroducon Algorhm Analyss Tme C Daa sream: 3

More information

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys Dual Approxmae Dynamc Programmng for Large Scale Hydro Valleys Perre Carpener and Jean-Phlppe Chanceler 1 ENSTA ParsTech and ENPC ParsTech CMM Workshop, January 2016 1 Jon work wh J.-C. Alas, suppored

More information

Method of upper lower solutions for nonlinear system of fractional differential equations and applications

Method of upper lower solutions for nonlinear system of fractional differential equations and applications Malaya Journal of Maemak, Vol. 6, No. 3, 467-472, 218 hps://do.org/1.26637/mjm63/1 Mehod of upper lower soluons for nonlnear sysem of fraconal dfferenal equaons and applcaons D.B. Dhagude1 *, N.B. Jadhav2

More information