Probabilistic Structured Query Methods

Size: px
Start display at page:

Download "Probabilistic Structured Query Methods"

Transcription

1 Probablstc Structured Query Methods Kareem Darwsh and Douglas W. Oard 1 Insttute for Advanced Computer Studes Unversty of Maryland, College Park, MD {kareem,oard}@glue.umd.edu ABSTRACT Structured methods for query term replacement rely on separate estmates of term frequency and document frequency to compute the weght for each query term. Ths paper revews pror work on structured query technques and ntroduces three new varants that leverage estmates of replacement probabltes. Statstcally sgnfcant mprovements n retreval effectveness are demonstrated for cross-language retreval and for retreval based on optcal character recognton when replacement probabltes are used to estmate both term frequency and document frequency. KEYWORDS Structured queres, Cross-language nformaton retreval, Document mage retreval 1 INTRODUCTION There are many stuatons n whch t s desrable to match a query term wth dfferent terms n a document. Well known examples nclude stemmng (where any word that shares the same stem should be matched), thesaurus expanson (where terms wth smlar meanngs should be matched), and crosslanguage retreval (where terms wth smlar meanngs n dfferent languages should be matched). When the mappngs among matchng terms are known n advance, the usual approach s to conflate the alternatves durng ndexng. That s the typcal way n whch stemmng s mplemented, for example. Query-tme mplementatons are necessary when approprate matchng decsons depend on the nature of the query, as mght be the case wth systems that provde the searcher wth nteractve control over thesaurus expanson. In ths paper, presently known technques for query-tme replacement are revewed, new technques that leverage estmates of replacement probablty are ntroduced, and experment results that demonstrate mproved retreval effectveness n two applcatons (Cross-Language Informaton Retreval (CLIR) and retreval of scanned documents based on Optcal Character Recognton (OCR)) are presented. CLIR has receved more attenton than any other querytme replacement problem n recent years, and several effectve technques are now known. Query translaton research has developed along two broad drectons, typcally referred to as dctonary-based and corpus-based technques. Broadly speakng, corpus-based technques seek to optmze retreval effectveness through relance on observed translaton 1 College of Informaton Studes and Insttute for Advance Computer Studes. probabltes n algned corpora, whle dctonary-based technques are optmzed for the case where relable estmates of translaton probablty are not avalable. A key dea n the so-called vector-space approach to nformaton retreval s relance on two statstcs: (1) term frequency (TF), the number of occurrences of a term n a document, and (2) document frequency (DF), the number of documents n whch a term appears. TF s a measure of aboutness, whch has benefcal effects on both precson and recall. DF s a measure of specfcty, and ts prncpal effect s on precson. In general, hgh TF and low DF are preferred, wth the optmal combnaton of those factors typcally beng determned through expermentaton (c.f., [14]). Prkola appears to have been the frst to try separately estmatng TF and DF for query terms n a CLIR applcaton [13], usng the InQuery synonym operator to mplement what he called structured queres. InQuery s synonym operator was orgnally desgned to support monolngual thesaurus expanson, so t estmates TF and DF as follows [11]: TF j(q ) = TFj ( Dk ) (1) ( )} DF(Q ) k Q U k Q { d D d} = k (2) ( )} where Q s a query term, D k s a document term, TF j (Q ) s the term frequency of Q n document j, DF(Q ) s the number of documents that contan Q, d s a document, and T j (Q ) s the set of known replacements (n ths case, translatons) for the term D k. Essentally, these equatons treat any occurrence of a replacement as an occurrence of the query term. Ths represents a very cautous strategy n whch a hgh DF for any replacement wll result n a hgh DF (and thus a low weght) for new jont DF of that query term. Retreval results are then domnated by query terms that have no unsafe (very common) replacements. For example, the Arabc query term can ether mean on or the proper name Al. If Al appears n few documents but on appears n many, equaton (2) wll treat as f t were at least as common as on. When there s not a large dsparty n DF, equaton (1) mplements a knd of query expanson effect. For example, the Arabc word can be translated as bread or bake, and equaton (1) would (wth proper stemmng) reward an occurrence of bakng bread. Corpus-based approaches to CLIR have generally developed wthn a framework based on language modelng rather than vector space models, at least n part because modern statstcal translaton frameworks offer a natural way of ntegratng translaton and language models [18]. In general, language modelng approaches to retreval rely on collecton frequency (CF) n place of DF: 2 CF(Q ) = k C ( (3) TF ) k Q where C represents the collecton, and the other terms are as defned above. Whether DF s better than CF depends on how we model the searcher s task when the goal s to fnd entre documents, DF models the concept of selectvty wth hgher fdelty. 2 Hemstra s work s a notable excepton [6].

2 The next secton ntroduces a set of replacement strateges that leverage observed replacement probabltes (from corpora) whle retanng the vector space model s concept of DF. The effectveness and effcency (relatve to present baselnes) of ths strategy s then shown n subsequent sectons for two applcatons: CLIR, and retreval from scanned documents usng OCR. The paper then concludes wth some notes on the lmtatons of the technques presented here and opportuntes for future work on ths problem. 2 BEYOND PIRKOLA S METHOD was the frst to ntroduce a varant to Prkola s method, amng to reduce mplementaton complexty by replacng the unon operator wth a sum [8]: DF(Q ) = DF ( ) (4) D k Dk T ( Q )} An alternatve approach, not prevously explored, would be to use the maxmum document frequency of any replacement (): DF(Q ) = MAX [ DF ( Dk )] (5) Dk T ( Q )} All three varants (Prkola,, and ) lower bound the DF for a query term by the DF of ts most common replacement, and the experments reported n Sectons 3 and 4 below show no statstcally sgnfcant dfference between the three technques. All three technques treat every known replacement as equally lkely. Ths rsks a somewhat counterntutve result: ntroducton of a translaton dctonary wth mproved coverage of rare translatons could actually harm retreval effectveness. To see ths problem, consder a case a query term n whch 99.9% of ts nstances should be translated as some rare term (e.g., superfluous ), but n 0.1% of the cases a translaton that happens to be a common term (e.g., the ) would actually be approprate. In such cases, the common term leads to a hgh jont DF, effectvely dmnshng the value of the orgnal query term. Ths exact stuaton actually arses often wth dctonares bult from algned corpora usng statstcal methods, snce there s always some chance that any term mght observed to be used as a replacement for any other term. One way to resolve the problem s to use a weghted varant of s method: [ DFj ( Dk ) wt( Dk )] DF(Q ) = (6) ( k Q )} In general, any monotone functon of the replacement probablty could be used for wt(d k ). For the experments reported below, the weght s smply set to the best avalable estmate of the replacement probablty. Improbable translatons that are common terms can also cause problems n equaton (1), snce common terms are lkely to have hgher TF s as well. One way to lmt ths effect s to use a weghted sum n the TF computaton: TF j(q ) = [ TFj ( Dk ) wt( Dk )] (7) ( )} k Q Agan, for the experments reported below the replacement probablty estmate s used as the weght. Fnally, ether TF formula could be combned wth any way of computng DF. In the experments reported below, the followng combnatons were tred: Method TF Formula DF Formula Prkola (1) (2) (1) (4) (1) (5) (1) (6) (7) (4) /DF (7) (6) Another way of leveragng nformaton about replacement probabltes s to smply gnore the least lkely replacements. Such an approach potentally offers two potental nsghts. Frst, t can reveal the extent of the adverse effect of lowprobablty replacements on each technque. Second, t offers a prncpled way of tunng the degree of comprehensveness of the dctonary to optmze the retreval effectveness of each technque. Two teams (from the Unversty of Massachusetts [9] and the Unversty of Maryland [2]) tred varants of ths approach for TREC 2002 CLIR track. For the experments reported below, a greedy technque was used n whch replacements were retaned n order of decreasng probablty untl a preset threshold on the cumulatve probablty was frst exceeded. Ths approach guarantees that at least one replacement s retaned. Mean unnterpolated average precson s reported for every threshold value between 0.1 and 1.0, n ncrements of 0.1. The experments were run usng a modfed verson of (**removed for blnd revewng**), whch s a vector space retreval system that was developed locally usng Okap BM-25 weghts. Reported statstcal sgnfcance tests were performed usng a pared two-taled t-test and are reported as sgnfcant for values of p <. 3 CLIR The CLIR experments reported n ths secton were performed usng the TREC 2002 CLIR track collecton, whch contans 383,872 artcles from the Agence France Press (AFP) Arabc newswre, 50 topc descrptons wrtten n Englsh, and assocated relevance judgments [12]. Queres were formed automatcally usng all the words n the ttle feld of the topc descrpton, whch s desgned to be representatve of the style of queres typcally ssued n Web search applcatons. The documents were stemmed usng Al-Stem (a standard resource for the TREC CLIR track), dacrtcs were removed, and normalzaton was performed to convert the letters ya ( ) and alef maqsoura ( ) to ya ( ) and all the varants of alef ( ) and hamza ( ), namely alef ( ), alef hamza ( ), alef maad ( ), hamza ( ), waw hamza ( ), and ya hamza ( ), to alef ( ). The Englsh queres were stemmed before translaton usng the Porter stemmer for compatblty wth the translaton resources descrbed below. 3.1 Estmatng Replacement Probabltes Fve translaton resources of three types were combned for the applcaton. Combnng resources s useful, because (a) the coverage of the combned resources s typcally better than any of the ndvdual resources, and (b) combnng resources can serve to renforce good translatons. The resources were as follows: 1. Two blngual term lsts that were constructed usng two Web-based machne translaton systems (Tarjm and Al-

3 Msbar [16][17]). In each case, sets of solated unque Englsh words found n a 200 MB collecton of Los Angeles Tmes news stores [10] were submtted for translaton from Englsh nto Arabc. Each system returned at most one translaton for each submtted word. Together, the two term lsts covered about 15% of the unque Arabc stems n the TREC collecton (measured by usng Al-Stem on both the term lst and the collecton). 2. The Salmone Arabc-to-Englsh dctonary (from Tufts Unversty), from whch we extracted only the translatons. No translaton preference nformaton s ndcated n ths dctonary. The coverage of the resultng term lst, measured n the same way, was about 7% of the unque Arabc stems n the TREC collecton. 3. Two translaton probablty tables, one for Englsh-to- Arabc and one for Arabc-to-Englsh. These tables were constructed from tables provded by BBN, whch were n turn constructed from a large collecton of algned Englsh and Arabc Unted Natons documents usng the Gza++ mplementaton of IBM s model 1 statstcal machne translaton desgn. The coverage of the Arabcto-Englsh table, measured n the same way, was 29% of the unque Arabc stems n the TREC collecton. These translaton resources were combned n the followng manner: 1. All resources that were orgnally provded as Arabc-to- Englsh were nverted. For the translaton probablty table, the probabltes for each translaton par were retaned and then the nverted tables were renormalzed so that the values of the probabltes for each sourcelanguage term summed to one. Ths process lkely ntroduced some error, snce probabltes for rare events may not have been accurately estmated. 2. A unform dstrbuton was used to assgn probabltes to the translatons obtaned from machne translaton systems and the Salmone dctonary. Tarjm and Al- Msbar each returned at most one translaton for an Englsh word, but two Englsh words mght share a common translaton. When n alternatves were known from a sngle source, each was assgned a probablty of 1/n. 3. The resultng translaton probabltes were then combned by summng the probabltes for a gven Arabc translaton across the sources n whch t appeared and then dvdng by the number of sources n whch the Englsh term had appeared. For example, f Tarjm, Al- Msbar and Salmone contaned the Englsh term, wth Tarjm contanng some specfc translaton wth probablty 1.0, Al-Msbar lackng that translaton (.e., assgnng t a probablty of 0.0), and Salmone assgnng t a probablty of 0.5 (because two translatons were known), then the resultng combned probablty would be 1/3 + 0/ /3 = 0.5. The resultng translaton resource contaned what appeared to be reasonable estmates of translaton probabltes, and covered 36% of the unque Arabc stems n the TREC collecton. 3.2 Results Fgure 1 shows the mean unnterpolated average precson for each of the sx structured query methods for each threshold value and Table 1 shows the same results n tabular form. As a baselne, one-best query translaton (usng only the most lkely translaton) was also run. Ths wdely reported baselne seems approprate n ths case because any cumulatve probablty threshold wll result n use of at least the most probable translaton for each query term. s and Prkola s methods turned out to be essentally ndstngushable, wth method performng nearly as well (statstcally sgnfcantly worse only at threshold values of 0.2 and 0.3). The /DF method produced results that were statstcally sgnfcantly better than the one-best baselne for every threshold value except 0.1 and 1.0. Moreover, /DF was the only one of the probablstc technques that dd not exhbt a dramatc decrease n effectveness as the threshold ncreased. The best /DF result (at a threshold of 0.6) s statstcally ndstngushable from the best result of Prkola,, or (n each case, at a threshold of 0.4), but the reduced dependence on accurate tunng of the threshold makes /DF clearly the preferred method. Table 1: CLIR: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the one-best translaton baselne. Cumulatve Probablty Threshold CLIR.0 Baselne Prkola /DF Mean Average Precson Ttle Queres.0 Threshold Prkola /DF Baselne Fgure 1: CLIR: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. 4 OCR-BASED RETRIEVAL Prevous approaches to retreval of OCR-degraded text have focused prmarly on correctng OCR errors [7][15] or on fuzzy matchng technques that are less senstve than exact strng matchng to OCR errors [1][5]. Ths secton demonstrates the generalty of the query-tme replacement technques developed above, usng them to combne TF and DF evdence for a novel technque whch attempts to replace

4 each query term wth possble OCR-dstortons of the term and to estmate probablty of the replacements. The experments were conducted wth the Zad collecton, whch was obtaned from the Unversty of Maryland [3]. The collecton s comprsed of 2,730 documents extracted from Zad Al-Me ad, a prnted book for whch an accurately character coded electronc verson (the clean text ) s also avalable [3]. Three sets of OCR outputs for the same documents were avalable: prnt resoluton (300x300 dots per nch (dp)) as orgnally scanned, and down sampled versons at fne fax resoluton (200x200 dp) and standard fax resoluton (200x100 dp). The test collecton ncludes 25 wrtten topc descrptons and assocated relevance judgments. Characters normalzatons were performed as descrbed above, and character 3-grams (3g) or character 4-grams (4g) were ndexed. Darwsh and Oard found those ndex terms to be among the most effectve of OCR-based retreval of Arabc [3]. 4.1 Estmatng Replacement Probabltes Term replacement probabltes were estmated usng a poston-senstve ungram character dstorton model traned on 5,000 words of algned clean and dstorted texts from the collectons beng searched. The algnment was desgned to smulate manual error correcton of a small porton of the collecton. 3 Snce the appearance of Arabc characters vares by poston, the standard four character postons (begnnng, mddle, end, solated) were modeled. Formally, gven a clean word wth characters C 1..C..C n and the resultng word after OCR degradaton D 1..D j..d m, where D j resulted from C, ε s the null character, L s the poston of the letter n the word (begnnng, mddle, end, or solated), and # s the word boundary, the three edt operatons for the models would be: C D P substtuton (C > D j = P deleton (C > ε = C C C j ε ε D j P nserton (ε > D j = C If the count n the numerator was zero, the computaton would be repeated wthout condtonng on poston. If the count remaned zero, a value of zero was recorded. A separate model was traned for each resoluton. Two factors made automatc algnment of the OCR output to the clean text challengng. Frst, the prnted and clean text versons n the Zad collecton were obtaned from dfferent sources that exhbted mnor dfferences (mostly substtuton or deleton of partcles such as n, from, or, and then). Second, some areas n the scanned mages of the prnted page exhbted mage dstortons that resulted n relatvely long runs of OCR errors. The algnment was performed usng SCLITE from the Natonal Insttute of Standards and Technology (NIST). SCLITE employs a dynamc programmng strng algnment algorthm, whch attempts to mnmze the Levenshten 3 Smaller and larger tranng sets were tred, but no mprovement resulted from more than 5,000 words. dstance (edt dstance) between two strngs. Conceptually, the algorthm uses dentcal matches to anchor algnment, and then uses word poston wth respect to those anchors to estmate an optmal algnment on the remander of the words. SCLITE was orgnally developed for speech recognton applcatons, but n OCR applcatons addtonal characterlevel evdence s avalable. SCLITE algnments were therefore accepted only f the number of character edt operatons were less than or equal to 50% of the length of the shorter of the two matched words. To algn the words that were not algned by SCLITE the followng algorthm was used: 1. Usng the exstng algnments as anchors, gven an unalgned word at poston l from the precedng anchor n a clean document, sequentally compare t to the words, n the correspondng degraded document between the correspondng par of anchors wth poston l from the precedng anchor where l -l < When comparng two words, f the dfference between ther respectve word lengths was less than or equal to 2 characters and the number of edt operatons between the two words (usng Lenvenshten s edt dstance) was less than a certan percentage q of the word length of the shorter one (the percentage q was the number of edt operaton dvded by the length of the shorter word), then the newly algned words were used as anchors. Intally, q was set to 60%. 3. Steps 1 and 2 were terated two more tmes usng the new anchors wth q equal to 40% and 20% to attempt to fnd more algnments. Ths algnment technque works well for prnt resoluton, but t s a sgnfcant source of errors for hghly degraded cases (e.g., standard fax resoluton). Gven a par of algned words, they were algned at the character level by fndng the edt dstance between them usng the Levenshten edt dstance algorthm and then back tracng the algorthm to dentfy nsertons, deletons, and substtutons. The resultng model was then used to assgn a probablty to possble dstortons of each query term as follows: 1. For each character n a clean query term, generate all substtutons or deletons that have non-zero probablty (.e., were observed at least once n the tranng data). The unchanged character s generated at ths step as a substtuton. 2. For each possble nserton pont, generate all possble sngle nsertons. Possble nserton ponts are before the frst character, between any par of characters, and after the last character. A null nserton s generated at each pont to cover the remander of the probablty mass. 3. For each strng that could result from the power set of all possble substtutons or deletons and all possble nsertons, compute the probablty of generatng that strng as the product of the assocated nserton, substtuton, and deleton probabltes. A more effcent mplementaton would be desrable n an operatonal settng, but ths approach suffces for the experments reported below. 4.2 Results Fgure 2 shows the mean unnterpolated average precson at prnt resoluton for each of the sx structured query methods for each threshold value and Table 2 shows the same data n

5 Prnt - 3grams Threshold vs. Mean Avg. Precson Fne Fax - 3grams Threshold vs. Mean Avg. Precson Prkola Prkola /DF baselne /DF baselne Prnt - 4grams Threshold vs. Mean Avg. Precson Fne Fax - 4grams Threshold vs. Mean Avg. Precson Prkola Prkola /DF baselne /DF baselne Fgure 2: Prnt: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. tabular form. Fgure 3 and Table 3 present the correspondng results for fne fax resoluton. As a baselne, the same ndex terms (3g or 4g) were run wth the clean (undstorted) queres, snce any cumulatve probablty threshold results n a superset of that baselne case. No statstcally sgnfcant dfferences were observed at any resoluton or threshold value between the Prkola, and methods, whch tends to confrm the observaton made n the CLIR applcaton that the smpler mplementaton of s method results n no sgnfcant adverse effect on retreval effectveness. For prnt resoluton, every structured query technque acheved a statstcally sgnfcant mprovement over the baselne when used wth the better of the two ndexng terms (4g). Among these, /DF both acheved the greatest mprovement (9.7% relatve), and exhbted the greatest range of threshold values over whch the mprovement was statstcally sgnfcant (0.6 to 1.0). Therefore, as wth CLIR, /DF s clearly the preferred technque n ths applcaton. No statstcally sgnfcant mprovements over the baselne were observed for the fne fax resoluton or the standard fax resoluton (not shown). Ths may, however, reflect errors n the algnment of the tranng data rather than lmtatons n the replacement technques that was tred. The same general trends are observable n Fgure 3 as n Fgure 2, so the use of /DF s certanly not counterndcated for the fne fax condton. Fgure 3: Fne fax: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. Table 2: Prnt: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the clean query baselne. Cumulatve Probablty Threshold Prnt.0 Baselne Prkola g /DF Baselne Prkola g /DF

6 Table 3: Fne fax: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the clean query baselne. Cumulatve Probablty Threshold Fne Fax.0 Baselne Prkola g /DF Baselne Prkola g /DF CONCLUSION AND FUTURE WORK Ths paper has ntroduced a famly of methods for query term replacement that explot estmates of replacement probabltes whle also ncorporatng the vector space model s concept of document frequency. Both s method and were found to acheve retreval effectveness values smlar to that obtaned wth Prkola s structured query method, so s method seems to be a good bass from whch to buld probablstc structured query methods. Coverage of rare translatons was shown to be problematc for all three methods, however. Use of only the most lkely translatons was found to be an effectve and expedent, but only f an approprate threshold on cumulatve probablty s used. Of the three probablstc structured query methods ntroduced n ths paper, /DF was the clear wnner, showng both the best retreval effectveness and the least senstvty to the cumulatve probablty threshold. Fnally, the novel approach of producng possble replacements for query terms that could have been generated by OCR proved to be a useful technque for mprovng retreval of OCR-degraded text. There are a number of nterestng drectons for future work suggested by these results: 1. Improved weghtng technques. The use of raw probablty estmates as weghts n the /DF method seems ntutvely appealng, but t s possble that usng some functon of the probabltes (e.g., log p) may actually outperform raw probabltes. There are also opportuntes to explore better smoothng methods when estmatng the probabltes. 2. Other applcatons. The /DF method can be used n any applcaton where replacement probabltes can be relably estmated. Examples of potental applcaton areas are thesaurus expanson, speech-based retreval, statstcal approxmatons of morphology, and perhaps gene sequence matchng. 3. Structured document ndexng. Query processng and document processng exhbt a strong dualty, so t may be possble to leverage some of the technques developed here at ndexng tme rather than query tme for applcatons such as stemmng, translaton based ndexng [11], speech retreval and OCR-based retreval. Varants of query term replacement are mportant n several nformaton retreval applcatons, and access to relable estmates of replacement probabltes from corpus statstcs s becomng ncreasngly common. The technques descrbed n ths paper balance effectveness and effcency n ways that are lkely to prove mmedately useful, and they should addtonally serve as a sold bass for future research on ths mportant problem. ACKNOWLEDGMENTS ***Removed for blnd revewng*** REFERENCES [1] Baeza-Yates, R. and G. Navarro, A Faster Algorthm for Approxmate Strng Matchng. Proceedngs of Combnatoral Pattern Matchng (CPM'96), Sprnger-Verlag LNCS, v. 1075, pages 1-13, [2] Darwsh, K. and D. Oard, CLIR Experments at Maryland for TREC 2002: Evdence Combnaton for Arabc-Englsh Retreval, TREC [3] Darwsh, K. and D. Oard, Term Selecton for Searchng Prnted Arabc, SIGIR 2002, , [4] Gey, F. and D. Oard, The TREC-2001 Cross- Language Informaton Retreval Track: Searchng Arabc Usng Englsh, French or Arabc Queres, TREC 2001, [5] Hardng, S., W. Croft, and C. Wer, Probablstc Retreval of OCR Degraded Text Usng N-Grams. European Conference on Dgtal Lbrares, 1997 [6] Hemstra, D. Usng language models for nformaton retreval Ph.D. Thess Unversty of Twente, Enschede, [7] Hong, T., Degraded Text Recognton Usng Vsual and Lngustc Context. Ph.D. thess, Computer Scence Department, SUNY Buffalo, [8], K. L., Personal communcaton. [9] Larkey, L., J. Allen, M. E. Connell, A. Bolvar, and C. Wade, UMass at TREC 2002: Cross Language and Novelty Tracks, TREC [10] NIST, Text Research Collecton Volume 5, Aprl [11] Oard, D. W. and F. Ertunc Translaton-Based Indexng for Cross-Language Retreval, ECIR 2002: , [12] Oard, D. W. and F. Gey, The TREC-2002 Arabc/Englsh CLIR Track, TREC [13] Prkola, A. The Effects of Query Structure and Dctonary setups n DctonaryBased Cross-language Informaton Retreval, Proceedngs of the 21 st Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, pages 55-63, [14] Robertson, S. E., S. Walker, M. Hancock-Beauleu, A. Gull, and M. Lau, Okap at TREC-3, In the Fourth Text REtreval Conference (TREC-3), , 1996.

7 [15] Taghva, K., J. Borsack, and A. Condt, An Expert System for Automatcally Correctng OCR Output. Proceedngs of the SPIE - Document Recognton, pages , [16] tarjm.ajeeb.com, Sakhr Technologes, Caro, Egypt [17] ATA Software Technology Lmted, North Brentford Mddlesex, UK. [18] Xu, J., Weschedel, R., and Nguyen, C. Evaluatng a Probablstc Model for Cross-lngual Informaton Retreval. In Proceedngs of SIGIR, 2001, pages , 2001.

Probabilistic Structured Query Methods

Probabilistic Structured Query Methods Probablstc Structured Query Methods Kareem Darwsh Electrcal and Computer Engneerng Department and UMIACS Unversty of Maryland, College Park, MD 20742 {kareem,oard}@glue.umd.edu Douglas W. Oard College

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

MSU at ImageCLEF: Cross Language and Interactive Image Retrieval

MSU at ImageCLEF: Cross Language and Interactive Image Retrieval MSU at ImageCLEF: Cross Language and Interactve Image Retreval Vneet Bansal, Chen Zhang, Joyce Y. Cha, Rong Jn Department of Computer Scence and Engneerng, Mchgan State Unversty East Lansng, MI48824, U.S.A.

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus Smlar Sentence Retreval for Machne Translaton Based on Word-Algned Blngual Corpus Wen-Han Chao and Zhou-Jun L School of Computer Scence, Natonal Unversty of Defense Technology, Chna, 40073 cwhk@63.com

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Extending Relevance Model for Relevance Feedback

Extending Relevance Model for Relevance Feedback Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

GEMINI GEneric Multimedia INdexIng

GEMINI GEneric Multimedia INdexIng GEMINI GEnerc Multmeda INdexIng Last lecture, LSH http://www.mt.edu/~andon/lsh/ Is there another possble soluton? Do we need to perform ANN? 1 GEnerc Multmeda INdexIng dstance measure Sub-pattern Match

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Cathy Walker March 5, 2010

Cathy Walker March 5, 2010 Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn

More information

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department

More information

Turing Machines (intro)

Turing Machines (intro) CHAPTER 3 The Church-Turng Thess Contents Turng Machnes defntons, examples, Turng-recognzable and Turng-decdable languages Varants of Turng Machne Multtape Turng machnes, non-determnstc Turng Machnes,

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data

More information

CHAPTER IV RESEARCH FINDING AND ANALYSIS

CHAPTER IV RESEARCH FINDING AND ANALYSIS CHAPTER IV REEARCH FINDING AND ANALYI A. Descrpton of Research Fndngs To fnd out the dfference between the students who were taught by usng Mme Game and the students who were not taught by usng Mme Game

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through   ISSN Internatonal Journal of Mathematcal Archve-3(3), 2012, Page: 1136-1140 Avalable onlne through www.ma.nfo ISSN 2229 5046 ARITHMETIC OPERATIONS OF FOCAL ELEMENTS AND THEIR CORRESPONDING BASIC PROBABILITY

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Search sequence databases 2 10/25/2016

Search sequence databases 2 10/25/2016 Search sequence databases 2 10/25/2016 The BLAST algorthms Ø BLAST fnds local matches between two sequences, called hgh scorng segment pars (HSPs). Step 1: Break down the query sequence and the database

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Quantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms

Quantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms 2004, 66th EAGE Conference, Pars Quanttatve Dscrmnaton of Effectve Porosty Usng Dgtal Image Analyss - Implcatons for Porosty-Permeablty Transforms Gregor P. Eberl 1, Gregor T. Baechle 1, Ralf Weger 1,

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING N. Phanthuna 1,2, F. Cheevasuvt 2 and S. Chtwong 2 1 Department of Electrcal Engneerng, Faculty of Engneerng Rajamangala

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

CS 468 Lecture 16: Isometry Invariance and Spectral Techniques

CS 468 Lecture 16: Isometry Invariance and Spectral Techniques CS 468 Lecture 16: Isometry Invarance and Spectral Technques Justn Solomon Scrbe: Evan Gawlk Introducton. In geometry processng, t s often desrable to characterze the shape of an object n a manner that

More information

Message modification, neutral bits and boomerangs

Message modification, neutral bits and boomerangs Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used

More information

Introduction to Information Theory, Data Compression,

Introduction to Information Theory, Data Compression, Introducton to Informaton Theory, Data Compresson, Codng Mehd Ibm Brahm, Laura Mnkova Aprl 5, 208 Ths s the augmented transcrpt of a lecture gven by Luc Devroye on the 3th of March 208 for a Data Structures

More information

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable

More information

1 Derivation of Point-to-Plane Minimization

1 Derivation of Point-to-Plane Minimization 1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton

More information

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) ,

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) , A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS Dr. Derald E. Wentzen, Wesley College, (302) 736-2574, wentzde@wesley.edu ABSTRACT A lnear programmng model s developed and used to compare

More information

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification Desgn Project Specfcaton Medan Flter Department of Electrcal & Electronc Engneeng Imperal College London E4.20 Dgtal IC Desgn Medan Flter Project Specfcaton A medan flter s used to remove nose from a sampled

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach Extractng Pronuncaton-translated Names from Chnese Texts usng Bootstrappng Approach Jng Xao School of Computng, Natonal Unversty of Sngapore xaojng@comp.nus.edu.sg Jmn Lu School of Computng, Natonal Unversty

More information

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Simulation and Probability Distribution

Simulation and Probability Distribution CHAPTER Probablty, Statstcs, and Relablty for Engneers and Scentsts Second Edton PROBABILIT DISTRIBUTION FOR CONTINUOUS RANDOM VARIABLES A. J. Clark School of Engneerng Department of Cvl and Envronmental

More information

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,* Advances n Computer Scence Research (ACRS), volume 54 Internatonal Conference on Computer Networks and Communcaton Technology (CNCT206) Usng Immune Genetc Algorthm to Optmze BP Neural Network and Its Applcaton

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem H.K. Pathak et. al. / (IJCSE) Internatonal Journal on Computer Scence and Engneerng Speedng up Computaton of Scalar Multplcaton n Ellptc Curve Cryptosystem H. K. Pathak Manju Sangh S.o.S n Computer scence

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

On the Repeating Group Finding Problem

On the Repeating Group Finding Problem The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI ASTERISK ADDED ON LESSON PAGE 3-1 after the second sentence under Clncal Trals Effcacy versus Effectveness versus Effcency The apprasal of a new or exstng healthcare

More information