Probabilistic Structured Query Methods
|
|
- Estella Simpson
- 6 years ago
- Views:
Transcription
1 Probablstc Structured Query Methods Kareem Darwsh and Douglas W. Oard 1 Insttute for Advanced Computer Studes Unversty of Maryland, College Park, MD {kareem,oard}@glue.umd.edu ABSTRACT Structured methods for query term replacement rely on separate estmates of term frequency and document frequency to compute the weght for each query term. Ths paper revews pror work on structured query technques and ntroduces three new varants that leverage estmates of replacement probabltes. Statstcally sgnfcant mprovements n retreval effectveness are demonstrated for cross-language retreval and for retreval based on optcal character recognton when replacement probabltes are used to estmate both term frequency and document frequency. KEYWORDS Structured queres, Cross-language nformaton retreval, Document mage retreval 1 INTRODUCTION There are many stuatons n whch t s desrable to match a query term wth dfferent terms n a document. Well known examples nclude stemmng (where any word that shares the same stem should be matched), thesaurus expanson (where terms wth smlar meanngs should be matched), and crosslanguage retreval (where terms wth smlar meanngs n dfferent languages should be matched). When the mappngs among matchng terms are known n advance, the usual approach s to conflate the alternatves durng ndexng. That s the typcal way n whch stemmng s mplemented, for example. Query-tme mplementatons are necessary when approprate matchng decsons depend on the nature of the query, as mght be the case wth systems that provde the searcher wth nteractve control over thesaurus expanson. In ths paper, presently known technques for query-tme replacement are revewed, new technques that leverage estmates of replacement probablty are ntroduced, and experment results that demonstrate mproved retreval effectveness n two applcatons (Cross-Language Informaton Retreval (CLIR) and retreval of scanned documents based on Optcal Character Recognton (OCR)) are presented. CLIR has receved more attenton than any other querytme replacement problem n recent years, and several effectve technques are now known. Query translaton research has developed along two broad drectons, typcally referred to as dctonary-based and corpus-based technques. Broadly speakng, corpus-based technques seek to optmze retreval effectveness through relance on observed translaton 1 College of Informaton Studes and Insttute for Advance Computer Studes. probabltes n algned corpora, whle dctonary-based technques are optmzed for the case where relable estmates of translaton probablty are not avalable. A key dea n the so-called vector-space approach to nformaton retreval s relance on two statstcs: (1) term frequency (TF), the number of occurrences of a term n a document, and (2) document frequency (DF), the number of documents n whch a term appears. TF s a measure of aboutness, whch has benefcal effects on both precson and recall. DF s a measure of specfcty, and ts prncpal effect s on precson. In general, hgh TF and low DF are preferred, wth the optmal combnaton of those factors typcally beng determned through expermentaton (c.f., [14]). Prkola appears to have been the frst to try separately estmatng TF and DF for query terms n a CLIR applcaton [13], usng the InQuery synonym operator to mplement what he called structured queres. InQuery s synonym operator was orgnally desgned to support monolngual thesaurus expanson, so t estmates TF and DF as follows [11]: TF j(q ) = TFj ( Dk ) (1) ( )} DF(Q ) k Q U k Q { d D d} = k (2) ( )} where Q s a query term, D k s a document term, TF j (Q ) s the term frequency of Q n document j, DF(Q ) s the number of documents that contan Q, d s a document, and T j (Q ) s the set of known replacements (n ths case, translatons) for the term D k. Essentally, these equatons treat any occurrence of a replacement as an occurrence of the query term. Ths represents a very cautous strategy n whch a hgh DF for any replacement wll result n a hgh DF (and thus a low weght) for new jont DF of that query term. Retreval results are then domnated by query terms that have no unsafe (very common) replacements. For example, the Arabc query term can ether mean on or the proper name Al. If Al appears n few documents but on appears n many, equaton (2) wll treat as f t were at least as common as on. When there s not a large dsparty n DF, equaton (1) mplements a knd of query expanson effect. For example, the Arabc word can be translated as bread or bake, and equaton (1) would (wth proper stemmng) reward an occurrence of bakng bread. Corpus-based approaches to CLIR have generally developed wthn a framework based on language modelng rather than vector space models, at least n part because modern statstcal translaton frameworks offer a natural way of ntegratng translaton and language models [18]. In general, language modelng approaches to retreval rely on collecton frequency (CF) n place of DF: 2 CF(Q ) = k C ( (3) TF ) k Q where C represents the collecton, and the other terms are as defned above. Whether DF s better than CF depends on how we model the searcher s task when the goal s to fnd entre documents, DF models the concept of selectvty wth hgher fdelty. 2 Hemstra s work s a notable excepton [6].
2 The next secton ntroduces a set of replacement strateges that leverage observed replacement probabltes (from corpora) whle retanng the vector space model s concept of DF. The effectveness and effcency (relatve to present baselnes) of ths strategy s then shown n subsequent sectons for two applcatons: CLIR, and retreval from scanned documents usng OCR. The paper then concludes wth some notes on the lmtatons of the technques presented here and opportuntes for future work on ths problem. 2 BEYOND PIRKOLA S METHOD was the frst to ntroduce a varant to Prkola s method, amng to reduce mplementaton complexty by replacng the unon operator wth a sum [8]: DF(Q ) = DF ( ) (4) D k Dk T ( Q )} An alternatve approach, not prevously explored, would be to use the maxmum document frequency of any replacement (): DF(Q ) = MAX [ DF ( Dk )] (5) Dk T ( Q )} All three varants (Prkola,, and ) lower bound the DF for a query term by the DF of ts most common replacement, and the experments reported n Sectons 3 and 4 below show no statstcally sgnfcant dfference between the three technques. All three technques treat every known replacement as equally lkely. Ths rsks a somewhat counterntutve result: ntroducton of a translaton dctonary wth mproved coverage of rare translatons could actually harm retreval effectveness. To see ths problem, consder a case a query term n whch 99.9% of ts nstances should be translated as some rare term (e.g., superfluous ), but n 0.1% of the cases a translaton that happens to be a common term (e.g., the ) would actually be approprate. In such cases, the common term leads to a hgh jont DF, effectvely dmnshng the value of the orgnal query term. Ths exact stuaton actually arses often wth dctonares bult from algned corpora usng statstcal methods, snce there s always some chance that any term mght observed to be used as a replacement for any other term. One way to resolve the problem s to use a weghted varant of s method: [ DFj ( Dk ) wt( Dk )] DF(Q ) = (6) ( k Q )} In general, any monotone functon of the replacement probablty could be used for wt(d k ). For the experments reported below, the weght s smply set to the best avalable estmate of the replacement probablty. Improbable translatons that are common terms can also cause problems n equaton (1), snce common terms are lkely to have hgher TF s as well. One way to lmt ths effect s to use a weghted sum n the TF computaton: TF j(q ) = [ TFj ( Dk ) wt( Dk )] (7) ( )} k Q Agan, for the experments reported below the replacement probablty estmate s used as the weght. Fnally, ether TF formula could be combned wth any way of computng DF. In the experments reported below, the followng combnatons were tred: Method TF Formula DF Formula Prkola (1) (2) (1) (4) (1) (5) (1) (6) (7) (4) /DF (7) (6) Another way of leveragng nformaton about replacement probabltes s to smply gnore the least lkely replacements. Such an approach potentally offers two potental nsghts. Frst, t can reveal the extent of the adverse effect of lowprobablty replacements on each technque. Second, t offers a prncpled way of tunng the degree of comprehensveness of the dctonary to optmze the retreval effectveness of each technque. Two teams (from the Unversty of Massachusetts [9] and the Unversty of Maryland [2]) tred varants of ths approach for TREC 2002 CLIR track. For the experments reported below, a greedy technque was used n whch replacements were retaned n order of decreasng probablty untl a preset threshold on the cumulatve probablty was frst exceeded. Ths approach guarantees that at least one replacement s retaned. Mean unnterpolated average precson s reported for every threshold value between 0.1 and 1.0, n ncrements of 0.1. The experments were run usng a modfed verson of (**removed for blnd revewng**), whch s a vector space retreval system that was developed locally usng Okap BM-25 weghts. Reported statstcal sgnfcance tests were performed usng a pared two-taled t-test and are reported as sgnfcant for values of p <. 3 CLIR The CLIR experments reported n ths secton were performed usng the TREC 2002 CLIR track collecton, whch contans 383,872 artcles from the Agence France Press (AFP) Arabc newswre, 50 topc descrptons wrtten n Englsh, and assocated relevance judgments [12]. Queres were formed automatcally usng all the words n the ttle feld of the topc descrpton, whch s desgned to be representatve of the style of queres typcally ssued n Web search applcatons. The documents were stemmed usng Al-Stem (a standard resource for the TREC CLIR track), dacrtcs were removed, and normalzaton was performed to convert the letters ya ( ) and alef maqsoura ( ) to ya ( ) and all the varants of alef ( ) and hamza ( ), namely alef ( ), alef hamza ( ), alef maad ( ), hamza ( ), waw hamza ( ), and ya hamza ( ), to alef ( ). The Englsh queres were stemmed before translaton usng the Porter stemmer for compatblty wth the translaton resources descrbed below. 3.1 Estmatng Replacement Probabltes Fve translaton resources of three types were combned for the applcaton. Combnng resources s useful, because (a) the coverage of the combned resources s typcally better than any of the ndvdual resources, and (b) combnng resources can serve to renforce good translatons. The resources were as follows: 1. Two blngual term lsts that were constructed usng two Web-based machne translaton systems (Tarjm and Al-
3 Msbar [16][17]). In each case, sets of solated unque Englsh words found n a 200 MB collecton of Los Angeles Tmes news stores [10] were submtted for translaton from Englsh nto Arabc. Each system returned at most one translaton for each submtted word. Together, the two term lsts covered about 15% of the unque Arabc stems n the TREC collecton (measured by usng Al-Stem on both the term lst and the collecton). 2. The Salmone Arabc-to-Englsh dctonary (from Tufts Unversty), from whch we extracted only the translatons. No translaton preference nformaton s ndcated n ths dctonary. The coverage of the resultng term lst, measured n the same way, was about 7% of the unque Arabc stems n the TREC collecton. 3. Two translaton probablty tables, one for Englsh-to- Arabc and one for Arabc-to-Englsh. These tables were constructed from tables provded by BBN, whch were n turn constructed from a large collecton of algned Englsh and Arabc Unted Natons documents usng the Gza++ mplementaton of IBM s model 1 statstcal machne translaton desgn. The coverage of the Arabcto-Englsh table, measured n the same way, was 29% of the unque Arabc stems n the TREC collecton. These translaton resources were combned n the followng manner: 1. All resources that were orgnally provded as Arabc-to- Englsh were nverted. For the translaton probablty table, the probabltes for each translaton par were retaned and then the nverted tables were renormalzed so that the values of the probabltes for each sourcelanguage term summed to one. Ths process lkely ntroduced some error, snce probabltes for rare events may not have been accurately estmated. 2. A unform dstrbuton was used to assgn probabltes to the translatons obtaned from machne translaton systems and the Salmone dctonary. Tarjm and Al- Msbar each returned at most one translaton for an Englsh word, but two Englsh words mght share a common translaton. When n alternatves were known from a sngle source, each was assgned a probablty of 1/n. 3. The resultng translaton probabltes were then combned by summng the probabltes for a gven Arabc translaton across the sources n whch t appeared and then dvdng by the number of sources n whch the Englsh term had appeared. For example, f Tarjm, Al- Msbar and Salmone contaned the Englsh term, wth Tarjm contanng some specfc translaton wth probablty 1.0, Al-Msbar lackng that translaton (.e., assgnng t a probablty of 0.0), and Salmone assgnng t a probablty of 0.5 (because two translatons were known), then the resultng combned probablty would be 1/3 + 0/ /3 = 0.5. The resultng translaton resource contaned what appeared to be reasonable estmates of translaton probabltes, and covered 36% of the unque Arabc stems n the TREC collecton. 3.2 Results Fgure 1 shows the mean unnterpolated average precson for each of the sx structured query methods for each threshold value and Table 1 shows the same results n tabular form. As a baselne, one-best query translaton (usng only the most lkely translaton) was also run. Ths wdely reported baselne seems approprate n ths case because any cumulatve probablty threshold wll result n use of at least the most probable translaton for each query term. s and Prkola s methods turned out to be essentally ndstngushable, wth method performng nearly as well (statstcally sgnfcantly worse only at threshold values of 0.2 and 0.3). The /DF method produced results that were statstcally sgnfcantly better than the one-best baselne for every threshold value except 0.1 and 1.0. Moreover, /DF was the only one of the probablstc technques that dd not exhbt a dramatc decrease n effectveness as the threshold ncreased. The best /DF result (at a threshold of 0.6) s statstcally ndstngushable from the best result of Prkola,, or (n each case, at a threshold of 0.4), but the reduced dependence on accurate tunng of the threshold makes /DF clearly the preferred method. Table 1: CLIR: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the one-best translaton baselne. Cumulatve Probablty Threshold CLIR.0 Baselne Prkola /DF Mean Average Precson Ttle Queres.0 Threshold Prkola /DF Baselne Fgure 1: CLIR: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. 4 OCR-BASED RETRIEVAL Prevous approaches to retreval of OCR-degraded text have focused prmarly on correctng OCR errors [7][15] or on fuzzy matchng technques that are less senstve than exact strng matchng to OCR errors [1][5]. Ths secton demonstrates the generalty of the query-tme replacement technques developed above, usng them to combne TF and DF evdence for a novel technque whch attempts to replace
4 each query term wth possble OCR-dstortons of the term and to estmate probablty of the replacements. The experments were conducted wth the Zad collecton, whch was obtaned from the Unversty of Maryland [3]. The collecton s comprsed of 2,730 documents extracted from Zad Al-Me ad, a prnted book for whch an accurately character coded electronc verson (the clean text ) s also avalable [3]. Three sets of OCR outputs for the same documents were avalable: prnt resoluton (300x300 dots per nch (dp)) as orgnally scanned, and down sampled versons at fne fax resoluton (200x200 dp) and standard fax resoluton (200x100 dp). The test collecton ncludes 25 wrtten topc descrptons and assocated relevance judgments. Characters normalzatons were performed as descrbed above, and character 3-grams (3g) or character 4-grams (4g) were ndexed. Darwsh and Oard found those ndex terms to be among the most effectve of OCR-based retreval of Arabc [3]. 4.1 Estmatng Replacement Probabltes Term replacement probabltes were estmated usng a poston-senstve ungram character dstorton model traned on 5,000 words of algned clean and dstorted texts from the collectons beng searched. The algnment was desgned to smulate manual error correcton of a small porton of the collecton. 3 Snce the appearance of Arabc characters vares by poston, the standard four character postons (begnnng, mddle, end, solated) were modeled. Formally, gven a clean word wth characters C 1..C..C n and the resultng word after OCR degradaton D 1..D j..d m, where D j resulted from C, ε s the null character, L s the poston of the letter n the word (begnnng, mddle, end, or solated), and # s the word boundary, the three edt operatons for the models would be: C D P substtuton (C > D j = P deleton (C > ε = C C C j ε ε D j P nserton (ε > D j = C If the count n the numerator was zero, the computaton would be repeated wthout condtonng on poston. If the count remaned zero, a value of zero was recorded. A separate model was traned for each resoluton. Two factors made automatc algnment of the OCR output to the clean text challengng. Frst, the prnted and clean text versons n the Zad collecton were obtaned from dfferent sources that exhbted mnor dfferences (mostly substtuton or deleton of partcles such as n, from, or, and then). Second, some areas n the scanned mages of the prnted page exhbted mage dstortons that resulted n relatvely long runs of OCR errors. The algnment was performed usng SCLITE from the Natonal Insttute of Standards and Technology (NIST). SCLITE employs a dynamc programmng strng algnment algorthm, whch attempts to mnmze the Levenshten 3 Smaller and larger tranng sets were tred, but no mprovement resulted from more than 5,000 words. dstance (edt dstance) between two strngs. Conceptually, the algorthm uses dentcal matches to anchor algnment, and then uses word poston wth respect to those anchors to estmate an optmal algnment on the remander of the words. SCLITE was orgnally developed for speech recognton applcatons, but n OCR applcatons addtonal characterlevel evdence s avalable. SCLITE algnments were therefore accepted only f the number of character edt operatons were less than or equal to 50% of the length of the shorter of the two matched words. To algn the words that were not algned by SCLITE the followng algorthm was used: 1. Usng the exstng algnments as anchors, gven an unalgned word at poston l from the precedng anchor n a clean document, sequentally compare t to the words, n the correspondng degraded document between the correspondng par of anchors wth poston l from the precedng anchor where l -l < When comparng two words, f the dfference between ther respectve word lengths was less than or equal to 2 characters and the number of edt operatons between the two words (usng Lenvenshten s edt dstance) was less than a certan percentage q of the word length of the shorter one (the percentage q was the number of edt operaton dvded by the length of the shorter word), then the newly algned words were used as anchors. Intally, q was set to 60%. 3. Steps 1 and 2 were terated two more tmes usng the new anchors wth q equal to 40% and 20% to attempt to fnd more algnments. Ths algnment technque works well for prnt resoluton, but t s a sgnfcant source of errors for hghly degraded cases (e.g., standard fax resoluton). Gven a par of algned words, they were algned at the character level by fndng the edt dstance between them usng the Levenshten edt dstance algorthm and then back tracng the algorthm to dentfy nsertons, deletons, and substtutons. The resultng model was then used to assgn a probablty to possble dstortons of each query term as follows: 1. For each character n a clean query term, generate all substtutons or deletons that have non-zero probablty (.e., were observed at least once n the tranng data). The unchanged character s generated at ths step as a substtuton. 2. For each possble nserton pont, generate all possble sngle nsertons. Possble nserton ponts are before the frst character, between any par of characters, and after the last character. A null nserton s generated at each pont to cover the remander of the probablty mass. 3. For each strng that could result from the power set of all possble substtutons or deletons and all possble nsertons, compute the probablty of generatng that strng as the product of the assocated nserton, substtuton, and deleton probabltes. A more effcent mplementaton would be desrable n an operatonal settng, but ths approach suffces for the experments reported below. 4.2 Results Fgure 2 shows the mean unnterpolated average precson at prnt resoluton for each of the sx structured query methods for each threshold value and Table 2 shows the same data n
5 Prnt - 3grams Threshold vs. Mean Avg. Precson Fne Fax - 3grams Threshold vs. Mean Avg. Precson Prkola Prkola /DF baselne /DF baselne Prnt - 4grams Threshold vs. Mean Avg. Precson Fne Fax - 4grams Threshold vs. Mean Avg. Precson Prkola Prkola /DF baselne /DF baselne Fgure 2: Prnt: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. tabular form. Fgure 3 and Table 3 present the correspondng results for fne fax resoluton. As a baselne, the same ndex terms (3g or 4g) were run wth the clean (undstorted) queres, snce any cumulatve probablty threshold results n a superset of that baselne case. No statstcally sgnfcant dfferences were observed at any resoluton or threshold value between the Prkola, and methods, whch tends to confrm the observaton made n the CLIR applcaton that the smpler mplementaton of s method results n no sgnfcant adverse effect on retreval effectveness. For prnt resoluton, every structured query technque acheved a statstcally sgnfcant mprovement over the baselne when used wth the better of the two ndexng terms (4g). Among these, /DF both acheved the greatest mprovement (9.7% relatve), and exhbted the greatest range of threshold values over whch the mprovement was statstcally sgnfcant (0.6 to 1.0). Therefore, as wth CLIR, /DF s clearly the preferred technque n ths applcaton. No statstcally sgnfcant mprovements over the baselne were observed for the fne fax resoluton or the standard fax resoluton (not shown). Ths may, however, reflect errors n the algnment of the tranng data rather than lmtatons n the replacement technques that was tred. The same general trends are observable n Fgure 3 as n Fgure 2, so the use of /DF s certanly not counterndcated for the fne fax condton. Fgure 3: Fne fax: Dependence of retreval effectveness on cumulatve probablty threshold, ttle queres. Table 2: Prnt: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the clean query baselne. Cumulatve Probablty Threshold Prnt.0 Baselne Prkola g /DF Baselne Prkola g /DF
6 Table 3: Fne fax: Mean average precson, ttle queres. Black (gray) cells represent statstcally better (worse) results, compared to the clean query baselne. Cumulatve Probablty Threshold Fne Fax.0 Baselne Prkola g /DF Baselne Prkola g /DF CONCLUSION AND FUTURE WORK Ths paper has ntroduced a famly of methods for query term replacement that explot estmates of replacement probabltes whle also ncorporatng the vector space model s concept of document frequency. Both s method and were found to acheve retreval effectveness values smlar to that obtaned wth Prkola s structured query method, so s method seems to be a good bass from whch to buld probablstc structured query methods. Coverage of rare translatons was shown to be problematc for all three methods, however. Use of only the most lkely translatons was found to be an effectve and expedent, but only f an approprate threshold on cumulatve probablty s used. Of the three probablstc structured query methods ntroduced n ths paper, /DF was the clear wnner, showng both the best retreval effectveness and the least senstvty to the cumulatve probablty threshold. Fnally, the novel approach of producng possble replacements for query terms that could have been generated by OCR proved to be a useful technque for mprovng retreval of OCR-degraded text. There are a number of nterestng drectons for future work suggested by these results: 1. Improved weghtng technques. The use of raw probablty estmates as weghts n the /DF method seems ntutvely appealng, but t s possble that usng some functon of the probabltes (e.g., log p) may actually outperform raw probabltes. There are also opportuntes to explore better smoothng methods when estmatng the probabltes. 2. Other applcatons. The /DF method can be used n any applcaton where replacement probabltes can be relably estmated. Examples of potental applcaton areas are thesaurus expanson, speech-based retreval, statstcal approxmatons of morphology, and perhaps gene sequence matchng. 3. Structured document ndexng. Query processng and document processng exhbt a strong dualty, so t may be possble to leverage some of the technques developed here at ndexng tme rather than query tme for applcatons such as stemmng, translaton based ndexng [11], speech retreval and OCR-based retreval. Varants of query term replacement are mportant n several nformaton retreval applcatons, and access to relable estmates of replacement probabltes from corpus statstcs s becomng ncreasngly common. The technques descrbed n ths paper balance effectveness and effcency n ways that are lkely to prove mmedately useful, and they should addtonally serve as a sold bass for future research on ths mportant problem. ACKNOWLEDGMENTS ***Removed for blnd revewng*** REFERENCES [1] Baeza-Yates, R. and G. Navarro, A Faster Algorthm for Approxmate Strng Matchng. Proceedngs of Combnatoral Pattern Matchng (CPM'96), Sprnger-Verlag LNCS, v. 1075, pages 1-13, [2] Darwsh, K. and D. Oard, CLIR Experments at Maryland for TREC 2002: Evdence Combnaton for Arabc-Englsh Retreval, TREC [3] Darwsh, K. and D. Oard, Term Selecton for Searchng Prnted Arabc, SIGIR 2002, , [4] Gey, F. and D. Oard, The TREC-2001 Cross- Language Informaton Retreval Track: Searchng Arabc Usng Englsh, French or Arabc Queres, TREC 2001, [5] Hardng, S., W. Croft, and C. Wer, Probablstc Retreval of OCR Degraded Text Usng N-Grams. European Conference on Dgtal Lbrares, 1997 [6] Hemstra, D. Usng language models for nformaton retreval Ph.D. Thess Unversty of Twente, Enschede, [7] Hong, T., Degraded Text Recognton Usng Vsual and Lngustc Context. Ph.D. thess, Computer Scence Department, SUNY Buffalo, [8], K. L., Personal communcaton. [9] Larkey, L., J. Allen, M. E. Connell, A. Bolvar, and C. Wade, UMass at TREC 2002: Cross Language and Novelty Tracks, TREC [10] NIST, Text Research Collecton Volume 5, Aprl [11] Oard, D. W. and F. Ertunc Translaton-Based Indexng for Cross-Language Retreval, ECIR 2002: , [12] Oard, D. W. and F. Gey, The TREC-2002 Arabc/Englsh CLIR Track, TREC [13] Prkola, A. The Effects of Query Structure and Dctonary setups n DctonaryBased Cross-language Informaton Retreval, Proceedngs of the 21 st Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, pages 55-63, [14] Robertson, S. E., S. Walker, M. Hancock-Beauleu, A. Gull, and M. Lau, Okap at TREC-3, In the Fourth Text REtreval Conference (TREC-3), , 1996.
7 [15] Taghva, K., J. Borsack, and A. Condt, An Expert System for Automatcally Correctng OCR Output. Proceedngs of the SPIE - Document Recognton, pages , [16] tarjm.ajeeb.com, Sakhr Technologes, Caro, Egypt [17] ATA Software Technology Lmted, North Brentford Mddlesex, UK. [18] Xu, J., Weschedel, R., and Nguyen, C. Evaluatng a Probablstc Model for Cross-lngual Informaton Retreval. In Proceedngs of SIGIR, 2001, pages , 2001.
Probabilistic Structured Query Methods
Probablstc Structured Query Methods Kareem Darwsh Electrcal and Computer Engneerng Department and UMIACS Unversty of Maryland, College Park, MD 20742 {kareem,oard}@glue.umd.edu Douglas W. Oard College
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More informationMSU at ImageCLEF: Cross Language and Interactive Image Retrieval
MSU at ImageCLEF: Cross Language and Interactve Image Retreval Vneet Bansal, Chen Zhang, Joyce Y. Cha, Rong Jn Department of Computer Scence and Engneerng, Mchgan State Unversty East Lansng, MI48824, U.S.A.
More informationProbabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology
Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)
More informationQuestion Classification Using Language Modeling
Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a
More informationSimilar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus
Smlar Sentence Retreval for Machne Translaton Based on Word-Algned Blngual Corpus Wen-Han Chao and Zhou-Jun L School of Computer Scence, Natonal Unversty of Defense Technology, Chna, 40073 cwhk@63.com
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationExtending Relevance Model for Relevance Feedback
Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu
More informationSplit alignment. Martin C. Frith April 13, 2012
Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationOn the correction of the h-index for career length
1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationGEMINI GEneric Multimedia INdexIng
GEMINI GEnerc Multmeda INdexIng Last lecture, LSH http://www.mt.edu/~andon/lsh/ Is there another possble soluton? Do we need to perform ANN? 1 GEnerc Multmeda INdexIng dstance measure Sub-pattern Match
More informationTemperature. Chapter Heat Engine
Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationDepartment of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution
Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable
More informationCathy Walker March 5, 2010
Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn
More informationWorkshop: Approximating energies and wave functions Quantum aspects of physical chemistry
Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department
More informationTuring Machines (intro)
CHAPTER 3 The Church-Turng Thess Contents Turng Machnes defntons, examples, Turng-recognzable and Turng-decdable languages Varants of Turng Machne Multtape Turng machnes, non-determnstc Turng Machnes,
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationCHAPTER IV RESEARCH FINDING AND DISCUSSIONS
CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data
More informationCHAPTER IV RESEARCH FINDING AND ANALYSIS
CHAPTER IV REEARCH FINDING AND ANALYI A. Descrpton of Research Fndngs To fnd out the dfference between the students who were taught by usng Mme Game and the students who were not taught by usng Mme Game
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationInternational Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN
Internatonal Journal of Mathematcal Archve-3(3), 2012, Page: 1136-1140 Avalable onlne through www.ma.nfo ISSN 2229 5046 ARITHMETIC OPERATIONS OF FOCAL ELEMENTS AND THEIR CORRESPONDING BASIC PROBABILITY
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationRegularized Discriminant Analysis for Face Recognition
1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More informationUncertainty in measurements of power and energy on power networks
Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationSearch sequence databases 2 10/25/2016
Search sequence databases 2 10/25/2016 The BLAST algorthms Ø BLAST fnds local matches between two sequences, called hgh scorng segment pars (HSPs). Step 1: Break down the query sequence and the database
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationQuantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms
2004, 66th EAGE Conference, Pars Quanttatve Dscrmnaton of Effectve Porosty Usng Dgtal Image Analyss - Implcatons for Porosty-Permeablty Transforms Gregor P. Eberl 1, Gregor T. Baechle 1, Ralf Weger 1,
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationCONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION
CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING N. Phanthuna 1,2, F. Cheevasuvt 2 and S. Chtwong 2 1 Department of Electrcal Engneerng, Faculty of Engneerng Rajamangala
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationCredit Card Pricing and Impact of Adverse Selection
Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationCS 468 Lecture 16: Isometry Invariance and Spectral Techniques
CS 468 Lecture 16: Isometry Invarance and Spectral Technques Justn Solomon Scrbe: Evan Gawlk Introducton. In geometry processng, t s often desrable to characterze the shape of an object n a manner that
More informationMessage modification, neutral bits and boomerangs
Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental
More informationChapter 5 Multilevel Models
Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level
More informationOutline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]
DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used
More informationIntroduction to Information Theory, Data Compression,
Introducton to Informaton Theory, Data Compresson, Codng Mehd Ibm Brahm, Laura Mnkova Aprl 5, 208 Ths s the augmented transcrpt of a lecture gven by Luc Devroye on the 3th of March 208 for a Data Structures
More informationUsing the estimated penetrances to determine the range of the underlying genetic model in casecontrol
Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable
More information1 Derivation of Point-to-Plane Minimization
1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton
More informationDETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH
Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationPop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing
Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationA LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) ,
A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS Dr. Derald E. Wentzen, Wesley College, (302) 736-2574, wentzde@wesley.edu ABSTRACT A lnear programmng model s developed and used to compare
More informationDepartment of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification
Desgn Project Specfcaton Medan Flter Department of Electrcal & Electronc Engneeng Imperal College London E4.20 Dgtal IC Desgn Medan Flter Project Specfcaton A medan flter s used to remove nose from a sampled
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationExtracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach
Extractng Pronuncaton-translated Names from Chnese Texts usng Bootstrappng Approach Jng Xao School of Computng, Natonal Unversty of Sngapore xaojng@comp.nus.edu.sg Jmn Lu School of Computng, Natonal Unversty
More informationComputational Biology Lecture 8: Substitution matrices Saad Mneimneh
Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally
More informationUncertainty as the Overlap of Alternate Conditional Distributions
Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationAnnexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances
ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationSemi-supervised Classification with Active Query Selection
Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples
More informationSimulation and Probability Distribution
CHAPTER Probablty, Statstcs, and Relablty for Engneers and Scentsts Second Edton PROBABILIT DISTRIBUTION FOR CONTINUOUS RANDOM VARIABLES A. J. Clark School of Engneerng Department of Cvl and Envronmental
More informationUsing Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*
Advances n Computer Scence Research (ACRS), volume 54 Internatonal Conference on Computer Networks and Communcaton Technology (CNCT206) Usng Immune Genetc Algorthm to Optmze BP Neural Network and Its Applcaton
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationSpeeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem
H.K. Pathak et. al. / (IJCSE) Internatonal Journal on Computer Scence and Engneerng Speedng up Computaton of Scalar Multplcaton n Ellptc Curve Cryptosystem H. K. Pathak Manju Sangh S.o.S n Computer scence
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationOn the Repeating Group Finding Problem
The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationThis column is a continuation of our previous column
Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard
More informationNEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI
NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI ASTERISK ADDED ON LESSON PAGE 3-1 after the second sentence under Clncal Trals Effcacy versus Effectveness versus Effcency The apprasal of a new or exstng healthcare
More information