On the Effectiveness of Relevance Profiling

Size: px
Start display at page:

Download "On the Effectiveness of Relevance Profiling"

Transcription

1 On the Effectveness of Relevance Proflng Davd J Harper Smart Web Technologes Centre The Robert Gordon Unversty Aberdeen AB25 1HG UK d.harper@rgu.ac.uk Davd Lee Smart Web Technologes Centre The Robert Gordon Unversty Aberdeen AB25 1HG UK davd.lee@smartweb.rgu.ac.uk Abstract Relevance proflng s a general process for wthndocument retreval. Gven a query, a profle of retreval status values s computed by sldng a fxed szed wndow across a document. In ths paper, we report a seres of bench experments on relevance proflng, usng an exstng electronc book, and ts assocated book ndex. The book ndex s the source of queres and relevance judgements for the experments. Three weghtng functons based on a language modellng approach are nvestgated, and we demonstrate that the well-known query generaton model outperforms one based on the Kullback-Lebler dvergence, and one based on smple term frequency. The relevance proflng process proved hghly effectve n retrevng relevant pages wthn the electronc book, and exhbts stable performance over a range of sldng wndow szes. The expermental study provdes evdence for the effectveness of relevance proflng for wthn-document retreval, wth the caveat that the experment was conducted wth a partcular electronc book. Keywords relevance proflng; wthn-document retreval; language modellng; nformaton retreval expermentaton. 1 Background Proceedngs of the 9th Australasan Document Computng Symposum, Melbourne, Australa, December 13, Copyrght for ths artcle remans wth the authors. Increasngly, long electronc documents are beng publshed and delvered usng web and other technologes, and users need to fnd nformaton wthn these long documents. Varous approaches have been proposed for wthn-document retreval, ncludng passage retreval [1], and user nterfaces supportng content-based browsng of documents [2]. We have developed a tool, ProfleSkm, for wthn-document retreval based on the concept of relevance proflng [3], and we reported on a comprehensve user-centred evaluaton of ths tool [4] [5]. ProfleSkm ntegrates passage retreval and content-based document browsng. The key concept underpnnng the tool s relevance proflng, n whch a profle of retreval status values s computed across a document n response to a query. Wthn the user nterface, an nteractve bar graph provdes an overvew of ths profle, and through nteracton wth the graph the user can select and browse n stu potentally relevant passages wthn the document. The reader s referred to [3] for a descrpton of, and detaled ratonale for, relevance proflng and the ProfleSkm user nterface. In the user evaluaton study [4] [5], we compared the performance of ProfleSkm aganst a tool based on the sequental nd command delvered wth most browsers and word processng packages. We concluded that: Relevance proflng, as mplemented and presented by the ProfleSkm, was effectve n enablng users to dentfy relevant passages of a document; Usng ProfleSkm, users were able to select and browse relevant passages more effcently, because only the best matchng passages needed to be explored; and Users found the tool satsfyng to use for wthndocument retreval, because of the overvew provded by the relevance profle. That study was based on a smulated work task, namely recreatng part of the (back of book) ndex for an electronc book. On the bass of the results, we tentatvely concluded that relevance proflng n combnaton wth a set of pre-defned ndex entres, mght prove to be an effectve replacement for the typcal ntellectually derved book ndex. In ths paper, we wsh to nvestgate the underlyng relevance proflng mechansm, through controlled bench experments. Specfcally, we want to nvestgate dfferent forms of the weghtng functon used for relevance proflng, and dfferent settngs for parameters of the process. Addtonally, we wanted to nvestgate whether relevance proflng could provde an alternatve to the ntellectually generated book ndex. Consequently, we based our experments on an electronc book, wth an exstng book ndex, that formed the bass of the expermental corpus.

2 2 Relevance Proflng and Weghtng unctons Relevance proflng s a process whereby a profle of retreval status values s computed across a document n response to a query. Ths profle s presented to the user n the form of a bar graph, and by nteractng wth ths bar graph the user can dentfy, and navgate to, relevant sectons of a document. In ths paper each secton, and hence bar n the graph, corresponds to a page of the document (electronc book). The process s sketched n on object-orented pseudo-code n gure 1. Effectvely, a retreval status value (RSV) s computed for each word poston n the document. The relevance proflng procedure operates by sldng a wndow of fxed sze across the document (lnes 7-18, gure 1), and at each word poston a wndow weghtng functon s appled (lne 12). The weghtng functon s a functon of the set of words n the document, query and the wndow, startng at the gven poston and of the gven sze. The wndow weghts are aggregated for each page of the document, and the maxmum weght acheved by a wndow startng n the page s returned (lnes 13-16). Note, that a flter s appled so that only pages contanng at least one term from the query are retreved,.e. assgned a score (lne 10). 1 process relevanceproflng ( Document, Query, weghtnguncton, WndowSze, scoresarray ) 2 begn 3 for Page = 1 to Document.countPages() 4 begn 5 scoresarray[ Page ]= mnmumwndowscore 6 end 7 for wndowposton = 1 to Document.length() 8 begn 9 PageInDoc= Document.pageOf( wndowposton ) 10 f (Document.contansTermOf (Query, PageInDoc) 11 begn 12 wndowweght = weghtnguncton ( Document, Query, wndowposton, WndowSze ) 13 f (wndowweght > scoresarray [ PageInDoc ]) 14 begn 15 scoresarray [ PageInDoc ]= wndowweght 16 end 17 end 18 end 19 end gure 1: Pseudo-code descrbng the relevance proflng process The man purpose of the expermental study s to nvestgate the behavour of the relevance proflng process, and specfcally the effect on retreval effectveness of the choce of wndow sze and weghtng functon. Before descrbng the expermental study, we ntroduce the varous weghtng functons that are nvestgated. The weghtng functons are based on a language modellng approach n whch we model a document, as a probablty dstrbuton over the terms of the vocabulary, and smlarly for each (sldng) wndow. We defne the document model to be the probablty dstrbuton P(t D), where for every term t n the vocabulary (n our case, of the document), the model gves the probablty that we would observe term t f we randomly sampled from the document D. Smlarly, we defne the wndow model, P(t W) for wndow W. We wll defne three weghtng functons (lne 12, gure 1), whch we refer to as the query generaton model (GEN), the Kullback-Lebler model (KL), and a smple term frequency model (REQ). 2.1 Query Generaton Model (GEN) We adapt the language modellng approach proposed for document retreval n [6] [7] for relevance proflng. Language modellng s used to construct a statstcal model for a text wndow, and based on ths model; we compute the wndow RSV as the probablty of generatng a query (denoted Q). We model the dstrbuton of terms (actually stemmed words) over a text wndow, as a mxture of the text wndow and document term dstrbutons as follows: Ρ( Q W ) = p ( t W ) (1) mx t Q where: pmx( t W ) = λ * p( t W ) + ( 1- λ) * p( t D) The estmates are smoothed by the document word statstcs usng the mxng parameter, λ a. The ndvdual word probabltes are estmated n the obvous way usng maxmum lkelhood estmators: p = ( t W) nw nw p ( t D) = nd nd (2) where n W (n D ), and n W (n D ), are the number of term occurrences of term n the wndow (document), and total term occurrences n the wndow (document) respectvely. Equvalently, we can rank wndows by takng the log of both sdes of (1), yeldng: log Ρ( Q W ) = log pmx ( t wn) (3) t Q a We have nvestgated the best value for the mxng parameter emprcally, and smlar results were obtaned n the range 0.8 through In the experments reported, we use 0.8.

3 2.2 Kullback-Lebler Model (KL) The Kullback-Lebler dvergence has been used extensvely n applcatons of language modellng n nformaton retreval [8]. In ths approach, we compute the Kullback-Lebler dvergence between the wndow model and the document model, and we restrct the computaton to the query terms, as follows: ( W D) = p( W ) log( p( W ) p( D ) KL t t t ) t Q (4) We use the strength of the dvergence as a measure of wndow relevance wth respect to a query. Intutvely, the larger the dvergence of the wndow model from the (common) document model, the more lkely the wndow s relevant to the query. Unlke the GEN model, the KL term weghts have both a representaton component (outsde the log), and a dscrmnatve component (the log term). We are unable to use maxmum lkelhood estmates for the component probabltes, as we must avod estmatng zero for p(t W). We use the estmaton rule attrbuted to Jefferys [9, p. 127] to estmate the parameters as follows: ( n W + 0.5) ( n W 1.0) ( n + 0.5) ( n 1.0) p ( t W ) = + p ( t D) = D D Term requency Weghtng Model (REQ) (5) In ths approach, we smply weght the wndow by summng the total occurrences of all query terms wthn the wndow. Ths approach can be gven a language modellng nterpretaton, whch wll be useful n our later analyss. Usng the query generaton approach, we compute the wndow RSV as the probablty of generatng any term of the query as follows: Ρ( any Q term W ) = p( t W t Q ) (6) Ths s equvalent to summng all the term occurrences n a wndow over the query terms. 3 Expermental Study 3.1 Research Questons In ths study we nvestgate three man research questons, namely: Is there an optmal wndow sze for the sldng wndow, and s ths optmal sze dependent on the partcular weghtng functon used? What s the relatve effectveness of the three weghtng functons we have descrbed? Could relevance proflng be used as an ad n generatng a book ndex, through ntellectual effort by a human, or ndeed as a complete replacement for the book ndex? In respect of wndow sze, we conjecture that smaller wndow szes wll tend to be precsonenhancng as they are less lkely to dscover unrelated term occurrences, and that large wndow szes wll tend to be recall enhancng. We nvestgate a range of wndow szes from a few sentences n length (50 words) up to 2-3 paragraphs or half a page (250 words). or the weghtng functons, we conjecture that the smple term frequency weghtng (REQ) wll perform worst, gven that t explots less of the avalable nformaton than the other functons. Both the query generaton approach and the Kullback-Lebler approach have proved effectve n conventonal document retreval. However, n ths applcaton, we beleve that comparatvely small sze of sample text (namely the samples of wndow text) s lkely to be a major lmtng factor, and specfcally n estmatng probabltes. The smpler query generaton model may prove more effectve due to the smoothng provded by the mxture approach. 3.2 Corpus We wsh to nvestgate relevance proflng for wthndocument retreval, and especally for long documents. And, n order to measure relatve effectveness, we need to establsh a so-called ground truth for the experments. We used van Rjsbergen s classc textbook, Informaton Retreval [9], whch s arguably long (191 pages, words). It s avalable n an electronc format, and t possesses a comprehensve book ndex. We had used ths corpus successfully n prevous end-user experments [4] [5]. Here, we use the corpus n a smlar way to a conventonal test collecton: the pages are the unts of retreval, the queres are provded by the book ndex entres, and the relevance assessments are the relevant pages assocated wth each ndex entry. We note that the book ndex was created by the book s author, and t s possble that the ndexng s not exhaustve. Consequently, the absolute performance of the technques we nvestgate s lkely to be hgher than we report. However, the man purpose of the experments s to compare technques, and n ths respect all are evaluated aganst the same ground truth. Table 1 summarses the man characterstc of the corpus. In the man, we conducted the experments usng the set of mult-term queres as all the weghtng func-

4 tons are monotonc wth respect to sngle term queres. We note that a hgh proporton of the mult-word queres are n fact phrasal queres rather than sets of words. Number of pages: 191 Number of words: Index Entry Type No. of Entres b (queres) Total relevant pages sngle word mult-word Average #rels/entry Table 1: Detals of corpus: Informaton Retreval textbook by van Rjsbergen [9] 3.3 Experment Desgn The experments were conducted as follows. Each query s used to generate a relevance profle for the document. Then, we smply rank the pages accordng to the retreval status value, and evaluate the rankng based on the relevance judgements provded by the book ndex. In fact, ths s not an deal way of evaluatng relevance proflng, whch s desgned to dentfy relevant groups (or clumps) of pages, and not smply ndvdual pages. It would have been better to assess the effectveness n dentfyng the same groups of pages dentfed n the book ndex. Gven the avalablty of measures and tools for evaluatng ranked output, we opted to do page rankng. 3.4 Measures We used a range of sngle-valued measures averaged over the set of queres to assess effectveness, namely: non-nterpolated precson, R-precson (R-prec.), and three varatons of the -measure. R-precson s the precson acheved at rank R, where R s the number of relevant documents for a gven query. Ths measure combnes elements of both precson, proporton of retreved R documents that are relevant, and recall, proporton of R relevant documents retreved. The - measure s smply the complement to the E-measure [9], and s gven by: ( αr + ( α ) P) = R P 1 (7) where R c s recall, P s precson, and α enables one to bas the measure n favour of precson (α -> 1) or recall (α ->0). We use three varants wth α set to 0.8 b There were a number of entres that we not used n the experments, namely the see also entres. c Ths s a dfferent use of R and we trust the use s clear from the context. (precson-orented), 0.5 (balanced harmonc mean of R and P), and 0.2 (recall-orented). To compute -values for each query, we compute precson and recall at each dfferent score n the rankng (note, not at each rank poston), and then compute the optmal values -values for the dfferent settngs. Statstcal sgnfcance testng was performed usng the sgn test, and p-values less than 0.05 were consdered sgnfcant. 4 Experments We nvestgate each research queston n turn, and present and dscuss the results. Gven the possble nteracton between wndow sze and weghtng functon, we decded to smply explore the wndow sze ntally wth the query generaton model (GEN), and we later confrmed the valdty of the fndngs for the other two weghtng functons. 4.1 Wndow Sze Experments In the frst set of experments, we nvestgated effect of the sze of the sldng wndow on the performance achevable wth the query generaton model (GEN). The sze of the wndow was vared from 50 words through 200 words n steps of 25, and for 250 words. The experment was run wth two sets of queres, those for whch the queres contaned more than one word (MULTI), and those comprsng a sngle word (SINGLE). In the context of document skmmng, we thnk t lkely that mult-word queres wll be more usual that sngle word queres. However, we were also nterested n the performance achevable wth sngle word queres, as they form a sgnfcant proporton of entres n our book ndex. If ProfleSkm were used as a replacement for the book ndex then t would have to provde good performance for ths type of query. We present the wndow sze results n Tables 2 and 3, where the bold fgures are the best values acheved for each measure, and the talcsed fgures are values close to the best. The results for the mult-term queres (Table 2) show that for -measures, the results are broadly comparable over a wde range of wndow szes from 50 through 250. Interestngly, the recall-orented (α=0.2) results are better than the other results, ndcatng that relevance proflng may be a recallorented devce. or the (averaged) non-nterpolated precson and for R-Precson, the results are better for the smaller wndow szes (50, 75), and sgnfcantly so compared wth the results for wndow szes 100 and up.

5 Wndow sze Prec. R- prec. (α=0.8) (α=0.5) (α=0.2) Table 2: Averaged effectveness of query generaton weghtng (GEN) for 232 mult-term queres for dfferent wndow szes The performance for the sngle term queres (Table 3) s reduced by 5-10% compared wth the mult-term queres, except for the R-Precson measure. Interestngly, for all measures, the maxmum effectveness s acheved at a wndow sze of 200. Wndow sze Prec. R- prec. (α=0.8) (α=0.5) (α=0.2) Table 3: Averaged effectveness of query generaton weghtng (GEN) for 46 sngle term queres for dfferent wndow szes The results of the wndow sze experments demonstrate the robust nature of the relevance proflng process, n that hgh and broadly comparable levels of performance are acheved across a range of wndow szes. However, statstcally speakng, small wndow szes (50, 75) yeld sgnfcantly better results for nonnterpolated precson and R-precson, for mult-term queres. Nevertheless, t s clear that relevance proflng works well for wndows of a few sentences n length up to half a page. or sngle term queres, the best effectveness s acheved for larger wndow szes (200, 250). A possble explanaton for ths s that, n order to establsh relevance for a short query, we may requre a larger sample of text Contrary to expectatons, small wndow szes dd not seem to be precson-enhancng, nor large wndow szes recall-enhancng. If one looks n detal at the relevance proflng process (gure 1), one can observe both precson- and recall-enhancng devces. By nsstng that pages can only acheve a score f at least one query term s actually present n the page, we are effectvely boostng precson. On the other hand, the fact that any wndow begnnng n a page can contrbute an RSV to that page s a recall-enhancng devce. We would argue that these devces operate to mtgate the effects of wndow sze, and partcularly the possble deleterous effect of large wndow szes on precson. 4.2 Weghtng uncton Experments In ths second set of experments, we nvestgate the comparatve effectveness of the three weghtng functons descrbed n secton 2. In Table 4, we report the best levels of effectveness acheved for each weghtng functon, where the wndow sze s chosen optmally for each functon (and n some cases measure). Standard uncton GEN KL REQ Wndow Sze Average Precson Average R-precson (α=0.8) (α=0.5) (α=0.2) Table 4: Averaged effectveness of three weghtng functons for 232 mult-term queres (usng optmal wndow settng for each functon) Wth coordnaton flterng uncton GEN KL REQ Wndow Sze (unless stated otherwse) Average Precson Average R-precson (75) (α=0.8) (100) (α=0.5) (100) (α=0.2) (200) (200) Table 5: Averaged effectveness of three weghtng functons wth co-ordnaton level flterng for 232 mult-term queres (usng optmal wndow settng for each functon)

6 Let us consder the standard results frst. Clearly, the performance of the query generaton model (GEN) exceeds that of the other two weghtng functons by a large and statstcally sgnfcant margn. It s approxmately 10%-15% better than the Kullback-Lebler model, and above 20% better than the term frequency weghtng model. Although KL weghtng outperforms REQ, t s not sgnfcantly better. It would appear that the ostensbly smple query generaton model s better than the arguably more nformaton-rch KL model, and we surmsed earler the problem of estmatng parameters from small samples may be mportant n relevance proflng. If we look at the ndvdual term weghts n the KL model (eqn 4), t s clear that wthn the log term, p(t D)<<p(t W), and thus the contrbuton of p(t D) domnates the expresson. Gven the nature of a book ndex, we beleve that many terms n the ndex wll be of smlar frequency, and thus the P(t D) wll be both small and relatvely constant for query terms. Consequently, the term outsde the log, namely p(t W) wll provde the overall shape of the weghtng functon, and ths s the same bass for the term frequency weghtng (eqn 6). Hence, the performance of KL and REQ should be, and ndeed are, broadly smlar. How then do we explan why the performance of GEN s so much better than the other two weghtng functons? We thnk t s the treatment of the nonoccurrence of query terms n a wndow that s the dstngushng hallmark of GEN. In GEN, f a query term does not appear n the wndow, then the weghtng functon penalses ths harshly. Consder that n the probablty mxture (eqn 1), the contrbuton from the document model wll always be very small. Whereas, n the other two functons the weghtng s more or less lnear n the number of term occurrences. To test ths conjecture, we ran a thrd set of experments, n whch we added an addtonal flter, whch we dubbed the coordnaton level flter. In these experments we nsst that all query terms are present n a wndow (or coordnated as t used to be known), pror to a retreval status value beng calculated for that wndow. The results of ths thrd set of experments are presented n Table 5. In some cases, there was a dfferent optmal wndow sze for some measures, and ths s ncluded n brackets wth the measure value. We expected that co-ordnaton flterng wll reduce the number of pages retreved, and hence lkely to depress recall, and possbly boost precson. or GEN, performance was reduced by about 5-10% for all measures. However, the co-ordnaton flterng boosted the performance of both KL and REQ to comparable levels to the fltered GEN performance. Ths provdes strong evdence that t s the treatment of the nonoccurrence of query terms whch s the key to the performance of the unfltered (or standard) query generaton model. It may be that the performance of the KL model could be mproved by expermentng wth the parameter estmaton rules. Interestngly, ncorporatng co-ordnaton flterng, results n generally larger optmal wndow szes for GEN and to some extent KL. 4.3 Relevance Proflng for Book Indexng Here, we draw together the evdence that relevance proflng mght be used as an ad n generatng a book ndex, namely through ntellectual effort by a human, or ndeed as a complete replacement for the book ndex. rst, the absolute effectveness acheved when usng the query generaton model was very hgh, for both the mult-term and sngle term sets of queres. or the mult-term queres (Table 3), t acheved optmal -values of (, α=0.5), whch effectvely means that at ths optmum pont, on average both Precson and Recall acheved levels around 70%. Moreover, the R-precson value of 0.58 s qute hgh, and means f a user nspects down to the rank correspondng the exact number of relevant pages for each query, then on average 58% of these relevant pages wll be retreved. Hgh recall s clearly mportant n the context of book ndexng. We also computed the number of queres that acheved recall of 100% at a cutoff of 20 retreved pages. or GEN (wndow sze = 75), 175 of the 232 mult-term queres acheved maxmum recall, and only 9 of the 232 queres faled to retreve any relevant documents. urther, GEN (wndow sze = 75) retreves 587 of the possble 766 relevant pages for the mult-term queres, and 100 of the possble 144 relevant pages for the sngle term queres. It seems reasonable to conclude that a tool such as ProfleSkm, whch mplements relevance proflng, would prove a useful ad to a human ndexer. Indeed, gven a set of ndex entres, relevance proflng mght even replace the tradtonal book ndex. It s worth notng that careful nspecton of the exstng book ndex showed conclusvely that the ndex s not completely exhaustve. In partcular, there s nconsstent treatment of page ranges, where frequently not all relevant pages n a range are ncluded n the ndex. Relevance proflng was able to relably dentfy many of these undentfed relevant pages. 5 Conclusons and uture Work In ths paper we have reported a seres of bench experments of relevance proflng. The experments were conducted wth an electronc book and ts assocated book ndex. We nvestgated three alternatve

7 weghtng functons, and varous parameters of the relevance proflng process, most notably wndow sze. The major fndngs of the experments are: - relevance proflng s a robust process yeldng acceptably hgh levels of performance over a range of wndow szes from words n length; - the query generaton model (GEN) s sgnfcantly more effectve than the Kullback-Lebler (KL) and the term frequency (REQ) models; - a key factor n effectveness of query generaton model s ts treatment of non-occurrences of query terms n the sldng wndow, whch we attrbute to the smoothng provded by the probablty mxture model; and - the absolute performance of relevance proflng ndcates that t mght be usefully employed as an ad to human book ndexng, and ndeed may provde a vable alternatve to the book ndex (provdng that a good set of ndex entres can be generated). We would emphasse that these experments have been conducted wth a sngle book, and assocated ndex, and t would be hghly desrable to repeat the experments wth a number of such books. Nevertheless, we beleve that the general conclusons would lkely be confrmed by such experments, albet the absolute levels of effectveness may dffer, dependng on the exhaustvty of the (human) ndexng. The results reported here suggest a number of possble avenues for future work. rst, gven the evdent, and ndeed wdely recognsed, mportance of parameter estmaton and smoothng n language modellng applcatons, and gven the specalsed nature of relevance proflng, further experments should be conducted on both GEN and KL usng dfferent parameter estmaton rules or smoothng approaches. Second, all the weghtng functons we nvestgated assumed that query terms are dstrbuted ndependently. Gven that, n general, ths s unlkely to be the case, and especally for phrasal queres, t would be desrable to nvestgate models based on term dependence. However, we would suggest only frst-order models be nvestgated gven the paucty of sample data. nally, t would be nterestng to explore how to generate useful queres automatcally, where by useful we mean the query dentfes coherent chucks of relevant text. We note that such queres would not necessarly be just phrasal. Ths would pave the way for both automatng the process of book ndexng, and perhaps more tantalzngly, enablng query-less retreval wthn documents. Acknowledgements Ths work was done wthn the Smart Web Technologes Centre that was establshed through a Scottsh Hgher Educaton undng Councl (SHEC) Research Development Grant (No. HR01007). Davd Lee was supported by a Summer Studentshp provded by The Robert Gordon Unversty. We thank the followng colleagues for ther feedback on ths work: Ian Pre, Ivan Koychev, Bcheng Lu, Ralf Berg and Murat Yakc. I also thank Margery Murray for her nvaluable assstance n preparng the manuscrpt. nally, we would lke to thank the anonymous referees for ther nsghtful comments on the submtted paper. 6 References [1] Kaszkel, M. and Zobel, J.: Passage Retreval Revsted. In: Proceedngs of the Twenteth Internatonal ACM-SIGIR Conference on Research and Development n Informaton Retreval, Phladelpha, July ACM Press, , [2] Hearst, M. A.: TleBars: vsualzaton of term dstrbuton nformaton n full text nformaton access. Proc. CHI'95, 56-66, [3] Harper, D. J., Coulthard, S., and Sun, Y. A language modellng approach to relevance proflng for document browsng. In Proceedngs of the Jont Conference on Dgtal Lbrares, pages 76-83, Oregon, USA, July [4] Harper, D.J., Koychev, I. and Sun, Y. Query-based document skmmng: A user-centred evaluaton of relevance proflng. In: Proceedngs of 25 th European Conference on Informaton Retreval. Lecture Notes n Computer Scence, Sprnger-Verlag, Berln, pp , [5] Harper, D.J., Koychev, I., Sun, Y. and Pre, I. Wthn-Document Retreval: A User-Centred Evaluaton of Relevance Proflng. In: Informaton Retreval, Kluwer Academc Publshers, The Netherlands, 7, , [6] Ponte, J. and Croft, W. B.: A language modelng approach to nformaton retreval. In: Proceedngs of the 1998 ACM SIGIR Conference on Research and Development n Informaton Retreval, , [7] Song,. and Croft, W.B.: A general language model for nformaton retreval. In: Proceedngs of the 1999 ACM SIGIR Conference on Research and Development n Informaton Retreval, , 1999 [8] Croft, W.B. and Lafferty, J. (Eds.): Language modelng for nformaton retreval. Kluwer Academc Publshers, The Netherlands, [9] Van Rjsbergen, K. Informaton Retreval. Butterworths, London, 1979.

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Extending Relevance Model for Relevance Feedback

Extending Relevance Model for Relevance Feedback Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Topic 23 - Randomized Complete Block Designs (RCBD)

Topic 23 - Randomized Complete Block Designs (RCBD) Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Thermodynamics and statistical mechanics in materials modelling II

Thermodynamics and statistical mechanics in materials modelling II Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Quantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms

Quantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms 2004, 66th EAGE Conference, Pars Quanttatve Dscrmnaton of Effectve Porosty Usng Dgtal Image Analyss - Implcatons for Porosty-Permeablty Transforms Gregor P. Eberl 1, Gregor T. Baechle 1, Ralf Weger 1,

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Cathy Walker March 5, 2010

Cathy Walker March 5, 2010 Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used

More information

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist? UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada,

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Density matrix. c α (t)φ α (q)

Density matrix. c α (t)φ α (q) Densty matrx Note: ths s supplementary materal. I strongly recommend that you read t for your own nterest. I beleve t wll help wth understandng the quantum ensembles, but t s not necessary to know t n

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

Bayesian Planning of Hit-Miss Inspection Tests

Bayesian Planning of Hit-Miss Inspection Tests Bayesan Plannng of Ht-Mss Inspecton Tests Yew-Meng Koh a and Wllam Q Meeker a a Center for Nondestructve Evaluaton, Department of Statstcs, Iowa State Unversty, Ames, Iowa 5000 Abstract Although some useful

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

DUE: WEDS FEB 21ST 2018

DUE: WEDS FEB 21ST 2018 HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

Least squares cubic splines without B-splines S.K. Lucas

Least squares cubic splines without B-splines S.K. Lucas Least squares cubc splnes wthout B-splnes S.K. Lucas School of Mathematcs and Statstcs, Unversty of South Australa, Mawson Lakes SA 595 e-mal: stephen.lucas@unsa.edu.au Submtted to the Gazette of the Australan

More information