On the Effectiveness of Relevance Profiling
|
|
- Peter Black
- 5 years ago
- Views:
Transcription
1 On the Effectveness of Relevance Proflng Davd J Harper Smart Web Technologes Centre The Robert Gordon Unversty Aberdeen AB25 1HG UK d.harper@rgu.ac.uk Davd Lee Smart Web Technologes Centre The Robert Gordon Unversty Aberdeen AB25 1HG UK davd.lee@smartweb.rgu.ac.uk Abstract Relevance proflng s a general process for wthndocument retreval. Gven a query, a profle of retreval status values s computed by sldng a fxed szed wndow across a document. In ths paper, we report a seres of bench experments on relevance proflng, usng an exstng electronc book, and ts assocated book ndex. The book ndex s the source of queres and relevance judgements for the experments. Three weghtng functons based on a language modellng approach are nvestgated, and we demonstrate that the well-known query generaton model outperforms one based on the Kullback-Lebler dvergence, and one based on smple term frequency. The relevance proflng process proved hghly effectve n retrevng relevant pages wthn the electronc book, and exhbts stable performance over a range of sldng wndow szes. The expermental study provdes evdence for the effectveness of relevance proflng for wthn-document retreval, wth the caveat that the experment was conducted wth a partcular electronc book. Keywords relevance proflng; wthn-document retreval; language modellng; nformaton retreval expermentaton. 1 Background Proceedngs of the 9th Australasan Document Computng Symposum, Melbourne, Australa, December 13, Copyrght for ths artcle remans wth the authors. Increasngly, long electronc documents are beng publshed and delvered usng web and other technologes, and users need to fnd nformaton wthn these long documents. Varous approaches have been proposed for wthn-document retreval, ncludng passage retreval [1], and user nterfaces supportng content-based browsng of documents [2]. We have developed a tool, ProfleSkm, for wthn-document retreval based on the concept of relevance proflng [3], and we reported on a comprehensve user-centred evaluaton of ths tool [4] [5]. ProfleSkm ntegrates passage retreval and content-based document browsng. The key concept underpnnng the tool s relevance proflng, n whch a profle of retreval status values s computed across a document n response to a query. Wthn the user nterface, an nteractve bar graph provdes an overvew of ths profle, and through nteracton wth the graph the user can select and browse n stu potentally relevant passages wthn the document. The reader s referred to [3] for a descrpton of, and detaled ratonale for, relevance proflng and the ProfleSkm user nterface. In the user evaluaton study [4] [5], we compared the performance of ProfleSkm aganst a tool based on the sequental nd command delvered wth most browsers and word processng packages. We concluded that: Relevance proflng, as mplemented and presented by the ProfleSkm, was effectve n enablng users to dentfy relevant passages of a document; Usng ProfleSkm, users were able to select and browse relevant passages more effcently, because only the best matchng passages needed to be explored; and Users found the tool satsfyng to use for wthndocument retreval, because of the overvew provded by the relevance profle. That study was based on a smulated work task, namely recreatng part of the (back of book) ndex for an electronc book. On the bass of the results, we tentatvely concluded that relevance proflng n combnaton wth a set of pre-defned ndex entres, mght prove to be an effectve replacement for the typcal ntellectually derved book ndex. In ths paper, we wsh to nvestgate the underlyng relevance proflng mechansm, through controlled bench experments. Specfcally, we want to nvestgate dfferent forms of the weghtng functon used for relevance proflng, and dfferent settngs for parameters of the process. Addtonally, we wanted to nvestgate whether relevance proflng could provde an alternatve to the ntellectually generated book ndex. Consequently, we based our experments on an electronc book, wth an exstng book ndex, that formed the bass of the expermental corpus.
2 2 Relevance Proflng and Weghtng unctons Relevance proflng s a process whereby a profle of retreval status values s computed across a document n response to a query. Ths profle s presented to the user n the form of a bar graph, and by nteractng wth ths bar graph the user can dentfy, and navgate to, relevant sectons of a document. In ths paper each secton, and hence bar n the graph, corresponds to a page of the document (electronc book). The process s sketched n on object-orented pseudo-code n gure 1. Effectvely, a retreval status value (RSV) s computed for each word poston n the document. The relevance proflng procedure operates by sldng a wndow of fxed sze across the document (lnes 7-18, gure 1), and at each word poston a wndow weghtng functon s appled (lne 12). The weghtng functon s a functon of the set of words n the document, query and the wndow, startng at the gven poston and of the gven sze. The wndow weghts are aggregated for each page of the document, and the maxmum weght acheved by a wndow startng n the page s returned (lnes 13-16). Note, that a flter s appled so that only pages contanng at least one term from the query are retreved,.e. assgned a score (lne 10). 1 process relevanceproflng ( Document, Query, weghtnguncton, WndowSze, scoresarray ) 2 begn 3 for Page = 1 to Document.countPages() 4 begn 5 scoresarray[ Page ]= mnmumwndowscore 6 end 7 for wndowposton = 1 to Document.length() 8 begn 9 PageInDoc= Document.pageOf( wndowposton ) 10 f (Document.contansTermOf (Query, PageInDoc) 11 begn 12 wndowweght = weghtnguncton ( Document, Query, wndowposton, WndowSze ) 13 f (wndowweght > scoresarray [ PageInDoc ]) 14 begn 15 scoresarray [ PageInDoc ]= wndowweght 16 end 17 end 18 end 19 end gure 1: Pseudo-code descrbng the relevance proflng process The man purpose of the expermental study s to nvestgate the behavour of the relevance proflng process, and specfcally the effect on retreval effectveness of the choce of wndow sze and weghtng functon. Before descrbng the expermental study, we ntroduce the varous weghtng functons that are nvestgated. The weghtng functons are based on a language modellng approach n whch we model a document, as a probablty dstrbuton over the terms of the vocabulary, and smlarly for each (sldng) wndow. We defne the document model to be the probablty dstrbuton P(t D), where for every term t n the vocabulary (n our case, of the document), the model gves the probablty that we would observe term t f we randomly sampled from the document D. Smlarly, we defne the wndow model, P(t W) for wndow W. We wll defne three weghtng functons (lne 12, gure 1), whch we refer to as the query generaton model (GEN), the Kullback-Lebler model (KL), and a smple term frequency model (REQ). 2.1 Query Generaton Model (GEN) We adapt the language modellng approach proposed for document retreval n [6] [7] for relevance proflng. Language modellng s used to construct a statstcal model for a text wndow, and based on ths model; we compute the wndow RSV as the probablty of generatng a query (denoted Q). We model the dstrbuton of terms (actually stemmed words) over a text wndow, as a mxture of the text wndow and document term dstrbutons as follows: Ρ( Q W ) = p ( t W ) (1) mx t Q where: pmx( t W ) = λ * p( t W ) + ( 1- λ) * p( t D) The estmates are smoothed by the document word statstcs usng the mxng parameter, λ a. The ndvdual word probabltes are estmated n the obvous way usng maxmum lkelhood estmators: p = ( t W) nw nw p ( t D) = nd nd (2) where n W (n D ), and n W (n D ), are the number of term occurrences of term n the wndow (document), and total term occurrences n the wndow (document) respectvely. Equvalently, we can rank wndows by takng the log of both sdes of (1), yeldng: log Ρ( Q W ) = log pmx ( t wn) (3) t Q a We have nvestgated the best value for the mxng parameter emprcally, and smlar results were obtaned n the range 0.8 through In the experments reported, we use 0.8.
3 2.2 Kullback-Lebler Model (KL) The Kullback-Lebler dvergence has been used extensvely n applcatons of language modellng n nformaton retreval [8]. In ths approach, we compute the Kullback-Lebler dvergence between the wndow model and the document model, and we restrct the computaton to the query terms, as follows: ( W D) = p( W ) log( p( W ) p( D ) KL t t t ) t Q (4) We use the strength of the dvergence as a measure of wndow relevance wth respect to a query. Intutvely, the larger the dvergence of the wndow model from the (common) document model, the more lkely the wndow s relevant to the query. Unlke the GEN model, the KL term weghts have both a representaton component (outsde the log), and a dscrmnatve component (the log term). We are unable to use maxmum lkelhood estmates for the component probabltes, as we must avod estmatng zero for p(t W). We use the estmaton rule attrbuted to Jefferys [9, p. 127] to estmate the parameters as follows: ( n W + 0.5) ( n W 1.0) ( n + 0.5) ( n 1.0) p ( t W ) = + p ( t D) = D D Term requency Weghtng Model (REQ) (5) In ths approach, we smply weght the wndow by summng the total occurrences of all query terms wthn the wndow. Ths approach can be gven a language modellng nterpretaton, whch wll be useful n our later analyss. Usng the query generaton approach, we compute the wndow RSV as the probablty of generatng any term of the query as follows: Ρ( any Q term W ) = p( t W t Q ) (6) Ths s equvalent to summng all the term occurrences n a wndow over the query terms. 3 Expermental Study 3.1 Research Questons In ths study we nvestgate three man research questons, namely: Is there an optmal wndow sze for the sldng wndow, and s ths optmal sze dependent on the partcular weghtng functon used? What s the relatve effectveness of the three weghtng functons we have descrbed? Could relevance proflng be used as an ad n generatng a book ndex, through ntellectual effort by a human, or ndeed as a complete replacement for the book ndex? In respect of wndow sze, we conjecture that smaller wndow szes wll tend to be precsonenhancng as they are less lkely to dscover unrelated term occurrences, and that large wndow szes wll tend to be recall enhancng. We nvestgate a range of wndow szes from a few sentences n length (50 words) up to 2-3 paragraphs or half a page (250 words). or the weghtng functons, we conjecture that the smple term frequency weghtng (REQ) wll perform worst, gven that t explots less of the avalable nformaton than the other functons. Both the query generaton approach and the Kullback-Lebler approach have proved effectve n conventonal document retreval. However, n ths applcaton, we beleve that comparatvely small sze of sample text (namely the samples of wndow text) s lkely to be a major lmtng factor, and specfcally n estmatng probabltes. The smpler query generaton model may prove more effectve due to the smoothng provded by the mxture approach. 3.2 Corpus We wsh to nvestgate relevance proflng for wthndocument retreval, and especally for long documents. And, n order to measure relatve effectveness, we need to establsh a so-called ground truth for the experments. We used van Rjsbergen s classc textbook, Informaton Retreval [9], whch s arguably long (191 pages, words). It s avalable n an electronc format, and t possesses a comprehensve book ndex. We had used ths corpus successfully n prevous end-user experments [4] [5]. Here, we use the corpus n a smlar way to a conventonal test collecton: the pages are the unts of retreval, the queres are provded by the book ndex entres, and the relevance assessments are the relevant pages assocated wth each ndex entry. We note that the book ndex was created by the book s author, and t s possble that the ndexng s not exhaustve. Consequently, the absolute performance of the technques we nvestgate s lkely to be hgher than we report. However, the man purpose of the experments s to compare technques, and n ths respect all are evaluated aganst the same ground truth. Table 1 summarses the man characterstc of the corpus. In the man, we conducted the experments usng the set of mult-term queres as all the weghtng func-
4 tons are monotonc wth respect to sngle term queres. We note that a hgh proporton of the mult-word queres are n fact phrasal queres rather than sets of words. Number of pages: 191 Number of words: Index Entry Type No. of Entres b (queres) Total relevant pages sngle word mult-word Average #rels/entry Table 1: Detals of corpus: Informaton Retreval textbook by van Rjsbergen [9] 3.3 Experment Desgn The experments were conducted as follows. Each query s used to generate a relevance profle for the document. Then, we smply rank the pages accordng to the retreval status value, and evaluate the rankng based on the relevance judgements provded by the book ndex. In fact, ths s not an deal way of evaluatng relevance proflng, whch s desgned to dentfy relevant groups (or clumps) of pages, and not smply ndvdual pages. It would have been better to assess the effectveness n dentfyng the same groups of pages dentfed n the book ndex. Gven the avalablty of measures and tools for evaluatng ranked output, we opted to do page rankng. 3.4 Measures We used a range of sngle-valued measures averaged over the set of queres to assess effectveness, namely: non-nterpolated precson, R-precson (R-prec.), and three varatons of the -measure. R-precson s the precson acheved at rank R, where R s the number of relevant documents for a gven query. Ths measure combnes elements of both precson, proporton of retreved R documents that are relevant, and recall, proporton of R relevant documents retreved. The - measure s smply the complement to the E-measure [9], and s gven by: ( αr + ( α ) P) = R P 1 (7) where R c s recall, P s precson, and α enables one to bas the measure n favour of precson (α -> 1) or recall (α ->0). We use three varants wth α set to 0.8 b There were a number of entres that we not used n the experments, namely the see also entres. c Ths s a dfferent use of R and we trust the use s clear from the context. (precson-orented), 0.5 (balanced harmonc mean of R and P), and 0.2 (recall-orented). To compute -values for each query, we compute precson and recall at each dfferent score n the rankng (note, not at each rank poston), and then compute the optmal values -values for the dfferent settngs. Statstcal sgnfcance testng was performed usng the sgn test, and p-values less than 0.05 were consdered sgnfcant. 4 Experments We nvestgate each research queston n turn, and present and dscuss the results. Gven the possble nteracton between wndow sze and weghtng functon, we decded to smply explore the wndow sze ntally wth the query generaton model (GEN), and we later confrmed the valdty of the fndngs for the other two weghtng functons. 4.1 Wndow Sze Experments In the frst set of experments, we nvestgated effect of the sze of the sldng wndow on the performance achevable wth the query generaton model (GEN). The sze of the wndow was vared from 50 words through 200 words n steps of 25, and for 250 words. The experment was run wth two sets of queres, those for whch the queres contaned more than one word (MULTI), and those comprsng a sngle word (SINGLE). In the context of document skmmng, we thnk t lkely that mult-word queres wll be more usual that sngle word queres. However, we were also nterested n the performance achevable wth sngle word queres, as they form a sgnfcant proporton of entres n our book ndex. If ProfleSkm were used as a replacement for the book ndex then t would have to provde good performance for ths type of query. We present the wndow sze results n Tables 2 and 3, where the bold fgures are the best values acheved for each measure, and the talcsed fgures are values close to the best. The results for the mult-term queres (Table 2) show that for -measures, the results are broadly comparable over a wde range of wndow szes from 50 through 250. Interestngly, the recall-orented (α=0.2) results are better than the other results, ndcatng that relevance proflng may be a recallorented devce. or the (averaged) non-nterpolated precson and for R-Precson, the results are better for the smaller wndow szes (50, 75), and sgnfcantly so compared wth the results for wndow szes 100 and up.
5 Wndow sze Prec. R- prec. (α=0.8) (α=0.5) (α=0.2) Table 2: Averaged effectveness of query generaton weghtng (GEN) for 232 mult-term queres for dfferent wndow szes The performance for the sngle term queres (Table 3) s reduced by 5-10% compared wth the mult-term queres, except for the R-Precson measure. Interestngly, for all measures, the maxmum effectveness s acheved at a wndow sze of 200. Wndow sze Prec. R- prec. (α=0.8) (α=0.5) (α=0.2) Table 3: Averaged effectveness of query generaton weghtng (GEN) for 46 sngle term queres for dfferent wndow szes The results of the wndow sze experments demonstrate the robust nature of the relevance proflng process, n that hgh and broadly comparable levels of performance are acheved across a range of wndow szes. However, statstcally speakng, small wndow szes (50, 75) yeld sgnfcantly better results for nonnterpolated precson and R-precson, for mult-term queres. Nevertheless, t s clear that relevance proflng works well for wndows of a few sentences n length up to half a page. or sngle term queres, the best effectveness s acheved for larger wndow szes (200, 250). A possble explanaton for ths s that, n order to establsh relevance for a short query, we may requre a larger sample of text Contrary to expectatons, small wndow szes dd not seem to be precson-enhancng, nor large wndow szes recall-enhancng. If one looks n detal at the relevance proflng process (gure 1), one can observe both precson- and recall-enhancng devces. By nsstng that pages can only acheve a score f at least one query term s actually present n the page, we are effectvely boostng precson. On the other hand, the fact that any wndow begnnng n a page can contrbute an RSV to that page s a recall-enhancng devce. We would argue that these devces operate to mtgate the effects of wndow sze, and partcularly the possble deleterous effect of large wndow szes on precson. 4.2 Weghtng uncton Experments In ths second set of experments, we nvestgate the comparatve effectveness of the three weghtng functons descrbed n secton 2. In Table 4, we report the best levels of effectveness acheved for each weghtng functon, where the wndow sze s chosen optmally for each functon (and n some cases measure). Standard uncton GEN KL REQ Wndow Sze Average Precson Average R-precson (α=0.8) (α=0.5) (α=0.2) Table 4: Averaged effectveness of three weghtng functons for 232 mult-term queres (usng optmal wndow settng for each functon) Wth coordnaton flterng uncton GEN KL REQ Wndow Sze (unless stated otherwse) Average Precson Average R-precson (75) (α=0.8) (100) (α=0.5) (100) (α=0.2) (200) (200) Table 5: Averaged effectveness of three weghtng functons wth co-ordnaton level flterng for 232 mult-term queres (usng optmal wndow settng for each functon)
6 Let us consder the standard results frst. Clearly, the performance of the query generaton model (GEN) exceeds that of the other two weghtng functons by a large and statstcally sgnfcant margn. It s approxmately 10%-15% better than the Kullback-Lebler model, and above 20% better than the term frequency weghtng model. Although KL weghtng outperforms REQ, t s not sgnfcantly better. It would appear that the ostensbly smple query generaton model s better than the arguably more nformaton-rch KL model, and we surmsed earler the problem of estmatng parameters from small samples may be mportant n relevance proflng. If we look at the ndvdual term weghts n the KL model (eqn 4), t s clear that wthn the log term, p(t D)<<p(t W), and thus the contrbuton of p(t D) domnates the expresson. Gven the nature of a book ndex, we beleve that many terms n the ndex wll be of smlar frequency, and thus the P(t D) wll be both small and relatvely constant for query terms. Consequently, the term outsde the log, namely p(t W) wll provde the overall shape of the weghtng functon, and ths s the same bass for the term frequency weghtng (eqn 6). Hence, the performance of KL and REQ should be, and ndeed are, broadly smlar. How then do we explan why the performance of GEN s so much better than the other two weghtng functons? We thnk t s the treatment of the nonoccurrence of query terms n a wndow that s the dstngushng hallmark of GEN. In GEN, f a query term does not appear n the wndow, then the weghtng functon penalses ths harshly. Consder that n the probablty mxture (eqn 1), the contrbuton from the document model wll always be very small. Whereas, n the other two functons the weghtng s more or less lnear n the number of term occurrences. To test ths conjecture, we ran a thrd set of experments, n whch we added an addtonal flter, whch we dubbed the coordnaton level flter. In these experments we nsst that all query terms are present n a wndow (or coordnated as t used to be known), pror to a retreval status value beng calculated for that wndow. The results of ths thrd set of experments are presented n Table 5. In some cases, there was a dfferent optmal wndow sze for some measures, and ths s ncluded n brackets wth the measure value. We expected that co-ordnaton flterng wll reduce the number of pages retreved, and hence lkely to depress recall, and possbly boost precson. or GEN, performance was reduced by about 5-10% for all measures. However, the co-ordnaton flterng boosted the performance of both KL and REQ to comparable levels to the fltered GEN performance. Ths provdes strong evdence that t s the treatment of the nonoccurrence of query terms whch s the key to the performance of the unfltered (or standard) query generaton model. It may be that the performance of the KL model could be mproved by expermentng wth the parameter estmaton rules. Interestngly, ncorporatng co-ordnaton flterng, results n generally larger optmal wndow szes for GEN and to some extent KL. 4.3 Relevance Proflng for Book Indexng Here, we draw together the evdence that relevance proflng mght be used as an ad n generatng a book ndex, namely through ntellectual effort by a human, or ndeed as a complete replacement for the book ndex. rst, the absolute effectveness acheved when usng the query generaton model was very hgh, for both the mult-term and sngle term sets of queres. or the mult-term queres (Table 3), t acheved optmal -values of (, α=0.5), whch effectvely means that at ths optmum pont, on average both Precson and Recall acheved levels around 70%. Moreover, the R-precson value of 0.58 s qute hgh, and means f a user nspects down to the rank correspondng the exact number of relevant pages for each query, then on average 58% of these relevant pages wll be retreved. Hgh recall s clearly mportant n the context of book ndexng. We also computed the number of queres that acheved recall of 100% at a cutoff of 20 retreved pages. or GEN (wndow sze = 75), 175 of the 232 mult-term queres acheved maxmum recall, and only 9 of the 232 queres faled to retreve any relevant documents. urther, GEN (wndow sze = 75) retreves 587 of the possble 766 relevant pages for the mult-term queres, and 100 of the possble 144 relevant pages for the sngle term queres. It seems reasonable to conclude that a tool such as ProfleSkm, whch mplements relevance proflng, would prove a useful ad to a human ndexer. Indeed, gven a set of ndex entres, relevance proflng mght even replace the tradtonal book ndex. It s worth notng that careful nspecton of the exstng book ndex showed conclusvely that the ndex s not completely exhaustve. In partcular, there s nconsstent treatment of page ranges, where frequently not all relevant pages n a range are ncluded n the ndex. Relevance proflng was able to relably dentfy many of these undentfed relevant pages. 5 Conclusons and uture Work In ths paper we have reported a seres of bench experments of relevance proflng. The experments were conducted wth an electronc book and ts assocated book ndex. We nvestgated three alternatve
7 weghtng functons, and varous parameters of the relevance proflng process, most notably wndow sze. The major fndngs of the experments are: - relevance proflng s a robust process yeldng acceptably hgh levels of performance over a range of wndow szes from words n length; - the query generaton model (GEN) s sgnfcantly more effectve than the Kullback-Lebler (KL) and the term frequency (REQ) models; - a key factor n effectveness of query generaton model s ts treatment of non-occurrences of query terms n the sldng wndow, whch we attrbute to the smoothng provded by the probablty mxture model; and - the absolute performance of relevance proflng ndcates that t mght be usefully employed as an ad to human book ndexng, and ndeed may provde a vable alternatve to the book ndex (provdng that a good set of ndex entres can be generated). We would emphasse that these experments have been conducted wth a sngle book, and assocated ndex, and t would be hghly desrable to repeat the experments wth a number of such books. Nevertheless, we beleve that the general conclusons would lkely be confrmed by such experments, albet the absolute levels of effectveness may dffer, dependng on the exhaustvty of the (human) ndexng. The results reported here suggest a number of possble avenues for future work. rst, gven the evdent, and ndeed wdely recognsed, mportance of parameter estmaton and smoothng n language modellng applcatons, and gven the specalsed nature of relevance proflng, further experments should be conducted on both GEN and KL usng dfferent parameter estmaton rules or smoothng approaches. Second, all the weghtng functons we nvestgated assumed that query terms are dstrbuted ndependently. Gven that, n general, ths s unlkely to be the case, and especally for phrasal queres, t would be desrable to nvestgate models based on term dependence. However, we would suggest only frst-order models be nvestgated gven the paucty of sample data. nally, t would be nterestng to explore how to generate useful queres automatcally, where by useful we mean the query dentfes coherent chucks of relevant text. We note that such queres would not necessarly be just phrasal. Ths would pave the way for both automatng the process of book ndexng, and perhaps more tantalzngly, enablng query-less retreval wthn documents. Acknowledgements Ths work was done wthn the Smart Web Technologes Centre that was establshed through a Scottsh Hgher Educaton undng Councl (SHEC) Research Development Grant (No. HR01007). Davd Lee was supported by a Summer Studentshp provded by The Robert Gordon Unversty. We thank the followng colleagues for ther feedback on ths work: Ian Pre, Ivan Koychev, Bcheng Lu, Ralf Berg and Murat Yakc. I also thank Margery Murray for her nvaluable assstance n preparng the manuscrpt. nally, we would lke to thank the anonymous referees for ther nsghtful comments on the submtted paper. 6 References [1] Kaszkel, M. and Zobel, J.: Passage Retreval Revsted. In: Proceedngs of the Twenteth Internatonal ACM-SIGIR Conference on Research and Development n Informaton Retreval, Phladelpha, July ACM Press, , [2] Hearst, M. A.: TleBars: vsualzaton of term dstrbuton nformaton n full text nformaton access. Proc. CHI'95, 56-66, [3] Harper, D. J., Coulthard, S., and Sun, Y. A language modellng approach to relevance proflng for document browsng. In Proceedngs of the Jont Conference on Dgtal Lbrares, pages 76-83, Oregon, USA, July [4] Harper, D.J., Koychev, I. and Sun, Y. Query-based document skmmng: A user-centred evaluaton of relevance proflng. In: Proceedngs of 25 th European Conference on Informaton Retreval. Lecture Notes n Computer Scence, Sprnger-Verlag, Berln, pp , [5] Harper, D.J., Koychev, I., Sun, Y. and Pre, I. Wthn-Document Retreval: A User-Centred Evaluaton of Relevance Proflng. In: Informaton Retreval, Kluwer Academc Publshers, The Netherlands, 7, , [6] Ponte, J. and Croft, W. B.: A language modelng approach to nformaton retreval. In: Proceedngs of the 1998 ACM SIGIR Conference on Research and Development n Informaton Retreval, , [7] Song,. and Croft, W.B.: A general language model for nformaton retreval. In: Proceedngs of the 1999 ACM SIGIR Conference on Research and Development n Informaton Retreval, , 1999 [8] Croft, W.B. and Lafferty, J. (Eds.): Language modelng for nformaton retreval. Kluwer Academc Publshers, The Netherlands, [9] Van Rjsbergen, K. Informaton Retreval. Butterworths, London, 1979.
CS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationExtending Relevance Model for Relevance Feedback
Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu
More informationBOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu
BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationProbabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology
Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationUncertainty in measurements of power and energy on power networks
Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:
More informationQuestion Classification Using Language Modeling
Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a
More informationDepartment of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution
Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationStatistics II Final Exam 26/6/18
Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationOn the correction of the h-index for career length
1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat
More informationTopic 23 - Randomized Complete Block Designs (RCBD)
Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationThermodynamics and statistical mechanics in materials modelling II
Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationChapter 5 Multilevel Models
Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationThis column is a continuation of our previous column
Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationCredit Card Pricing and Impact of Adverse Selection
Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n
More informationLecture 6: Introduction to Linear Regression
Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationTHE SUMMATION NOTATION Ʃ
Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationJanuary Examinations 2015
24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationQuantitative Discrimination of Effective Porosity Using Digital Image Analysis - Implications for Porosity-Permeability Transforms
2004, 66th EAGE Conference, Pars Quanttatve Dscrmnaton of Effectve Porosty Usng Dgtal Image Analyss - Implcatons for Porosty-Permeablty Transforms Gregor P. Eberl 1, Gregor T. Baechle 1, Ralf Weger 1,
More informationUncertainty as the Overlap of Alternate Conditional Distributions
Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria
ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book
More informationGames of Threats. Elon Kohlberg Abraham Neyman. Working Paper
Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017
More informationA Hybrid Variational Iteration Method for Blasius Equation
Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method
More informationCathy Walker March 5, 2010
Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationTemperature. Chapter Heat Engine
Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used
More informationUNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?
UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada,
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationDensity matrix. c α (t)φ α (q)
Densty matrx Note: ths s supplementary materal. I strongly recommend that you read t for your own nterest. I beleve t wll help wth understandng the quantum ensembles, but t s not necessary to know t n
More informationInductance Calculation for Conductors of Arbitrary Shape
CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationSemi-supervised Classification with Active Query Selection
Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationExplaining the Stein Paradox
Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten
More informationPop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing
Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationDO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes
25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationAnnexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances
ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationBayesian Planning of Hit-Miss Inspection Tests
Bayesan Plannng of Ht-Mss Inspecton Tests Yew-Meng Koh a and Wllam Q Meeker a a Center for Nondestructve Evaluaton, Department of Statstcs, Iowa State Unversty, Ames, Iowa 5000 Abstract Although some useful
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationDUE: WEDS FEB 21ST 2018
HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant
More informationMDL-Based Unsupervised Attribute Ranking
MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationFREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,
FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then
More informationLeast squares cubic splines without B-splines S.K. Lucas
Least squares cubc splnes wthout B-splnes S.K. Lucas School of Mathematcs and Statstcs, Unversty of South Australa, Mawson Lakes SA 595 e-mal: stephen.lucas@unsa.edu.au Submtted to the Gazette of the Australan
More information