Exploiting association rules and ontology for semantic document indexing

Size: px
Start display at page:

Download "Exploiting association rules and ontology for semantic document indexing"

Transcription

1 Explotng assocaton rules and ontology for antc document ndexng Fatha Boubekeur IRIT-SIG, Paul Sabater Unversty of Toulouse, CEDEX 9, France Department of Computer Scences, Mouloud Mammer Unversty of Tz-Ouzou, 15000, Algera Mohand Boughanem IRIT-SIG, Paul Sabater Unversty of Toulouse, CEDEX 9, France Lynda Tamne-Lechan IRIT-SIG, Paul Sabater Unversty of Toulouse, CEDEX 9, France Abstract Ths paper descrbes a novel approach for document ndexng based on the dscovery of contextual antc relatons between concepts. The concepts are frst extracted from WordNet ontology. Then we propose to extend and to use the assocaton rules technque n order to dscover condtonal relatons between concepts. Fnally, concepts and related contextual relatons are organzed nto a condtonal graph. Keywords Informaton retreval, Conceptual ndexng, Assocaton rules, WordNet. 1 Introducton Informaton retreval (IR) s concerned wth selectng, from a collecton of documents, those that are lkely to be relevant to a user nformaton need expressed usng a query. Three basc functons are carred out n an nformaton retreval system (IRS): document and nformaton needs representaton and matchng of these representatons [6]. Document representaton s usually called ndexng. The man obectve of ndexng s to assgn to each document a descrptor represented wth a set of features, usually keywords, derved from the document content. Representng the user s nformaton need nvolves a one step or multstep query formulaton by means of pror terms expressed by the user and/or addtve nformaton drven by teratve query mprovements lke relevance feedback [17]. The man goal of document-query matchng, called also query evaluaton, s to estmate a relevance score used as a crteron to rank the lst of documents returned to the user as an answer to hs query. Most of IR models handle durng ths step, an approxmate matchng process usng the frequency dstrbuton of query terms over the documents. Numerous factors affect the IRS performances. Frst, nformaton sources mght contan varous topcs addressed wth a very wde vocabulary that lead to dfferent antc contexts. Ths consequently ncrease the dffculty of dentfyng the accurate antc context addressed by the user s query. Second, users often do not express ther queres n an optmal form consderng ther needs. Thrd, n classcal IR models, documents and queres are bascally represented as bags of weghted terms. A key characterstc of such models s that the degree of document query matchng depends on the number of the shared terms. Ths leads to a lexcal focused relevance estmaton whch s less effectve than a conceptual focused one [14]. Ths paper addresses these lmtatons by proposng a conceptual ndexng approach based on the ont use of ontology and assocaton L. Magdalena, M. Oeda-Acego, J.L. Verdegay (eds): Proceedngs of IPMU 08, pp Torremolnos (Málaga), June 22 27, 2008

2 rules. We thus expect to take advantages from (1) conceptual ndexng of documents and (2) from the contextual dependences between concepts dscovered by means of assocaton rules. The paper s structured as follows: Secton 2 ntroduces the problems we am to tackle, namely the term msmatch and ambguty n IR, then reports some related works and presents our motvatons. The proposed antc ndexng approach s detaled n secton 3. Secton 4 concludes the paper. 2 Related works and motvatons Most of classcal IR models are based on the well known technque of bag of words expressng the fact that both documents and queres are represented usng basc weghted keywords. The performances of such models suffer from the so-called keyword barrer [20] that has serous drawbacks especally at document representaton and query formulaton levels that lead to bad performances. Indeed, n such IRS, a relevant document wll not be retreved n response to a query f the document and query representatons do not share at least one word. Ths mples on one hand, that relevant documents are not retreved f they do not share terms wth the query; on the other hand, rrelevant documents that have common words wth the query are retreved even f these words are not antcally equvalent. Varous approaches and technques attempted to tackle these problems by enhancng the document representaton or query formulaton. Attempts n document representaton mprovements are related to the use of antcs n the ndexng process. In ths context, two man ssues could be dstngushed [3]: antc ndexng and conceptual ndexng. Semantc ndexng s based on technques for contextual word sense dsambguaton (WSD) [10], [13], [18], [21]. Indexng conssts n assocatng the extracted words of document or query, to words of ther own context [21]. Other dsambguaton approaches [20] use herarchcal representatons drven from ontologes to compute the antc dstance or antc smlarty [11], [12], [16] between words to be compared. Conceptual ndexng s based on usng concepts extracted from ontologes and taxonomes n order to ndex documents nstead of usng smple lsts of words [3], [7], [8], [9], [19]. The ndexng process runs manly n two steps: frst dentfyng terms n the document text by matchng document phrases wth concepts n the ontology, then, dsambguatng the dentfed terms. In [15] the FIS-CRM model focusses on the nterrelatons (synonymy and generalty ones) among the document terms n order to buld the related conceptual ndex. A concept s represented by the related document term, the set of ts fuzzy synonymous (extracted from a fuzzy dctonary), and ts more general concepts (extracted from a fuzzy ontology). A weght readustment process allows the added terms to have fuzzy weghts even f they do not occur n the document. Document clusterng s then nvolved usng concept coocurrence measures. We am to tackle the problems due to usng basc keyword based evdence sources n IR, provdng a soluton at the ndexng level. The proposed approach reles on the ont use of concepts and contextual relatons between them n order to ndex the document. Both concepts and related dependences are fnally organzed nto a graphcal representaton resultng n a graphcal representaton of the document ndex. Our approach allows a rcher and more accurate representaton of documents when supportng both contextual and antcal relatons between concepts. Our man propostons presented n ths paper concern a novel conceptual document ndexng approach based on concepts and contextual dependences between them. Ths approach s based on the use of the WordNet general ontology as a source for extractng representatve document concepts, and on assocaton rules based technques n order to dscover the latent concept contextual relatons. More precsely, our conceptual ndexng approach s supported by three man steps: (1) Identfyng the representatve concepts n a document, (2) Dscoverng contextdependent relatons between antcal enttes namely the concepts (rather than between lexcal enttes that are terms) usng a proposed varant of assocaton rules namely antc assocaton Proceedngs of IPMU

3 rules, and (3) combnng both concepts and related antc assocatons nto a compact graphcal representaton. 3 A conceptual document ndexng approach Ths secton detals our conceptual document ndexng approach based on (1) the explotaton of WordNet for dentfyng concepts, (2) assocaton rules for dscoverng contextual relatons between concepts. Both concepts and related relatons are fnally organzed nto a compact condtonal graph. 3.1 Outlne We propose to use WordNet ontology and assocaton rules n order to buld the document conceptual ndex. The document ndexng process s handled through three man steps: (1) Identfyng the representatve concepts of a document: ndex concepts are dentfed n the document usng WordNet. (2) Mnng assocaton rules: contextual relatons between selected concepts are dscovered usng assocaton rules. (3) Buldng a graphcal conceptual ndex: both concepts and related relatons are then organzed nto a graph. Nodes n the graph are concepts and edges traduce relatons between concepts. The above steps are detaled n the followng. 3.2 Index concepts dentfcaton Overvew The whole process of dentfyng the representatve concepts of the document has been well descrbed n [4], we summarze t n the followng. The process reles on the followng steps: (1) Term dentfcaton: the goal of ths step s to dentfy sgnfcatve terms n a document. These terms correspond to entres n the ontology. (2) Term weghtng: n ths step, an alternatve of tf*df weghtng scheme s proposed. The underlyng goal s to elmnate the less frequent (less mportant) terms n a document n order to select only the most representatve ones (ndex terms). (3) Dsambguaton: ndex terms are assocated wth ther correspondng concepts n the ontology. As each extracted term could have many senses (related to dfferent concepts), we dsambguate t usng smlarty measures. A score s thus assgned to each concept based on ts antc dstance to other concepts n the document. The selected concepts are those havng the best scores Basc Notons Terms are represented as lsts of word strngs (a word strng s a character strng that represents a word). The length of a term t, noted t, s then the number of words n t. A mono-word term conssts n a one word lst. A mult-word term conssts n a word lst of more than one word. Let t be a term represented as a lst of words, t [w 1, w 2,,w,., w l ]. Elements n t could be dentcal, representng dfferent occurrences of a same word n t. We note w the th word n t. We recursvely defne the poston of the word w n the lst t by the followng: pos pos t t ( w1 ) 1 ( w ) pos ( w ) t 1 + 1, 1.. l Let t 1 [ w w,..., w m ], t [ y y,..., y l ] 1, 2 gven terms. 2 1, 2 be two Defnton 1 t 2 s a sub-term of t 1 f the whole sequence of words n t 2 occurs n t 1. t 1 s then called sur-term of t Identfcaton of ndex terms Before any document processng, n partcular before prunng stop words, an mportant process for the next steps conssts n extractng monoword and mult-word terms from texts that correspond to entres n WordNet. The technque we propose performs a word by word analyss of the document. It s descrbed n the followng: 466 Proceedngs of IPMU 08

4 Let w be the next word (assumed not to be a stop word), to analyze n the document d. We extract from WordNet, the set S of terms contanng w : let S {C 1, C 2,... C m}, S s composed of mult-word and mono-word terms. S s ranked n decreasng order of term length as follows: S {C (1), C (2), C (m)} where () (1)..(m) s an ndex permutaton such as C (1) C (2) C (m). Terms wth dentcal szes are ndfferently placed one besde another. For each element C () n S, we note Pos C () (w ) the poston of w n the C () lst of words. There are pos C w ( ) ( ) 1 words on the left of w n C (). Let Pos d (w ) be the poston of w n d lst of words. Defnton 2 The relatve context of w occurrence n document d gven the term C (), s the word lst CH defned by: CH sub(d, p,l) where: ( ) ( ) p posd w pos w 1 and l C C( ) (). We so extract the relatve context of w n d, namely CH sub(d, p,l) (c.f. Fg. 1), then we compare word strng lsts CH and C ().. Fgure 1: Identfyng word context n d If CH C (), the C ( +1 ) term of S s analyzed, otherwse, term t k C () s dentfed. The next word to analyze n d s w such that pos w p. ( ) l d + Durng the dentfcaton process, three cases can happen as shown n Fg. 2: Case a) Current dentfed term t k s completely dsont from t k-1. It could be dentcal but we do not treat denttes at ths level. It s thus a new term whch wll be retaned n document descrpton. Case b) Term t k covers partally term t k-1. The two terms are thus dfferent and both dentfed even f they have common words. Case c) Term t k covers completely one or more precedng adacent terms t k-1 t, k-1. In ths case, to allow an effectve dsambguaton, we retan the longest term, namely t k, and elmnate the adacent terms t covers (t k-1... t, k-1). case a) case b) case c) Fgure 2: Term dentfcaton By ths step, we wll have dentfed the set T(d) of all the mult-word or mono-word terms that compose d: T(d) {t 1, t 2,.., t n }. We fnally compute each term frequency n d and elmnate redundant terms, whch leads to: T (d){(t 1,Occ 1 ), (t 2,Occ 2 ),, (t m, Occ m ) / t d, Occ Occ(t ) s the occurrence frequency of t n d, 1 m and m n } Term weghtng Term weghtng assgnes to each term a weght that reflects ts mportance n a document. In the case of mono-word terms, varants of the tf*df formula are used, expressed as follows: W t,d tf(t)*df(t), tf s term frequency, df s the nverse N document frequency such as df () t log df () t, N s the number of documents n the corpus and df(t) the number of documents n the corpus that contan the term t. In the case of mult-word terms, term weghtng approaches use n general statstcal and/or syntactcal analyss. Roughly, they add sngle word frequences or multply the number of term occurrences by the number of sngle words belongng to ths term. We propose a new weghtng formula as varant of tf*df defned as follows: let W t,d be the weght assocated wth the term t n the document d, T (d) the term set of d, Sub (t) T (d) a sub-term of t, Sur (t ) T (d) a sur-term of t, S(t) the set of synsets C assocated wth term t. We defne the probablty that t s a possble sense of Sub (t) as follows: Proceedngs of IPMU

5 ( S( Sub () t ) P t { C S ( Sub () t )/ t C} S Sub () t ( ) We then defne W t, d tf (t) * df(t) as follows: W t, d Occ( t) + Occ( Sur ( t)) [ P( t S( Sub ( t) ) Occ( Sub ( t) )] N * ln df ( t) Where: N s the total number of documents n the corpus, df(t) (document frequency) s the number of documents n the corpus that contan the term t. The underlyng dea s that the global frequency of a term n a document s quantfed on the bass of both ts own frequency and ts frequency wthn each of ts sur-terms as well as ts probable frequency wthn ts sub-terms. Document ndex, Index(d), s then bult by keepng only those terms whose weghts are greater than a fxed threshold Term dsambguaton Each term t n Index(d) may have a number of related senses (.e. WordNet synsets). Let S {C 1,, C,..., C n } be the set of all synsets assocated wth term t. Thus, t has S n senses. We beleve that each ndex term contrbutes to the antc content representaton of d wth only one sense (even f that s somewhat erroneous, snce a term can have dfferent senses n the same document, but we consder here the only domnatng sense). From where, we must choose, for each term belongng to the document ndex (t Index(d)), ts best sense n d. Ths s term dsambguaton. Our dsambguaton approach reles on the computaton of a score (C_Score) for every concept (sense) assocated wth an ndex term. That s, for a term t, the score of ts th sense, noted C, s computed as: Score ( C ) l ( C C ) l WC, d WC, d Dst, k l [ 1,.., m] 1 k nl l k Where m s the number of concepts from Index(d), n l represents the number of WordNet senses whch s proper to each term t l, l Dst C, C s a antc relatedness measure ( ) k between concepts and as defned n [11], [12], [16] and s the weght assocated wth concept C C W C, d l C k, formally defned as follows: C S, W W C, d t, d The concept-sense whch maxmzes the score s then retaned as the best sense of term t. The underlyng dea s that the best sense for a term t n the document d may be the one strongly correlated wth senses assocated wth other mportant terms n d. The set N ( d ) of the selected senses represents the antc core of the document d. 3.3 Dscoverng assocatons between concepts Assocaton rules ntroduced n [1] am to generate all the sgnfcant assocatons between tems n a transactonal database. In our context, assocaton rules are used n order to dscover sgnfcant assocatons between concepts. The formal model s defned n the followng: Let ( d ) { C, C2,..., C,...} N be the antc core 1 of the document d. Each concept C s represented by the term t refers to n the document. The set of ts synonymous defnes ts Dom C C C, C,.... Each value doman ( ) { } value n Dom(C ), s a concept 1, 2 3 that C and C are synonymous. Thus, let η ( d ) {( C, Dom( C ))} ( d ) {( X, Dom( X ))} C N( d ) such, we smply note η the antc core of document d. We propose to use assocaton rules mnng to dscover latent (hdden) contextual relatons between the ndex concepts. Concepts are antc enttes. As assocaton rules allow dscoverng relatons between lexcal enttes, namely terms, we propose to extend them n order to support antc entty assocatons (namely concept assocatons). The proposed extenson s a revsed verson of the one defned n [4]. 468 Proceedngs of IPMU 08

6 Defnton 3 A antc assocaton rule between X and Y η( d ), we note X Y, s defned as follows: X X Dom Y X Y ( X ), Y Dom( Y ) Such that X Y s an assocaton between terms (frequent 1-temsets) X and Y. The ntutve meanng of X Y rule s that f a document s about a concept X, t tends to also be about the concept Y. The aboutness of a document expresses the topc focus of ts content. Ths nterpretaton also apples to the rule X Y. Thus, the rule R : X Y expresses the probablty that the document s about Y knowng that t s about X. The confdence assocated wth R thus rely on the mportance degree of Y n a document d, knowng the mportance degree of X n d. It s formally defned n the followng. Defnton 4 The confdence of the rule R : X Y s formally gven by: Confdence ( R) Support Support mn ( X and Y ) ( X ) ( W W ) X, d, W X, d Y, d Defnton 5 The confdence of a antc assocaton rule R : X Y s defned as: Confdence ( X Y ) max Confdence, X R : X X ( ) ( ) Dom X, Y Dom Y Remark 1. Confdence ( X Y ) always equal 1. In our context, the support of a antc assocaton rule X Y reles on the amount of ndvdual assocaton rules X Y ( X Dom( X ) and Y Dom( Y ), havng confdence greater than the fxed mnmum confdence threshold mnconf1. The support s formally defned n the followng. Defnton 6 The support of the rule R : X Y s formally gven by: / Support ( R) { X Y / Confdence( X Y ) mn conf } { X Y, ( X, Y ) Dom( X ) Dom( Y )} We propose to dscover relatons between concepts n η(d) by means of antc assocaton rules. Semantc assocaton rules are based, n our context, on the followng prncples: (1) a transacton s a document, (2) tems are values from concept domans, (3) an temset s a concept, (4) a antc assocaton rule X Y defnes an mplcaton (.e. a condtonal relaton) between concepts X and Y. By usng assocaton rules, we am to buld a condtonal herarchcal structure of the topc focus of the document content. That s to say, we am at structurng concepts descrbng the document, n a condtonal herarchy whch s supported by the antcs of extracted assocaton rules. The problem of dscoverng assocaton rules between concepts s devded nto two steps, followng the prncple by A-pror algorthm.. Frst, dentfyng all the frequent 1-temsets, correspondng to ndvdual concepts. A frequent concept s n our context, a concept that have a weght greater than a fxed mnmum threshold. Second, mnng assocaton rules between the frequent temsets. The obectve of mnng antc assocaton rules between concepts s to retan the only ones that have a support and a confdence greater than a fxed mnmum threshold mnsup and mnconf respectvely. Some problems can occur when dscoverng assocaton rules, such as redundances and cycles. Redundant rules come generally from transtve propertes: X Y, Y Z and X Z. In order to elmnate redundancy we propose to buld the mnmal covers of the set of extracted rules (that s the mnmal set of non transtve rules). The exstence of cycles n the graph would be due to the smultaneous dscovery of assocaton rules X Y and Y X, or of assocaton rules such as X Y, Y Z and Z X. To solve ths problem, we elmnate the weakest rule Proceedngs of IPMU

7 havng the lowest support, among that leadng to the cycle. If all the rule supports are equal, we randomly elmnate one rule n the cycle. Fnally, concepts and related assocaton rules are structured nto a network leadng to a graphcal ndex. The process of buldng the ndex graph s based on the followng prncples: Nodes n the graph are are concepts from the antc core of the document d, η ( d ) {( C, Dom( C ))}. Thus, nodes n the graph are represented by varables attached to the concepts C, nstantated n the value doman Dom C C C, C,.... ( ) { } 1, 2 3 Node relatons are condtonal dependences dscovered between concepts, by means of antc assocaton rules. Each node X n the graph s annotated by a uncondtonal table named T(X) such that: ( X ), T ( X ) WX d X Dom, (1) In a prevous work [4], we used partcularly CP- Nets [5] as the graphcal model supportng ths conceptual ndex. 3.4 Illustraton The document ndexng process presented above s llustrated through the followng example. Let d((u.c 1.,0.5),(Metropols,0.9), (Impovershment, 0.1),(People,0.4),(Poorness, 0.7), ) a document descrbed by the gven weghted concepts. Metropols, and U.C are synonymes of Cty, thus U.C and Metropols pertan to Cty concept node doman. Smlarly, both of Impovershment and Poorness pertan to Poverty concept node doman, whereas People s assocated wth populaton concept node. That s to say η(d) {(Cty, Dom (Cty)), (Poverty, Dom(Poverty)), (Populaton, Dom(Populaton)) Dom(Cty) {Metropols, U.C}, Dom(Poverty) {Poorness,Impovershment}, Dom(Populaton) {People}. 1 U.C. Urban Center We am to dscover assocatons between Cty, Populaton and Poverty nodes. Applyng Apror algorthm [2] leads: (1) to extract frequent temsets, then (2) to generate assocaton rules between frequent 1- temsets. Relatons of nterest are between ndvdual concepts n the document (rather than between sets of concepts), thus we only have to compute the k-temsets, k1, 2. Assumng a mnmum support threshold mnsup 0.1, the extracted frequent k-temsets (k 1, 2) are gven n Table 1. Support(Impovershment) < msup : the 1- temset Impovershment s not frequent, t s then pruned. The extracted assocaton rules are gven n Table 2. Table 1: Generatng frequent k-temsets 1-Itemsets Itemset Support Metropols 0.9 U.C 0.5 Impovershment 0.1 Poorness 0.7 Frequent 2- temsets Table 2: R 1 : Metropols People R 3 : Metropols Poorness R 5 : U.C People R 7 : U.C Poorness R 9 : People Poorness People 0.4 Metropols, People 0.4 Metropols, Poorness 0.7 U.C, People 0.4 U.C, Poorness 0.5 People, Poorness 0.4 Generated assocaton rules R 2 : People Metropols R 4 : Poorness Metropols R 6 : People U.C R 8 : Poorness U.C R 10 : Poorness People By applyng the formula gven n defnton 4, generated rule confdences are computed leadng to the results gven n Table 3. If we suppose a confdence mnmum threshold mnconf 1, we retan only rules whose confdences are equal or greater than mnconf. The selected rules are shown n Table Proceedngs of IPMU 08

8 Table 3: Confdence rules R R 1 R 3 R 5 R 7 R 9 Confdence(R ) assocated wth concept nodes Populaton, Cty and Poverty respectvely, usng formula 1, whch leads to the graphcal document ndex gven n Fg. 3. R 2 R 4 R 6 R 8 R These rules are frst used to buld antcal assocaton rules that n fact correspond to the ndex graph concept- node relatons. Thus, we deduce: (1) From R 2 : People Metropols and R 6 : People U.C: Populaton Cty (2) From R 4 : Poorness Metropols : Poverty Cty (3) From R 7 : U.C Poorness : Ct y Poverty (4) From R 9 : People Poorness : Populaton Poverty Table 4: Selected assocaton rules R 2 : People Metropols R 4 : Poorness Metropols R 6 : People U.C R 7 : U.C Poorness R 9 : People Poorness We then compute the support of each antc assocaton rule. The results are gven n Table 5. Table 5: Semantc assocaton rules supports Populaton Cty 1 Poverty Cty 0.5 Cty Poverty 0.5 Populaton Poverty 1 Fgure 3: A graphcal document ndex 4 Concluson We descrbed n ths paper a novel approach for conceptual document ndexng. The approach s founded on the ont use of both ontology for dentfyng, weghtng and dsambguatng terms, and assocaton rules to derve context dependent relatons (the context here refers to the document content) between terms leadng to a more expressve document representaton. The approach foundaton s not new but we proposed new technques for dentfyng, weghtng and dsambguatng terms and for dscoverng relatons between related concepts by means of the proposed antc assocaton rules. Semantc assocaton rules allow to derve context dependent relatons between concepts leadng to a more expressve document representaton. The work s stll n progress and results from a comparatve study wth other exstng conceptual ndexng approaches wll be soon avalable. We obvously retan these rules that have a support equal to 1. Two assocatons exst between concepts Cty and Poverty, wth the same support. We thus randomly keep one of them. Suppose Poverty Cty s retaned. The set of selected antc assocaton rules s hghlghted n Table 5. Clearly, retanng the three rules wll lead to a cycle n the ndex graph. To avod ths, we prune the weakest rule (that s the one wth lowest support) Poverty Cty. Fnally, the only selected antc rules are the followng: Populaton Cty and Populaton Poverty. CPT s are then References [1] R.Agrawal, T. Imelnsk, and A. Swam (1993). Mnng assocaton rules between sets of tems n large databases. In Proceedngs of the ACMSIGMOD Conference on Management of data, pages , Washngton, USA, [2] R. Agrawal and R. Srkant (1994). Fast algorthms for mnng assocaton rules. In Proceedngs of the 20th VLdB Conference, pages Proceedngs of IPMU

9 [3] M. Bazz, M.Boughanem, N.Aussenac- Glles N. and C. Chrsment (2005). Semantc Cores for Representng document n IR. In SAC' th ACM Symposum on Appled Computng, pages Santa Fe, New Mexco, USA, [4] F. Boubekeur, M. Boughanem and L. Tamne-Lechan (2007). Semantc Informaton Retreval Based on CP-Nets, n IEEE Internatonal Conference on Fuzzy Systems, London, U.K, uly [5] C. Boutler, R. Brafman, H. Hoos and D. Poole (1999). Reasonng wth Condtonal Ceters Parbus Preference Statements. In Proceedngs of UAI-1999, pages [6] W.B. Croft (1993). Knowledge-based and statstcal approaches to text retreval. In IEEE Expert, 8(2), pages [7] N. Guarno, C. Masolo and G. Vetere (1999). OntoSeek: Usng Large Lngustc Ontologes for Accessng On-Lne Yellow Pages and Product Catalogs. Natonal Research Councl, LADSEBCNR, Padavo, Italy, [8] J. Gonzalo, F. Verdeo, I. Chugur, J. Cgarran. (1998). Indexng wth WordNet synsets can mprove retreval. In proceedngs of COLING/ACL Work, [9] L.R Khan (2000). Ontology-based Informaton Selecton. Phd Thess, Faculty of the Graduate School, Unversty of Southern Calforna. [10] R. Krovetz and W.B. Croft (1992). Lexcal Ambguty and Informaton Retreval. In ACM Transactons on Informaton Systems, 10(1). [11] C. Leacock, G.A. Mller and M. Chodorow (1998). Usng corpus statstcs and WordNet relatons for sense dentfcaton. Computatonal Lngustcs 24(1,) pages [12] D. Ln (1998). An nformaton-theoretc defnton of smlarty. In Proceedngs of 15th Internatonal Conference On Machne Learnng. [13] R. Mhalcea, and D. Moldovan (2000). Semantc ndexng usng WordNet senses. In Proceedngs of ACL Workshop on IR and NLP, HongKong. [14] M. Mauldn, J. Carbonell and R. Thomason (1987). Beyond the keyword barer: knowledge-based nformaton retreval. Informaton servces and use 7(4-5), pages [15] J.A. Olvas, P.J. Garcés and F.P. Romero (2003). An applcaton of the FIS-CRM model to the FISS metasearcher: Usng fuzzy synonymy and fuzzy generalty for representng concepts n documents. Internatonal Journal of Approxmate Reasonng, Volume 34, Issues 2-3, pages , November [16] P. Resnk (1999). Semantc Smlarty n a Taxonomy: An Informaton-Based Measure and ts Applcaton to Problems of Ambguty n Natural Language. Journal of Artfcal Intellgence Research (JAIR), 11. pages [17] J.J. Roccho (1971). Relevance feedback n nformaton retreval. In The SMART Retreval System-experments n Automatc Document Processng, G.Salton edtor, Prentce-Hall, Englewood Clffs, NJ, pages [18] M. Sanderson (1994). Word sense dsambguaton and nformaton retreval. In Proceedngs of the 17th Annual Internatonal ACM-SIGIR Conference on Research and development n Informaton Retreval, pages [19] M.A. Starmand and J.B. Wllam (1996). Conceptual and Contextual Indexng of documents usng WordNet-derved Lexcal Chans. In Proceedngs of 18th BCS-IRSG Annual Colloquum on Informaton Retreval Research. [20] E.M. Voorhees (1993). Usng WordNet to dsambguate Word Senses for Text Retreval. In Proceedngs of the 16thAnnual Conference on Research and development n Informaton Retreval, SIGIR'93, Pttsburgh, PA. [21] D. Yarowsky (1995). Unsupervsed word sense dsambguaton rvalng supervsed methods. In 33rd Annual Meetng, Assocaton for Computatonal Lngustcs, pages , Cambrdge, Massachusetts, USA. 472 Proceedngs of IPMU 08

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Fuzzy Boundaries of Sample Selection Model

Fuzzy Boundaries of Sample Selection Model Proceedngs of the 9th WSES Internatonal Conference on ppled Mathematcs, Istanbul, Turkey, May 7-9, 006 (pp309-34) Fuzzy Boundares of Sample Selecton Model L. MUHMD SFIIH, NTON BDULBSH KMIL, M. T. BU OSMN

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation Corpora and Statstcal Methods Lecture 6 Semantc smlarty, vector space models and wordsense dsambguaton Part 1 Semantc smlarty Synonymy Dfferent phonologcal/orthographc words hghly related meanngs: sofa

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

A Weighted Utility Framework for Mining Association Rules

A Weighted Utility Framework for Mining Association Rules A Weghted Utlty Framework for Mnng Assocaton Rules M Sulaman Khan 1, 2, Maybn Muyeba 1, Frans Coenen 2 1 School of Computng, Lverpool Hope Unversty, UK 2 Department of Computer Scence, Unversty of Lverpool,

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach

Extracting Pronunciation-translated Names from Chinese Texts using Bootstrapping Approach Extractng Pronuncaton-translated Names from Chnese Texts usng Bootstrappng Approach Jng Xao School of Computng, Natonal Unversty of Sngapore xaojng@comp.nus.edu.sg Jmn Lu School of Computng, Natonal Unversty

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION CAPTER- INFORMATION MEASURE OF FUZZY MATRI AN FUZZY BINARY RELATION Introducton The basc concept of the fuzz matr theor s ver smple and can be appled to socal and natural stuatons A branch of fuzz matr

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Spectral Clustering. Shannon Quinn

Spectral Clustering. Shannon Quinn Spectral Clusterng Shannon Qunn (wth thanks to Wllam Cohen of Carnege Mellon Unverst, and J. Leskovec, A. Raaraman, and J. Ullman of Stanford Unverst) Graph Parttonng Undrected graph B- parttonng task:

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

Aggregation of Social Networks by Divisive Clustering Method

Aggregation of Social Networks by Divisive Clustering Method ggregaton of Socal Networks by Dvsve Clusterng Method mne Louat and Yves Lechaveller INRI Pars-Rocquencourt Rocquencourt, France {lzennyr.da_slva, Yves.Lechevaller, Fabrce.Ross}@nra.fr HCSD Beng October

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni Handlng Uncertan Spatal Data: Comparsons between Indexng Structures Br Bhanu, Ru L, Chnya Ravshankar and Jnfeng N Abstract Managng and manpulatng uncertanty n spatal databases are mportant problems for

More information

Week 2. This week, we covered operations on sets and cardinality.

Week 2. This week, we covered operations on sets and cardinality. Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION

A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION OPERATIONS RESEARCH AND DECISIONS No. 2 21 Barbara GŁADYSZ* A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION In ths artcle we propose a method for dentfyng outlers n fuzzy regresson. Outlers n a sample

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Redesigning Decision Matrix Method with an indeterminacy-based inference process

Redesigning Decision Matrix Method with an indeterminacy-based inference process Redesgnng Decson Matrx Method wth an ndetermnacy-based nference process Jose L. Salmeron a* and Florentn Smarandache b a Pablo de Olavde Unversty at Sevlle (Span) b Unversty of New Mexco, Gallup (USA)

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus Smlar Sentence Retreval for Machne Translaton Based on Word-Algned Blngual Corpus Wen-Han Chao and Zhou-Jun L School of Computer Scence, Natonal Unversty of Defense Technology, Chna, 40073 cwhk@63.com

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through   ISSN Internatonal Journal of Mathematcal Archve-3(3), 2012, Page: 1136-1140 Avalable onlne through www.ma.nfo ISSN 2229 5046 ARITHMETIC OPERATIONS OF FOCAL ELEMENTS AND THEIR CORRESPONDING BASIC PROBABILITY

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

ENTROPIC QUESTIONING

ENTROPIC QUESTIONING ENTROPIC QUESTIONING NACHUM. Introucton Goal. Pck the queston that contrbutes most to fnng a sutable prouct. Iea. Use an nformaton-theoretc measure. Bascs. Entropy (a non-negatve real number) measures

More information

Graph Reconstruction by Permutations

Graph Reconstruction by Permutations Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer

More information

A Network Intrusion Detection Method Based on Improved K-means Algorithm

A Network Intrusion Detection Method Based on Improved K-means Algorithm Advanced Scence and Technology Letters, pp.429-433 http://dx.do.org/10.14257/astl.2014.53.89 A Network Intruson Detecton Method Based on Improved K-means Algorthm Meng Gao 1,1, Nhong Wang 1, 1 Informaton

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

On the Repeating Group Finding Problem

On the Repeating Group Finding Problem The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

on the improved Partial Least Squares regression

on the improved Partial Least Squares regression Internatonal Conference on Manufacturng Scence and Engneerng (ICMSE 05) Identfcaton of the multvarable outlers usng T eclpse chart based on the mproved Partal Least Squares regresson Lu Yunlan,a X Yanhu,b

More information

Complement of Type-2 Fuzzy Shortest Path Using Possibility Measure

Complement of Type-2 Fuzzy Shortest Path Using Possibility Measure Intern. J. Fuzzy Mathematcal rchve Vol. 5, No., 04, 9-7 ISSN: 30 34 (P, 30 350 (onlne Publshed on 5 November 04 www.researchmathsc.org Internatonal Journal of Complement of Type- Fuzzy Shortest Path Usng

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Feature Selection in Multi-instance Learning

Feature Selection in Multi-instance Learning The Nnth Internatonal Symposum on Operatons Research and Its Applcatons (ISORA 10) Chengdu-Juzhagou, Chna, August 19 23, 2010 Copyrght 2010 ORSC & APORC, pp. 462 469 Feature Selecton n Mult-nstance Learnng

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Fundamental loop-current method using virtual voltage sources technique for special cases

Fundamental loop-current method using virtual voltage sources technique for special cases Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

arxiv:cs.cv/ Jun 2000

arxiv:cs.cv/ Jun 2000 Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Probabilistic Metrics for Soft-Clustering and Topic Model Validation

Probabilistic Metrics for Soft-Clustering and Topic Model Validation 010 IEEE/WIC/ACM Internatonal Conference on Web Intellgence and Intellgent Agent Technology Probablstc Metrcs for Soft-Clusterng and Topc Model Valdaton Eduardo H. Ramrez, Ramon Brena Center for Intellgent

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data

More information

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Lecture 13 APPROXIMATION OF SECOMD ORDER DERIVATIVES

Lecture 13 APPROXIMATION OF SECOMD ORDER DERIVATIVES COMPUTATIONAL FLUID DYNAMICS: FDM: Appromaton of Second Order Dervatves Lecture APPROXIMATION OF SECOMD ORDER DERIVATIVES. APPROXIMATION OF SECOND ORDER DERIVATIVES Second order dervatves appear n dffusve

More information

829. An adaptive method for inertia force identification in cantilever under moving mass

829. An adaptive method for inertia force identification in cantilever under moving mass 89. An adaptve method for nerta force dentfcaton n cantlever under movng mass Qang Chen 1, Mnzhuo Wang, Hao Yan 3, Haonan Ye 4, Guola Yang 5 1,, 3, 4 Department of Control and System Engneerng, Nanng Unversty,

More information

An Improved multiple fractal algorithm

An Improved multiple fractal algorithm Advanced Scence and Technology Letters Vol.31 (MulGraB 213), pp.184-188 http://dx.do.org/1.1427/astl.213.31.41 An Improved multple fractal algorthm Yun Ln, Xaochu Xu, Jnfeng Pang College of Informaton

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information