6 Probabilistic Retrieval Models

Size: px
Start display at page:

Download "6 Probabilistic Retrieval Models"

Transcription

1 Probabilistic Retrieval Models 1 6 Probabilistic Retrieval Models Notations Binary Independence Retrieval model Probability Ranking Principle

2 Probabilistic Retrieval Models Notations Q Q Q D rel. ϱ R IR judg. αd βd D D D αq βq D q Q query d D document qk Q: query representation dm D: document representation q D k QD : query description R: relevance scale d D m D D : document description ϱ: retrieval function

3 Probabilistic Retrieval Models Binary independence retrieval model Retrieval functions for binary indexing represent queries and documents as sets of terms T = {t1,..., tn} set of terms in the collection qk Q: query representation dm D: document representation simple retrieval function: coordination level match q T k : set of query terms d T m: set of document terms ϱcoord(qk, dm) = q T k d T m Binary independence retrieval (BIR) model: assign weights to query terms ϱbir(qk, dm) = ti q T k dt m cik

4 Probabilistic Retrieval Models Probabilistic foundation of the BIR model Basic techniques for the derivation of probabilistic models: 1. application of Bayes theorem:, P (a b) = P (a, b) P (b) = P (b a) P (a) P (b) 2. usage of odds instead of probabilities, where O(y) = P (y) P (y) P (ȳ) = P (y) 1 P (y).

5 Probabilistic Retrieval Models 5 Derivation of the BIR model Estimation of O(R qk, d T m) = odds that document with set of terms d T m will be relevant to qk represent document dm as binary vector x = (x1,..., xn) with 1, if ti d T m xi = 0, otherwise Apply Bayes Theorem: O(R qk, x) = P (R q k, x) P ( R qk, x) = P (R q k) P ( R qk) P ( x R, q k) P ( x R, qk) P ( x q k) P ( x qk) P (R qk): prob. that arbitrary doc. will be relevant to qk (generality of qk) P ( xm R, qk): prob. that arbitrary relevant doc. will have term vector x P ( xm R, qk): prob. that arbitrary nonrelevant doc. will have term vector x

6 Probabilistic Retrieval Models 6 Linked dependence assumption: P ( x R, qk) P ( x R, qk) = n i=1 P (xi R, qk) P (xi R, qk) O(R qk, x) = O(R qk) n i=1 P (xi R, qk) P (xi R, qk) split according to presence/absence of terms in the current document: O(R qk, x) = O(R qk) xi=1 P (xi=1 R, qk) P (xi=1 R, qk) xi=0 P (xi=0 R, qk) P (xi=0 R, qk). pik = P (xi=1 R, qk): prob. that ti occurs in arbitrary relevant doc. qik = P (xi=1 R, qk) prob. that ti occurs in arbitrary nonrelevant doc.

7 Probabilistic Retrieval Models 7 assume that pik = qik for all ti / q T k O(R qk, d T m) = O(R qk) = O(R qk) tiɛd T m qt k tiɛd T m qt k pik qik tiɛq T k \dt m 1 pik 1 qik pik(1 qik) qik(1 pik) tiɛq T k 1 pik 1 qik Only first product varies for different documents with respect to the same request qk regard only this product for ranking

8 Probabilistic Retrieval Models 8 use logarithm: cik = log p ik(1 qik) qik(1 pik) retrieval function: ϱbir(qk, dm) = tiɛd T m qt k cik

9 Probabilistic Retrieval Models Application of the BIR model Parameter estimation for qik qik = P (xi=1 R, qk): (probability that ti occurs in arbitrary nonrelevant document) assume that number of nonrelevant documents collection size N collection size ni # documents with term ti qik = n i N

10 Probabilistic Retrieval Models 10 Parameter estimation for pik pik = P (xi=1 R, qk): (probability that ti occurs in arbitrary relevant document) 1. assume global value p for all piks term weighting by inverse document frequency (IDF) cik = log p 1 p + log 1 q ik qik = cp + log N n i ni ϱidf (qk, dm) = ti q + log k T dt m(cp N n i ni ) often used: p = 0.5 cp = 0

11 Probabilistic Retrieval Models relevance feedback: initial ranking with IDF formula present top ranking documents to the user (about ) user gives binary relevance judgements: relevant/non-relevant r: # documents judged relevant for request qk ri: # relevant documents with term ti pik = P (ti R, qk) r i r improved estimates (see parameter estimation methods): pik r i r + 1

12 Probabilistic Retrieval Models 12 BIR example dm r(dm) x1 x2 P (R x) BIR dm r(dm) x1 x2 P (R x) BIR d1 R 1 1 d12 R 0 1 d2 R 1 1 d13 R 0 1 d3 R d14 R d4 R 1 1 d15 N 0 1 d5 N 1 1 d16 N 0 1 d6 R 1 0 d17 N 0 1 d7 R 1 0 d18 R 0 0 d8 R 1 0 d19 N d9 R 1 0 d20 N 0 0 d10 N 1 0 d11 N 1 0

13 Probabilistic Retrieval Models The Probability Ranking Principle (PRP) perfect retrieval: rank all relevant documents ahead of any nonrelevant one relates to objects itself, only possible with complete relevance information optimum retrieval: relates to representations (as any IR system does) Probability Ranking Principle (PRP) defines optimum retrieval for probabilistic models: rank documents according to decreasing probability of relevance

14 Probabilistic Retrieval Models Decision-theoretic justification of the PRP C: costs for the retrieval of a nonrelevant document C: costs for the retrieval of a relevant document. expected costs for the retrieval of a document dj: EC(q, dj) = C P (R q, dj) + C(1 P (R q, dj)) Total cost for retrieval: (assuming that user looks at first l documents, where l is not known in advance) r(i): ranking function, determines index of document to be placed at rank i EC(q, l) = EC(q, d r(1), d r(2),..., d r(l) ) l = EC(q, d r(i) ) i=1 Mimimum total costs minimize l i=1 EC(q, d r(i)) r(i) should order documents by ascending costs

15 Probabilistic Retrieval Models 15 decision-theoretic rule: EC(q, d r(i) ) EC(q, d r(i+1) ) C P (R q, d r(i) )+ C(1 P (R q, d r(i) )) C P (R q, d r(i+1) )+ C ( 1 P (R q, d r(i+1) ) ) (since C < C): P (R q, d r(i) ) P (R q, d r(i+1) ). rank documents according to their decreasing probability of relevance!

16 Probabilistic Retrieval Models Justification based on effectiveness measures for any two events a, b, Bayes theorem yields the following monotonic transformations of P (a b): (see derivation of BIR model) O(a b) = P (b a)p (a) P (b ā)p (ā) log O(a b) = log P (b a) P (b ā) + log O(a) logit P (a b) = log P (b a) P (b ā) + logit P (a) with logit P (x) = log O(x)

17 Probabilistic Retrieval Models 17 ρ = P (doc.retrieved doc.relevant) φ = P (doc.retrieved doc.nonrelevant) π = P (doc.relevant doc.retrieved) γ = P (doc.relevant) ρ(di) = P (doc.is di doc.relevant) φ(di) = P (doc.is di doc.nonrelevant) π(di) = P (doc.relevant doc.is di) (probability of relevance) S: set of retrieved documents

18 Probabilistic Retrieval Models 18 ρ = di S ρ(di) φ = di S φ(di) logit π(di) = log ρ(d i) φ(di) + logit γ ρ(di) = xi φ(di) with xi = exp(logit π(di) logit γ)

19 Probabilistic Retrieval Models cutoff defined by φ (fallout) φ = di S φ(di) ρ = di S ρ(di) = φ(di) exp(logit π(di) logit γ) maximize ρ (recall) by including docs with highest values of π(di) ˆ= rank according to prob. of relevance 2. given # documents retrieved maximize expected recall, minimize expected fallout 3. cutoff defined by ρ (recall) minimize fallout

20 Probabilistic Retrieval Models 20 logit π = log(ρ/φ) + logit γ 4. expected precision is maximized for given recall / fallout / # documents retrieved

21 Probabilistic Retrieval Models PRP for multivalued relevance scales n relevance values R1 < R2 <... < Rn corresponding costs for the retrieval of a document: C1, C2,..., Cn. rank documents according to their expected costs EC(q, dm) = n l=1 Cl P (Rl q, dm). comparison with binary case: nonbinary scale more appropriate for user n 1 estimates P (Rl q, dm) required cost factors Cl must be known contradicting experimental evidence so far

22 Probabilistic Retrieval Models 22 Combination of probabilistic and fuzzy retrieval Fuzzy retrieval: uses degree of relevance instead of binary scale system aims a computing degree of relevance for a query-document pair Combination: continuous relevance scale: rɛ[0, 1] replace probability distribution P (Rl q, dm) by density function p(r q, dm) replace cost factors Cl by cost function c(r).

23 Probabilistic Retrieval Models General concepts of probabilistic IR models Conceptual model Q Q Q D rel. ϱ R IR judg. αd βd D D D αq βq D

24 Probabilistic Retrieval Models 24 Representations and descriptions of the BIR model query representation qk = (q k T, qj k ): set of query terms q k T + set of relevance judgements q k J = {(d m, r(dm, qk))} query description q k D = {(t i, cik)}: set of query terms with associated weights document representation dm = d T m set of terms document description d D m = document representation d T m

25 Probabilistic Retrieval Models 25 directions in the development of probabilistic IR models: 1. Optimization of retrieval quality for a fixed representation (e.g. different dependence assumptions than in the BIR model) 2. models for more detailed representations (e.g. documents as bags of terms, phrases in addition to words)

26 Probabilistic Retrieval Models Parameter learning in IR terms terms terms documents dm documents documents learning qk learning application application learning application queries queries queries query-related learning document-related learning description-related learning Learning approaches in IR

27 Probabilistic Retrieval Models Event space Event space: Q D single element: query-document pair (qk, dm) all elements are equiprobable relevance judgement (qk, dm)ɛr relevance judgements for different documents w.r.t. the same query are independent of each other probability of relevance P (R qk, dm): probability of a an element of (qk, dm) being relevant regard collections as samples of possibly infinite sets poor representation of retrieval objects: single representation may stand for a number of different objects.

28 Probabilistic Retrieval Models 28 D Q q k qk d m dm Event space of relevance models

29 Probabilistic Retrieval Models Relevance models Optimum polynomial retrieval functions Basic concepts regard retrieval as (probabilistic) classification task (classify objects into one of n classes) objects: query-document pairs (q k, d m ) classes: relevance values Rl R = {R1,..., Rn} description-oriented approach: learning strategy abstracting from specific queries, documents and terms

30 Probabilistic Retrieval Models 30 (q,d) P(R q,d) P(R x(q,d)) description decision x(q,d) relevance description 1. description step map query-document pairs onto a feature vector x = x(qk, dm) 2. decision step apply classification functions el( x) for estimating P (Rl x(qk, dm)), l = 1,..., n classification functions are derived from learning sample with relevance judgements

31 Probabilistic Retrieval Models 31 description step example: element description x1 # descriptors common to query and document x2 log(# descriptors common to query and document) x3 highest indexing weight of a common descriptor x4 lowest indexing weight of a common descriptor x5 # common descriptors with weight 0.15 x6 # non-common descriptors with weight 0.15 x7 # descriptors in the document with weight 0.15 x8 log P (indexing weights of common descriptors) x9 log(# descriptors in the query) x10 log(min(size of output set, 100)) x11 = 1, if size of output set > 100 x12 = 1, if request about nuclear physics

32 Probabilistic Retrieval Models 32 decision step represent relevance judgement Rl = r(qk, dm) by a vector y = (y1,..., yn) with 1, if i = l yi = 0 otherwise. seek for regression function eopt( x) which yields optimum approximations ˆ y of the class vectors y. optimizing criterion: minimum squared errors E( y eopt( x) 2 )! = min. eopt( x) yields probabilistic estimates P (Rl x) in the components of ˆ y variation problem cannot be solved in its general form restrict search to predefined class of functions general variation problem parameter optimization task.

33 Probabilistic Retrieval Models 33 resulting functions yield least squares approximations of eopt: approximation with respect to the expression E( y ˆ y 2 )! = min yields the same result as an optimization fulfilling the condition E( E( y x ) ˆ y 2 )! = min. parameter optimiziation yields least squares approximations of P (Rl x(qk, dm)).

34 Probabilistic Retrieval Models 34 least square polynomials (LSP) approach: polynomials with a predefined structure as function classes define polynomial structure v ( x) = (v1,..., vl) with v ( x) = (1, x1, x2,..., xn, x 2 1, x1x2,...) N = number of dimensions of x class of polynomials is given by the components x l i x m j x n k... (i, j, k,... [1, N]; l, m, n,... 0) (mostly linear and quadratic polynomials regarded)

35 Probabilistic Retrieval Models 35 regression function: e ( x) = A T v ( x) where A = (ail) with i=1,..., L; l=1,..., n is the coefficient matrix P (Rl x) is approximated by the polynomial el( x) = a1l + a2l x1 + a3l x an+1,l xn + +an+2,l x an+3,l x1 x coefficient matrix is computed by solving the equation system E( v v T ) A = E( v y T ). (1)

36 Probabilistic Retrieval Models 36 development of an LSP function: 1. Statistical evaluation of a learning sample: use representative sample of request-document relationships together with relevance judgements derive pairs ( x, y ) compute empirical momental matrix M: M = ( v v T v y T ). (M contains both sides of the equation system)

37 Probabilistic Retrieval Models Computation of the coefficient matrix by means of the Gauss-Jordan algorithm choose coefficient which maximizes the reduction of the overall error s 2 matrix M before the ith elimination step: M (i) = (m (i) lj ) with l=1,..., n; j=1,..., L + n reduction d (i) j achieved by choosing component j: d (i) j = 1 m (i)2 jj L+n l=l+1 m (i)2 jl. (2) procedure yields preliminary solution e (i) ( x) after each iteration i (with i coefficients = result of limited optimization process) limited optimization feasible for small learning samples (avoid over-adaptation)

38 Probabilistic Retrieval Models 38 Example x rk y P (R1 x ) e (1) 1 ( x ) e(2) 1 ( x ) e(3) 1 ( x ) (1,1) R1 (1,0) (1,1) R1 (1,0) (1,1) R2 (0,1) (1,0) R1 (1,0) (1,0) R2 (0,1) (0,1) R1 (1,0) (0,1) R2 (0,1) (0,1) R2 (0,1) Define v( x) = (1, x1, x2)

39 Probabilistic Retrieval Models 39 M = 1 8 e (1) 1 ( x ) = 3 5 x 1 e (1) 2 = 2 5 x d (1) 1 = 0.50 d (1) 2 = 0.52 d (1) 3 = 0.50 M (2) = (2) (2) e (2) 1 = x1 e (2) 2 = x1.. d (2) 1 = 0.56 d (2) 3 = 0.27 M (3) = , (3) (3) e (3) 1 = x x2 e (3) 2 = x1 0.17x2.

40 Probabilistic Retrieval Models 40 x rk y P (R1 x ) e (3) 1 ( x ) e(3) 1 ( x ) (1,1) R1 (1,0) (1,1) R1 (1,0) (1,1) R2 (0,1) (1,0) R1 (1,0) (1,0) R2 (0,1) (0,1) R1 (1,0) (0,1) R2 (0,1) (0,1) R2 (0,1) (0,0) R2 (0,1) M = e (3) 1 = x x2

41 Probabilistic Retrieval Models 41 Experimental results effectiveness measure: normalized recall Rnorm (for multivalued relevance scales or preference relation) S + # doc. pairs in correct order S # doc. pairs in wrong order S max + max. # doc. pairs in correct order (1 + S+ S ) Rnorm = 1 2 S + max random ordering of documents will have Rnorm = 0.5 in the average.

42 Probabilistic Retrieval Models 42 retr.fct. sample R µ norm R M norm e1 B e e e e1 C e e e cosine e1 A e e e cosine

43 Probabilistic Retrieval Models 43 retr. relev. learn. test fct. scale sample sample R µ norm R M norm e1 R2 B C e1 R5 B C e1 R2 B A e1 R5 B A e2 R2 B C e2 R5 B C e2 R5 B A e2 R2 B A Retrieval function based on binary (R2) and multi-valued relevance (R5) scales

44 Probabilistic Retrieval Models 44 retr. relev. learn. test fct. scale sample sample R µ norm R M norm e1 R2 B/10 C e1 R5 B/10 C e1 R2 B/10 A e1 R5 B/10 A e2 R2 B/10 C e2 R5 B/10 C e2 R5 B/10 A e2 R2 B/10 A Retrieval function based on binary (R2) and multi-valued relevance (R5) scales adapted on a small (B/10 = every 10th request-document pair of B) learning sample

45 Probabilistic Retrieval Models 45 Model-oriented vs. description-oriented approaches model-oriented approaches: - refer to specific representation - based on certain explicit assumptions - strict theoretical model - quality depends on validity of assumptions description-oriented approach: - flexible w.r.t. representation - most assumptions implicit - heuristic definition of relevance description - better adaptation to the application data

46 Probabilistic Retrieval Models The BII model (Binary independence indexing) regards one document in relation to a number of queries qk query representation = set of terms q T k T binary vector zk =,..., z ) = with (zk1 kn 1, if ti q k T zki = 0, otherwise dm document representation: not further specified d T m terms with weights w.r.t. the document. P (R qk, dm) = P (R zk, dm): probability that document with representation dm will be judged relevant w.r.t. a query with representation qk = q k T apply Bayes theorem:

47 Probabilistic Retrieval Models 47 P (R zk, dm) = P (R dm) P ( z k R, dm) P ( zk dm) (3) P (R dm) prob. that dm will be judged relevant to arbitrary request P ( zk dm) prob. of query with rep. zk independence assumption: independent distribution of terms in all queries to which a document with representation dm is relevant: P ( zk R, dm) = n i=1 P (zki R, dm)

48 Probabilistic Retrieval Models 48 P (R zk, dm) = P (R d m) P ( zk dm) = P (R d m) P ( zk dm) n i=1 n i=1 P (zki R, d m) P, d (R zki m) P (R dm) P (zki dm) P ( zk dm) and P d (zki m) are independent of a specific document (since we always regard all documents w.r.t. a query): P (R zk, dm) = = n i=1 P (z ki) P ( zk) n i=1 P (z ) ki P ( zk) zk i =0 P (R dm) P (R dm) P = 0, d (R zki m) P (R dm) n i=1 zk i =1 P, d (R zki m) P (R dm) P = 1, d (R zki m) P (R dm) (4)

49 Probabilistic Retrieval Models 49 additional simplifying assumption: relevance of a doc. dm with respect to a query qk depends only on the terms from q k T, and not on other terms zk i =0 P (R zki = 0, dm) P (R dm) = 1 constant factor ck for a given query qk, not required for ranking: n i=1 P (z ki) P ( zk) = ck

50 Probabilistic Retrieval Models 50 P (R zki = 1, dm) = P (R ti, dm): probabilistic index term weight of ti w.r.t. dm = prob. that doc. dm will be judged relevant to an arbitrary query, containing ti. d T m should contain at least those terms from T for which P (R ti, dm) P (R dm). assume that P (R ti, dm) = P (R dm) for all ti / d T m: P (R qk, dm) = ck P (R dm) ti q T k dt m P (R ti, dm) P (R dm). (5) application of BII model in this form nearly impossible, because of lack of relevance information for specific term-document pairs.

51 Probabilistic Retrieval Models A description-oriented indexing approach Darmstadt Indexing Approach (t, d) P(C t, d) P(C x(t, d)) description decision x(t, d) relevance description regard features of terms in documents instead of the document-term pairs itself (description-related learning strategy)

52 Probabilistic Retrieval Models 52 description step relevance description x(ti, dm) contains attribute values of the term ti the document dm the term-document relationship approach makes no additional assumptions about the choice of the attributes or the structure of x actual definition of relevance descriptions can be adapted to the specific application context (representation of documents, amount of learning data available)

53 Probabilistic Retrieval Models 53 decision step instead of P (R ti, dm), estimate P (R x(ti, dm)) P (R ti, dm): regard a single document dm with respect to all queries containing ti P (R x(ti, dm)): regard set of all term-document pairs with the same relevance description x learning example L Q D R (query-document with relevance judgements) L = {(q k, d m, rkm)}. form relevance descriptions for the terms common to query and document : bag of relevance descriptions with relevance judgements L x = [(x(ti, dm), rkm) ti q k T dt m (q k, d m, rkm) L]. estimate parameters P (R x(ti, dm)) by applying probabilistic classification procedures (e.g. LSP) indexing function e(x(ti, dm))

54 Probabilistic Retrieval Models 54 example for description-oriented approach query doc. judg. term x q1 d1 R t1 (1, 1) t2 (0, 1) t3 (1, 2) q1 d2 R t 1 (0, 2) t3 (1, 1) t4 (0, 1) q2 d1 R t2 (0, 2) t5 (0, 2) t6 (1, 1) P (R x) x EX EBII (0, 1) 1/4 1/3 (0, 2) 2/3 1/2 (1, 1) 2/3 2/3 (1, 2) 1 1 t7 (1, 2) q2 d3 R t 5 (0, 1) t7 (0, 1)

55 Probabilistic Retrieval Models 55 2 different event spaces: EBII: equiprobable query-document pairs EX: equiprobable relevance descriptions

56 Probabilistic Retrieval Models 56 Indexing function Data used (same as in SMART): tfmi: within-document frequency (wdf) of ti in dm. max tfm: maximum wdf tfmi of all terms ti d T m. ni: number of documents in which ti occurs. D : number of documents in the collection. d T m : number of different terms in dm. tami: = 1, if ti occurs in the title of dm, and 0 otherwise.

57 Probabilistic Retrieval Models 57 Relevance description elements: x1 = tfmi x2 = 1/max tfm x3 = log(ni/ D ) x4 = log d T m x5 = tami Indexing functions: el = a0 + a1 tfmi + a2/max tfm + a3 log(ni/ D ) + a4 log d T m, eta = a0 + a1 tfmi + a2/max tfm + a3 log(ni/ D ) + a4 log d T m + a5tami

58 Probabilistic Retrieval Models 58 Retrieval functions q T k set of query terms d T m set of document terms umi indexing weight e( x(ti, dm)) cki query term weight (see below) utility theoretic retrieval function [Wong & Yao 89]: ϱ(qk, dm) = ti q T k dt m cki umi ϱbin binary query term weights (cki = 1 for all ti q T k ) ϱtf cki = within query frequency of ti in qk ϱtf idf cki = tf idf with within query frequencies

59 Probabilistic Retrieval Models 59 Experimental results tf idf el eta collection ϱtf idf ϱbin ϱtf ϱtf idf ϱbin ϱtf CACM % % % + 6.2% % CISI % + 9.0% - 2.2% % % CRAN % % + 2.8% % % INSPEC % + 8.8% + 0.5% - 6.5% + 8.2% NPL % % % - -

60 Probabilistic Retrieval Models 60 Comparison of different retrieval functions (test sample, top, Ex) average precision at three recall points (0.25, 0.50, 0.75) probabilistic indexing with ϱtf better than SMART approach SMART approach works without relevance information probabilistic indexing can be extended easily to more complex representations

61 Probabilistic Retrieval Models Retrieval with probabilistic indexing probabilistic model for ranking of documents with probabilistic indexing weights probabilistic indexing: assume a fixed number of binary indexings per document extended event space Q D I: Q set of queries D set of documents I set of indexers single event: query-document pair with relevance judgement specific (binary) indexing relevance descriptions for the terms w.r.t. the document

62 Probabilistic Retrieval Models 62 Representations and descriptions: query representation qk = (q k T, qj k ): (like in BIR model) set of query terms q k T + set of relevance judgements q k J = {(d m, r(dm, qk))} query description q k D : set of query terms with associated weights document representation: dm = ( dm, cm) d m = (dm1,..., d mn ), where d mi is the relevance description of t i w.r.t. dm cm =,..., c ) with (cm1 mn = Ci, if ti has been assigned to dm cmi Ci, otherwise document description d D m: set of terms with indexing weights

63 Probabilistic Retrieval Models 63 P (R qk, x): probability that document with relevance descriptions x is relevant w.r.t. qk apply Bayes theorem: O(R qk, x) = O(R qk) P ( x R, q k) P ( x R, qk) linked dependence assumption: P ( x R, qk) P ( x R, qk) = n i=1 P (xi R, qk) P (xi R, qk) yields O(R qk, x) = O(R qk) n i=1 P (xi R, qk) P (xi R, qk)

64 Probabilistic Retrieval Models 64 Assumptions: relevance description xi of a term ti depends only on the correctness of ti, independent of the correctness of other terms and relevance correctness of a term (w.r.t. a document) depends only on relevance, independent of the correctness of other terms

65 Probabilistic Retrieval Models 65 O(R qk, x) = O(R qk) n i=1 P (xi Ci) P (Ci R, qk) + P (xi Ci) P ( Ci R, qk) P (xi Ci) P (Ci R, qk) + P (xi Ci) P ( Ci R, qk) = O(R qk) n i=1 P (Ci xi) P (Ci) P (Ci R, qk) + P ( Ci xi) P ( Ci) P ( Ci R, qk) P (Ci xi) P (Ci) P (Ci R, qk) + P ( Ci xi) P ( Ci) P ( Ci R, qk) umi = P (Ci xi=dmi ): probabilistic indexing weight of t i w.r.t. dm = probability that arbitrary indexer assigned ti to dm qi = P (Ci): probability that arbitrary indexer assigned ti to arbitrary document = average indexing weight of ti in the collection pik = P (Ci R, qk): probability that arbitrary indexer assigned ti to arbitrary document relevant to qk = average indexing weight of ti in relevant documents rik = P (Ci R, qk): probability that arbitrary indexer assigned ti to arbitrary doc. nonrelevant to qk = average indexing weight of ti in nonrelevant docs

66 Probabilistic Retrieval Models 66 assume that P (Ci R, qk) = P (Ci R, qk) for all ti / q T k O(R qk, x= dm) = O(R qk) ti q T k um i qi um i qi pik + 1 u m i 1 qi rik + 1 u m i 1 qi (1 pik) (1 rik) approximation for P (Ci R, qk) P (Ci): O(R qk, x= dm) O(R qk) ti q T k umi qi pik + 1 u mi 1 qi (1 pik)

67 Probabilistic Retrieval Models 67 Parameter estimation D R k set of docs. judged relevant w.r.t. q k qi = pik = 1 D d j ɛd 1 D R k uji d j ɛd R k uji

68 Probabilistic Retrieval Models Parameter estimation parameter estimation affects retrieval quality observed in experiments! 2 problems: 1. estimation sample selection (random sample vs. top ranking documents) estimate parameters for nonrelevant documents from all documents not known to be relevant description-oriented approaches mostly assume a sample representative for the documents to be ranked (and not representative for the whole collection) 2. estimation method (data parameters)

69 Probabilistic Retrieval Models 69 Estimation methods collection of documents with features ei estimation of P (ei ej) BIR model: ej = relevance / non-relevance w.r.t. qk ei = presence / absence of a term ti g number of documents in (random) sample, where f documents with feature ej and h documents with ei and ej Task: computation of estimate p(ei ej, (h, f, g)) for P (ei ej), given (h, f, g) maximum likelihood estimate: p(ei ej, (h, f, g)) = h/f not defined for f = h = 0 biased estimate

70 Probabilistic Retrieval Models 70 Bayesian probability estimation Q parameter to be estimated (continuous random variable) f(q) prior distribution of parameter Q X discrete random variable with values x1, x2,... P (X=xk Q=q): probability that X will take the value xk, given that Q has the value q posterior distribution of q: g(q xk) = f(q) P (X=xk Q=q) f(q) P (X=x k Q=q)dq estimate for q requires application of further methods, e.g. cost function

71 Probabilistic Retrieval Models 71 Estimates for beta prior assume beta distribution as prior distribution: f(p) = 1 B(a, b) pa 1 (1 p) b 1 with B(a, b)=γ(a) Γ(b)/Γ(a + b) a, b > 0: parameters to be chosen application with (f, h) observed: g(p f h ) = p a 1 (1 p) b 1( ) f h p h (1 p) f h 1 0 pa 1 (1 p) (b 1)( f h) p h (1 p) f h dp B(a, b) = 1 0 pa 1 (1 p) b 1 dp yields posterior distribution: g(p f h ) = ph+a 1 (1 p) f h+b 1 B(h + a, f h + b)

72 Probabilistic Retrieval Models 72 apply loss function L(ˆp, pij) = (ˆp pij) 2 seek for estimate pl minimizing the expectation of the loss function: d dpl yields 1 0 L(p, pl)g(p)dp! = 0 pl = h + a f + a + b

73 Probabilistic Retrieval Models Prior (upper curve) and posterior distribution for a = b = 0.5, f = 3, h = 2

74 Probabilistic Retrieval Models Posterior distributions for a = b = 0.5, f = 3, h = 2 and f = 15, h = 10

75 Probabilistic Retrieval Models Prior (left curve) and posterior distribution for a = 2, b = 4, f = 3, h = 2

76 Probabilistic Retrieval Models Posterior distributions for a = 2, b = 4, f = 3, h = 2 and f = 15, h = 10

77 Probabilistic Retrieval Models 77 Experimental results f Zg (h, f) a b parameters of the beta distribution. derived from a sample of PHYS documents. Zg : # pairs (ei, ej) in the learning sample (g = ).

78 Probabilistic Retrieval Models 78 sample Zg estimate s 2 X 2852 pm L pl popt Z 2709 pm L pl popt different kinds of estimates for f= , h 4 and h f > 0.4 Zg : # pairs (ei, ej) s 2 : average quadratic error

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

More information

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Trevor Cohn (Slide credits: William Webber) COMP90042, 2015, Semester 1 What we ll learn in this lecture Probabilistic models for

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Language Models, Smoothing, and IDF Weighting

Language Models, Smoothing, and IDF Weighting Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

IR Models: The Probabilistic Model. Lecture 8

IR Models: The Probabilistic Model. Lecture 8 IR Models: The Probabilistic Model Lecture 8 ' * ) ( % $ $ +#! "#! '& & Probability of Relevance? ' ', IR is an uncertain process Information need to query Documents to index terms Query terms and index

More information

Probabilistic Information Retrieval

Probabilistic Information Retrieval Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23 Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

9.5. Lines and Planes. Introduction. Prerequisites. Learning Outcomes

9.5. Lines and Planes. Introduction. Prerequisites. Learning Outcomes Lines and Planes 9.5 Introduction Vectors are very convenient tools for analysing lines and planes in three dimensions. In this Section you will learn about direction ratios and direction cosines and then

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk

More information

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Predictive criteria for the representation of primes by binary quadratic forms

Predictive criteria for the representation of primes by binary quadratic forms ACTA ARITHMETICA LXX3 (1995) Predictive criteria for the representation of primes by binary quadratic forms by Joseph B Muskat (Ramat-Gan), Blair K Spearman (Kelowna, BC) and Kenneth S Williams (Ottawa,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data

A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 1 Sofia 2009 A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data Ch. Aswani Kumar 1,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Towards Collaborative Information Retrieval

Towards Collaborative Information Retrieval Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,

More information

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion) OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion) Outline The idea The model Learning a ranking function Experimental

More information

Information Retrieval Tutorial 6: Evaluation

Information Retrieval Tutorial 6: Evaluation Information Retrieval Tutorial 6: Evaluation Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-11-30 IR Evaluation 1 / 19 Overview IR Evaluation 2 / 19 Precision and recall Precision

More information

Language Models. Hongning Wang

Language Models. Hongning Wang Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Worst case analysis for a general class of on-line lot-sizing heuristics

Worst case analysis for a general class of on-line lot-sizing heuristics Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models 3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Data Mining 2018 Logistic Regression Text Classification

Data Mining 2018 Logistic Regression Text Classification Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 13: Query Expansion and Probabilistic Retrieval Paul Ginsparg Cornell University,

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Online Passive-Aggressive Algorithms. Tirgul 11

Online Passive-Aggressive Algorithms. Tirgul 11 Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

Lecture 15: Logistic Regression

Lecture 15: Logistic Regression Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Information Retrival Ranking. tf-idf. Eddie Aronovich. October 30, 2012

Information Retrival Ranking. tf-idf. Eddie Aronovich. October 30, 2012 October 30, 2012 Table of contents 1 2 Term Frequency tf idf Inforamation Retrival the techniques of storing and recovering and often disseminating recorded data especially through the use of a computerized

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Scribes: Ellis Weng, Andrew Owens February 11, 2010 1 Introduction In this lecture, we will introduce our second paradigm for

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models Language Models and Smoothing Methods for Collections with Large Variation in Document Length Najeeb Abdulmutalib and Norbert Fuhr najeeb@is.inf.uni-due.de, norbert.fuhr@uni-due.de Information Systems,

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

An Analytical Comparison between Bayes Point Machines and Support Vector Machines

An Analytical Comparison between Bayes Point Machines and Support Vector Machines An Analytical Comparison between Bayes Point Machines and Support Vector Machines Ashish Kapoor Massachusetts Institute of Technology Cambridge, MA 02139 kapoor@mit.edu Abstract This paper analyzes the

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Chapter 4: Advanced IR Models

Chapter 4: Advanced IR Models Chapter 4: Advanced I Models 4.1 robabilistic I 4.1.1 rinciples 4.1.2 robabilistic I with Term Independence 4.1.3 robabilistic I with 2-oisson Model (Okapi BM25) IDM WS 2005 4-1 4.1.1 robabilistic etrieval:

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information