6 Probabilistic Retrieval Models

Size: px

Start display at page:

Download "6 Probabilistic Retrieval Models"

Logan Harrell
5 years ago
Views:

1 Probabilistic Retrieval Models 1 6 Probabilistic Retrieval Models Notations Binary Independence Retrieval model Probability Ranking Principle

2 Probabilistic Retrieval Models Notations Q Q Q D rel. ϱ R IR judg. αd βd D D D αq βq D q Q query d D document qk Q: query representation dm D: document representation q D k QD : query description R: relevance scale d D m D D : document description ϱ: retrieval function

3 Probabilistic Retrieval Models Binary independence retrieval model Retrieval functions for binary indexing represent queries and documents as sets of terms T = {t1,..., tn} set of terms in the collection qk Q: query representation dm D: document representation simple retrieval function: coordination level match q T k : set of query terms d T m: set of document terms ϱcoord(qk, dm) = q T k d T m Binary independence retrieval (BIR) model: assign weights to query terms ϱbir(qk, dm) = ti q T k dt m cik

4 Probabilistic Retrieval Models Probabilistic foundation of the BIR model Basic techniques for the derivation of probabilistic models: 1. application of Bayes theorem:, P (a b) = P (a, b) P (b) = P (b a) P (a) P (b) 2. usage of odds instead of probabilities, where O(y) = P (y) P (y) P (ȳ) = P (y) 1 P (y).

5 Probabilistic Retrieval Models 5 Derivation of the BIR model Estimation of O(R qk, d T m) = odds that document with set of terms d T m will be relevant to qk represent document dm as binary vector x = (x1,..., xn) with 1, if ti d T m xi = 0, otherwise Apply Bayes Theorem: O(R qk, x) = P (R q k, x) P ( R qk, x) = P (R q k) P ( R qk) P ( x R, q k) P ( x R, qk) P ( x q k) P ( x qk) P (R qk): prob. that arbitrary doc. will be relevant to qk (generality of qk) P ( xm R, qk): prob. that arbitrary relevant doc. will have term vector x P ( xm R, qk): prob. that arbitrary nonrelevant doc. will have term vector x

6 Probabilistic Retrieval Models 6 Linked dependence assumption: P ( x R, qk) P ( x R, qk) = n i=1 P (xi R, qk) P (xi R, qk) O(R qk, x) = O(R qk) n i=1 P (xi R, qk) P (xi R, qk) split according to presence/absence of terms in the current document: O(R qk, x) = O(R qk) xi=1 P (xi=1 R, qk) P (xi=1 R, qk) xi=0 P (xi=0 R, qk) P (xi=0 R, qk). pik = P (xi=1 R, qk): prob. that ti occurs in arbitrary relevant doc. qik = P (xi=1 R, qk) prob. that ti occurs in arbitrary nonrelevant doc.

7 Probabilistic Retrieval Models 7 assume that pik = qik for all ti / q T k O(R qk, d T m) = O(R qk) = O(R qk) tiɛd T m qt k tiɛd T m qt k pik qik tiɛq T k \dt m 1 pik 1 qik pik(1 qik) qik(1 pik) tiɛq T k 1 pik 1 qik Only first product varies for different documents with respect to the same request qk regard only this product for ranking

8 Probabilistic Retrieval Models 8 use logarithm: cik = log p ik(1 qik) qik(1 pik) retrieval function: ϱbir(qk, dm) = tiɛd T m qt k cik

9 Probabilistic Retrieval Models Application of the BIR model Parameter estimation for qik qik = P (xi=1 R, qk): (probability that ti occurs in arbitrary nonrelevant document) assume that number of nonrelevant documents collection size N collection size ni # documents with term ti qik = n i N

10 Probabilistic Retrieval Models 10 Parameter estimation for pik pik = P (xi=1 R, qk): (probability that ti occurs in arbitrary relevant document) 1. assume global value p for all piks term weighting by inverse document frequency (IDF) cik = log p 1 p + log 1 q ik qik = cp + log N n i ni ϱidf (qk, dm) = ti q + log k T dt m(cp N n i ni ) often used: p = 0.5 cp = 0

11 Probabilistic Retrieval Models relevance feedback: initial ranking with IDF formula present top ranking documents to the user (about ) user gives binary relevance judgements: relevant/non-relevant r: # documents judged relevant for request qk ri: # relevant documents with term ti pik = P (ti R, qk) r i r improved estimates (see parameter estimation methods): pik r i r + 1

12 Probabilistic Retrieval Models 12 BIR example dm r(dm) x1 x2 P (R x) BIR dm r(dm) x1 x2 P (R x) BIR d1 R 1 1 d12 R 0 1 d2 R 1 1 d13 R 0 1 d3 R d14 R d4 R 1 1 d15 N 0 1 d5 N 1 1 d16 N 0 1 d6 R 1 0 d17 N 0 1 d7 R 1 0 d18 R 0 0 d8 R 1 0 d19 N d9 R 1 0 d20 N 0 0 d10 N 1 0 d11 N 1 0

13 Probabilistic Retrieval Models The Probability Ranking Principle (PRP) perfect retrieval: rank all relevant documents ahead of any nonrelevant one relates to objects itself, only possible with complete relevance information optimum retrieval: relates to representations (as any IR system does) Probability Ranking Principle (PRP) defines optimum retrieval for probabilistic models: rank documents according to decreasing probability of relevance

14 Probabilistic Retrieval Models Decision-theoretic justification of the PRP C: costs for the retrieval of a nonrelevant document C: costs for the retrieval of a relevant document. expected costs for the retrieval of a document dj: EC(q, dj) = C P (R q, dj) + C(1 P (R q, dj)) Total cost for retrieval: (assuming that user looks at first l documents, where l is not known in advance) r(i): ranking function, determines index of document to be placed at rank i EC(q, l) = EC(q, d r(1), d r(2),..., d r(l) ) l = EC(q, d r(i) ) i=1 Mimimum total costs minimize l i=1 EC(q, d r(i)) r(i) should order documents by ascending costs

15 Probabilistic Retrieval Models 15 decision-theoretic rule: EC(q, d r(i) ) EC(q, d r(i+1) ) C P (R q, d r(i) )+ C(1 P (R q, d r(i) )) C P (R q, d r(i+1) )+ C ( 1 P (R q, d r(i+1) ) ) (since C < C): P (R q, d r(i) ) P (R q, d r(i+1) ). rank documents according to their decreasing probability of relevance!

16 Probabilistic Retrieval Models Justification based on effectiveness measures for any two events a, b, Bayes theorem yields the following monotonic transformations of P (a b): (see derivation of BIR model) O(a b) = P (b a)p (a) P (b ā)p (ā) log O(a b) = log P (b a) P (b ā) + log O(a) logit P (a b) = log P (b a) P (b ā) + logit P (a) with logit P (x) = log O(x)

17 Probabilistic Retrieval Models 17 ρ = P (doc.retrieved doc.relevant) φ = P (doc.retrieved doc.nonrelevant) π = P (doc.relevant doc.retrieved) γ = P (doc.relevant) ρ(di) = P (doc.is di doc.relevant) φ(di) = P (doc.is di doc.nonrelevant) π(di) = P (doc.relevant doc.is di) (probability of relevance) S: set of retrieved documents

18 Probabilistic Retrieval Models 18 ρ = di S ρ(di) φ = di S φ(di) logit π(di) = log ρ(d i) φ(di) + logit γ ρ(di) = xi φ(di) with xi = exp(logit π(di) logit γ)

19 Probabilistic Retrieval Models cutoff defined by φ (fallout) φ = di S φ(di) ρ = di S ρ(di) = φ(di) exp(logit π(di) logit γ) maximize ρ (recall) by including docs with highest values of π(di) ˆ= rank according to prob. of relevance 2. given # documents retrieved maximize expected recall, minimize expected fallout 3. cutoff defined by ρ (recall) minimize fallout

20 Probabilistic Retrieval Models 20 logit π = log(ρ/φ) + logit γ 4. expected precision is maximized for given recall / fallout / # documents retrieved

21 Probabilistic Retrieval Models PRP for multivalued relevance scales n relevance values R1 < R2 <... < Rn corresponding costs for the retrieval of a document: C1, C2,..., Cn. rank documents according to their expected costs EC(q, dm) = n l=1 Cl P (Rl q, dm). comparison with binary case: nonbinary scale more appropriate for user n 1 estimates P (Rl q, dm) required cost factors Cl must be known contradicting experimental evidence so far

22 Probabilistic Retrieval Models 22 Combination of probabilistic and fuzzy retrieval Fuzzy retrieval: uses degree of relevance instead of binary scale system aims a computing degree of relevance for a query-document pair Combination: continuous relevance scale: rɛ[0, 1] replace probability distribution P (Rl q, dm) by density function p(r q, dm) replace cost factors Cl by cost function c(r).

23 Probabilistic Retrieval Models General concepts of probabilistic IR models Conceptual model Q Q Q D rel. ϱ R IR judg. αd βd D D D αq βq D

24 Probabilistic Retrieval Models 24 Representations and descriptions of the BIR model query representation qk = (q k T, qj k ): set of query terms q k T + set of relevance judgements q k J = {(d m, r(dm, qk))} query description q k D = {(t i, cik)}: set of query terms with associated weights document representation dm = d T m set of terms document description d D m = document representation d T m

25 Probabilistic Retrieval Models 25 directions in the development of probabilistic IR models: 1. Optimization of retrieval quality for a fixed representation (e.g. different dependence assumptions than in the BIR model) 2. models for more detailed representations (e.g. documents as bags of terms, phrases in addition to words)

26 Probabilistic Retrieval Models Parameter learning in IR terms terms terms documents dm documents documents learning qk learning application application learning application queries queries queries query-related learning document-related learning description-related learning Learning approaches in IR

27 Probabilistic Retrieval Models Event space Event space: Q D single element: query-document pair (qk, dm) all elements are equiprobable relevance judgement (qk, dm)ɛr relevance judgements for different documents w.r.t. the same query are independent of each other probability of relevance P (R qk, dm): probability of a an element of (qk, dm) being relevant regard collections as samples of possibly infinite sets poor representation of retrieval objects: single representation may stand for a number of different objects.

28 Probabilistic Retrieval Models 28 D Q q k qk d m dm Event space of relevance models

29 Probabilistic Retrieval Models Relevance models Optimum polynomial retrieval functions Basic concepts regard retrieval as (probabilistic) classification task (classify objects into one of n classes) objects: query-document pairs (q k, d m ) classes: relevance values Rl R = {R1,..., Rn} description-oriented approach: learning strategy abstracting from specific queries, documents and terms

30 Probabilistic Retrieval Models 30 (q,d) P(R q,d) P(R x(q,d)) description decision x(q,d) relevance description 1. description step map query-document pairs onto a feature vector x = x(qk, dm) 2. decision step apply classification functions el( x) for estimating P (Rl x(qk, dm)), l = 1,..., n classification functions are derived from learning sample with relevance judgements

31 Probabilistic Retrieval Models 31 description step example: element description x1 # descriptors common to query and document x2 log(# descriptors common to query and document) x3 highest indexing weight of a common descriptor x4 lowest indexing weight of a common descriptor x5 # common descriptors with weight 0.15 x6 # non-common descriptors with weight 0.15 x7 # descriptors in the document with weight 0.15 x8 log P (indexing weights of common descriptors) x9 log(# descriptors in the query) x10 log(min(size of output set, 100)) x11 = 1, if size of output set > 100 x12 = 1, if request about nuclear physics

32 Probabilistic Retrieval Models 32 decision step represent relevance judgement Rl = r(qk, dm) by a vector y = (y1,..., yn) with 1, if i = l yi = 0 otherwise. seek for regression function eopt( x) which yields optimum approximations ˆ y of the class vectors y. optimizing criterion: minimum squared errors E( y eopt( x) 2 )! = min. eopt( x) yields probabilistic estimates P (Rl x) in the components of ˆ y variation problem cannot be solved in its general form restrict search to predefined class of functions general variation problem parameter optimization task.

33 Probabilistic Retrieval Models 33 resulting functions yield least squares approximations of eopt: approximation with respect to the expression E( y ˆ y 2 )! = min yields the same result as an optimization fulfilling the condition E( E( y x ) ˆ y 2 )! = min. parameter optimiziation yields least squares approximations of P (Rl x(qk, dm)).

34 Probabilistic Retrieval Models 34 least square polynomials (LSP) approach: polynomials with a predefined structure as function classes define polynomial structure v ( x) = (v1,..., vl) with v ( x) = (1, x1, x2,..., xn, x 2 1, x1x2,...) N = number of dimensions of x class of polynomials is given by the components x l i x m j x n k... (i, j, k,... [1, N]; l, m, n,... 0) (mostly linear and quadratic polynomials regarded)

35 Probabilistic Retrieval Models 35 regression function: e ( x) = A T v ( x) where A = (ail) with i=1,..., L; l=1,..., n is the coefficient matrix P (Rl x) is approximated by the polynomial el( x) = a1l + a2l x1 + a3l x an+1,l xn + +an+2,l x an+3,l x1 x coefficient matrix is computed by solving the equation system E( v v T ) A = E( v y T ). (1)

36 Probabilistic Retrieval Models 36 development of an LSP function: 1. Statistical evaluation of a learning sample: use representative sample of request-document relationships together with relevance judgements derive pairs ( x, y ) compute empirical momental matrix M: M = ( v v T v y T ). (M contains both sides of the equation system)

37 Probabilistic Retrieval Models Computation of the coefficient matrix by means of the Gauss-Jordan algorithm choose coefficient which maximizes the reduction of the overall error s 2 matrix M before the ith elimination step: M (i) = (m (i) lj ) with l=1,..., n; j=1,..., L + n reduction d (i) j achieved by choosing component j: d (i) j = 1 m (i)2 jj L+n l=l+1 m (i)2 jl. (2) procedure yields preliminary solution e (i) ( x) after each iteration i (with i coefficients = result of limited optimization process) limited optimization feasible for small learning samples (avoid over-adaptation)

38 Probabilistic Retrieval Models 38 Example x rk y P (R1 x ) e (1) 1 ( x ) e(2) 1 ( x ) e(3) 1 ( x ) (1,1) R1 (1,0) (1,1) R1 (1,0) (1,1) R2 (0,1) (1,0) R1 (1,0) (1,0) R2 (0,1) (0,1) R1 (1,0) (0,1) R2 (0,1) (0,1) R2 (0,1) Define v( x) = (1, x1, x2)

39 Probabilistic Retrieval Models 39 M = 1 8 e (1) 1 ( x ) = 3 5 x 1 e (1) 2 = 2 5 x d (1) 1 = 0.50 d (1) 2 = 0.52 d (1) 3 = 0.50 M (2) = (2) (2) e (2) 1 = x1 e (2) 2 = x1.. d (2) 1 = 0.56 d (2) 3 = 0.27 M (3) = , (3) (3) e (3) 1 = x x2 e (3) 2 = x1 0.17x2.

40 Probabilistic Retrieval Models 40 x rk y P (R1 x ) e (3) 1 ( x ) e(3) 1 ( x ) (1,1) R1 (1,0) (1,1) R1 (1,0) (1,1) R2 (0,1) (1,0) R1 (1,0) (1,0) R2 (0,1) (0,1) R1 (1,0) (0,1) R2 (0,1) (0,1) R2 (0,1) (0,0) R2 (0,1) M = e (3) 1 = x x2

41 Probabilistic Retrieval Models 41 Experimental results effectiveness measure: normalized recall Rnorm (for multivalued relevance scales or preference relation) S + # doc. pairs in correct order S # doc. pairs in wrong order S max + max. # doc. pairs in correct order (1 + S+ S ) Rnorm = 1 2 S + max random ordering of documents will have Rnorm = 0.5 in the average.

42 Probabilistic Retrieval Models 42 retr.fct. sample R µ norm R M norm e1 B e e e e1 C e e e cosine e1 A e e e cosine

43 Probabilistic Retrieval Models 43 retr. relev. learn. test fct. scale sample sample R µ norm R M norm e1 R2 B C e1 R5 B C e1 R2 B A e1 R5 B A e2 R2 B C e2 R5 B C e2 R5 B A e2 R2 B A Retrieval function based on binary (R2) and multi-valued relevance (R5) scales

44 Probabilistic Retrieval Models 44 retr. relev. learn. test fct. scale sample sample R µ norm R M norm e1 R2 B/10 C e1 R5 B/10 C e1 R2 B/10 A e1 R5 B/10 A e2 R2 B/10 C e2 R5 B/10 C e2 R5 B/10 A e2 R2 B/10 A Retrieval function based on binary (R2) and multi-valued relevance (R5) scales adapted on a small (B/10 = every 10th request-document pair of B) learning sample

45 Probabilistic Retrieval Models 45 Model-oriented vs. description-oriented approaches model-oriented approaches: - refer to specific representation - based on certain explicit assumptions - strict theoretical model - quality depends on validity of assumptions description-oriented approach: - flexible w.r.t. representation - most assumptions implicit - heuristic definition of relevance description - better adaptation to the application data

46 Probabilistic Retrieval Models The BII model (Binary independence indexing) regards one document in relation to a number of queries qk query representation = set of terms q T k T binary vector zk =,..., z ) = with (zk1 kn 1, if ti q k T zki = 0, otherwise dm document representation: not further specified d T m terms with weights w.r.t. the document. P (R qk, dm) = P (R zk, dm): probability that document with representation dm will be judged relevant w.r.t. a query with representation qk = q k T apply Bayes theorem:

47 Probabilistic Retrieval Models 47 P (R zk, dm) = P (R dm) P ( z k R, dm) P ( zk dm) (3) P (R dm) prob. that dm will be judged relevant to arbitrary request P ( zk dm) prob. of query with rep. zk independence assumption: independent distribution of terms in all queries to which a document with representation dm is relevant: P ( zk R, dm) = n i=1 P (zki R, dm)

48 Probabilistic Retrieval Models 48 P (R zk, dm) = P (R d m) P ( zk dm) = P (R d m) P ( zk dm) n i=1 n i=1 P (zki R, d m) P, d (R zki m) P (R dm) P (zki dm) P ( zk dm) and P d (zki m) are independent of a specific document (since we always regard all documents w.r.t. a query): P (R zk, dm) = = n i=1 P (z ki) P ( zk) n i=1 P (z ) ki P ( zk) zk i =0 P (R dm) P (R dm) P = 0, d (R zki m) P (R dm) n i=1 zk i =1 P, d (R zki m) P (R dm) P = 1, d (R zki m) P (R dm) (4)

49 Probabilistic Retrieval Models 49 additional simplifying assumption: relevance of a doc. dm with respect to a query qk depends only on the terms from q k T, and not on other terms zk i =0 P (R zki = 0, dm) P (R dm) = 1 constant factor ck for a given query qk, not required for ranking: n i=1 P (z ki) P ( zk) = ck

50 Probabilistic Retrieval Models 50 P (R zki = 1, dm) = P (R ti, dm): probabilistic index term weight of ti w.r.t. dm = prob. that doc. dm will be judged relevant to an arbitrary query, containing ti. d T m should contain at least those terms from T for which P (R ti, dm) P (R dm). assume that P (R ti, dm) = P (R dm) for all ti / d T m: P (R qk, dm) = ck P (R dm) ti q T k dt m P (R ti, dm) P (R dm). (5) application of BII model in this form nearly impossible, because of lack of relevance information for specific term-document pairs.

51 Probabilistic Retrieval Models A description-oriented indexing approach Darmstadt Indexing Approach (t, d) P(C t, d) P(C x(t, d)) description decision x(t, d) relevance description regard features of terms in documents instead of the document-term pairs itself (description-related learning strategy)

52 Probabilistic Retrieval Models 52 description step relevance description x(ti, dm) contains attribute values of the term ti the document dm the term-document relationship approach makes no additional assumptions about the choice of the attributes or the structure of x actual definition of relevance descriptions can be adapted to the specific application context (representation of documents, amount of learning data available)

53 Probabilistic Retrieval Models 53 decision step instead of P (R ti, dm), estimate P (R x(ti, dm)) P (R ti, dm): regard a single document dm with respect to all queries containing ti P (R x(ti, dm)): regard set of all term-document pairs with the same relevance description x learning example L Q D R (query-document with relevance judgements) L = {(q k, d m, rkm)}. form relevance descriptions for the terms common to query and document : bag of relevance descriptions with relevance judgements L x = [(x(ti, dm), rkm) ti q k T dt m (q k, d m, rkm) L]. estimate parameters P (R x(ti, dm)) by applying probabilistic classification procedures (e.g. LSP) indexing function e(x(ti, dm))

54 Probabilistic Retrieval Models 54 example for description-oriented approach query doc. judg. term x q1 d1 R t1 (1, 1) t2 (0, 1) t3 (1, 2) q1 d2 R t 1 (0, 2) t3 (1, 1) t4 (0, 1) q2 d1 R t2 (0, 2) t5 (0, 2) t6 (1, 1) P (R x) x EX EBII (0, 1) 1/4 1/3 (0, 2) 2/3 1/2 (1, 1) 2/3 2/3 (1, 2) 1 1 t7 (1, 2) q2 d3 R t 5 (0, 1) t7 (0, 1)

55 Probabilistic Retrieval Models 55 2 different event spaces: EBII: equiprobable query-document pairs EX: equiprobable relevance descriptions

56 Probabilistic Retrieval Models 56 Indexing function Data used (same as in SMART): tfmi: within-document frequency (wdf) of ti in dm. max tfm: maximum wdf tfmi of all terms ti d T m. ni: number of documents in which ti occurs. D : number of documents in the collection. d T m : number of different terms in dm. tami: = 1, if ti occurs in the title of dm, and 0 otherwise.

57 Probabilistic Retrieval Models 57 Relevance description elements: x1 = tfmi x2 = 1/max tfm x3 = log(ni/ D ) x4 = log d T m x5 = tami Indexing functions: el = a0 + a1 tfmi + a2/max tfm + a3 log(ni/ D ) + a4 log d T m, eta = a0 + a1 tfmi + a2/max tfm + a3 log(ni/ D ) + a4 log d T m + a5tami

58 Probabilistic Retrieval Models 58 Retrieval functions q T k set of query terms d T m set of document terms umi indexing weight e( x(ti, dm)) cki query term weight (see below) utility theoretic retrieval function [Wong & Yao 89]: ϱ(qk, dm) = ti q T k dt m cki umi ϱbin binary query term weights (cki = 1 for all ti q T k ) ϱtf cki = within query frequency of ti in qk ϱtf idf cki = tf idf with within query frequencies

59 Probabilistic Retrieval Models 59 Experimental results tf idf el eta collection ϱtf idf ϱbin ϱtf ϱtf idf ϱbin ϱtf CACM % % % + 6.2% % CISI % + 9.0% - 2.2% % % CRAN % % + 2.8% % % INSPEC % + 8.8% + 0.5% - 6.5% + 8.2% NPL % % % - -

60 Probabilistic Retrieval Models 60 Comparison of different retrieval functions (test sample, top, Ex) average precision at three recall points (0.25, 0.50, 0.75) probabilistic indexing with ϱtf better than SMART approach SMART approach works without relevance information probabilistic indexing can be extended easily to more complex representations

61 Probabilistic Retrieval Models Retrieval with probabilistic indexing probabilistic model for ranking of documents with probabilistic indexing weights probabilistic indexing: assume a fixed number of binary indexings per document extended event space Q D I: Q set of queries D set of documents I set of indexers single event: query-document pair with relevance judgement specific (binary) indexing relevance descriptions for the terms w.r.t. the document

62 Probabilistic Retrieval Models 62 Representations and descriptions: query representation qk = (q k T, qj k ): (like in BIR model) set of query terms q k T + set of relevance judgements q k J = {(d m, r(dm, qk))} query description q k D : set of query terms with associated weights document representation: dm = ( dm, cm) d m = (dm1,..., d mn ), where d mi is the relevance description of t i w.r.t. dm cm =,..., c ) with (cm1 mn = Ci, if ti has been assigned to dm cmi Ci, otherwise document description d D m: set of terms with indexing weights

63 Probabilistic Retrieval Models 63 P (R qk, x): probability that document with relevance descriptions x is relevant w.r.t. qk apply Bayes theorem: O(R qk, x) = O(R qk) P ( x R, q k) P ( x R, qk) linked dependence assumption: P ( x R, qk) P ( x R, qk) = n i=1 P (xi R, qk) P (xi R, qk) yields O(R qk, x) = O(R qk) n i=1 P (xi R, qk) P (xi R, qk)

64 Probabilistic Retrieval Models 64 Assumptions: relevance description xi of a term ti depends only on the correctness of ti, independent of the correctness of other terms and relevance correctness of a term (w.r.t. a document) depends only on relevance, independent of the correctness of other terms

65 Probabilistic Retrieval Models 65 O(R qk, x) = O(R qk) n i=1 P (xi Ci) P (Ci R, qk) + P (xi Ci) P ( Ci R, qk) P (xi Ci) P (Ci R, qk) + P (xi Ci) P ( Ci R, qk) = O(R qk) n i=1 P (Ci xi) P (Ci) P (Ci R, qk) + P ( Ci xi) P ( Ci) P ( Ci R, qk) P (Ci xi) P (Ci) P (Ci R, qk) + P ( Ci xi) P ( Ci) P ( Ci R, qk) umi = P (Ci xi=dmi ): probabilistic indexing weight of t i w.r.t. dm = probability that arbitrary indexer assigned ti to dm qi = P (Ci): probability that arbitrary indexer assigned ti to arbitrary document = average indexing weight of ti in the collection pik = P (Ci R, qk): probability that arbitrary indexer assigned ti to arbitrary document relevant to qk = average indexing weight of ti in relevant documents rik = P (Ci R, qk): probability that arbitrary indexer assigned ti to arbitrary doc. nonrelevant to qk = average indexing weight of ti in nonrelevant docs

66 Probabilistic Retrieval Models 66 assume that P (Ci R, qk) = P (Ci R, qk) for all ti / q T k O(R qk, x= dm) = O(R qk) ti q T k um i qi um i qi pik + 1 u m i 1 qi rik + 1 u m i 1 qi (1 pik) (1 rik) approximation for P (Ci R, qk) P (Ci): O(R qk, x= dm) O(R qk) ti q T k umi qi pik + 1 u mi 1 qi (1 pik)

67 Probabilistic Retrieval Models 67 Parameter estimation D R k set of docs. judged relevant w.r.t. q k qi = pik = 1 D d j ɛd 1 D R k uji d j ɛd R k uji

68 Probabilistic Retrieval Models Parameter estimation parameter estimation affects retrieval quality observed in experiments! 2 problems: 1. estimation sample selection (random sample vs. top ranking documents) estimate parameters for nonrelevant documents from all documents not known to be relevant description-oriented approaches mostly assume a sample representative for the documents to be ranked (and not representative for the whole collection) 2. estimation method (data parameters)

69 Probabilistic Retrieval Models 69 Estimation methods collection of documents with features ei estimation of P (ei ej) BIR model: ej = relevance / non-relevance w.r.t. qk ei = presence / absence of a term ti g number of documents in (random) sample, where f documents with feature ej and h documents with ei and ej Task: computation of estimate p(ei ej, (h, f, g)) for P (ei ej), given (h, f, g) maximum likelihood estimate: p(ei ej, (h, f, g)) = h/f not defined for f = h = 0 biased estimate

70 Probabilistic Retrieval Models 70 Bayesian probability estimation Q parameter to be estimated (continuous random variable) f(q) prior distribution of parameter Q X discrete random variable with values x1, x2,... P (X=xk Q=q): probability that X will take the value xk, given that Q has the value q posterior distribution of q: g(q xk) = f(q) P (X=xk Q=q) f(q) P (X=x k Q=q)dq estimate for q requires application of further methods, e.g. cost function

71 Probabilistic Retrieval Models 71 Estimates for beta prior assume beta distribution as prior distribution: f(p) = 1 B(a, b) pa 1 (1 p) b 1 with B(a, b)=γ(a) Γ(b)/Γ(a + b) a, b > 0: parameters to be chosen application with (f, h) observed: g(p f h ) = p a 1 (1 p) b 1( ) f h p h (1 p) f h 1 0 pa 1 (1 p) (b 1)( f h) p h (1 p) f h dp B(a, b) = 1 0 pa 1 (1 p) b 1 dp yields posterior distribution: g(p f h ) = ph+a 1 (1 p) f h+b 1 B(h + a, f h + b)

72 Probabilistic Retrieval Models 72 apply loss function L(ˆp, pij) = (ˆp pij) 2 seek for estimate pl minimizing the expectation of the loss function: d dpl yields 1 0 L(p, pl)g(p)dp! = 0 pl = h + a f + a + b

73 Probabilistic Retrieval Models Prior (upper curve) and posterior distribution for a = b = 0.5, f = 3, h = 2

74 Probabilistic Retrieval Models Posterior distributions for a = b = 0.5, f = 3, h = 2 and f = 15, h = 10

75 Probabilistic Retrieval Models Prior (left curve) and posterior distribution for a = 2, b = 4, f = 3, h = 2

76 Probabilistic Retrieval Models Posterior distributions for a = 2, b = 4, f = 3, h = 2 and f = 15, h = 10

77 Probabilistic Retrieval Models 77 Experimental results f Zg (h, f) a b parameters of the beta distribution. derived from a sample of PHYS documents. Zg : # pairs (ei, ej) in the learning sample (g = ).

78 Probabilistic Retrieval Models 78 sample Zg estimate s 2 X 2852 pm L pl popt Z 2709 pm L pl popt different kinds of estimates for f= , h 4 and h f > 0.4 Zg : # pairs (ei, ej) s 2 : average quadratic error

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to