Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23
Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem Probability Ranking Principle 3 4 5 Sumit Bhatia Probabilistic Information Retrieval 2/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs Sumit Bhatia Probabilistic Information Retrieval 3/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation Sumit Bhatia Probabilistic Information Retrieval 3/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation Sumit Bhatia Probabilistic Information Retrieval 3/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Sumit Bhatia Probabilistic Information Retrieval 3/23
Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Sumit Bhatia Probabilistic Information Retrieval 4/23
Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23
Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix Bag of words A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23
Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix Bag of words No Ranking A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23
Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Sumit Bhatia Probabilistic Information Retrieval 5/23
Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Query and Document vectors in term space Sumit Bhatia Probabilistic Information Retrieval 5/23
Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Query and Document vectors in term space Cosine similarity between query and document vectors indicates similarity Sumit Bhatia Probabilistic Information Retrieval 5/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process-Revisited 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Sumit Bhatia Probabilistic Information Retrieval 6/23
Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process-Revisited 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Problem! Both Query and Document Representations are Uncertain Sumit Bhatia Probabilistic Information Retrieval 6/23
Probability Basics Information Retrieval IR Models Probability Basics Chain Rule: P(A,B) = P(A B) = P(A B)P(B) = P(B A)P(A) Partition Rule: P(B) = P(A,B) + P(Ā,B) Bayes Rule: P(A B) = P(B A)P(A) P(B) = [ ] P P(B A) X {A,Ā} P(B X)P(X) P(A) Sumit Bhatia Probabilistic Information Retrieval 7/23
Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Sumit Bhatia Probabilistic Information Retrieval 8/23
Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Hint: We want the best document to be at rank 1, second best to be at rank 2 and so on. Sumit Bhatia Probabilistic Information Retrieval 8/23
Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Hint: We want the best document to be at rank 1, second best to be at rank 2 and so on. Solution Rank by probability of relevance of the document w.r.t. information need (query). = by P(R = 1 d,q) Sumit Bhatia Probabilistic Information Retrieval 8/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Probability Ranking Principle (Rijsbergen, 1979) If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. Sumit Bhatia Probabilistic Information Retrieval 9/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Probability Ranking Principle (Rijsbergen, 1979) If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. Observation 1: PRP maximizes the mean probability at rank k. Sumit Bhatia Probabilistic Information Retrieval 9/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Sumit Bhatia Probabilistic Information Retrieval 10/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Sumit Bhatia Probabilistic Information Retrieval 10/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Theorem 1 PRP is optimal, in the sense that it minimizes the expected loss (Bayes Risk) under 1/0 loss. Sumit Bhatia Probabilistic Information Retrieval 10/23
Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Theorem 1 PRP is optimal, in the sense that it minimizes the expected loss (Bayes Risk) under 1/0 loss. Case 2: PRP with differential retrieval costs C 1.P(R = 1 d, q) + C 0.P(R = 0 d, q) C 1.P(R = 1 d, q) + C 0.P(R = 0 d, q) Sumit Bhatia Probabilistic Information Retrieval 10/23
Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23
Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23
Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 3 Relevance of a document is independent of relevance of other documents 1 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23
Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 3 Relevance of a document is independent of relevance of other documents 1 Implications: 1 Many documents have the same representation. 2 No association between terms is considered. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23
Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Sumit Bhatia Probabilistic Information Retrieval 12/23
Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Using Bayes Rule, we have: P(R = 1 d, q) = P( d R = 1, q) P(R = 1 q) P( d q) P(R = 0 d, q) = P( d R = 0, q) P(R = 0 q) P( d q) (1) (2) Sumit Bhatia Probabilistic Information Retrieval 12/23
Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Using Bayes Rule, we have: P(R = 1 d, q) = P( d R = 1, q) P(R = 1 q) P( d q) P(R = 0 d, q) = P( d R = 0, q) P(R = 0 q) P( d q) (1) (2) Prior Relevance Probability Sumit Bhatia Probabilistic Information Retrieval 12/23
Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Sumit Bhatia Probabilistic Information Retrieval 13/23
Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = Document Independent! P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Sumit Bhatia Probabilistic Information Retrieval 13/23
Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = Document Independent! P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) What for the second term? (3) Sumit Bhatia Probabilistic Information Retrieval 13/23
Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Document Independent! What for the second term? Naive Bayes Assumption Sumit Bhatia Probabilistic Information Retrieval 13/23
Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Document Independent! What for the second term? Naive Bayes Assumption O(R d, q) m Π t=1 P( d t R = 1, q) P( d t R = 0, q) (4) Sumit Bhatia Probabilistic Information Retrieval 13/23
Binary Independence Model Observation 1: A term is either present in a document or not. Sumit Bhatia Probabilistic Information Retrieval 14/23
Binary Independence Model Observation 1: A term is either present in a document or not. O(R d, q) Π m P( d t = 1 R = 1, q) t:d t=1 P( d t = 1 R = 0, q). m P( d t = 0 R = 1, q) Π t:d t=0 P( d t = 0 R = 0, q) (5) Sumit Bhatia Probabilistic Information Retrieval 14/23
Binary Independence Model Observation 1: A term is either present in a document or not. O(R d, q) Π m P( d t = 1 R = 1, q) t:d t=1 P( d t = 1 R = 0, q). m P( d t = 0 R = 1, q) Π t:d t=0 P( d t = 0 R = 0, q) (5) document R = 1 R = 0 Term present d t = 1 p t u t Term absent d t = 0 1 p t 1 u t Sumit Bhatia Probabilistic Information Retrieval 14/23
Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Sumit Bhatia Probabilistic Information Retrieval 15/23
Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. O(R d, q) p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) Sumit Bhatia Probabilistic Information Retrieval 15/23
Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Manipulating: O(R d, q) O(R d, q) p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) p t (1 u t ) Π t:d t=q t=1 u t (1 p t ). 1 p t Πt:q t=1 (7) 1 u t Sumit Bhatia Probabilistic Information Retrieval 15/23
Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Manipulating: O(R d, q) O(R d, q) Constant for a given query! p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) p t (1 u t ) Π t:d t=q t=1 u t (1 p t ). 1 p t Πt:q t=1 (7) 1 u t Sumit Bhatia Probabilistic Information Retrieval 15/23
Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) (9) Sumit Bhatia Probabilistic Information Retrieval 16/23
Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) (9) Docs R=1 R=0 Total d i = 1 s n-s n d i = 0 S-s (N-n)-(S-s) N-n Total S N-S N Sumit Bhatia Probabilistic Information Retrieval 16/23
Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) substituting, we get: RSV d = t:d t=q t=1 (9) Docs R=1 R=0 Total d i = 1 s n-s n d i = 0 S-s (N-n)-(S-s) N-n Total S N-S N (s + 1 2 log )/(S s + 1 2 ) (n s + 1 2 )/(N n S + s + 1 2 ) (10) Sumit Bhatia Probabilistic Information Retrieval 16/23
Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! Sumit Bhatia Probabilistic Information Retrieval 17/23
Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! It is not so simple for relevant documents Estimating from known relevant documents (not always known) Assuming p t = constant, equivalent to IDF weighting only Sumit Bhatia Probabilistic Information Retrieval 17/23
Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! It is not so simple for relevant documents Estimating from known relevant documents (not always known) Assuming p t = constant, equivalent to IDF weighting only Difficulties in probability estimation and drastic assumptions makes achieving performance difficult Sumit Bhatia Probabilistic Information Retrieval 17/23
Weighting Scheme BIM does not consider term frequencies and document length. BM25 weighting scheme (Okapi weighting) by was developed to build a probabilistic model sensitive to these quantities. BM25 today is widely used and has shown good performance in a number of practical systems. Sumit Bhatia Probabilistic Information Retrieval 18/23
Weighting Scheme RSV d = t q { log N df t (k 1 + 1)tf td k 1 ((1 b) + b ( l d )) + tf td l av } (k 3 + 1)tf tq k 3 + tf tq where: N is the total number of documents, df t is the document frequency, i.e.,number of documents that contain the term t, tf td is the frequency of term t in document d, tf tq is the frequency of term t in query q, l d is the length of document d, l av is the average length of documents, k 1, k 3 and b are constants which are generally set to 2, 2 and.75 respectively. Sumit Bhatia Probabilistic Information Retrieval 19/23
What Next? Similarity between terms and documents - is this sufficient? Sumit Bhatia Probabilistic Information Retrieval 20/23
What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Sumit Bhatia Probabilistic Information Retrieval 20/23
What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Time and Location of user? Sumit Bhatia Probabilistic Information Retrieval 20/23
What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Time and Location of user? Different users might want different documents for same query? Sumit Bhatia Probabilistic Information Retrieval 20/23
What Next? Maximum Marginal Relevance [CG98] Rank documents so as to minimize similarity between returned documents Sumit Bhatia Probabilistic Information Retrieval 21/23
What Next? Maximum Marginal Relevance [CG98] Rank documents so as to minimize similarity between returned documents Result Diversification [Wan09] Rank documents so as to maximize mean relevance, given a variance level. Variance here determines the risk the user is willing to take Sumit Bhatia Probabilistic Information Retrieval 21/23
References Carbonell, Jaime and Goldstein, Jade, The use of MMR, diversity-based reranking for reordering documents and producing summaries, SIGIR, 1998, pp. 335 336. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, to information retrieval, Cambridge University Press, 2008. Jun Wang, Mean-variance analysis: A new document ranking theory in information retrieval, Advances in Information Retrieval, 2009, pp. 4 16. Sumit Bhatia Probabilistic Information Retrieval 22/23
QUESTIONS??? Sumit Bhatia Probabilistic Information Retrieval 23/23