Probabilistic Information Retrieval

Similar documents
Information Retrieval

PV211: Introduction to Information Retrieval

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Information Retrieval Basic IR models. Luca Bondi

IR Models: The Probabilistic Model. Lecture 8

Ranked Retrieval (2)

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

CS47300: Web Information Search and Management

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Machine Learning for natural language processing

Information Retrieval

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Machine Learning for natural language processing

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Chap 2: Classical models for information retrieval

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Modern Information Retrieval

Boolean and Vector Space Retrieval Models

PV211: Introduction to Information Retrieval

Dealing with Text Databases

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Information Retrieval

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

CSCE 561 Information Retrieval System Models

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

Information Retrieval

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

CMPS 561 Boolean Retrieval. Ryan Benton Sept. 7, 2011

CAIM: Cerca i Anàlisi d Informació Massiva

Scoring, Term Weighting and the Vector Space

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science

Compact Indexes for Flexible Top-k Retrieval


Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Document indexing, similarities and retrieval in large scale text collections

Information Retrieval

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

A Risk Minimization Framework for Information Retrieval

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

How Latent Semantic Indexing Solves the Pachyderm Problem

Effectiveness of complex index terms in information retrieval

6 Probabilistic Retrieval Models

Towards modeling implicit feedback with quantum entanglement

Advanced Topics in Information Retrieval 5. Diversity & Novelty

3. Basics of Information Retrieval

Query Propagation in Possibilistic Information Retrieval Networks

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course.

Improving Diversity in Ranking using Absorbing Random Walks

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Variable Latent Semantic Indexing

Probability Theory for Machine Learning. Chris Cremer September 2015

Name: Matriculation Number: Tutorial Group: A B C D E

Information Retrieval and Web Search Engines

Information Retrieval

Basic Probability and Decisions

Text mining and natural language analysis. Jefrey Lijffijt

Data Mining and Matrices

Lecture 3: Probabilistic Retrieval Models

Comparing Relevance Feedback Techniques on German News Articles

A Study of the Dirichlet Priors for Term Frequency Normalisation

Non-Boolean models of retrieval: Agenda

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

vector space retrieval many slides courtesy James Amherst

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson

INFO 630 / CS 674 Lecture Notes

Machine Learning Algorithm. Heejun Kim

{ p if x = 1 1 p if x = 0

Computational Genomics

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Undirected Graphical Models

Dept. of Linguistics, Indiana University Fall 2015

Probabilistic Graphical Models (I)

Molecular Similarity Searching Using Inference Network

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Modeling the Score Distributions of Relevant and Non-relevant Documents

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

David Giles Bayesian Econometrics

16 The Information Retrieval "Data Model"

A Latent Variable Graphical Model Derivation of Diversity for Set-based Retrieval

Behavioral Data Mining. Lecture 2

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

Lecture 13: More uses of Language Models

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Transcription:

Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23

Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem Probability Ranking Principle 3 4 5 Sumit Bhatia Probabilistic Information Retrieval 2/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs Sumit Bhatia Probabilistic Information Retrieval 3/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation Sumit Bhatia Probabilistic Information Retrieval 3/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation Sumit Bhatia Probabilistic Information Retrieval 3/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Sumit Bhatia Probabilistic Information Retrieval 3/23

Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Sumit Bhatia Probabilistic Information Retrieval 4/23

Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23

Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix Bag of words A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23

Boolean Retrieval Model Information Retrieval IR Models Probability Basics Query = Boolean Expression of terms ex. Mitra AND Giles Document = Term-document Matrix Bag of words No Ranking A ij = 1 iff i th term is present in j th document. Sumit Bhatia Probabilistic Information Retrieval 4/23

Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Sumit Bhatia Probabilistic Information Retrieval 5/23

Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Query and Document vectors in term space Sumit Bhatia Probabilistic Information Retrieval 5/23

Vector Space Model Information Retrieval IR Models Probability Basics Query = free text query ex. Mitra Giles Query and Document vectors in term space Cosine similarity between query and document vectors indicates similarity Sumit Bhatia Probabilistic Information Retrieval 5/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process-Revisited 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Sumit Bhatia Probabilistic Information Retrieval 6/23

Information Retrieval IR Models Probability Basics Information Retrieval(IR) Process-Revisited 1 User has some information needs 2 Information Need Query using Query Representation 3 Documents Document Representation 4 IR system matches the two representations to determine the documents that satisfy user s information needs. Problem! Both Query and Document Representations are Uncertain Sumit Bhatia Probabilistic Information Retrieval 6/23

Probability Basics Information Retrieval IR Models Probability Basics Chain Rule: P(A,B) = P(A B) = P(A B)P(B) = P(B A)P(A) Partition Rule: P(B) = P(A,B) + P(Ā,B) Bayes Rule: P(A B) = P(B A)P(A) P(B) = [ ] P P(B A) X {A,Ā} P(B X)P(X) P(A) Sumit Bhatia Probabilistic Information Retrieval 7/23

Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Sumit Bhatia Probabilistic Information Retrieval 8/23

Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Hint: We want the best document to be at rank 1, second best to be at rank 2 and so on. Sumit Bhatia Probabilistic Information Retrieval 8/23

Document Ranking Problem Document Ranking Problem Probability Ranking Principle Problem Statement Given a set of documents D = {d 1,d 2,...,d n } and a query q, in what order the subset of relevant documents D r = {d r1,d r2...,d rm } should be returned to the user. Hint: We want the best document to be at rank 1, second best to be at rank 2 and so on. Solution Rank by probability of relevance of the document w.r.t. information need (query). = by P(R = 1 d,q) Sumit Bhatia Probabilistic Information Retrieval 8/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Probability Ranking Principle (Rijsbergen, 1979) If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. Sumit Bhatia Probabilistic Information Retrieval 9/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Probability Ranking Principle (Rijsbergen, 1979) If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. Observation 1: PRP maximizes the mean probability at rank k. Sumit Bhatia Probabilistic Information Retrieval 9/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Sumit Bhatia Probabilistic Information Retrieval 10/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Sumit Bhatia Probabilistic Information Retrieval 10/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Theorem 1 PRP is optimal, in the sense that it minimizes the expected loss (Bayes Risk) under 1/0 loss. Sumit Bhatia Probabilistic Information Retrieval 10/23

Probability Ranking Principle Document Ranking Problem Probability Ranking Principle Case 1: 1/0 Loss = No selection/retrieval costs. Bayes Optimal Decision Rule: d is relevant iff P(R = 1 d,q) > P(R = 0 d,q) Theorem 1 PRP is optimal, in the sense that it minimizes the expected loss (Bayes Risk) under 1/0 loss. Case 2: PRP with differential retrieval costs C 1.P(R = 1 d, q) + C 0.P(R = 0 d, q) C 1.P(R = 1 d, q) + C 0.P(R = 0 d, q) Sumit Bhatia Probabilistic Information Retrieval 10/23

Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23

Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23

Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 3 Relevance of a document is independent of relevance of other documents 1 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23

Binary Independence Model (BIM) Assumptions: 1 Binary: documents are represented as binary incidence vectors of terms. d = {d 1,d 2,...,d n } d i = 1 iff term i is present in d, else it is 0. 2 Independence: terms occur in documents independent of other documents. 3 Relevance of a document is independent of relevance of other documents 1 Implications: 1 Many documents have the same representation. 2 No association between terms is considered. 1 This is the assumption for PRP in general. Sumit Bhatia Probabilistic Information Retrieval 11/23

Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Sumit Bhatia Probabilistic Information Retrieval 12/23

Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Using Bayes Rule, we have: P(R = 1 d, q) = P( d R = 1, q) P(R = 1 q) P( d q) P(R = 0 d, q) = P( d R = 0, q) P(R = 0 q) P( d q) (1) (2) Sumit Bhatia Probabilistic Information Retrieval 12/23

Binary Independence Model (BIM) We wish to compute P(R d,q). We do it in terms of term incidence vectors d and q. We thus compute P(R d, q). Using Bayes Rule, we have: P(R = 1 d, q) = P( d R = 1, q) P(R = 1 q) P( d q) P(R = 0 d, q) = P( d R = 0, q) P(R = 0 q) P( d q) (1) (2) Prior Relevance Probability Sumit Bhatia Probabilistic Information Retrieval 12/23

Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Sumit Bhatia Probabilistic Information Retrieval 13/23

Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = Document Independent! P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Sumit Bhatia Probabilistic Information Retrieval 13/23

Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = Document Independent! P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) What for the second term? (3) Sumit Bhatia Probabilistic Information Retrieval 13/23

Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Document Independent! What for the second term? Naive Bayes Assumption Sumit Bhatia Probabilistic Information Retrieval 13/23

Binary Independence Model Computing the Odd ratios, we get: O(R d, q) = P(R = 1 q) P(R = 0 q) P( d R = 1, q) P( d R = 0, q) (3) Document Independent! What for the second term? Naive Bayes Assumption O(R d, q) m Π t=1 P( d t R = 1, q) P( d t R = 0, q) (4) Sumit Bhatia Probabilistic Information Retrieval 13/23

Binary Independence Model Observation 1: A term is either present in a document or not. Sumit Bhatia Probabilistic Information Retrieval 14/23

Binary Independence Model Observation 1: A term is either present in a document or not. O(R d, q) Π m P( d t = 1 R = 1, q) t:d t=1 P( d t = 1 R = 0, q). m P( d t = 0 R = 1, q) Π t:d t=0 P( d t = 0 R = 0, q) (5) Sumit Bhatia Probabilistic Information Retrieval 14/23

Binary Independence Model Observation 1: A term is either present in a document or not. O(R d, q) Π m P( d t = 1 R = 1, q) t:d t=1 P( d t = 1 R = 0, q). m P( d t = 0 R = 1, q) Π t:d t=0 P( d t = 0 R = 0, q) (5) document R = 1 R = 0 Term present d t = 1 p t u t Term absent d t = 0 1 p t 1 u t Sumit Bhatia Probabilistic Information Retrieval 14/23

Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Sumit Bhatia Probabilistic Information Retrieval 15/23

Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. O(R d, q) p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) Sumit Bhatia Probabilistic Information Retrieval 15/23

Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Manipulating: O(R d, q) O(R d, q) p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) p t (1 u t ) Π t:d t=q t=1 u t (1 p t ). 1 p t Πt:q t=1 (7) 1 u t Sumit Bhatia Probabilistic Information Retrieval 15/23

Binary Independence Model Assumption: A term not in query is equally likey to occur in relevant and non-relevant documents. Manipulating: O(R d, q) O(R d, q) Constant for a given query! p t Π. t:d t=q t=1 u t Π t:d t=0,q t=1 1 p t 1 u t (6) p t (1 u t ) Π t:d t=q t=1 u t (1 p t ). 1 p t Πt:q t=1 (7) 1 u t Sumit Bhatia Probabilistic Information Retrieval 15/23

Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) (9) Sumit Bhatia Probabilistic Information Retrieval 16/23

Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) (9) Docs R=1 R=0 Total d i = 1 s n-s n d i = 0 S-s (N-n)-(S-s) N-n Total S N-S N Sumit Bhatia Probabilistic Information Retrieval 16/23

Binary Independence Model RSV d = log = Π t:d t=q t=1 t:d t=q t=1 p t (1 u t ) u t (1 p t ) (8) log p t(1 u t ) u t (1 p t ) substituting, we get: RSV d = t:d t=q t=1 (9) Docs R=1 R=0 Total d i = 1 s n-s n d i = 0 S-s (N-n)-(S-s) N-n Total S N-S N (s + 1 2 log )/(S s + 1 2 ) (n s + 1 2 )/(N n S + s + 1 2 ) (10) Sumit Bhatia Probabilistic Information Retrieval 16/23

Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! Sumit Bhatia Probabilistic Information Retrieval 17/23

Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! It is not so simple for relevant documents Estimating from known relevant documents (not always known) Assuming p t = constant, equivalent to IDF weighting only Sumit Bhatia Probabilistic Information Retrieval 17/23

Observations Probabilities for non-relevant documents can be approximated by collection statistics. = log (1 u t) (N n) = log log N u t n n = IDF! It is not so simple for relevant documents Estimating from known relevant documents (not always known) Assuming p t = constant, equivalent to IDF weighting only Difficulties in probability estimation and drastic assumptions makes achieving performance difficult Sumit Bhatia Probabilistic Information Retrieval 17/23

Weighting Scheme BIM does not consider term frequencies and document length. BM25 weighting scheme (Okapi weighting) by was developed to build a probabilistic model sensitive to these quantities. BM25 today is widely used and has shown good performance in a number of practical systems. Sumit Bhatia Probabilistic Information Retrieval 18/23

Weighting Scheme RSV d = t q { log N df t (k 1 + 1)tf td k 1 ((1 b) + b ( l d )) + tf td l av } (k 3 + 1)tf tq k 3 + tf tq where: N is the total number of documents, df t is the document frequency, i.e.,number of documents that contain the term t, tf td is the frequency of term t in document d, tf tq is the frequency of term t in query q, l d is the length of document d, l av is the average length of documents, k 1, k 3 and b are constants which are generally set to 2, 2 and.75 respectively. Sumit Bhatia Probabilistic Information Retrieval 19/23

What Next? Similarity between terms and documents - is this sufficient? Sumit Bhatia Probabilistic Information Retrieval 20/23

What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Sumit Bhatia Probabilistic Information Retrieval 20/23

What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Time and Location of user? Sumit Bhatia Probabilistic Information Retrieval 20/23

What Next? Similarity between terms and documents - is this sufficient? JAVA: Coffee or Computer Language or Place? Time and Location of user? Different users might want different documents for same query? Sumit Bhatia Probabilistic Information Retrieval 20/23

What Next? Maximum Marginal Relevance [CG98] Rank documents so as to minimize similarity between returned documents Sumit Bhatia Probabilistic Information Retrieval 21/23

What Next? Maximum Marginal Relevance [CG98] Rank documents so as to minimize similarity between returned documents Result Diversification [Wan09] Rank documents so as to maximize mean relevance, given a variance level. Variance here determines the risk the user is willing to take Sumit Bhatia Probabilistic Information Retrieval 21/23

References Carbonell, Jaime and Goldstein, Jade, The use of MMR, diversity-based reranking for reordering documents and producing summaries, SIGIR, 1998, pp. 335 336. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, to information retrieval, Cambridge University Press, 2008. Jun Wang, Mean-variance analysis: A new document ranking theory in information retrieval, Advances in Information Retrieval, 2009, pp. 4 16. Sumit Bhatia Probabilistic Information Retrieval 22/23

QUESTIONS??? Sumit Bhatia Probabilistic Information Retrieval 23/23