Chapter 4: Advanced IR Models

Similar documents
Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Information Retrieval


RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Ranked Retrieval (2)

PV211: Introduction to Information Retrieval

IR Models: The Probabilistic Model. Lecture 8

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Modern Information Retrieval

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Boolean and Vector Space Retrieval Models

Classification Algorithms

A Tutorial on Learning with Bayesian Networks

Bayesian Models in Machine Learning

CAIM: Cerca i Anàlisi d Informació Massiva

Natural Language Processing. Statistical Inference: n-grams

Midterm Examination Practice

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

5/21/17. Machine learning for IR ranking? Machine learning for IR ranking. Machine learning for IR ranking. Introduction to Information Retrieval

Bayesian Analysis for Natural Language Processing Lecture 2

Chap 2: Classical models for information retrieval

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

Language Models. Hongning Wang

Probabilistic Information Retrieval

Probabilistic Language Modeling

Lecture 13: More uses of Language Models

Information Retrieval

Deposited on: 1 November 2007 Glasgow eprints Service

Language as a Stochastic Process

Information Retrieval and Web Search

Query Propagation in Possibilistic Information Retrieval Networks

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

16 The Information Retrieval "Data Model"

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

Information Retrieval Basic IR models. Luca Bondi

Artificial Intelligence: Cognitive Agents

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

A Study of the Dirichlet Priors for Term Frequency Normalisation

Behavioral Data Mining. Lecture 2

Document and Topic Models: plsa and LDA

Directed Graphical Models

Bayesian Methods: Naïve Bayes

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models

Generalized Inverse Document Frequency

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian Learning (II)

Information Retrieval and Organisation

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Naive Bayes classification

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CPSC 340: Machine Learning and Data Mining

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

Language Models, Smoothing, and IDF Weighting

Modeling Environment

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

Machine Learning for natural language processing

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

More Smoothing, Tuning, and Evaluation

PV211: Introduction to Information Retrieval

Diversity-Promoting Bayesian Learning of Latent Variable Models

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

The Bayes classifier

STAT 425: Introduction to Bayesian Analysis

Linear Models for Regression CS534

Combinations and Probabilities

6 Probabilistic Retrieval Models

Machine Learning: Assignment 1

Bayesian Networks Practice

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Non-Boolean models of retrieval: Agenda

Latent Dirichlet Allocation

Information Retrieval

Modeling the Score Distributions of Relevant and Non-relevant Documents

Bayesian Methods for Machine Learning

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Modern Information Retrieval

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Variable Latent Semantic Indexing

Lecture 8: December 17, 2003

CS 646 (Fall 2016) Homework 3

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Computational Cognitive Science

Protein Complex Identification by Supervised Graph Clustering

CSE 473: Artificial Intelligence Autumn Topics

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Statistical Theory 1

Transcription:

Chapter 4: Advanced I Models 4.1 robabilistic I 4.1.1 rinciples 4.1.2 robabilistic I with Term Independence 4.1.3 robabilistic I with 2-oisson Model (Okapi BM25) IDM WS 2005 4-1

4.1.1 robabilistic etrieval: rinciples obertson and Sparck Jones 1976 Goal: anking based on sim(doc d, query q) = d = doc d is relevant for query q d has term vector X1,..., Xm Assumptions: elevant and irrelevant documents differ in their terms. Binary Independence etrieval (BI) Model: robabilities for term occurrence are pairwise independent for different terms. Term weights are binary {0,1}. For terms that do not occur in query q the probabilities for such a term occurring are the same for relevant and irrelevant documents. IDM WS 2005 4-2

IDM WS 2005 4-3 4.1.2 robabilistic I with Term Independence: anking roportional to elevance Odds ) ( ), ( d d d O q d sim d d (Bayes theorem) (odds for relevance) ~ d d X X i i i (independence or linked dependence) q i log log log ), ( q d sim q i ( = 1 if d includes i-th term, 0 otherwise)

robabilistic etrieval: anking roportional to elevance Odds (cont.) iq ~ iq iq iq 1 1 log( pi ( 1 pi ) ) log( qi ( 1 qi ) ) (binary features) with estimators pi==1 and qi==1 log( pi ( 1 ( 1 pi ) pi log 1 pi pi log 1 pi pi ) ) log( iq iq qi ( 1 qi ) ) ( 1 qi ) 1 qi log qi 1 qi log qi iq 1 pi log 1 qi sim( d, q)'' IDM WS 2005 4-4

robabilistic etrieval: obertson / Sparck Jones Formula Estimate pi und qi based on training sample (query q on small sample of corpus) or based on intellectual assessment of first round s result (relevance feedback): Let N be #docs in sample, be # relevant docs in sample ni #docs in sample that contain term i, ri # relevant docs in sample that contain term i ri ni ri Estimate: pi qi N ri 0.5 ni ri 0.5 or: pi qi (Lidstone smoothing 1 N 1 with =0.5) ri 0.5 N ni ri 0.5 sim( d, q)'' log log i ri 0.5 i ni ri 0.5 ( ri 0.5) ( N ni ri 0.5) Weight of term i in doc d: log ( ri 0.5)( ni ri 0.5) IDM WS 2005 4-5

robabilistic etrieval: *idf Formula Assumptions (without training sample or relevance feedback): pi is the same for all i. Most documents are irrelevant. Each individual term i is infrequent. This implies: pi log c with constant c i 1 pi i df qi 1 i N 1 qi N dfi N qi df df sim( d, q)'' i log i pi 1 qi log i 1 pi i qi c i i idf IDM WS 2005 4-6 i Scalar product over the product of and dampend idf values for query terms

Example for robabilistic etrieval Documents with relevance feedback: t1 t2 t3 t4 t5 t6 d1 1 0 1 1 0 0 1 d2 1 1 0 1 1 0 1 d3 0 0 0 1 1 0 0 d4 0 0 1 0 0 0 0 ni 2 1 2 3 2 0 ri 2 1 1 2 1 0 pi 5/6 1/2 1/2 5/6 1/2 1/6 qi 1/6 1/6 1/2 1/2 1/2 1/6 q: t1 t2 t3 t4 t5 t6 =2, N=4 Score of new document d5 (with Lidstone smoothing with =0.5): d5q: <1 1 0 0 0 1> sim(d5, q) = log 5 + log 1 + log 0.2 + log 5 + log 5 + log 5 pi 1 qi sim( d, q)'' log log 1 pi i i qi IDM WS 2005 4-7

Laplace Smoothing (with Uniform rior) robabilities pi and qi for term i are estimated by MLE for binomial distribution (repeated coin tosses for relevant docs, showing term i with pi, epeated coin tosses for irrelevant docs, showing term i with qi) To avoid overfitting to feedback/training,the estimates should be smoothed (e.g. with uniform prior): Instead of estimating pi = k/n estimate (Laplace s law of succession): pi = (k+1) / (n+2) or with heuristic generalization (Lidstone s law of succession): pi = (k+) / ( n+2) with > 0 (e.g. =0.5) And for multinomial distribution (n times w-faceted dice) estimate: pi = (ki + 1) / (n + w) IDM WS 2005 4-8

BM25: Motivations Estimates for term probabilistic weights based on assumptions on the Estimates about the relevance of a term based on the notion of Eliteness of terms Assumptions about the relationships between eliteness and document length IDM WS 2005 4-9

Okapi BM25 Approximation of oisson model by similarly-shaped function: w: p(1 q) log q(1 p) k 1 finally leads to Okapi BM25 (which achieved best TEC results): ( k1 1) N df 0.5 w ( d) : log length ( d) k ((1 ) ) 0.5 1 b b df avgdoclength or in the most comprehensive, tunable form: N df 0.5 ( k1 1) ( k3 1) q len ( d) score( d, q) : log k2 q 0.5 len ( d) 1.. q df k ((1 ) ) k3 len ( d) 1 b b with =avgdoclength and tuning parameters k 1, k 2, k 3, b, and non-linear influence of and consideration of doc length IDM WS 2005 4-10

Eliteness in BM25 IDM WS 2005 4-11

IDM WS 2005 4-12

IDM WS 2005 4-13

IDM WS 2005 4-14

IDM WS 2005 4-15

IDM WS 2005 4-16

oisson Mixtures for Capturing Distribution Katz s K-mixture: distribution of values for term said Source: Church/Gale 1995 IDM WS 2005 4-17

Averaging Eliteness according to document length info IDM WS 2005 4-18

IDM WS 2005 4-19

IDM WS 2005 4-20

Okapi BM25 Approximation of oisson model by similarly-shaped function: w: p(1 q) log q(1 p) k 1 finally leads to Okapi BM25 (which achieved best TEC results): ( k1 1) N df 0.5 w ( d) : log length ( d) k ((1 ) ) 0.5 1 b b df avgdoclength or in the most comprehensive, tunable form: N df 0.5 ( k1 1) ( k3 1) q len ( d) score( d, q) : log k2 q 0.5 len ( d) 1.. q df k ((1 ) ) k3 len ( d) 1 b b with =avgdoclength and tuning parameters k 1, k 2, k 3, b, and non-linear influence of and consideration of doc length IDM WS 2005 4-21

4.1.3 robabilistic I with oisson Model (Okapi BM25) Generalize term weight into w log p q q p 0 0 p(1 q) w log q(1 p) with p, q denoting prob. that term occurs times in rel./irrel. doc ostulate oisson (or oisson-mixture) distributions: p e! q e! IDM WS 2005 4-22

Additional Literature robabilistic I: Grossman/Frieder Sections 2.2 and 2.4 S.E. obertson, K. Sparck Jones: elevance Weighting of Search Terms, JASIS 27(3), 1976 S.E. obertson, S. Walker: Some Simple Effective Approximations to the 2-oisson Model for robabilistic Weighted etrieval, SIGI 1994 K.W. Church, W.A. Gale: oisson Mixtures, Natural Language Engineering 1(2), 1995 C.T. Yu, W. Meng: rinciples of Database Query rocessing for Advanced Applications, Morgan Kaufmann, 1997, Chapter 9 D. Heckerman: A Tutorial on Learning with Bayesian Networks, Technical eport MS-T-95-06, Microsoft esearch, 1995 IDM WS 2005 4-23