CS47300: Web Information Search and Management

Similar documents
Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Web-Mining Agents Probabilistic Information Retrieval

Retrieval Models: Language models

Probabilistic Information Retrieval

PV211: Introduction to Information Retrieval

Probabilistic Ranking Principle. Hongning Wang

Artificial Intelligence Bayesian Networks

Evaluation for sets of classes

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Information Retrieval

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Probabilistic Ranking Principle. Hongning Wang

Lecture 12: Classification

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Machine learning: Density estimation

Bayesian Planning of Hit-Miss Inspection Tests

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

The big picture. Outline

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Linear Approximation with Regularization and Moving Least Squares

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Quantifying Uncertainty

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

ENTROPIC QUESTIONING

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Solutions Homework 4 March 5, 2018

Statistical pattern recognition

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

CHAPTER 3: BAYESIAN DECISION THEORY

Expectation Maximization Mixture Models HMMs

Conjugacy and the Exponential Family

Expected Value and Variance

PROBABILITY PRIMER. Exercise Solutions

Information Retrieval Language models for IR

Lecture 4 Hypothesis Testing

Information Retrieval Basic IR models. Luca Bondi

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Extending Relevance Model for Relevance Feedback

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Question Classification Using Language Modeling

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Engineering Risk Benefit Analysis

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Lecture 2: Prelude to the big shrink

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

Formalisms For Fusion Belief in Design

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Explaining the Stein Paradox

EM and Structure Learning

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

VQ widely used in coding speech, image, and video

Basically, if you have a dummy dependent variable you will be estimating a probability.

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

18.1 Introduction and Recap

EGR 544 Communication Theory

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Maximum Likelihood Estimation (MLE)

Homework Assignment 3 Due in class, Thursday October 15

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

x = , so that calculated

Estimation: Part 2. Chapter GREG estimation

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Mining Data Streams-Estimating Frequency Moment

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Economics 130. Lecture 4 Simple Linear Regression Continued

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran

The Geometry of Logit and Probit

Logistic Regression Maximum Likelihood Estimation

Why BP Works STAT 232B

Credit Card Pricing and Impact of Adverse Selection

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

A Robust Method for Calculating the Correlation Coefficient

Topic- 11 The Analysis of Variance

e i is a random error

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Uncertainty as the Overlap of Alternate Conditional Distributions

Gaussian process classification: a message-passing viewpoint

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

Chapter 8 SCALAR QUANTIZATION

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Transcription:

CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes n IR? User Informaton Need Query Representaton Understandng of user need s uncertan How to match? Documents Document Representaton Uncertan guess of whether document has relevant content In tradtonal IR systems, matchng between each document and query s attempted n a semantcally mprecse space of ndex terms. Probabltes provde a prncpled foundaton for uncertan reasonng. Can we use probabltes to quantfy our uncertantes? 15 Jan-17 20 Chrstopher W. Clfton 1

Probablstc IR topcs Classcal probablstc retreval model Probablty rankng prncple, etc. Bnary ndependence model ( Naïve Bayes (Okap BM25 Bayesan networks for text retreval Language model approach to IR An mportant emphass n recent work Probablstc methods have been a recurrng theme n Informaton Retreval Tradtonally: neat deas, but ddn t wn on performance 16 The document rankng problem We have a collecton of documents User ssues a query A lst of documents needs to be returned Rankng method s the core of an IR system: In what order do we present documents to the user? We want the best document to be frst, second best second, etc. Idea: Rank by probablty of relevance of the document w.r.t. nformaton need P(R=1 document, query 17 Jan-17 20 Chrstopher W. Clfton 2

Recall a few probablty bascs For events A and B: Bayes Rule p(a, B = p(aç B = p(a Bp(B = p(b Ap(A Odds: p(a B = Posteror p(b Ap(A p(b = O(A = p(a p(a = p(a 1- p(a p(b Ap(A p(b Xp(X å X=A, A Pror 18 The Probablty Rankng Prncple (PRP If a reference retreval system s response to each request s a rankng of the documents n the collecton n order of decreasng probablty of relevance to the user who submtted the request, where the probabltes are estmated as accurately as possble on the bass of whatever data have been made avalable to the system for ths purpose, the overall effectveness of the system to ts user wll be the best that s obtanable on the bass of those data. [1960s/1970s] S. Robertson, W.S. Cooper, M.E. Maron; van Rjsbergen (1979:113; Mannng & Schütze (1999:538 19 Jan-17 20 Chrstopher W. Clfton 3

Probablty Rankng Prncple (PRP Let x represent a document n the collecton. Let R represent relevance of a document w.r.t. gven (fxed query and let R=1 represent relevant and R=0 not relevant. Need to fnd p(r=1 x - probablty that a document x s relevant. p(r =1 x = p(r = 0 x = p(x R =1p(R =1 p(x p(x R = 0p(R = 0 p(x p(r = 0 x+ p(r =1 x =1 p(r=1,p(r=0 - pror probablty of retrevng a relevant or non-relevant document p(x R=1, p(x R=0 - probablty that f a relevant (not relevant document s retreved, t s x. 23 Probablty Rankng Prncple (PRP Smple case: no selecton costs or other utlty concerns that would dfferentally weght errors PRP n acton: Rank all documents by p(r=1 x Theorem: Usng the PRP s optmal, n that t mnmzes the loss (Bayes rsk under 1/0 loss Provable f all probabltes correct, etc. [e.g., Rpley 1996] How do we compute all those probabltes? Do not know exact probabltes, have to use estmates 24 Jan-17 20 Chrstopher W. Clfton 4

Probablstc Rankng Basc concept: For a gven query, f we know some documents that are relevant, terms that occur n those documents should be gven greater weghtng n searchng for other relevant documents. By makng assumptons about the dstrbuton of terms and applyng Bayes Theorem, t s possble to derve weghts theoretcally. Van Rjsbergen 28 Bnary Independence Model Tradtonally used n conjuncton wth PRP Bnary = Boolean: documents are represented as bnary ncdence vectors of terms: Ԧx = (x 1,, x n x = 1 ff term s present n document x. Independence : terms occur n documents ndependently Dfferent documents can be modeled as the same vector 29 Jan-17 20 Chrstopher W. Clfton 5

Bnary Independence Model Queres: bnary term ncdence vectors Gven query q, for each document d need to compute p(r q,d. replace wth computng p(r q,x where x s bnary term ncdence vector representng d. Interested only n rankng Use odds and Bayes Rule: O(R q, x = p(r =1 q, x p(r = 0 q, x = p(r =1 qp(x R =1, q p(x q p(r = 0 qp(x R = 0, q p(x q 30 Bnary Independence Model O(R q, x = p(r =1 q, x p(r =1 q p(x R =1,q = p(r = 0 q, x p(r = 0 q p(x R = 0,q Constant for a gven query Needs estmaton Usng Independence Assumpton: p(x R =1,q p(x R = 0,q = n =1 p(x R =1,q p(x R = 0,q O(R q, x = O(R q n =1 p(x R =1, q p(x R = 0,q 31 Jan-17 20 Chrstopher W. Clfton 6

Bnary Independence Model O(R q, x = O(R q Snce x s ether 0 or 1: n =1 p(x R =1, q p(x R = 0,q p(x O(R q, x = O(R q =1 R =1, q x =1 p(x =1 R = 0,q p(x = 0 R =1,q x =0 p(x = 0 R = 0, q Let p = p(x =1 R =1,q; r = p(x =1 R = 0,q; Assume, for all terms not occurrng n the query (q =0 p r O(R q, x = O(R q x =1 q =1 p r x =0 q =1 (1- p (1- r 32 Model Parameters document relevant (R=1 not relevant (R=0 term present x = 1 p r term absent x = 0 (1 p (1 r 33 Jan-17 20 Chrstopher W. Clfton 7

Bnary Independence Model O(R q, x = O(R q All matchng terms O(R q, x = O(R q All matchng terms x =1 q =1 p x =q =1 r p r x =1 q =1 x =0 q =1 p O(R q, x = O(R q (1- r x =q =1 r (1- p 1- p 1- r Non-matchng query terms æ 1- r 1- p ö ç è1- p 1- r ø q =1 1- p 1- r x =0 q =1 1- p 1- r All query terms 34 Bnary Independence Model O( R q, x O( R q p (1 r x 1 (1 1 1 q r p q r 1 p Constant for each query Retreval Status Value: p (1 r RSV log r (1 p Only quantty to be estmated for rankngs p (1 r log x q 1 x q 1 r (1 p 35 Jan-17 20 Chrstopher W. Clfton 8

Bnary Independence Model All bols down to computng RSV. RSV log RSV c x q 1 p (1 r p (1 r log x q 1 r (1 p x q 1 r (1 p ; c p (1 r log r (1 p The c are log odds ratos They functon as the term weghts n ths model So, how do we compute c s from our data? 36 Bnary Independence Model Estmatng RSV coeffcents n theory For each term look at ths table of document counts: Documents Relevant Non-Relevant Total x=1 s n-s n x=0 S-s N-n-S+s N-n Total S N-S N c s ( n s p r S ( N S s ( S s K( N, n, S, s log ( n s ( N n S s Estmates: For now, assume no zero terms. 37 Jan-17 20 Chrstopher W. Clfton 9

Estmaton key challenge If non-relevant documents are approxmated by the whole collecton, then r (prob. of occurrence n non-relevant documents for query s n/n and log 1- r r = log N - n - S + s n - s» log N - n n» log N n = IDF! 38 Estmaton key challenge p (probablty of occurrence n relevant documents cannot be approxmated as easly p can be estmated n varous ways: from relevant documents f know some Relevance weghtng can be used n a feedback loop constant (Croft and Harper combnaton match then just get df weghtng of terms (wth p =0.5 å RSV = log N x =q =1 n proportonal to prob. of occurrence n collecton Greff (SIGIR 1998 argues for 1/3 + 2/3 df /N 39 Jan-17 20 Chrstopher W. Clfton 10