Chapter 4: Advanced IR Models

Size: px
Start display at page:

Download "Chapter 4: Advanced IR Models"

Transcription

1 Chapter 4: Advanced I Models 4.1 robabilistic I rinciples robabilistic I with Term Independence robabilistic I with 2-oisson Model (Okapi BM25) IDM WS

2 4.1.1 robabilistic etrieval: rinciples obertson and Sparck Jones 1976 Goal: anking based on sim(doc d, query q) = d = doc d is relevant for query q d has term vector X1,..., Xm Assumptions: elevant and irrelevant documents differ in their terms. Binary Independence etrieval (BI) Model: robabilities for term occurrence are pairwise independent for different terms. Term weights are binary {0,1}. For terms that do not occur in query q the probabilities for such a term occurring are the same for relevant and irrelevant documents. IDM WS

3 IDM WS robabilistic I with Term Independence: anking roportional to elevance Odds ) ( ), ( d d d O q d sim d d (Bayes theorem) (odds for relevance) ~ d d X X i i i (independence or linked dependence) q i log log log ), ( q d sim q i ( = 1 if d includes i-th term, 0 otherwise)

4 robabilistic etrieval: anking roportional to elevance Odds (cont.) iq ~ iq iq iq 1 1 log( pi ( 1 pi ) ) log( qi ( 1 qi ) ) (binary features) with estimators pi==1 and qi==1 log( pi ( 1 ( 1 pi ) pi log 1 pi pi log 1 pi pi ) ) log( iq iq qi ( 1 qi ) ) ( 1 qi ) 1 qi log qi 1 qi log qi iq 1 pi log 1 qi sim( d, q)'' IDM WS

5 robabilistic etrieval: obertson / Sparck Jones Formula Estimate pi und qi based on training sample (query q on small sample of corpus) or based on intellectual assessment of first round s result (relevance feedback): Let N be #docs in sample, be # relevant docs in sample ni #docs in sample that contain term i, ri # relevant docs in sample that contain term i ri ni ri Estimate: pi qi N ri 0.5 ni ri 0.5 or: pi qi (Lidstone smoothing 1 N 1 with =0.5) ri 0.5 N ni ri 0.5 sim( d, q)'' log log i ri 0.5 i ni ri 0.5 ( ri 0.5) ( N ni ri 0.5) Weight of term i in doc d: log ( ri 0.5)( ni ri 0.5) IDM WS

6 robabilistic etrieval: *idf Formula Assumptions (without training sample or relevance feedback): pi is the same for all i. Most documents are irrelevant. Each individual term i is infrequent. This implies: pi log c with constant c i 1 pi i df qi 1 i N 1 qi N dfi N qi df df sim( d, q)'' i log i pi 1 qi log i 1 pi i qi c i i idf IDM WS i Scalar product over the product of and dampend idf values for query terms

7 Example for robabilistic etrieval Documents with relevance feedback: t1 t2 t3 t4 t5 t6 d d d d ni ri pi 5/6 1/2 1/2 5/6 1/2 1/6 qi 1/6 1/6 1/2 1/2 1/2 1/6 q: t1 t2 t3 t4 t5 t6 =2, N=4 Score of new document d5 (with Lidstone smoothing with =0.5): d5q: < > sim(d5, q) = log 5 + log 1 + log log 5 + log 5 + log 5 pi 1 qi sim( d, q)'' log log 1 pi i i qi IDM WS

8 Laplace Smoothing (with Uniform rior) robabilities pi and qi for term i are estimated by MLE for binomial distribution (repeated coin tosses for relevant docs, showing term i with pi, epeated coin tosses for irrelevant docs, showing term i with qi) To avoid overfitting to feedback/training,the estimates should be smoothed (e.g. with uniform prior): Instead of estimating pi = k/n estimate (Laplace s law of succession): pi = (k+1) / (n+2) or with heuristic generalization (Lidstone s law of succession): pi = (k+) / ( n+2) with > 0 (e.g. =0.5) And for multinomial distribution (n times w-faceted dice) estimate: pi = (ki + 1) / (n + w) IDM WS

9 BM25: Motivations Estimates for term probabilistic weights based on assumptions on the Estimates about the relevance of a term based on the notion of Eliteness of terms Assumptions about the relationships between eliteness and document length IDM WS

10 Okapi BM25 Approximation of oisson model by similarly-shaped function: w: p(1 q) log q(1 p) k 1 finally leads to Okapi BM25 (which achieved best TEC results): ( k1 1) N df 0.5 w ( d) : log length ( d) k ((1 ) ) b b df avgdoclength or in the most comprehensive, tunable form: N df 0.5 ( k1 1) ( k3 1) q len ( d) score( d, q) : log k2 q 0.5 len ( d) 1.. q df k ((1 ) ) k3 len ( d) 1 b b with =avgdoclength and tuning parameters k 1, k 2, k 3, b, and non-linear influence of and consideration of doc length IDM WS

11 Eliteness in BM25 IDM WS

12 IDM WS

13 IDM WS

14 IDM WS

15 IDM WS

16 IDM WS

17 oisson Mixtures for Capturing Distribution Katz s K-mixture: distribution of values for term said Source: Church/Gale 1995 IDM WS

18 Averaging Eliteness according to document length info IDM WS

19 IDM WS

20 IDM WS

21 Okapi BM25 Approximation of oisson model by similarly-shaped function: w: p(1 q) log q(1 p) k 1 finally leads to Okapi BM25 (which achieved best TEC results): ( k1 1) N df 0.5 w ( d) : log length ( d) k ((1 ) ) b b df avgdoclength or in the most comprehensive, tunable form: N df 0.5 ( k1 1) ( k3 1) q len ( d) score( d, q) : log k2 q 0.5 len ( d) 1.. q df k ((1 ) ) k3 len ( d) 1 b b with =avgdoclength and tuning parameters k 1, k 2, k 3, b, and non-linear influence of and consideration of doc length IDM WS

22 4.1.3 robabilistic I with oisson Model (Okapi BM25) Generalize term weight into w log p q q p 0 0 p(1 q) w log q(1 p) with p, q denoting prob. that term occurs times in rel./irrel. doc ostulate oisson (or oisson-mixture) distributions: p e! q e! IDM WS

23 Additional Literature robabilistic I: Grossman/Frieder Sections 2.2 and 2.4 S.E. obertson, K. Sparck Jones: elevance Weighting of Search Terms, JASIS 27(3), 1976 S.E. obertson, S. Walker: Some Simple Effective Approximations to the 2-oisson Model for robabilistic Weighted etrieval, SIGI 1994 K.W. Church, W.A. Gale: oisson Mixtures, Natural Language Engineering 1(2), 1995 C.T. Yu, W. Meng: rinciples of Database Query rocessing for Advanced Applications, Morgan Kaufmann, 1997, Chapter 9 D. Heckerman: A Tutorial on Learning with Bayesian Networks, Technical eport MS-T-95-06, Microsoft esearch, 1995 IDM WS

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Trevor Cohn (Slide credits: William Webber) COMP90042, 2015, Semester 1 What we ll learn in this lecture Probabilistic models for

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk

More information

IR Models: The Probabilistic Model. Lecture 8

IR Models: The Probabilistic Model. Lecture 8 IR Models: The Probabilistic Model Lecture 8 ' * ) ( % $ $ +#! "#! '& & Probability of Relevance? ' ', IR is an uncertain process Information need to query Documents to index terms Query terms and index

More information

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval Ranking-II Temporal Representation and Retrieval Models Temporal Information Retrieval Ranking in Information Retrieval Ranking documents important for information overload, quickly finding documents which

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Classification Algorithms

Classification Algorithms Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:

More information

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

Midterm Examination Practice

Midterm Examination Practice University of Illinois at Urbana-Champaign Midterm Examination Practice CS598CXZ Advanced Topics in Information Retrieval (Fall 2013) Professor ChengXiang Zhai 1. Basic IR evaluation measures: The following

More information

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006 Scribes: Gilly Leshed, N. Sadat Shami Outline. Review. Mixture of Poissons ( Poisson) model 3. BM5/Okapi method 4. Relevance feedback. Review In discussing probabilistic models for information retrieval

More information

5/21/17. Machine learning for IR ranking? Machine learning for IR ranking. Machine learning for IR ranking. Introduction to Information Retrieval

5/21/17. Machine learning for IR ranking? Machine learning for IR ranking. Machine learning for IR ranking. Introduction to Information Retrieval Sec. 15.4 Machine learning for I ranking? Introduction to Information etrieval CS276: Information etrieval and Web Search Christopher Manning and Pandu ayak Lecture 14: Learning to ank We ve looked at

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13 Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai

More information

Language Models. Hongning Wang

Language Models. Hongning Wang Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity

More information

Probabilistic Information Retrieval

Probabilistic Information Retrieval Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23 Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic

More information

Deposited on: 1 November 2007 Glasgow eprints Service

Deposited on: 1 November 2007 Glasgow eprints Service Amati, G. and Van Rijsbergen, C.J. (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4):pp. 357-389. http://eprints.gla.ac.uk/3798/

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Query Propagation in Possibilistic Information Retrieval Networks

Query Propagation in Possibilistic Information Retrieval Networks Query Propagation in Possibilistic Information Retrieval Networks Asma H. Brini Université Paul Sabatier brini@irit.fr Luis M. de Campos Universidad de Granada lci@decsai.ugr.es Didier Dubois Université

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation

More information

16 The Information Retrieval "Data Model"

16 The Information Retrieval Data Model 16 The Information Retrieval "Data Model" 16.1 The general model Not presented in 16.2 Similarity the course! 16.3 Boolean Model Not relevant for exam. 16.4 Vector space Model 16.5 Implementation issues

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

More information

Artificial Intelligence: Cognitive Agents

Artificial Intelligence: Cognitive Agents Artificial Intelligence: Cognitive Agents AI, Uncertainty & Bayesian Networks 2015-03-10 / 03-12 Kim, Byoung-Hee Biointelligence Laboratory Seoul National University http://bi.snu.ac.kr A Bayesian network

More information

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Scribes: Ellis Weng, Andrew Owens February 11, 2010 1 Introduction In this lecture, we will introduce our second paradigm for

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Behavioral Data Mining Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Outline Naïve Bayes Classifier Regularization in Linear Regression Generalized Linear Models Assignment Tips: Matrix

More information

Generalized Inverse Document Frequency

Generalized Inverse Document Frequency Generalized Inverse Document Frequency Donald Metzler metzler@yahoo-inc.com Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 ABSTRACT Inverse document frequency (IDF) is one of the most

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York Outline Quantizing 1 Quantizing 2 3 4 5 6 Quantizing Data is real-valued Data is integer valued with large

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

Language Models, Smoothing, and IDF Weighting

Language Models, Smoothing, and IDF Weighting Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data. INST 737 April 1, 2013 Midterm Name: }{{} by writing my name I swear by the honor code Read all of the following information before starting the exam: For free response questions, show all work, clearly

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

More Smoothing, Tuning, and Evaluation

More Smoothing, Tuning, and Evaluation More Smoothing, Tuning, and Evaluation Nathan Schneider (slides adapted from Henry Thompson, Alex Lascarides, Chris Dyer, Noah Smith, et al.) ENLP 21 September 2016 1 Review: 2 Naïve Bayes Classifier w

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 6: Scoring, term weighting, the vector space model Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics,

More information

Diversity-Promoting Bayesian Learning of Latent Variable Models

Diversity-Promoting Bayesian Learning of Latent Variable Models Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and

More information

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi On the Foundations of Diverse Information Retrieval Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi 1 Outline Need for diversity The answer: MMR But what was the

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Combinations and Probabilities

Combinations and Probabilities Combinations and Probabilities Tutor: Zhang Qi Systems Engineering and Engineering Management qzhang@se.cuhk.edu.hk November 2014 Tutor: Zhang Qi (SEEM) Tutorial 7 November 2014 1 / 16 Combination Review

More information

6 Probabilistic Retrieval Models

6 Probabilistic Retrieval Models Probabilistic Retrieval Models 1 6 Probabilistic Retrieval Models Notations Binary Independence Retrieval model Probability Ranking Principle Probabilistic Retrieval Models 2 6.1 Notations Q Q Q D rel.

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 5: Scoring, Term Weighting, The Vector Space Model II Paul Ginsparg Cornell

More information

Non-Boolean models of retrieval: Agenda

Non-Boolean models of retrieval: Agenda Non-Boolean models of retrieval: Agenda Review of Boolean model and TF/IDF Simple extensions thereof Vector model Language Model-based retrieval Matrix decomposition methods Non-Boolean models of retrieval:

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

More information

Modeling the Score Distributions of Relevant and Non-relevant Documents

Modeling the Score Distributions of Relevant and Non-relevant Documents Modeling the Score Distributions of Relevant and Non-relevant Documents Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, and Javed A. Aslam College of Computer and Information Science Northeastern University,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Lecture 8: December 17, 2003

Lecture 8: December 17, 2003 Computational Genomics Fall Semester, 2003 Lecture 8: December 17, 2003 Lecturer: Irit Gat-Viks Scribe: Tal Peled and David Burstein 8.1 Exploiting Independence Property 8.1.1 Introduction Our goal is

More information

CS 646 (Fall 2016) Homework 3

CS 646 (Fall 2016) Homework 3 CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science http://www.cs.odu.edu/ jbollen January 30, 2003 Page 1 Structure 1. IR formal characterization (a) Mathematical

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information