# Ranked Retrieval (2)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1

2 Recall: VSM & TFIDF term weighting Combines TF and IDF to find the weight of terms w t.d = 1 + log 10 tf t, d log 10 ( N df t ) For a query q and document d, retrieval score f(q,d): Score(q, d) = t q d w t.d TFIDF observations Term appearing more in a doc gets higher weight (TF) First occurrence is more important (log) Rare terms are more important (IDF) Bias towards longer documents 3 IR Model VSM is very heuristic in nature No notion of relevance is there (still works well) Any weighting scheme, similarity measure can be used Components not interpretable no guide for what to try next More engineering rather than theory tweak, run, observe, tweak Very popular, hard to beat, strong baseline Easy to adapt good ideas from other models Probabilistic Model of retrieval Mathematical formulisation for relevant / irrelevant sets Explicitly defines random variables (R,Q,D) Specific about what their values are State the assumptions behind each step Watch out for contradictions 4 2

3 Probabilistic Models Concept: Uncertainty is inherent part of IR process Probability theory is strong foundation for representing and manipulating uncertainty Probability Ranking Principle (1977) 5 Probability Ranking Principle If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data. Basis for most probabilistic approaches for IR 6 3

4 Formulation of PRP Rank docs by probability of relevance P(R D r1 ) > P(R D r2 ) > P(R D r3 ) > P(R D r4 ) >. Estimate probability as accurate as possible P est (R D) P true (R D) Estimate with all possibly available data P est (R doc, session, context, user profile, ) Best possible accuracy can be achieved with that data the perfect IR system Is it really doable? How to estimate the probability of relevance? 7 PRP Concept Imagine IR as a classification problem P(R D) + P(NR D) = 1 Document D is relevant if P(R D) > P(NR D) 8 4

5 Probability of Relevance What is P true (rel doc, query, session, user, )? Isn t relevance just the user s opinion? User decides relevant or not, what is the probability thing? Search algorithm cannot look into your head (yet!) Relevance depends on factors that algorithm cannot observe SIGIR 2016 best paper award: Understanding Information Need: an fmri Study Different users may disagree on relevance of the same doc Even similar users, doing the same task, in the same context P true (rel Q, D): Proportion of all unseen users / context / tasks for which D would have judged relevant to Q Similar to: P(die=6 even and not square) 9 Okapi BM25 Model Based on the probabilistic model A document D is relevant if P(R=1 D) > P(R=0 D) Extension to the binary independence model Binary features: Document represented by a vector of binary features indicating term occurrence Assume term independence (Naïve Bayes assumption) BOW trick In 1995, Stephan Robertson with his group came up with the BM25 Formula as part of the Okapi project. It outperformed all other systems in TREC Popular and effective ranking algorithm 10 5

6 Okapi BM25 Ranking Function Let L d be the number of terms in document d Let തL be the average number of terms in a document w t.d = tf t,d k. L d ത L + tf t,d log 10 N df t df t Best practices: k= Okapi BM25 Ranking Function tf t,d w t.d = 1.5 L d L ത + tf t,d log 10 N df t df t tf component idf component Okapi TF L / L IDF Classic Okapi Raw TF Raw DF 12 6

7 Probabilistic Model in IR Focuses on the probability of relevance of docs Could be mathematically proved Different ways to apply it BM25 is the most common formula for it What other models could be still used in IR? 13 Noisy-Channel Model of IR Information need User has a information need Query and writes down a query Machine s task: Given the query, guess which document matches the query. d 1 d 2 d n document collection 14 7

8 IR based on Language Model (LM) Relevant document ( Q M d ) M d P 1 d 1 generation query M d2 d 2 thinks of a relevant document writes down query words likely to appear in that document M dn d n The LM approach directly exploits that idea! a document is a good match to a query if the document model is likely to generate the query document collection 15 Concept Coming up with good queries? Think of words that would likely appear in a relevant doc Use those words as the query The language modeling approach to IR directly models that idea a document is a good match to a query if the document model is likely to generate the query happens if the document contains the query words often. Build a probabilistic language model M d from each document d Rank documents based on the probability of the model generating the query: P(q M d ). 16 8

9 Language Model (LM) A language model is a probability distribution over strings drawn from some vocabulary A topic in a document or query can be represented as a language model i.e., words that tend to occur often when discussing a topic will have high probabilities in the corresponding language model 17 Unigram LM Terms are randomly drawn from a document (with replacement) Doc words P ( ) = P ( ) P ( ) P ( ) P ( ) = (4/9) (2/9) (4/9) (3/9) 18 9

10 Example This is a one-state probabilistic finite-state automaton a unigram language model. S = frog said that toad likes frog STOP P(S) = = Comparing LMs M d1 LM generated from Doc 1 M d2 LM generated from Doc 2 Try to generate sentence S from M d1 & M d2 Model M d1 P(w) w P(w) w Model M d2 0.2 the 0.2 the yon 0.1 yon 0.01 class class maiden 0.01 maiden sayst 0.03 sayst pleaseth 0.02 pleaseth text: M d1 : M d2 : the class pleaseth yon maiden P(S) P(text M d2 ) > P(text M d1 ) 20 10

11 Stochastic Language Models A statistical model for generating text Probability distribution over strings in a given language M P ( M ) = P ( M ) P ( M, ) P ( M, ) P ( M, ) 21 Unigram and Higher-order LM P ( ) = P ( ) P ( ) P ( ) P ( ) Unigram Language Models P ( ) P ( ) P ( ) P ( ) Bigram (generally, n-gram) Language Models P ( ) P ( ) P ( ) P ( ) 22 11

12 LM in IR Each document is treated as basis for a LM. Given a query q, rank documents based on P(d q) P q d P(d) P d q = P(q) P(q) is the same for all documents ignore P(d) is the prior often treated as the same for all d But we can give a prior to high-quality documents, e.g., those with high PageRank (later to be discussed). P(q d) is the probability of q given d. So to rank documents according to relevance to q, ranking according to P(q d) and P(d q) is equivalent 23 LM in IR: Basic idea We attempt to model the query generation process. Then we rank documents by the probability that a query would be observed as a random sample from the respective document model. That is, we rank according to P(q d)

13 P(q d) We will make the conditional independence assumption. P q M d = P t 1,, t q M d = q : length of q; This is equivalent to: 1 k q t k : token occurring at position k in q P q M d = each term t in q Query Likelihood Model P(t M d ) tf t,q tf t,q : term frequency (# occurrences) of t in q Multinomial model (omitting constant factor) P( t k M d ) 25 Parameter estimation Probability of a term t in a LM M d using Maximum Likelihood Estimation (MLE) d : length of d; tf t,d : # occurrences of t in d P t M d = tf t,d d Probability of a query q to be noticed in a LM M d : P q M d = t q tf t,d d tf t,q 26 13

14 Example P ( ) = P ( ) 2 P ( ) P ( ) = (4/9) 2 (2/9) (3/9) = Doc d P ( ) = P ( ) P ( ) P ( ) P ( ) = (4/9) (2/9) (0/9) (3/9) = 0 Is that fair? In VSM, S(Q,D) was summation, works more like OR in Boolean search. Missing one term reduces score only In language model, S(Q,D) is P(Q D) Multiplication of probabilities missing one term makes score = 0 Is there a better way to handle unseen terms? 27 Smoothing Problem: Zero frequency Solution: Smooth terms probability P(t) Maximum Likelihood Estimate p ( t) count of t ML count of all words Smoothed probability distribution t 28 14

15 Smoothing Document texts are a sample from the language model Missing words should not have zero probability of occurring A missing term is possible (even though it didn t occur) but no more likely than would be expected by chance in the collection. A technique for estimating probabilities for missing (or unseen) words Overcomes data-sparsity problem lower (or discount) the probability estimates for words that are seen in the document text assign that left-over probability to the estimates for the words that are not seen in the text (and also on the seen ones) 29 Mixture Model P(t d) = λp(t M d ) + (1 - λ)p(t M c ) Mixes the probability from the document with the general collection frequency of the word. Estimate for unseen words is (1-λ) P(t M c ) Based on collection language model (background LM) P(t M c ) is the probability for query word i in the collection language model for collection C (background probability) λ is a parameter controlling probability for unseen words Estimate for observed words is λ P(t M d ) + (1-λ) P(t M c ) 30 15

16 Jelinek-Mercer Smoothing P(t d) = λp(t M d ) + (1 - λ)p(t M c ) High value of λ: conjunctive-like search tends to retrieve documents containing all query words. Low value of λ: more disjunctive, suitable for long queries Correctly setting λ is important for good performance. Final Ranking function: P q M d 1 k q λ P t k M d + 1 λ P t k M c 31 Example Collection: d 1 and d 2 d 1 : Jackson was one of the most talented entertainers of all time d 2 : Michael Jackson anointed himself King of Pop Query q: Michael Jackson Use mixture model with λ = 1/2 P(q d 1 ) = [(0/11 + 1/18)/2] [(1/11 + 2/18)/2] P(q d 2 ) = [(1/7 + 1/18)/2] [(1/7 + 2/18)/2] Ranking: d 2 > d

17 Notes on Query Likelihood Model It has similar effectiveness to BM25 With more sophisticated techniques, it outperforms BM25 Topic models There are several alternative smoothing techniques That was just an example 33 Summary Three ways to model IR VSM How query vector aligns with document vector? Probabilistic Model What is the relevance probability of document D given query Q? LM How likely is it possible to observe/generate sequence of terms Q in a language model of document D? 34 17

18 Resources Text book 1: Intro to IR, Chapter 12 Text book 2: IR in Practice, Chapter 7.2, 7.3 Readings: Robertson, Stephen E., et al. "Okapi at TREC-3. Nist Special Publication Sp 109 (1995): 109. J. Ponte and W. B. Croft. A language modeling approach toinformation retrieval. In Proceedings on the 21st annualinternational ACM SIGIR conference, pages ,

### Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis

### Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,

### University of Illinois at Urbana-Champaign. Midterm Examination

University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,

### CS 646 (Fall 2016) Homework 3

CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.

### Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

### Modern Information Retrieval

Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

### Non-Boolean models of retrieval: Agenda

Non-Boolean models of retrieval: Agenda Review of Boolean model and TF/IDF Simple extensions thereof Vector model Language Model-based retrieval Matrix decomposition methods Non-Boolean models of retrieval:

### Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models

Language Models and Smoothing Methods for Collections with Large Variation in Document Length Najeeb Abdulmutalib and Norbert Fuhr najeeb@is.inf.uni-due.de, norbert.fuhr@uni-due.de Information Systems,

### CSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP

University of Malta BSc IT (Hons)Year IV CSA4050: Advanced Topics Natural Language Processing Lecture Statistics III Statistical Approaches to NLP Witten-Bell Discounting Unigrams Bigrams Dept Computer

### Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

### Information Retrieval Basic IR models. Luca Bondi

Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

### 1 Information retrieval fundamentals

CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

### Language as a Stochastic Process

CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

### Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Ranking-II Temporal Representation and Retrieval Models Temporal Information Retrieval Ranking in Information Retrieval Ranking documents important for information overload, quickly finding documents which

### CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

### Lecture 3: Probabilistic Retrieval Models

Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme

### Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

### A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

### Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

### A Risk Minimization Framework for Information Retrieval

A Risk Minimization Framework for Information Retrieval ChengXiang Zhai a John Lafferty b a Department of Computer Science University of Illinois at Urbana-Champaign b School of Computer Science Carnegie

### PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 6: Scoring, term weighting, the vector space model Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics,

### Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

### Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

### Information Retrieval

Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning, Pandu Nayak, and Prabhakar Raghavan Lecture 14: Learning to Rank Sec. 15.4 Machine learning for IR

### Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

### Query Propagation in Possibilistic Information Retrieval Networks

Query Propagation in Possibilistic Information Retrieval Networks Asma H. Brini Université Paul Sabatier brini@irit.fr Luis M. de Campos Universidad de Granada lci@decsai.ugr.es Didier Dubois Université

### perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)

Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More

### Language Modeling. Michael Collins, Columbia University

Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting

### CAIM: Cerca i Anàlisi d Informació Massiva

1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

### Generalized Inverse Document Frequency

Generalized Inverse Document Frequency Donald Metzler metzler@yahoo-inc.com Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 ABSTRACT Inverse document frequency (IDF) is one of the most

### Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes

### Modeling the Score Distributions of Relevant and Non-relevant Documents

Modeling the Score Distributions of Relevant and Non-relevant Documents Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, and Javed A. Aslam College of Computer and Information Science Northeastern University,

### Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either

### Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya

Week 13: Language Modeling II Smoothing in Language Modeling Irina Sergienya 07.07.2015 Couple of words first... There are much more smoothing techniques, [e.g. Katz back-off, Jelinek-Mercer,...] and techniques

### Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:

Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS 6501: Natural Language Processing 1 This lecture Language Models What are

### Pivoted Length Normalization I. Summary idf II. Review

2 Feb 2006 1/11 COM S/INFO 630: Representing and Accessing [Textual] Digital Information Lecturer: Lillian Lee Lecture 3: 2 February 2006 Scribes: Siavash Dejgosha (sd82) and Ricardo Hu (rh238) Pivoted

### Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Scribes: Ellis Weng, Andrew Owens February 11, 2010 1 Introduction In this lecture, we will introduce our second paradigm for

### CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

### Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

### N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

### Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

### Risk Minimization and Language Modeling in Text Retrieval

Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai July 2002 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted in partial fulfillment of the requirements

### Document indexing, similarities and retrieval in large scale text collections

Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1

### vector space retrieval many slides courtesy James Amherst

vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the

### Introduction to Machine Learning Midterm Exam Solutions

10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

### ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk Language Modelling 2 A language model is a probability

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

### Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

### smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI

smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI collaborators include: Rami Al-Rfou, Yun-hsuan Sung Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar Balint Miklos, Ray Kurzweil and

### Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

### The Bayes classifier

The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

### Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

### Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

### Towards Collaborative Information Retrieval

Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,

### The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

### Statistical Methods for NLP

Statistical Methods for NLP Language Models, Graphical Models Sameer Maskey Week 13, April 13, 2010 Some slides provided by Stanley Chen and from Bishop Book Resources 1 Announcements Final Project Due,

### Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

### INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 5: Scoring, Term Weighting, The Vector Space Model II Paul Ginsparg Cornell

### Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

### CS Lecture 18. Topic Models and LDA

CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

### Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

### Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

### Natural Language Processing

Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

### BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS

TECHNICAL REPORT YL-2011-001 BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS Roi Blanco, Hugo Zaragoza Yahoo! Research Diagonal 177, Barcelona, Spain {roi,hugoz}@yahoo-inc.com Bangalore Barcelona

### Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

### Day 6: Classification and Machine Learning

Day 6: Classification and Machine Learning Kenneth Benoit Essex Summer School 2014 July 30, 2013 Today s Road Map The Naive Bayes Classifier The k-nearest Neighbour Classifier Support Vector Machines (SVMs)

### Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

### Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

INST 737 April 1, 2013 Midterm Name: }{{} by writing my name I swear by the honor code Read all of the following information before starting the exam: For free response questions, show all work, clearly

### Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents

Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents Mohammad Abolhassani and Norbert Fuhr Institute of Informatics and Interactive Systems, University of Duisburg-Essen,

### Positional Language Models for Information Retrieval

Positional Language Models for Information Retrieval Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

### TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

### Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

### Text Analytics. Searching Terms. Ulf Leser

Text Analytics Searching Terms Ulf Leser A Probabilistic Interpretation of Relevance We want to compute the probability that a doc d is relevant to query q The probabilistic model determines this probability

### Molecular Similarity Searching Using Inference Network

Molecular Similarity Searching Using Inference Network Ammar Abdo, Naomie Salim* Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia Molecular Similarity Searching Search for

### Definition 2. Conjunction of p and q

Proposition Propositional Logic CPSC 2070 Discrete Structures Rosen (6 th Ed.) 1.1, 1.2 A proposition is a statement that is either true or false, but not both. Clemson will defeat Georgia in football

### Information Retrieval

Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

### The Static Absorbing Model for the Web a

Journal of Web Engineering, Vol. 0, No. 0 (2003) 000 000 c Rinton Press The Static Absorbing Model for the Web a Vassilis Plachouras University of Glasgow Glasgow G12 8QQ UK vassilis@dcs.gla.ac.uk Iadh

### Some Simple Effective Approximations to the 2 Poisson Model for Probabilistic Weighted Retrieval

Reprinted from: W B Croft and C J van Rijsbergen (eds), SIGIR 94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag,

### An AI-ish view of Probability, Conditional Probability & Bayes Theorem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

### Lower-Bounding Term Frequency Normalization

Lower-Bounding Term Frequency Normalization Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 68 ylv@uiuc.edu ChengXiang Zhai Department of Computer Science

### PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

### Machine Learning, Fall 2012 Homework 2

0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

### Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

### Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics

### Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

### Evaluation Metrics. Jaime Arguello INLS 509: Information Retrieval March 25, Monday, March 25, 13

Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 1 Batch Evaluation evaluation metrics At this point, we have a set of queries, with identified relevant

### Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Wei Wu a, Hang Li b, Jun Xu b a Department of Probability and Statistics, Peking University b Microsoft Research

### Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

### Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

### The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

### Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

### 5/21/17. Machine learning for IR ranking? Machine learning for IR ranking. Machine learning for IR ranking. Introduction to Information Retrieval

Sec. 15.4 Machine learning for I ranking? Introduction to Information etrieval CS276: Information etrieval and Web Search Christopher Manning and Pandu ayak Lecture 14: Learning to ank We ve looked at

### PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

### Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

### 2 The Information Retrieval data model

2 The Information Retrieval data model 2.1 Introduction: Data models 2.2 Text in BioSciences 2.3 Information Retrieval Models 2.3.1. Boolean Model 2.3.2 Vector space Model 2.3.3 Efficient implementation