Retrieval models II. IN4325 Information Retrieval

Size: px
Start display at page:

Download "Retrieval models II. IN4325 Information Retrieval"

Transcription

1 Retrieval models II IN4325 Information Retrieval 1

2 Assignment 1 Deadline Wednesday means as long as it is Wednesday you are still on time In practice, any time before I make it to the office on Thursday (~8am) is okay Amazon #instances still limited, please keep to 2-3 instances per run with m1.xlarge 15GB memory, 8 EC2 compute units the same as 8x m1.small Between 2-6 jobs ran yesterday at any time Hadoop is difficult to learn, but well worth it LinkedIn: do a job search on Hadoop or Big data 2

3 Assignment I public class CategoryPerPageMapper extends Mapper<Object, Text, Text, Text> { public void map(object key, Text value, Context context) throws Exception { SAXBuilder builder = new SAXBuilder(); Pattern catpattern = Pattern.compile("\\[\\[[cC]ategory:.*?]]"); Text emptytext = new Text(""); } } try { } String xml = value.tostring(); Reader in = new StringReader(xml); String content =""; Text title = new Text(); Document doc = builder.build(in); 3

4 Assignment I public class CategoryPerPageMapper extends Mapper<Object, Text, Text, Text> { SAXBuilder builder = new SAXBuilder(); Pattern catpattern = Pattern.compile("\\[\\[[cC]ategory:.*?]]"); Text emptytext = new Text(""); } public void map(object key, Text value, Context context) throws Exception { try { String xml = value.tostring(); Reader in = new StringReader(xml); String content =""; Text title = new Text(); Document doc = builder.build(in); } } 4

5 Assignment I Data-Intensive Text Processing with MapReduce by Jimmy Lin et al Highly recommended reading ~170 pages full of useful knowledge A lot of information about design patterns in MapReduce No cookbook (no source code) 5

6 Last time Term weighting strategies Vector space model Pivoted document length normalization 6

7 Today A look at relevance feedback The cluster hypothesis A first look at probabilistic models Language modeling 7

8 Vector space model In one slide A vector space model for automatic indexing. A classic paper by G. Salton et al. from tf idf t,d = tf t,d " idf t cos retrieval by A classic information retrieval model Corpus: a set of vectors in a vector space with one dimension for each term S(Q,D): cosine similarity between query and document vector D 1 = {cat,eats} D 2 = {dog} D 3 = {dog,eats,cat} D 4 = {cat,cat} eats dog cat

9 The problem of synonymy Synonymy: different terms with the same meaning synset Users try to combat these situations by adding terms they see in the retrieved documents to the query WordNet

10 Automatic query reformulation independent of the user/results Word sense disambiguation Query expansion: additional query terms derived from a thesaurus or WordNet King 1968 (King 1968), (Martin Luther King), (Martin Luther King jr.) Query expansion through automatic thesaurus generation Query reformulation by spelling correction Martin Luter King Martin Luther King 10

11 Automatic query reformulation wrt. the user/results Given an initial set of results, relevance feedback can be Obtained explicitly/interactively (ask the user) +/- voting on the top results Inferred (observe the user) How long does the user spend on viewing document X? What is his eye movement on the result page? Assumed (observe the top results) Also called blind feedback, pseudo-relevance feedback The system takes the top returned results and assumes all of them to be relevant 11

12 Basic relevance feedback loop evolving information needs served user +- query feedback retrieval engine results top-retrieved results iterative refinement Idea: users do not know the corpus (difficult to come up with all possible query reformulations), easy though for the user to decide if a given document is relevant 12

13 Rocchio algorithm An algorithm to take advantage of relevance feedback Incorporates relevance feedback into the vector space model query space relevant vs. non-relevant Optimal query 13

14 Rocchio algorithm Theory Goal: query vector with max. similarity to the relevant documents and min. similarity to the non-relevant documents Cosine similarity based: q = argmax[sim( q,c r ) sim( q,c nr r )] q cosθ q = 1 C r " d j # 1 d j C r C nr " d j C nr d j Optimal query is the vector difference between centroids of the relevant and non-relevant documents 14

15 Rocchio algorithm Developed 1971 Given: a query and some knowledge about relevant/nonrelevant documents q m = q 0 + " 1 D r $ d j % & d j #D r 1 D nr $ d j #D nr d j original query known (non-)relevant documents Model parameters:, ", # 15

16 Rocchio algorithm Q: what are good weights for Find similar pages? Given: a query and some knowledge about relevant/nonrelevant documents q m = q 0 + " 1 D r $ d j % & d j #D r 1 D nr $ d j #D nr d j Higher alpha: large influence of original query Higher beta: large influence of the rel docs Higher gamma: go away from non-rel docs Q: should the weights change over time? X terms are added to the expanded query 16

17 Rocchio algorithm Graphically Due to the subtraction we may end up with negative term weights; ignore those. Initial query q 0 relevant vs. non-relevant revised query q m Use revised query for retrieval 17

18 Effects of relevance feedback S 4 S 5 Improves precision and recall precision = # rel. retrieved # retrieved recall = # rel. retrieved # relevant corpus Most important for recall Positive feedback more valuable than negative feedback (why?) γ<β Common values: = 1.0, " = 0.75, # = 0.15 Often only positive feedback: γ=0 P@10 R=5, Recall

19 Rocchio algorithm Empirical evaluation: Buckley et al [7] What influence does Rocchio have on the effectiveness of a system? TREC routing task evaluation One long standing information need and new documents appear continuously, hundreds of positive qrels Query is updated depending on user feedback Effects studied of the #known relevant documents and #terms added to the query the more known relevant documents the better the more terms added to the query the better 19

20 Relevance feedback Issues unclear where to move to in the query space If the initial query performs poorly, an expanded version is likely to perform poorly too, even with relevance feedback Misspellings Cross-language retrieval (document vectors in other languages are not nearby) Total mismatch of searcher s vocabulary and collection vocabulary Relevance feedback can only work, if the cluster hypothesis holds 20

21 Cluster hypothesis Keith van Rijsbergen [2] Closely associated documents tend to be relevant to the same requests. Basic assumption of IR systems: relevant documents are more similar to each other than they are to non-relevant documents association between all document pairs (R-R), (R-NR) relevant docs (R) non-relevant docs (NR) 21

22 Cluster hypothesis Keith van Rijsbergen [2] Plot the relative frequency against the strength of association R-NR R-NR rel. freq. R-R rel. freq. R-R association association good separation high R-R association poor separation 22

23 Cluster hypothesis Keith van Rijsbergen [2] Clustering methods should Produce a stable clustering: no sudden changes when items are added Be tolerant to errors: small errors should lead to small changes in clustering Be independent of the initial item ordering Clustering fails Subsets of documents have very different important terms (Burma vs. Myanmar, TREC vs. CLEF) Queries that are inherently disjunctive (there is no document that contains all terms) Polysemy semantic approach: link both terms to the same concept in Wikipedia 23

24 Relevance feedback & users Not popular with users (4% on Excite search engine, 2000) Explicit feedback is difficut to get (makes interaction with the search interface cumbersome) Users get annoyed if after a feedback round the results do not (seemingly) get better Efficiency issues The longer the query, the longer it takes to process it Use top weighted query terms only, though according to Buckley [7], the more terms the better Up to 4000 terms per query 24

25 Pseudo-relevance feedback Blind relevance feedback No user Proxy for relevance judgments: assume top k documents to be relevant, same computation otherwise Works well in situations where many relevant documents appear in the top-k If the query performs well, pseudo-relevance feedback improves it further If the initial results are poor, they remain poor 25

26 Query expansion Users explicitly provide extra terms (instead of feedback on documents) Thesaurus-based expansion for the user No user input E.g. PubMed automatically expands the query 26

27 Query expansion Thesauri Can be built manually (time, money, people) Can be built automatically (co-occurrence statistics, large-scale corpora, processing power) Query reformulations based on log data Exploits manual data (*your* reformulation ideas) What about long-tail queries? 27

28 Hearst patterns Hearst [8] IS-A relations pluto Use simple patterns to decide on taxonomic relations asteroid planetoid celestial body natural object Pluto must be a planet 28

29 Hearst patterns Hearst [8] NP 0 such as {NP 1,NP 2,, (and or)} NP n American cars such as Chevrolet, Pontiac Such NP as {NP, }* {(or and)} NP such colors as red or orange NP {, NP}* {,} or other NP NP {, NP}* {,} and other NP NP {,} including {NP,}* {or and} NP NP {,} especially {NP,}* {or and} NP Bootstrapping the discovery of new patterns 1. Use standard patterns to find entity pairs 2. Where else do these pairs co-occur and with what terms? 29

30 A new type of retrieval models 30

31 Probabilistic models Binary independence model 2-Poisson model Language modeling * 31

32 Basic probabilities P(A, B) = P(A B)P(B) = P(B A)P(A) joint probability cond. probability chain rule P(A B) = P(B A) P(A) P(B) Bayes Theorem P(B) = P(A, B) + P(A, B) partition rule 32

33 Probabilistic models in IR uncertain understanding of information need Information need query query representation matching uncertain guess of relevance documents document representation P(D Q) = P(Q D) P(D) P(Q) central equation to language modeling 33

34 Probabilistic models in IR Probability models have a sounder theoretical foundation (reasoning under uncertainty) than boolean or vector space models And they usually perform better Documents are ranked according to their probability of being relevant, they are ranked accoring to P(D Q) Many models exist We start with language modeling 34

35 Language modeling Ponte & Croft, P(D Q) = P(Q D) P(D) P(Q) Idea: rank the documents by the likelihood of the query according to the language model CSKA If we throw all terms into a bag and randomly draw terms, for which document is the probability greater of drawing CSKA? Pontus Wernbloom's last-minute volley earned a draw to keep CSKA Moscow in the hunt in their last-16 Champions League match against Real Madrid. Real had looked set to return to Spain with a lead at the tie's halfway stage, but paid for their wastefulness. Cristiano Ronaldo fired under Sergei Chepchugov to open the scoring but fluffed a fine 84th-minute chance. CSKA had rarely threatened but Wernbloom crashed home with virtually the game's final kick to snatch a draw. Greece has avoided a nightmare scenario by agreeing to a 130bn euros ( 110bn; $170bn) bailout deal, Finance Minister Evangelos Venizelos has said. He said the deal was probably the most important in Greece's post-war history. The cabinet was meeting to discuss how to pass the reforms stipulated by international lenders, which include huge spending cuts and beefed-up monitoring by eurozone officials. Trade unions have called strikes and protests for Wednesday. 35

36 Language modeling Query Q is generated by a probabilistic model based on document D Pontus Wernbloom's last-minute volley earned a draw to keep CSKA Moscow in the hunt in their last-16 Champions League match against Real Madrid. Real had looked set to return to Spain with a lead at the tie's halfway stage, but paid for their wastefulness. Cristiano Ronaldo fired under Sergei Chepchugov to open the scoring but fluffed a fine 84th-minute chance. CSKA had rarely threatened but Wernbloom crashed home with virtually the game's final kick to snatch a draw. posterior likelihood prior P(D Q) P(Q D) " P(D) Goal Hard Relatively easy Ignore for now P(Q)has no use to the a s but in their with wernbloom real had draw cska moscow Maximum likelihood estimate 36

37 Language modeling uniform P(D Q) P(Q D) " P(D) Unigram (multinomial) language model: i P(Q D) = P(q i D) Assuming term independence makes the problem more tractable Problem reduced to estimating P(q i D) term frequencies What about the IDF component? (none in LM) 37

38 Language modeling Smoothing i P(Q D) = P(q i D) Q = {CSKA, Moscow,Greece} P(Q D) = P(CSKA D)P(Moscow D)P(Greece D) P(Q D) = " " 0 P(Q D) = 0 Smoothing methods smooth the document s language model (maximum likelihood prob. distribution) to avoid terms with zero probability 38

39 Language modeling Smoothing 0.06 maximum likelihood smoothed (JM) 0.05 P(term ID) Smoothing removes Some prob. Mass From the original ML Distribution and gives It to the zero terms Non-zero probability term ID 39

40 Language modeling Smoothing Smoothing methods consider 2 distributions Model P seen (t D) for terms occurring in the document Model P unseen (t D) for unseen terms not in the document log P(Q D) = log P(q i D) c(w; D) count of word w in D i i:c(q i :D)>0 i:c(q i; D)=0 = log P seen (q i D) + log P unseen (q i D) i:c(q i D)>0 = log P seen(q i D) P unseen (q i D) + log P unseen(q i D) i + i:c(q i :D)>0 i:(q i ;D)>0 log P unseen (q i D) log P unseen (q i D) 40

41 Language modeling Smoothing P unseen (w) is typically proportional of w P unseen (q i D) = d P(q i ) to the general frequency document dependent constant collection language model (previous slide) i:c(q i D)>0 log P(Q D) = log P seen(q i D) i:c(q i D)>0 P unseen (q i D) + log P (q D) unseen i i i = log P seen(q i D) " d P(q i ) + nlog" + log P(q ) d i term weight query length doc. independent prob. mass to unseen words 41

42 Language modeling Smoothing log P(Q D) log P seen(q i D) # " d P(q i ) + nlog" d i:c(q i D)>0 What happens to Similar to IDF P(q i ) if the query term is a stopword? nlog d : product of a document dependent constant and the query lengh Longer documents should receive less smoothing (greater penalty if a query term is missing) Similar to document length normalization 42

43 Language modeling Smoothing How to smooth P(w D) in practice? We look at 3 ideas (there are more) P ml (w D) = w c(w; D) c(w; D) General idea: discount probabilities of seen words, assign extra probability mass to unseen words with a fallback model (the collection language model) P(w D) = " P smoothed (w D) if word w is seen # $ d P(w ) otherwise all probabilities must sum to one. Laplace smoothing: simplest approach Add one count to every word P laplace (w D) = w c(w; D) +1 ( c(w; D) +1 ) 43

44 Language modeling Smoothing Jelineck-Mercer (JM) smoothing Linear interpolation between ML and collection LM P (w D) = (1" )P ml (w D) + P(w ), #(0,1) parameter controls amount of smoothing Term-dependent Jelineck-Mercer smoothing P w (w D) = (1" w )P ml (w D) + w P(w ) Different terms are smoothed to different degrees 44

45 Language modeling Smoothing Dirichlet smoothing * What is the effect of the denominator when D =100 and D =1000? P µ (w D) = c(w; D) + µp(w ) c(w; D) + µ w, usually µ > 100 The longer the document, the less smoothing is applied Absolute discounting subtract constant from counts P (w D) = max(c(w; D) ",0) c(w; D) # w + $ P(w ), %(0,1) and $ = D unique D all 45

46 Assignment II Retrieval experiments You will get a number of queries to run on the Wikipedia corpus (training data) You have to implement the vector space model + 1 modification (term normalization, language modeling, your own ideas) I make sure the output format and everything else is clear We will run a benchmark between the groups 46

47 Sources Introduction to Information Retrieval. Manning et al Information retrieval. Keith van Rijsbergen, Managing gigabytes, Witten et al The probability ranking principle in IR, S.E. Robertson, 1977 A study of smoothing methods for language models applied to ad hoc information retrieval. Zhai & Lafferty The importance of prior probabilities for entry page search. Kraaij et al The effect of adding relevance information in a relevance feedback environment. Buckley et al Automatic acquisition of hyponyms from large text corpora. MA Hearst

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk

More information

Language Models. Hongning Wang

Language Models. Hongning Wang Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity

More information

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13 Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

More information

University of Illinois at Urbana-Champaign. Midterm Examination

University of Illinois at Urbana-Champaign. Midterm Examination University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists

More information

A Study of Smoothing Methods for Language Models Applied to Information Retrieval

A Study of Smoothing Methods for Language Models Applied to Information Retrieval A Study of Smoothing Methods for Language Models Applied to Information Retrieval CHENGXIANG ZHAI and JOHN LAFFERTY Carnegie Mellon University Language modeling approaches to information retrieval are

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models Language Models and Smoothing Methods for Collections with Large Variation in Document Length Najeeb Abdulmutalib and Norbert Fuhr najeeb@is.inf.uni-due.de, norbert.fuhr@uni-due.de Information Systems,

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

CS 646 (Fall 2016) Homework 3

CS 646 (Fall 2016) Homework 3 CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 13: Query Expansion and Probabilistic Retrieval Paul Ginsparg Cornell University,

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Midterm Examination Practice

Midterm Examination Practice University of Illinois at Urbana-Champaign Midterm Examination Practice CS598CXZ Advanced Topics in Information Retrieval (Fall 2013) Professor ChengXiang Zhai 1. Basic IR evaluation measures: The following

More information

INFO 630 / CS 674 Lecture Notes

INFO 630 / CS 674 Lecture Notes INFO 630 / CS 674 Lecture Notes The Language Modeling Approach to Information Retrieval Lecturer: Lillian Lee Lecture 9: September 25, 2007 Scribes: Vladimir Barash, Stephen Purpura, Shaomei Wu Introduction

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Probabilistic Information Retrieval

Probabilistic Information Retrieval Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23 Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic

More information

Language Models, Smoothing, and IDF Weighting

Language Models, Smoothing, and IDF Weighting Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Lecture 3: Probabilistic Retrieval Models

Lecture 3: Probabilistic Retrieval Models Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

A REVIEW ARTICLE ON NAIVE BAYES CLASSIFIER WITH VARIOUS SMOOTHING TECHNIQUES

A REVIEW ARTICLE ON NAIVE BAYES CLASSIFIER WITH VARIOUS SMOOTHING TECHNIQUES Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016 Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Trevor Cohn (Slide credits: William Webber) COMP90042, 2015, Semester 1 What we ll learn in this lecture Probabilistic models for

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

IR Models: The Probabilistic Model. Lecture 8

IR Models: The Probabilistic Model. Lecture 8 IR Models: The Probabilistic Model Lecture 8 ' * ) ( % $ $ +#! "#! '& & Probability of Relevance? ' ', IR is an uncertain process Information need to query Documents to index terms Query terms and index

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006 Scribes: Gilly Leshed, N. Sadat Shami Outline. Review. Mixture of Poissons ( Poisson) model 3. BM5/Okapi method 4. Relevance feedback. Review In discussing probabilistic models for information retrieval

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Naive Bayes Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 20 Introduction Classification = supervised method for

More information

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Comparing Relevance Feedback Techniques on German News Articles

Comparing Relevance Feedback Techniques on German News Articles B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

A Risk Minimization Framework for Information Retrieval

A Risk Minimization Framework for Information Retrieval A Risk Minimization Framework for Information Retrieval ChengXiang Zhai a John Lafferty b a Department of Computer Science University of Illinois at Urbana-Champaign b School of Computer Science Carnegie

More information

Homework 6: Image Completion using Mixture of Bernoullis

Homework 6: Image Completion using Mixture of Bernoullis Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016 Classification: Naïve Bayes Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016 1 Sentiment Analysis Recall the task: Filled with horrific dialogue, laughable characters,

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 12: Latent Semantic Indexing and Relevance Feedback Paul Ginsparg Cornell

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Why Language Models and Inverse Document Frequency for Information Retrieval?

Why Language Models and Inverse Document Frequency for Information Retrieval? Why Language Models and Inverse Document Frequency for Information Retrieval? Catarina Moreira, Andreas Wichert Instituto Superior Técnico, INESC-ID Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal

More information

Probabilistic Field Mapping for Product Search

Probabilistic Field Mapping for Product Search Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This

More information

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 47 Big picture: KDDM

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Pivoted Length Normalization I. Summary idf II. Review

Pivoted Length Normalization I. Summary idf II. Review 2 Feb 2006 1/11 COM S/INFO 630: Representing and Accessing [Textual] Digital Information Lecturer: Lillian Lee Lecture 3: 2 February 2006 Scribes: Siavash Dejgosha (sd82) and Ricardo Hu (rh238) Pivoted

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning

More information

Classification Algorithms

Classification Algorithms Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage: Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS 6501: Natural Language Processing 1 This lecture Language Models What are

More information

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling Institute for Web Science & Technologies University of Koblenz-Landau, Germany Web Information Retrieval Dipl.-Inf. Christoph Carl Kling Exercises WebIR ask questions! WebIR@c-kling.de 2 of 40 Probabilities

More information

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information