Social Data Mining Trainer: Enrico De Santis, PhD

Size: px
Start display at page:

Download "Social Data Mining Trainer: Enrico De Santis, PhD"



2 Outlines Vector Semantics From plain text to mathematical representations Linear algebra in pills Assessing the semantic content in text term-context matrix and term document matrix Similarity e dissimilarity computation The cosine similarity family Applications 2

3 Vector Semantics One of the biggest obstacles to making full use of the power of computers is that they currently understand very little of the meaning of human language. The term Semantics is used here in a general sense, as the meaning of a word, a phrase, a sentence, or any text in human language, and the study of such meaning. We are not concerned with narrower senses of semantics, such as the semantic web or approaches to semantics based on formal logic. 3

4 Vector Semantics What is Vector Semantics? To understand vector semantics and to apply related techniques to social network analysis and text analysis it is necessary to be ferried from the humanities world to the fantastic world of math... Di Joachim Patinir - Museo Nacional del Prado, Pubblico dominio, 4

5 Vector Semantics To face any type of text analysis it is of paramount importance to represent documents in a corpus in a mathematical space. We will see how to embed a set of documents or words in a vector space, hence applying all algebraic properties: To measure the similarity between words and grasp their semantics. To measure the similarity between documents. To extract a set of features and build a logic structure for applying advanced analysis techniques such as supervised and unsupervised learning (machine learning). 5

6 Before to start Words and documents will be suitably embedded in a vector space. Before to start it is fruitful to recall some notions used sometimes explicitly, other times not explicitly, in all future analysis. We will recall: the notion of vector space and vectors; the basic manipulations of vectors; the notion of matrix and the basic manipulations; how to calculate the distance between two vectors; 6

7 Background: space, vectors, matrix A vector space (also called a linear space) is a collection of objects called vectors, which may be added together and multiplied ("scaled") by numbers, called scalars in this context. Graphically vectors are represented as arrows, but they are regarded as abstract mathematical objects with particular properties Geometrically vectors represent points in a given space. Vector addition and scalar multiplication: a vector vv (blue) is added to another vector ww (red) ww is stretched by a factor of 2, yielding the sum vv + 2 ww. 7

8 Background: space, vectors, matrices Mathematically the bi-dimensional surface of a table or the tri-dimensional space in which our body moves, are examples of vector spaces. The vector pp is expressed by its components: pp = [ aa 1, aa 2, aa 3 ] Dimension = 3 (space: R 3 ) pp Components or coordinates in a Cartesian space aa 1 aa 2 aa 3 In a n-dimensional space, given n non aligned vectors all other vectors can be expressed mathematically by a (linear) combination of these n vectors. 8

9 Background: linear combination If vv 1, vv 2,, vv nn are vectors and a 1,...,a n are scalars, then the linear combination of those vectors with those scalars as coefficients is: ww = aa 1 vv 1 + aa 2 vv 2 +,, +aa nn vv nn Example: Consider the vectors e 1 = (1,0,0), e 2 = (0,1,0) and e 3 = (0,0,1). Then any vector in R 3 is a linear combination of e 1, e 2 and e 3 : To see that take an arbitrary vector in R 3 : xx = [aa 1, aa 2, aa 3 ] and write: xx = [aa 1, aa 2, aa 3 ] = aa 1, 0,0 + 0, aa 2, 0 + 0,0, aa 3 = aa 1 1,0,0 + aa 2 0,1,0 + aa 3 0,0,1 = aa 1 ee 1 + aa 2 ee 2 + aa 3 ee 3. We use only the addition and the multiplication for a scalar! 9

10 Background: linear dependence A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others; if no vector in the set can be written in this way, then the vectors are said to be linearly independent. 10

11 Background: scalar product between two vectors Given two vectors vv and ww (in any dimension) it is defined the scalar product as: vv ww = vv 1 ww 1 + vv 2 ww 2 +,, +vv nn ww nn = scalar value (number). If vv is orthogonal to ww then vv ww=0 (vectors forms an angle of 90 degree). Through the scalar product it is possible to define the length of a vector ww also called (Euclidean) vector norm ww : ww =length= ww ww = ww 1 ww 1 + ww 2 ww 2 +,, +ww nn ww nn. The scalar product helps to define the angle between two vectors: vv ww cosine aaaaaaaaaa vv, ww = ww vv vv ww ww vv 11

12 Background: Euclidean distance! Given two vectors vv and ww (in any dimension) it is defined the Euclidean distance dd ww, vv : dd ww, vv = ww vv = ww vv ww vv = = (vv 1 ww 1 ) 2 +(vv 2 ww 2 ) 2 +,, +(vv nn ww nn ) 2 = = nn ii=1 (vv ii ww ii ) 2. ww There are many possible definitions of distance in machine learning, one example is the weighted Euclidean distance: given a set of weights aa 1, aa 2, aa nn we have: vv dd ww, vv; aa = nn ii=1 aa ii (vv ii ww ii ) 2. 12

13 Background: Matrices A matrix AA = [aa iiii ] is a table of values or a collection of row vectors or column vectors: Rows space Columns space Matrix multiplication 13

14 Vector Space Model and information retrieval The VSM was developed for the SMART information retrieval system (Salton, 1971) by Gerard Salton and his colleagues (Salton, Wong, & Yang, 1975). The idea of the VSM is to represent each document in a collection as a point in a space (a vector in a vector space). Points that are close together in this space are semantically similar and points that are far apart are semantically distant. The user s query is represented as a point in the same space as the documents (the query is a pseudo-document) 14

15 Distributional models of meaning = vector-space models of meaning = vector semantics Intuitions: Zellig Harris (1954): oculist and eye-doctor occur in almost the same environments If A and B have almost identical environments we say that they are synonyms. Firth (1957): You shall know a word by the company it keeps! 15

16 Vector semantics Nida example: A bottle of tesgüino is on the table Everybody likes tesgüino Tesgüino makes you drunk We make tesgüino out of corn. From context words humans can guess tesgüino means an alcoholic beverage like beer Intuition for algorithm: Two words are similar if they have similar word contexts. 16

17 Therefore a working hypothesis statistical semantics hypothesis: statistical patterns of human word usage can be used to figure out what people mean (George Furnas, University of Michigan). Similarity of Words: The Word-Context Matrix. Similarity of Documents: The Term-Document Matrix. 17

18 The Word-Context Matrix Wittgenstein was primarily interested in the physical activities that form the context of word usage (e.g., the word brick, spoken in the context of the physical activity of building a house). The distributional hypothesis in linguistics is that words that occur in similar contexts tend to have similar meanings (Harris, 1954). A word may be represented by a vector in which the elements are derived from the occurrences of the word in various contexts, such as windows of words (Lund & Burgess, 1996). They can be used also richer contexts such as grammatical dependencies (Lin, 1998; Pado & Lapata, 2007), or dependency graphs between words. 18

19 The Word-Context Matrix The Word-Context matrix known also as the Term-Term matrix is a matrix in which columns are labeled by words. Indicating with V the number of unique words in a corpus or document (i.e. types) the matrix is of dimension V V, each cell records the number of times the row (target) word and the column (context) word co-occur in some context in some training corpus. Usually the context is a window around the word, for example of 7 words to the left and 7 words to the right, in which case the cell represents the number of times (in some training corpus) the column word occurs in such a 7 word window around the row word. 19

20 The Word-Context Matrix Example from Brown Corpus: Context window Context window 7 7 word Sample of the Word-Context matrix (the real matrix is higher) constructed as raw frequency of the cooccurrence of two words The graph is a spatial visualization of word vectors for digital and information, showing just two of the dimensions, corresponding to the words data and result. Note that V, the length of the vector, is generally the size of the vocabulary, usually between 10,000 and 50,000 words. Most of these numbers are zero hence the matrix is called sparse. 20

21 The Word-Context Matrix The size of the window used to collect counts can vary based on the goals of the representation, but is generally between 1 and 8 words on each side of the target word (for a total context of 3-17 words). In general, the shorter the window, the more syntactic the representations, since the information is coming from immediately nearby words; the longer the window, the more semantic the relations. Two words have first-order co-occurrence (sometimes called syntagmatic association) if they are typically nearby each other. Thus wrote is a first-order associate of book or poem. Two words have second-order cooccurrence (sometimes called paradigmatic association) if they have similar neighbors. Thus wrote is a second order associate of words like said or remarked. 21

22 The W-C Matrix: Measuring the association between words It turns out, however, that simple frequency count isn t the best measure of association between words. If we want to know what kinds of contexts are shared by apricot and pineapple but not by digital and information, we re not going to get good discrimination from words like the, it, or they, which occur frequently with all sorts of words and aren t informative about any particular word. Instead we d like context words that are particularly informative about the target word. The best weighting or measure of association between words should tell us how much more often than chance the two words co-occur. 22

23 The Mutual Information and Pointwise Mutual Information (PMI) Pointwise mutual information is just such a measure (Church and Hanks, 1989) and (Church and Hanks, 1990). The mutual information between two random variables XX and YY is: PP(xx, yy) II XX, YY = PP xx, yy llllll 2 ( PP xx PP(yy) ) xx yy The pointwise mutual information (Fano, 1961) is a measure of how often two events xx and yy occur, compared with what we would expect if they were independent: PP(xx, yy) II xx, yy = llllll 2 PP xx, PP(yy) 23

24 The Positive Pointwise Mutual Information (PPMI) We can apply this intuition to co-occurrence vectors by defining the pointwise mutual information association between a target word ww and a context word cc as: PP(ww, cc) PPPPPP ww, cc = llllll 2 PP ww PP(cc) The numerator tells us how often we observed the two words together. The denominator tells us how often we would expect the two words to cooccur assuming they each occurred independently (so their probabilities could just be multiplied). Thus, the ratio gives us an estimate of how much more the target and feature co-occur than we expect by chance. 24

25 The Positive Pointwise Mutual Information (PPMI) PMI values range from negative to positive infinity. Negative PMI values (which imply things are co-occurring less often than we would expect by chance) tend to be unreliable unless our corpora are enormous. Furthermore it s not clear whether it s even possible to evaluate such scores of unrelatedness with human judgments. It is common to use Positive PMI (called PPMI) which replaces all negative PMI values with zero. PP(ww, cc) PPPPPPPP ww, cc = max(llllll 2 PP ww PP(cc), 0) 25

26 PPMI, an example Let s assume we have a co-occurrence matrix FF with WW rows (words) and CC columns (contexts), where ff iiii gives the number of times word ww ii occurs in context cc jj. CC (cccccccccccc) WW (rrrrrrrr) 26

27 PPMI, an example Thus for example we could compute PPPPPPPP(ww = iiiiiiiiiiiiiiiiiiiiii, cc = dddddddd), assuming we pretended that Fig. below encompassed all the relevant word contexts/dimensions. CONTEXT Joint probabilities and marginals PPMI values 27

28 The Term-Document Matrix In a Term-Document matrix or Word-Document matrix, each row represents a word in the vocabulary and each column represents a document from some collection. Each cell in this matrix represents the number of times a particular word (defined by the row) occurs in a particular document (defined by the column). 28

29 The Term-Document Matrix Used for document indexing (e.g. search engines) Retrieval based on similarity between documents. Similarity based on occurrence frequencies of keywords in query and document. Base Hypothesis: Bag of Words order does not matter; Documents and queries are both vectors. For each term, ii, in a document or query, jj, is given a real-valued weight, ww iiii Both documents and queries are expressed as a VV -dimensional vectors: dd jj = (ww 1jj, ww 2jj,, ww VV jj ). 29

30 The Term-Document Matrix Each cell: count of term t in a document d: tf t,d : Each document is a count vector in N V below. a column We can think of the vector for a document as identifying a point in V -dimensional space. As You Like It Twelfth Night Julius Caesar Henry V battle soldier fool clown

31 The Term-Document Matrix V dimension of the vocabulary This is the general form of a t-d matrix where instead the frequency of the words in the document we may also use functions of this frequency like the log frequency or other weighting schemes. 31

32 Term-Document Matrix - weighting More frequent terms in a document are more important, i.e. more indicative of the topic. ff iiii frequency of term ii in document jj ; May want to normalize term frequency (tf) by dividing by the frequency of the most common term in the document: tttt iiii = ff iiii max jj ff iiii ; Terms that appear in many different documents are less indicative of overall topic. ddff ii document frequency of term ii = (number of documents containing term ii ); iiiiff ii = log2 (NN/ dddd ii ) inverse document frequency of term ii, (NN: total number of documents); A typical combined term importance indicator is tf-idf weighting: ww iiii = tttt iiii iiiiii ii = tttt iiii log2 (NN/ dddd ii ) ; A term occurring frequently in the document but rarely in the rest of the collection is given high weight (other approaches can be used). 32

33 T-D Matrix Measuring the Similarity Dual representation: (1) documents in term space or (2) terms in document space tt 1 dd ii = ww iii tt 1 +ww iii tt 2 + +ww iiii tt nn dd 1 tt ss = ww 1ss dd 1 +ww 2ss dd 2 + +ww iiii dd ii (1) (2) tt nn dd ii tt 3 tt 2 XX Cosine similarity between two documents: dd 3 dd 2 XX TT n d j dk w i 1 i, jw = ik, sim( d j, dk) = = d j dk w w n 2 n 2 i= 1 i, j i= 1 ik, Useful for measuring semantical similarity of terms given a corpus A query document qq can be compared with each column of document-term matrix XX and results can be used for ranking (document indexing). 33

34 Dot Product and cosine similarity Most metrics for similarity between vectors are based on the dot product. The dot product acts as a similarity metric because it will tend to be high just when the two vectors have large values in the same dimensions. vv ww = nn ii=1 vv ii ww ii = vv 1 ww 1 + vv 2 ww 2 +,, +vv nn ww nn Recall the vector length: vv = ii vv ii 2. The dot product is higher if a vector is longer, with higher values in each dimension. More frequent words have longer vectors, since they tend to co-occur with more words and have higher co-occurrence values with each of them. The raw dot product thus will be higher for frequent words (problem). Solution: The simplest way to modify the dot product to normalize for the vector length is to divide the dot product by the lengths of each of the two vectors or prenormalizing each vector to have unitary length: cosine vv, ww = vv ww ww vv 34

35 How about the correlation metric? Hence the cosine similarity coincides with the dot product when the involved vectors are normalized with unitary length. Both similarity measures are based on dot product. How about correlation? Cosine similarity is not invariant to shifts. If x was shifted to x+1, the cosine similarity would change. What is invariant, though, is the Pearson correlation. If, ww, vv are the respective mean values of ww, vv: CCCCCCCC ww, vv = ii vv ii vv ii ww ii ww ii vv ii vv 2 ii ww ii ww 2 = vv vv ww ww vv vv ww ww = cccccccccccc(ww ww, vv vv) 35

36 Correlation metrics: an application Hierarchical clustering: clustering of vectors as a way to visualize what words are most similar to other ones (Rohde et al., 2006). Using hierarchical clustering to visualize 4 noun classes from the embeddings produced by Rohde et al. (2006). These embeddings use a window size of 4, and 14,000 dimensions, with 157 closed-class words removed. This visualization uses hierarchical clustering, with correlation as the similarity function 36

37 Cosine similarity: an example Depending on the application the cosine similarity can be used in Term-Document matrix or Term-Term matrix (it can be used for any vector space). Let s see how the cosine computes which of the words apricot or digital is closer in meaning to information, just using raw counts from the following simplified table: The model decides that information is closer to digital than it is to apricot, a result that seems sensible. Recall: for small angles the cosine measure is higher more similar 37

38 Applications The Term-Document matrix is arranged with words-types in row and documents in column. It can be useful in assessing the meaning of words. Depending on the application it can be fruitful considering the transpose of this matrix, i.e. the Document-Term matrix, which has the documents as rows and the wordstypes as columns. The distinction depends on the task we would like to accomplish. For example in machine learning (ML) and Data Analysis, such as in documents classification we may want to classify documents (i.e. in Sentiment Analysis). Considering the Document-Term matrix as a dataset, words-types are considered as variables (or features in ML jargon) and documents as measurements (or patterns in ML and Pattern Recognition jargon). 38

39 Conclusions Text mining applications are heavily based on vector semantics. Vector semantics works with Vector space models that ar in charge of transforming plain text in a suitable mathematical space (vectorial space) letting to use the power of linear algebra and measure similarity or dissimilarity of objects. As we will see, in machine learning the availability of vectorial data is of paramount importance because we dispose of features describing objects on which a suitable learning procedure can be adopted through a suitable computer program. E.g. the term-document matrix is an important building block in Social Data mining, specifically in text analysis and allow applications such as thematic analysis of contents, text summarization, sentiment analysis and opinion mining and so on... However, we can imagine a similar matrix structure consisting of different features than words and documents. Features can be all interesting information that we want to use to represent object extracted from Social Media platforms (age, gender, geolocation, hardware used...). 39

Word Meaning and Similarity. Word Similarity: Distributional Similarity (I)

Word Meaning and Similarity. Word Similarity: Distributional Similarity (I) Word Meaning and Similarity Word Similarity: Distributional Similarity (I) Problems with thesaurus-based meaning We don t have a thesaurus for every language Even if we do, they have problems with recall

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: Outline Vector Semantics Sparse

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: Outline Vector Semantics Sparse

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 9: Lexical semantics (Feb 19, 2019) David Bamman, UC Berkeley Lexical semantics You shall know a word by the company it keeps [Firth 1957] Harris 1954

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information


DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze Querying Corpus-wide statistics Querying

More information

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Content adapted from Hinrich Schütze Querying Corpus-wide statistics

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

Embeddings Learned By Matrix Factorization

Embeddings Learned By Matrix Factorization Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix

More information

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication. This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication. Copyright Pearson Canada Inc. All rights reserved. Copyright Pearson

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

Natural Language Processing

Natural Language Processing David Packard, A Concordance to Livy (1968) Natural Language Processing Info 159/259 Lecture 8: Vector semantics and word embeddings (Sept 18, 2018) David Bamman, UC Berkeley 259 project proposal due 9/25

More information


CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions):

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy Abstract Current Information Retrieval

More information

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review INFO 4300 / CS4300 Information Retrieval IR 9: Linear Algebra Review Paul Ginsparg Cornell University, Ithaca, NY 24 Sep 2009 1/ 23 Overview 1 Recap 2 Matrix basics 3 Matrix Decompositions 4 Discussion

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

Natural Language Processing

Natural Language Processing David Packard, A Concordance to Livy (1968) Natural Language Processing Info 159/259 Lecture 8: Vector semantics (Sept 19, 2017) David Bamman, UC Berkeley Announcements Homework 2 party today 5-7pm: 202

More information

Generic Text Summarization

Generic Text Summarization June 27, 2012 Outline Introduction 1 Introduction Notation and Terminology 2 3 4 5 6 Text Summarization Introduction Notation and Terminology Two Types of Text Summarization Query-Relevant Summarization:

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Vectors and their uses

Vectors and their uses Vectors and their uses Sharon Goldwater Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh DRAFT Version 0.95: 3 Sep 2015. Do not redistribute without permission.

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 Roberto Battiti

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations As stated in Section G, Definition., a linear equation in two variables is an equation of the form AAAA + BBBB = CC, where AA and BB are not both zero. Such an equation has

More information

Mathematical Methods 2019 v1.2

Mathematical Methods 2019 v1.2 Examination This sample has been compiled by the QCAA to model one possible approach to allocating marks in an examination. It matches the examination mark allocations as specified in the syllabus (~ 60%

More information

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides CSE 494/598 Lecture-4: Correlation Analysis LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Project-1 Due: February 12 th 2016 Analysis report:

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information


INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes Outline Precision and Recall The problem with indexing so far Intuition for solving it Overview of the solution The Math How to measure

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

Vector-based Models of Semantic Composition. Jeff Mitchell and Mirella Lapata, 2008

Vector-based Models of Semantic Composition. Jeff Mitchell and Mirella Lapata, 2008 Vector-based Models of Semantic Composition Jeff Mitchell and Mirella Lapata, 2008 Composition in Distributional Models of Semantics Jeff Mitchell and Mirella Lapata, 2010 Distributional Hypothesis Words

More information

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra Worksheets for GCSE Mathematics Solving Equations Mr Black's Maths Resources for Teachers GCSE 1-9 Algebra Equations Worksheets Contents Differentiated Independent Learning Worksheets Solving Equations

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

LINEAR ALGEBRA: THEORY. Version: August 12,

LINEAR ALGEBRA: THEORY. Version: August 12, LINEAR ALGEBRA: THEORY. Version: August 12, 2000 13 2 Basic concepts We will assume that the following concepts are known: Vector, column vector, row vector, transpose. Recall that x is a column vector,

More information

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang / CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / What we know so far Decision Trees What is a decision tree, and how to induce it from

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 Outline Background

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

7.1 What is it and why should we care?

7.1 What is it and why should we care? Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should

More information

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes

More information

Dealing with Text Databases

Dealing with Text Databases Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Information Retrieval. Lecture 6

Information Retrieval. Lecture 6 Information Retrieval Lecture 6 Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces This lecture

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima DEPARTMENT

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator Lecture No. 1 Introduction to Method of Weighted Residuals Solve the differential equation L (u) = p(x) in V where L is a differential operator with boundary conditions S(u) = g(x) on Γ where S is a differential

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY Overview Latent Semantic

More information

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers. 2 VECTORS, POINTS, and LINEAR ALGEBRA. At first glance, vectors seem to be very simple. It is easy enough to draw vector arrows, and the operations (vector addition, dot product, etc.) are also easy to

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 Abstract Representing a word by its co-occurrences

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

1. Ignoring case, extract all unique words from the entire set of documents.

1. Ignoring case, extract all unique words from the entire set of documents. CS 378 Introduction to Data Mining Spring 29 Lecture 2 Lecturer: Inderjit Dhillon Date: Jan. 27th, 29 Keywords: Vector space model, Latent Semantic Indexing(LSI), SVD 1 Vector Space Model The basic idea

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima DEPARTMENT

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Scoring, Term Weighting and the Vector Space

Scoring, Term Weighting and the Vector Space Scoring, Term Weighting and the Vector Space Model Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content [J

More information

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen ( Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium Mapping of Science Bart Thijs ECOOM, K.U.Leuven, Belgium Introduction Definition: Mapping of Science is the application of powerful statistical tools and analytical techniques to uncover the structure

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information


CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

A Introduction to Matrix Algebra and the Multivariate Normal Distribution A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An

More information