Knowledge Discovery and Data Mining 1 (VO) ( )

Size: px
Start display at page:

Download "Knowledge Discovery and Data Mining 1 (VO) ( )"

Transcription

1 Knowledge Discovery and Data Mining 1 (VO) ( ) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

2 Big picture: KDDM Probability Theory Linear Algebra Map-Reduce Information Theory Statistical Inference Mathematical Tools Infrastructure Data Mining Preprocessing Transformation Knowledge Discovery Process Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

3 Outline 1 Introduction adn Recap 2 Probabilistic Generative Models 3 Topic Models 4 Probabilistic Latent Semantic Analysis Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

4 Introduction adn Recap Short recap: SVD and LSA Singular Value Decomposition Let M R m n be a matrix and let r be the rank of M (the rank of a matrix is the largest number of linearly independent rows or columns). Then we can find matrices U, V, and Σ with the following properties: U R m r is a column-orthonormal matrix V R n r is a column-orthonormal matrix Σ R r r is a diagonal matrix. The matrix M can be then written as: M = UΣV T Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

5 2. V is an n r column-orthonormal Introduction adn Recap matrix. Note that we always use V in its transposed form, so it is the rows of V T that are orthonormal. Short recap: SVD and LSA 3. Σ is a diagonal matrix; that is, all elements not on the main diagonal are 0. The elements of Σ are called the singular values of M. n r r n Σ V T r m M = U Figure 11.5: The form of a singular-value decomposition Figure: Figure from Mining Massive Datasets Example 11.8 : In Fig is a rank-2 matrix representing ratings of movies Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

6 Introduction adn Recap Short recap: SVD and LSA Let M be a utility matrix with people ratings for the movies The rows of M are people, the columns of M are movies The rows of U are people, the columns of U are concepts U connects people to concepts Then the rows of V T are concepts, the columns of V T are movies V connects movies to concepts Σ represents the importance of concepts Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

7 Introduction adn Recap Short recap: SVD and LSA Let M be a term-document matrix with term occurrences in the documents The rows of M are terms, the columns of M are documents The rows of U are terms, the columns of U are concepts U connects terms to concepts Then the rows of V T are concepts, the columns of V T are documents V connects documents to concepts Σ represents the importance of concepts Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

8 Introduction adn Recap Short recap: SVD and LSA Vector Space Model: documents are represented as term vectors Cosine similarity to compute scores Vector Space Model can not cope with two classic problems arising in natural languages Synonymy: two words having the same meaning Polysemy: one word having multiple meanings Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

9 Introduction adn Recap Short recap: SVD and LSA In latent semantic analysis (LSA) or latent semantic indexing (LSI) we use SVD to create a low-rank approximation of the term-document matrix We select k largest singular values and create M k approximation to the original matrix We thus map each term/document to a k-dimensional space of concepts These concepts are hidden (latent) in the collection They represent the semantic of the terms and documents E.g. the topics of terms and documents Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

10 Introduction adn Recap Short recap: SVD and LSA By computing low-rank approximation of the original term-document matrix the SVD brings together the terms with similar co-occurrences Retrieval quality may actually be improved by the approximation! As we reduce k recall improves A value of k in low hundreds tend to increase precision as well (this suggests that a suitable k addresses some of the challenges of synonymy) Retrieval by folding the query into the low-rank space Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

11 Introduction adn Recap Disadvantages of LSA Statistical foundation is missing SVD assumes normally distributed data But, term occurrence is not normally distributed Still, often it works remarkably good! Why? Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

12 Introduction adn Recap Disadvantages of LSA Statistical foundation is missing SVD assumes normally distributed data But, term occurrence is not normally distributed Still, often it works remarkably good! Why? Matrix entries are weighted (e.g. tf-idf) and those weighted entries may be normally distributed Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

13 Probabilistic Generative Models Recap: Model-based methods Statistical inference is based on fitting a probabilistic model of data The idea is based on a probabilistic or generative model Such models assign a probability for observing specific data examples, e.g. observing words in a text document Generative models are a powerful method to encode specific assumptions of how unknown parameters interact to create data Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

14 Probabilistic Generative Models Recap: Generative models How does a generative network model work? It defines a conditional probability distribution over data given a hypothesis P(D h) Given h we generate data from the conditional distribution P(D h) Generative models have many advantages The main disadvantage is that fitting of the models can be more complicated than an algorithmic approach Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

15 Probabilistic Generative Models Recap: Inference (Statistical) inference is the reverse of the generation process We are given some data D, e.g. a collection of documents We want to estimate the model, or more precisely the parameters of the hypothesis h that are most likely to have generated the data generation P(D h) D inference Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

16 Probabilistic Generative Models Recap: Naive Bayes document models We discussed generative models in connection with Naive Bayes classification We introduced multinomial generative model and Bernoulli generative model In the multinomial model we assume that the documents are generated from a multinomial distribution, i.e. the number of occurrences of terms in document is a multinomial r.v. In the Bernoulli model we assume that the documents are generated from a multivariate Bernoulli distribution The distributions were conditioned on the document class Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

17 Topic Models Topic models Document class is something that we observe in our data (at least in the training data) Other observable entities: documents and words However, there are some entities which are present but not observable, i.e. they are hidden They are latent E.g. concepts in LSA Let us call those entities topics Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

18 Topic Models Topic models A topic model is a probabilistic generative model that we can use to generate the observable data, i.e. documents and words In the other direction: inference When we observe a specific data instance we can infer the model Probabilistic model: we will have joint probability distributions Typically we will work with conditional probability distributions Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

19 Topic Models Probabilistic topic models Each document is a probability distribution over topics Distribution over topics represents the essence, the body, or the gist of a given document Each topic is a probability distribution over words Topic Education : School, Students, Education, University,... Topic Budget : Million, Finance, Tax, Program,... Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

20 Topic Models Document generation process 1 For each document d choose a mixture of topics z 2 For every word slot draw a topic from the mixture with probability p(z d) 3 Then draw a word from the topic with probability p(w z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

21 Topic Models Document generation process Figure: Figure from slides by Thomas Huffman Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

22 Probabilistic Latent Semantic Analysis Document generation process z w N M (b) mixture of unigrams d z w N M (c) plsi/aspect model Figure: Figure from LDA by Blei et al. Figure 3: Graphical model representation of different models of discrete data. ture of unigrams gment the unigram model with a discrete random topic variable z (Figure 3b), we o Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

23 Probabilistic Latent Semantic Analysis Distributions We are interested in the joint probability of the observable variables: p(d, w) However, we have a joint probability of the observable and latent variables p(d, w, z) Thus, we have to marginalize over z to obtain p(d, w) p(d, w) = z p(d, w, z) = z p(d, w z)p(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

24 Probabilistic Latent Semantic Analysis Recap: Conditional independence Definition Suppose P(C) > 0. Event A and B are conditionally independent given C if: P(A B C) = P(A C)P(B C) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

25 Probabilistic Latent Semantic Analysis Distributions We made the same assumption in Naive Bayes classification Documents and words are conditionally independent given the topic: p(d, w z) = p(d z)p(w z) p(d, w) = z p(d z)p(w z)p(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

26 Probabilistic Latent Semantic Analysis Distributions p(d, w) = z p(d z)p(w z)p(z) This is symmetric formulation of plsa We select a topic z and then with the probability p(d z) a document d and then with the probability p(w z) words for that document We repeat the process for all documents Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

27 Probabilistic Latent Semantic Analysis Distributions We can reformulate the last equation Let us see what is p(d, z) again using the assumption that d and w are independent p(d, z) = p(z)p(d z) = p(d)p(z d) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

28 Probabilistic Latent Semantic Analysis Distributions We can now substitute in the symmetric equation p(d, w) = z = z p(d z)p(w z)p(z) p(z d)p(w z)p(d) = p(d) z p(z d)p(w z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

29 Probabilistic Latent Semantic Analysis Distributions This is asymmetric formulation Thus, we first pick a document with p(d) and then select all words for that document from p(w d) given by p(d, w) = p(w d)p(d) = p(w d) = z p(w z)p(z d) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

30 Probabilistic Latent Semantic Analysis plsa Decomposition p(w i d j ) = K p(w i z k )p(z k d j ) k=1 Figure: Figure from slides by Josef Sivic Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

31 11.3. SINGULAR-VALUE DECOMPOSITION 409 Probabilistic Latent Semantic Analysis plsa comparison with SVD 2. V is an n r column-orthonormal matrix. Note that we always use V in its transposed form, so it is the rows of V T that are orthonormal. p(d, w) = 3. Σ is a diagonal matrix; that is, all p(w elements z)p(z)p(d z) not on the main diagonal are 0. The elements of Σ are called z the singular values of M. n r r n Σ V T r m M = U Figure 11.5: The form of a singular-value decomposition Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

32 Probabilistic Latent Semantic Analysis plsa comparison with SVD Word probabilities given topics p(w z): matrix U Document probabilities given topics p(d z): matrix V Topic probabilities p(z): matrix Σ Difference: values in all matrices are normalized and non-negative They are probabilities Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

33 Probabilistic Latent Semantic Analysis Parameter inference We will infer parameters using Maximum Likelihood Estimator (MLE) First, we need to write down the likelihood function Let n(w i, d j ) be the number of occurrences of word w i in document d j p(w i, d j ) is the probability of observing a single occurrence word w i in document d j Then, the probability of observing n(w i, d j ) occurrences of word w i in document d j is given by: p(w i, d j ) n(w i,d j ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

34 Parameter inference Probabilistic Latent Semantic Analysis The probability of observing the complete document collection is then given by the product of probabilities of observing every single word in every document with corresponding number of occurrences That is then the likelihood L = m i=1 j=1 n p(w i, d j ) n(w i,d j ) L = = m n n(w i, d j )log(p(w i, d j )) i=1 j=1 m n K n(w i, d j )log( p(w i z l )p(z l )p(d j z l )) i=1 j=1 l=1 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

35 Probabilistic Latent Semantic Analysis EM algorithm We can not maximize the likelihood analytically because of the logarithm of the sum A standard procedure is to use an algorithm called Expectation-Maximization (EM) This is an iterative method to estimate parameters of the models with latent variables Each iteration consists of two steps: expectation step (E) and maximization step (M) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

36 Probabilistic Latent Semantic Analysis EM algorithm In the E step we create a function of the expectation of the log-likelihood using the current parameter estimates In the M step we compute parameters which maximize the expectation of the log-likelihood These parameter estimates are used to determine the distribution of the latent variables in the next E step Let us illustrate the EM algorithm in a general case Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

37 Probabilistic Latent Semantic Analysis EM algorithm We observe some data D generated by a probabilistic model with parameters θ and some latent variables z We are interested in the likelihood of that data D given the parameters θ: p(d θ) However, there exist a joint probability distribution of data D and latent variables z: p(d, z θ) Thus, to obtain p(d θ) we have to marginalize out z: p(d θ) = z p(d z, θ)p(z θ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

38 EM algorithm Probabilistic Latent Semantic Analysis We are now interested in maximizing this likelihood, which is equivalent to maximizing log-likelihood log(p(d θ)) = log( z p(d z, θ)p(z θ)) Jensen s inequality for concave functions such as log gives us: E[f (x)] f (E[x]) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

39 EM algorithm Probabilistic Latent Semantic Analysis log( z p(d z, θ)p(z θ)) = log( p(d z, θ)p(z θ) q(z) q(z) ) z z p(d z, θ)p(z θ) = log( q(z)) q(z) p(d z, θ)p(z θ) = log(e[ ]) q(z) This is by the Jensen s inequality greater or equal to: log(e[ p(d z, θ)p(z θ) p(d z, θ)p(z θ) ]) E[log( )] q(z) q(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

40 Probabilistic Latent Semantic Analysis EM algorithm E[log( p(d z,θ)p(z θ) q(z) )] is then lower bound on the likelihood Thus, we can maximize this lower bound EM algorithm maximizes exactly this lower bound Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

41 EM algorithm Probabilistic Latent Semantic Analysis E[log( p(d z, θ)p(z θ) )] = q(z) z q(z)log( p(d z, θ)p(z θ) ) q(z) = p(z D, θ)p(d θ) q(z)log( ) q(z) z = q(z)log(p(d θ)) + p(z D, θ) q(z)log( ) q(z) z z = log(p(d θ)) q(z) q(z)log( p(z D, θ) ) z q(z) This will have maximum when z q(z)log( p(z D,θ) ) = 0 This is the case when q(z) = p(z D, θ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

42 EM algorithm Probabilistic Latent Semantic Analysis p(z D, θ) is the posterior of z z q(z)log( q(z) p(z D,θ) ) is Kullback-Leibler (KL) divergence Thus, in E step we use the current values of parameters to calculate the posterior of z M step is then problem dependent Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

43 Probabilistic Latent Semantic Analysis EM algorithm for plsa p(z w, d) = = = p(z, w, d) p(w, d) p(d)p(z d)p(w z) z p(d)p(z d)p(w z) p(z d)p(w z) z p(z d)p(w z) p(w z) d p(z d) w n(d, w)p(z d, w) n(d, w)p(z d, w) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

44 Probabilistic Latent Semantic Analysis Example IPython Notebook examples Slightly modified code from: http: //kti.tugraz.at/staff/denis/courses/kddm1/plsa.ipynb Command Line ipython notebook pylab=inline plsa.ipynb Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

45 Example Probabilistic Latent Semantic Analysis User Movie Matrix Alien Star Wars Casablanca Titanic Joe Jim John Jack Jill Jenny Jane Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

46 Example Probabilistic Latent Semantic Analysis User Movie Matrix Alien Star Wars Casablanca Titanic Joe Jim John Jack Jill Jenny Jane Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

47 Example Probabilistic Latent Semantic Analysis \segment 1" \segment 2" \matrix 1" \matrix 2" \line 1" \line 2" \power 1" power 2" imag speaker robust manufactur constraint alpha POWER load SEGMENT speech MATRIX cell LINE redshift spectrum memori texture recogni eigenvalu part match LINE omega vlsi color signal uncertainti MATRIX locat galaxi mpc POWER tissue train plane cellular imag quasar hsup systolic brain hmm linear famili geometr absorp larg input slice source condition design impos high redshift complex cluster speakerind. perturb machinepart segment ssup galaxi arrai mri SEGMENT root format fundament densiti standard present volume sound suci group recogn veloc model implement Figure 3: Eight selected factors from a 128 factor decomposition. The displayed word stems are the 10 most probable words in the class-conditional distribution P (wjz), from top to bottom in descending order. Figure: From Hofmann, 2000 Document 1,P fzkjd1;wj =`segment`g =(0:951; 0:0001;:::) P fwj =`segment`jd1 g =0:06 SEGMENT medic imag challeng problem eld imag analysi diagnost base proper SEGMENT digit imag SEGMENT medic imag need applic involv estim boundari object classif tissu abnorm shape analysi contour detec textur SEGMENT despit exist techniqu SEGMENT specif medic imag remain crucial problem [...] Document 2,P fzkjd2;wj =`segment`g =(0:025; 0:867;:::) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

48 Performance Probabilistic Latent Semantic Analysis The performance of a retrieval system based on this model (PLSI) In IRwas typically found superior to that toof both: the VSM vector and space LSA based similarity (cos) and a non-probabilistic latent semantic indexing (LSI) method. (We skip details here.) From Th. Hofmann, 2000 Figure: From Hofmann, 2000 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Sample Examination Questions Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 22 Exercise Suppose we have a utility

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Latent semantic indexing

Latent semantic indexing Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:

More information

Text Mining for Economics and Finance Unsupervised Learning

Text Mining for Economics and Finance Unsupervised Learning Text Mining for Economics and Finance Unsupervised Learning Stephen Hansen Text Mining Lecture 3 1 / 46 Introduction There are two main divisions in machine learning: 1. Supervised learning seeks to build

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Web Search and Text Mining. Lecture 16: Topics and Communities

Web Search and Text Mining. Lecture 16: Topics and Communities Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

Dimensionality Reduction

Dimensionality Reduction 394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Matrix decompositions and latent semantic indexing

Matrix decompositions and latent semantic indexing 18 Matrix decompositions and latent semantic indexing On page 113, we introduced the notion of a term-document matrix: an M N matrix C, each of whose rows represents a term and each of whose columns represents

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Non-Negative Matrix Factorization

Non-Negative Matrix Factorization Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way. 10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information