Knowledge Discovery and Data Mining 1 (VO) ( )
|
|
- Geraldine Jackson
- 6 years ago
- Views:
Transcription
1 Knowledge Discovery and Data Mining 1 (VO) ( ) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
2 Big picture: KDDM Probability Theory Linear Algebra Map-Reduce Information Theory Statistical Inference Mathematical Tools Infrastructure Data Mining Preprocessing Transformation Knowledge Discovery Process Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
3 Outline 1 Introduction adn Recap 2 Probabilistic Generative Models 3 Topic Models 4 Probabilistic Latent Semantic Analysis Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
4 Introduction adn Recap Short recap: SVD and LSA Singular Value Decomposition Let M R m n be a matrix and let r be the rank of M (the rank of a matrix is the largest number of linearly independent rows or columns). Then we can find matrices U, V, and Σ with the following properties: U R m r is a column-orthonormal matrix V R n r is a column-orthonormal matrix Σ R r r is a diagonal matrix. The matrix M can be then written as: M = UΣV T Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
5 2. V is an n r column-orthonormal Introduction adn Recap matrix. Note that we always use V in its transposed form, so it is the rows of V T that are orthonormal. Short recap: SVD and LSA 3. Σ is a diagonal matrix; that is, all elements not on the main diagonal are 0. The elements of Σ are called the singular values of M. n r r n Σ V T r m M = U Figure 11.5: The form of a singular-value decomposition Figure: Figure from Mining Massive Datasets Example 11.8 : In Fig is a rank-2 matrix representing ratings of movies Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
6 Introduction adn Recap Short recap: SVD and LSA Let M be a utility matrix with people ratings for the movies The rows of M are people, the columns of M are movies The rows of U are people, the columns of U are concepts U connects people to concepts Then the rows of V T are concepts, the columns of V T are movies V connects movies to concepts Σ represents the importance of concepts Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
7 Introduction adn Recap Short recap: SVD and LSA Let M be a term-document matrix with term occurrences in the documents The rows of M are terms, the columns of M are documents The rows of U are terms, the columns of U are concepts U connects terms to concepts Then the rows of V T are concepts, the columns of V T are documents V connects documents to concepts Σ represents the importance of concepts Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
8 Introduction adn Recap Short recap: SVD and LSA Vector Space Model: documents are represented as term vectors Cosine similarity to compute scores Vector Space Model can not cope with two classic problems arising in natural languages Synonymy: two words having the same meaning Polysemy: one word having multiple meanings Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
9 Introduction adn Recap Short recap: SVD and LSA In latent semantic analysis (LSA) or latent semantic indexing (LSI) we use SVD to create a low-rank approximation of the term-document matrix We select k largest singular values and create M k approximation to the original matrix We thus map each term/document to a k-dimensional space of concepts These concepts are hidden (latent) in the collection They represent the semantic of the terms and documents E.g. the topics of terms and documents Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
10 Introduction adn Recap Short recap: SVD and LSA By computing low-rank approximation of the original term-document matrix the SVD brings together the terms with similar co-occurrences Retrieval quality may actually be improved by the approximation! As we reduce k recall improves A value of k in low hundreds tend to increase precision as well (this suggests that a suitable k addresses some of the challenges of synonymy) Retrieval by folding the query into the low-rank space Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
11 Introduction adn Recap Disadvantages of LSA Statistical foundation is missing SVD assumes normally distributed data But, term occurrence is not normally distributed Still, often it works remarkably good! Why? Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
12 Introduction adn Recap Disadvantages of LSA Statistical foundation is missing SVD assumes normally distributed data But, term occurrence is not normally distributed Still, often it works remarkably good! Why? Matrix entries are weighted (e.g. tf-idf) and those weighted entries may be normally distributed Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
13 Probabilistic Generative Models Recap: Model-based methods Statistical inference is based on fitting a probabilistic model of data The idea is based on a probabilistic or generative model Such models assign a probability for observing specific data examples, e.g. observing words in a text document Generative models are a powerful method to encode specific assumptions of how unknown parameters interact to create data Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
14 Probabilistic Generative Models Recap: Generative models How does a generative network model work? It defines a conditional probability distribution over data given a hypothesis P(D h) Given h we generate data from the conditional distribution P(D h) Generative models have many advantages The main disadvantage is that fitting of the models can be more complicated than an algorithmic approach Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
15 Probabilistic Generative Models Recap: Inference (Statistical) inference is the reverse of the generation process We are given some data D, e.g. a collection of documents We want to estimate the model, or more precisely the parameters of the hypothesis h that are most likely to have generated the data generation P(D h) D inference Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
16 Probabilistic Generative Models Recap: Naive Bayes document models We discussed generative models in connection with Naive Bayes classification We introduced multinomial generative model and Bernoulli generative model In the multinomial model we assume that the documents are generated from a multinomial distribution, i.e. the number of occurrences of terms in document is a multinomial r.v. In the Bernoulli model we assume that the documents are generated from a multivariate Bernoulli distribution The distributions were conditioned on the document class Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
17 Topic Models Topic models Document class is something that we observe in our data (at least in the training data) Other observable entities: documents and words However, there are some entities which are present but not observable, i.e. they are hidden They are latent E.g. concepts in LSA Let us call those entities topics Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
18 Topic Models Topic models A topic model is a probabilistic generative model that we can use to generate the observable data, i.e. documents and words In the other direction: inference When we observe a specific data instance we can infer the model Probabilistic model: we will have joint probability distributions Typically we will work with conditional probability distributions Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
19 Topic Models Probabilistic topic models Each document is a probability distribution over topics Distribution over topics represents the essence, the body, or the gist of a given document Each topic is a probability distribution over words Topic Education : School, Students, Education, University,... Topic Budget : Million, Finance, Tax, Program,... Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
20 Topic Models Document generation process 1 For each document d choose a mixture of topics z 2 For every word slot draw a topic from the mixture with probability p(z d) 3 Then draw a word from the topic with probability p(w z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
21 Topic Models Document generation process Figure: Figure from slides by Thomas Huffman Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
22 Probabilistic Latent Semantic Analysis Document generation process z w N M (b) mixture of unigrams d z w N M (c) plsi/aspect model Figure: Figure from LDA by Blei et al. Figure 3: Graphical model representation of different models of discrete data. ture of unigrams gment the unigram model with a discrete random topic variable z (Figure 3b), we o Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
23 Probabilistic Latent Semantic Analysis Distributions We are interested in the joint probability of the observable variables: p(d, w) However, we have a joint probability of the observable and latent variables p(d, w, z) Thus, we have to marginalize over z to obtain p(d, w) p(d, w) = z p(d, w, z) = z p(d, w z)p(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
24 Probabilistic Latent Semantic Analysis Recap: Conditional independence Definition Suppose P(C) > 0. Event A and B are conditionally independent given C if: P(A B C) = P(A C)P(B C) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
25 Probabilistic Latent Semantic Analysis Distributions We made the same assumption in Naive Bayes classification Documents and words are conditionally independent given the topic: p(d, w z) = p(d z)p(w z) p(d, w) = z p(d z)p(w z)p(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
26 Probabilistic Latent Semantic Analysis Distributions p(d, w) = z p(d z)p(w z)p(z) This is symmetric formulation of plsa We select a topic z and then with the probability p(d z) a document d and then with the probability p(w z) words for that document We repeat the process for all documents Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
27 Probabilistic Latent Semantic Analysis Distributions We can reformulate the last equation Let us see what is p(d, z) again using the assumption that d and w are independent p(d, z) = p(z)p(d z) = p(d)p(z d) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
28 Probabilistic Latent Semantic Analysis Distributions We can now substitute in the symmetric equation p(d, w) = z = z p(d z)p(w z)p(z) p(z d)p(w z)p(d) = p(d) z p(z d)p(w z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
29 Probabilistic Latent Semantic Analysis Distributions This is asymmetric formulation Thus, we first pick a document with p(d) and then select all words for that document from p(w d) given by p(d, w) = p(w d)p(d) = p(w d) = z p(w z)p(z d) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
30 Probabilistic Latent Semantic Analysis plsa Decomposition p(w i d j ) = K p(w i z k )p(z k d j ) k=1 Figure: Figure from slides by Josef Sivic Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
31 11.3. SINGULAR-VALUE DECOMPOSITION 409 Probabilistic Latent Semantic Analysis plsa comparison with SVD 2. V is an n r column-orthonormal matrix. Note that we always use V in its transposed form, so it is the rows of V T that are orthonormal. p(d, w) = 3. Σ is a diagonal matrix; that is, all p(w elements z)p(z)p(d z) not on the main diagonal are 0. The elements of Σ are called z the singular values of M. n r r n Σ V T r m M = U Figure 11.5: The form of a singular-value decomposition Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
32 Probabilistic Latent Semantic Analysis plsa comparison with SVD Word probabilities given topics p(w z): matrix U Document probabilities given topics p(d z): matrix V Topic probabilities p(z): matrix Σ Difference: values in all matrices are normalized and non-negative They are probabilities Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
33 Probabilistic Latent Semantic Analysis Parameter inference We will infer parameters using Maximum Likelihood Estimator (MLE) First, we need to write down the likelihood function Let n(w i, d j ) be the number of occurrences of word w i in document d j p(w i, d j ) is the probability of observing a single occurrence word w i in document d j Then, the probability of observing n(w i, d j ) occurrences of word w i in document d j is given by: p(w i, d j ) n(w i,d j ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
34 Parameter inference Probabilistic Latent Semantic Analysis The probability of observing the complete document collection is then given by the product of probabilities of observing every single word in every document with corresponding number of occurrences That is then the likelihood L = m i=1 j=1 n p(w i, d j ) n(w i,d j ) L = = m n n(w i, d j )log(p(w i, d j )) i=1 j=1 m n K n(w i, d j )log( p(w i z l )p(z l )p(d j z l )) i=1 j=1 l=1 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
35 Probabilistic Latent Semantic Analysis EM algorithm We can not maximize the likelihood analytically because of the logarithm of the sum A standard procedure is to use an algorithm called Expectation-Maximization (EM) This is an iterative method to estimate parameters of the models with latent variables Each iteration consists of two steps: expectation step (E) and maximization step (M) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
36 Probabilistic Latent Semantic Analysis EM algorithm In the E step we create a function of the expectation of the log-likelihood using the current parameter estimates In the M step we compute parameters which maximize the expectation of the log-likelihood These parameter estimates are used to determine the distribution of the latent variables in the next E step Let us illustrate the EM algorithm in a general case Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
37 Probabilistic Latent Semantic Analysis EM algorithm We observe some data D generated by a probabilistic model with parameters θ and some latent variables z We are interested in the likelihood of that data D given the parameters θ: p(d θ) However, there exist a joint probability distribution of data D and latent variables z: p(d, z θ) Thus, to obtain p(d θ) we have to marginalize out z: p(d θ) = z p(d z, θ)p(z θ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
38 EM algorithm Probabilistic Latent Semantic Analysis We are now interested in maximizing this likelihood, which is equivalent to maximizing log-likelihood log(p(d θ)) = log( z p(d z, θ)p(z θ)) Jensen s inequality for concave functions such as log gives us: E[f (x)] f (E[x]) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
39 EM algorithm Probabilistic Latent Semantic Analysis log( z p(d z, θ)p(z θ)) = log( p(d z, θ)p(z θ) q(z) q(z) ) z z p(d z, θ)p(z θ) = log( q(z)) q(z) p(d z, θ)p(z θ) = log(e[ ]) q(z) This is by the Jensen s inequality greater or equal to: log(e[ p(d z, θ)p(z θ) p(d z, θ)p(z θ) ]) E[log( )] q(z) q(z) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
40 Probabilistic Latent Semantic Analysis EM algorithm E[log( p(d z,θ)p(z θ) q(z) )] is then lower bound on the likelihood Thus, we can maximize this lower bound EM algorithm maximizes exactly this lower bound Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
41 EM algorithm Probabilistic Latent Semantic Analysis E[log( p(d z, θ)p(z θ) )] = q(z) z q(z)log( p(d z, θ)p(z θ) ) q(z) = p(z D, θ)p(d θ) q(z)log( ) q(z) z = q(z)log(p(d θ)) + p(z D, θ) q(z)log( ) q(z) z z = log(p(d θ)) q(z) q(z)log( p(z D, θ) ) z q(z) This will have maximum when z q(z)log( p(z D,θ) ) = 0 This is the case when q(z) = p(z D, θ) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
42 EM algorithm Probabilistic Latent Semantic Analysis p(z D, θ) is the posterior of z z q(z)log( q(z) p(z D,θ) ) is Kullback-Leibler (KL) divergence Thus, in E step we use the current values of parameters to calculate the posterior of z M step is then problem dependent Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
43 Probabilistic Latent Semantic Analysis EM algorithm for plsa p(z w, d) = = = p(z, w, d) p(w, d) p(d)p(z d)p(w z) z p(d)p(z d)p(w z) p(z d)p(w z) z p(z d)p(w z) p(w z) d p(z d) w n(d, w)p(z d, w) n(d, w)p(z d, w) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
44 Probabilistic Latent Semantic Analysis Example IPython Notebook examples Slightly modified code from: http: //kti.tugraz.at/staff/denis/courses/kddm1/plsa.ipynb Command Line ipython notebook pylab=inline plsa.ipynb Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
45 Example Probabilistic Latent Semantic Analysis User Movie Matrix Alien Star Wars Casablanca Titanic Joe Jim John Jack Jill Jenny Jane Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
46 Example Probabilistic Latent Semantic Analysis User Movie Matrix Alien Star Wars Casablanca Titanic Joe Jim John Jack Jill Jenny Jane Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
47 Example Probabilistic Latent Semantic Analysis \segment 1" \segment 2" \matrix 1" \matrix 2" \line 1" \line 2" \power 1" power 2" imag speaker robust manufactur constraint alpha POWER load SEGMENT speech MATRIX cell LINE redshift spectrum memori texture recogni eigenvalu part match LINE omega vlsi color signal uncertainti MATRIX locat galaxi mpc POWER tissue train plane cellular imag quasar hsup systolic brain hmm linear famili geometr absorp larg input slice source condition design impos high redshift complex cluster speakerind. perturb machinepart segment ssup galaxi arrai mri SEGMENT root format fundament densiti standard present volume sound suci group recogn veloc model implement Figure 3: Eight selected factors from a 128 factor decomposition. The displayed word stems are the 10 most probable words in the class-conditional distribution P (wjz), from top to bottom in descending order. Figure: From Hofmann, 2000 Document 1,P fzkjd1;wj =`segment`g =(0:951; 0:0001;:::) P fwj =`segment`jd1 g =0:06 SEGMENT medic imag challeng problem eld imag analysi diagnost base proper SEGMENT digit imag SEGMENT medic imag need applic involv estim boundari object classif tissu abnorm shape analysi contour detec textur SEGMENT despit exist techniqu SEGMENT specif medic imag remain crucial problem [...] Document 2,P fzkjd2;wj =`segment`g =(0:025; 0:867;:::) Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
48 Performance Probabilistic Latent Semantic Analysis The performance of a retrieval system based on this model (PLSI) In IRwas typically found superior to that toof both: the VSM vector and space LSA based similarity (cos) and a non-probabilistic latent semantic indexing (LSI) method. (We skip details here.) From Th. Hofmann, 2000 Figure: From Hofmann, 2000 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, / 47
Knowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Sample Examination Questions Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 22 Exercise Suppose we have a utility
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationInformation Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)
Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationData Mining and Matrices
Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationLatent semantic indexing
Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationtopic modeling hanna m. wallach
university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:
More informationText Mining for Economics and Finance Unsupervised Learning
Text Mining for Economics and Finance Unsupervised Learning Stephen Hansen Text Mining Lecture 3 1 / 46 Introduction There are two main divisions in machine learning: 1. Supervised learning seeks to build
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationWeb Search and Text Mining. Lecture 16: Topics and Communities
Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationLinear Algebra Background
CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationMatrix decompositions and latent semantic indexing
18 Matrix decompositions and latent semantic indexing On page 113, we introduced the notion of a term-document matrix: an M N matrix C, each of whose rows represents a term and each of whose columns represents
More informationLatent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationNon-Negative Matrix Factorization
Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationModeling User Rating Profiles For Collaborative Filtering
Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper
More informationcross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.
10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationMatrices, Vector Spaces, and Information Retrieval
Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More information