Lecture 19, November 19, 2012
|
|
- Mitchell Carroll
- 5 years ago
- Views:
Transcription
1 Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic ACL Eric CMU, 006-0
2 We are inundated with data (from imagesgooglecn) Humans cannot afford to deal with (eg, search, browse, or measure similarity) a huge number of text and media documents We need computers to help out Eric CMU, 006-0
3 A task: Say, we want to have a mapping, so that Compare similarity Classify contents Cluster/group/categorize docs Distill semantics and perspectives Eric CMU,
4 Representation: Data: Bag of Words Representation As for the Arabian and Palestinean voices that are against the current negotiations and the so-called peace process, they are not against peace per se, but rather for their well-founded predictions that Israel would NOT give an inch of the West bank (and most probably the same for Golan Heights) back to the Arabs An 8 months of "negotiations" in Madrid, and Washington proved these predictions Now many will jump on me saying why are you blaming israelis for no-result negotiations I would say why would the Arabs stall the negotiations, what do they have to loose? Arabian negotiations against peace Israel Arabs blaming Each document is a vector in the word space Ignore the order of words in a document Only count matters! A high-dimensional and sparse representation Not efficient text processing tasks, eg, search, document classification, or similarity measure Not effective for browsing Eric CMU,
5 Subspace analysis Document Term = * * X (m x n) T Λ D T (m x k) (k x k) (k x n) cluster/topic/bas A priori weights is Distributions s (subspace) Clustering: (0,) matrix LSI/NMF: arbitrary matrices Topic Models: stochastic matrix Sparse coding: arbitrary sparse matrices Memberships (coordinates) Eric CMU,
6 An example: Eric CMU,
7 Principal Component Analysis The new variables/dimensions Are linear combinations of the original ones Are uncorrelated with one another Orthogonal in original dimension space Capture as much of the original variance in the data as possible l Variable B Origina PC PC Are called Principal Components Orthogonal directions of greatest variance in data Projections along PC discriminate the data most along any one axis Original Variable A First principal component is the direction of greatest variability (covariance) in the data Second is the next orthogonal (uncorrelated) direction of greatest variability So first remove all the variability along the first component, and then find the next direction of greatest variability And so on Eric CMU,
8 Computing the Components Projection of vector x onto an axis (dimension) u is u T x Direction of greatest variability is that in which the average square of the projection is greatest: Maximize u T XX T u st u T u = Construct Langrangian u T XX T u λu T u Vector of partial derivatives set to zero xx T u λu = (xx T λi) u = 0 As u 0 then u must be an eigenvector of XX T with eigenvalue λ λ is the principal eigenvalue of the correlation matrix C= XX T The eigenvalue denotes the amount of variability captured along that dimension Eric CMU,
9 Computing the Components Similarly for the next axis, etc So, the new axes are the eigenvectors of the matrix of correlations of the original variables, which captures the similarities of the original variables based on how data samples project to them Geometrically: centering followed by rotation Linear transformation Eric CMU,
10 Eigenvalues & Eigenvectors For symmetric matrices, eigenvectors for distinct eigenvalues are orthogonal Sv {, } = λ {, } v{, and λ λ v v = }, All eigenvalues of a real symmetric matrix are real 0 if S λi = 0 and S = T S λ R All eigenvalues of a positive semidefinite matrix are nonnegative w R n T, w Sw 0, then if Sv = λ v λ 0 Eric CMU,
11 Eigen/diagonal Decomposition Let be a square matrix with m linearly independent eigenvectors (a non-defective matrix) Theorem: Exists an eigen decomposition diagonal Unique for distinc t eigenvalues (cf matrix diagonalization theorem) Columns of U are eigenvectors of S Diagonal elements of are eigenvalues of Eric CMU, 006-0
12 PCs, Variance and Least-Squares The first PC retains the greatest amount of variation in the sample The k th PC retains the kth greatest t fraction of the variation in the sample The k th largest eigenvalue of the correlation matrix C is the variance in the sample along the k th PC The least-squares view: PCs are a series of linear least squares fits to a sample, each orthogonal ogo to all previous ones Eric CMU, 006-0
13 The Corpora Matrix Doc Doc Doc 3 Word X = Word 0 8 Word Word Word Eric CMU,
14 Singular Value Decomposition For an m n matrix A of rank r there exists a factorization (Singular Value Decomposition = SVD) as follows: A = U Σ V m m m n V is n n T The columns of U are orthogonal eigenvectors of AA T The columns of V are orthogonal eigenvectors of A T A Eigenvalues λ λ r of AA T are the eigenvalues of A T A Σ = σ = λ i λ i ( σ ) diag σ r Eric CMU, Singular values 4
15 SVD and PCA The first root is called the prinicipal eigenvalue which has an associated orthonormal (u T u = ) eigenvector u Subsequent roots are ordered such that λ > λ > > λ M with rank(d) non-zero values Eigenvectors form an orthonormal basis ie u it u j = δ ij The eigenvalue decomposition of XX T = UΣU T where U = [u, u,, u M ] and Σ = diag[λ, λ,, λ M ] Similarly the eigenvalue decomposition of X T X = VΣV T The SVD is closely related to the above X=U Σ / V T The left eigenvectors U, right eigenvectors V, singular values = square root of eigenvalues Eric CMU,
16 HowManyPCs? For n original dimensions, sample covariance matrix is nxn, and has up to n eigenvectors So n PCs Where does dimensionality reduction come from? Can ignore the components of lesser significance 5 0 (%) Variance PC PC PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC0 You do lose some information, but if the eigenvalues are small, you don t lose much n dimensions in original data calculate n eigenvectors and eigenvalues choose only the first p eigenvectors, based on their eigenvalues final data set has only p dimensions Eric CMU,
17 K is the number of singular values used Eric CMU, Within 40 threshold 7
18 Summary: Latent Semantic Indexing(Deerwester et al, 990) Document Term = * * X (m x n) T Λ D T (m x k) (k x k) (k x n) r w= K k= r d k λ T LSI does not define a properly p normalized probability distribution of observed and latent entities Does not support probabilistic reasoning under uncertainty and data fusion k k Eric CMU,
19 Connecting Probability Models to Data (Generative Model) P(Data Parameters) Probabilistic Model Real World Data P(Parameters Data) (Inference) Eric CMU,
20 Latent Semantic Structure in GM Latent Structure l Distribution over words P( w) = P( w,l) l Inferring latent structure P(w l) P( l) Words w P( l w) = P(w ) Eric CMU,
21 How to Model Semantics? Q: What is it about? A: Mainly MT, with syntax, some learning AdMixing Proportion MT Syntax Learning Source Target SMT Alignment Score BLEU Parse Tree Noun Phrase Grammar CFG likelihood EM Hidden Parameters Estimation argmax Top pics A Hierarchical Phrase-Based Model for Statistical ti ti lmachine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases phrases that contain sub-phrases The model is formally a synchronous context-free grammar but is learned from a bitext t without t any syntactic ti information Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 75% over Pharaoh, a state-of-the-art phrase-based system Unigram over vocabulary Topic Models Eric CMU, 006-0
22 WhythisisUseful? Q: What is it about? A: Mainly MT, with syntax, some learning AdMixing for Statistical Machine Translation Proportion MT Syntax Learning Q: give me similar document? Structured way of browsing the collection A Hierarchical Phrase-Based Model f St ti ti lm hi T l ti We present a statistical phrase-based Translation model that uses hierarchical phrases phrases that contain sub-phrases The model is formally a synchronous context-free grammar but is learned from a bitext t without t any syntactic ti information Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Other tasks Improvement of 75% over Pharaoh, a state-of-the-art phrase-based system Other tasks Dimensionality reduction TF-IDF vs topic mixing proportion Classification, clustering, and more Eric CMU, 006-0
23 Words in Contexts It was a nice shot Eric CMU,
24 Words in Contexts (con'd) the opposition Labor Party fared even worse, with a predicted 35 seats, seven less than last election Eric CMU,
25 A possible generative process of a document TOPIC 3 8 DOCUMENT : money bank bank loan river stream bank money river bank money bank loan money stream bank money bank bank loan river stream bank money river bank money bank loan bank money stream DOCUMENT : river stream bank stream bank money loan river stream loan bank river bank bank stream river loan bank stream bank money loan river stream bank stream bank money river 7 stream loan bank river bank money bank stream river bank stream bank money TOPIC Mixture Components (distributions over elements) admixing weight vector θ (represents all components contributions) vector θ Bayesian approach: use priors Admixture weights ~ Dirichlet( α ) Mixture components ~ Dirichlet( Γ ) Eric CMU,
26 Probabilistic LSI Hoffman (999) β k θ d z n w n N M Eric CMU,
27 Probabilistic LSI A "generative" model Models each word in a document as a sample from a mixture model Each word is generated from a single topic, different words in the document may be generated from different topics A topic is characterized by a distribution over words Each document is represented as a list of admixing proportions for the components (ie topic vector θ ) β k θ d z n w n N M Eric CMU,
28 Latent Dirichlet Allocation Blei, Ng and Jordan (003) Essentially a Bayesian plsi: η β k K α θ z n w n N M Eric CMU,
29 LDA Generative model Models each word in a document as a sample from a mixture model Each word is generated from a single topic, different words in the document may be generated from different topics A topic is characterized by a distribution over words Each document is represented as a list of admixing proportions for the components (ie topic vector) The topic vectors and the word rates each follows a Dirichlet prior --- essentially a Bayesian plsi η β k K α θ z n w n Eric CMU, N M 9
30 Topic Models = Mixed Membership Models = Admixture Generating a document Draw θ from the prior For each word n - Draw z n - Draw w n from multinomia l z n, ( θ ) { β } from multinomia l( β ) : k zn Prior θ z β K w N d Which prior to use? N Eric CMU,
31 Choices of Priors Dirichlet (LDA) (Blei et al 003) Conjugate prior means efficient inference Can only capture variations in each topic s intensity independently Logistic Normal (CTM=LoNTAM) (Blei & Lafferty 005, Ahmed & Xing 006) Capture the intuition that some topics are highly correlated and can rise up in intensity together Not a conjugate prior implies hard inference Nested CRP (Blei et al 005) Defines hierarchy on topics Eric CMU,
32 Generative Semantic of LoNTAM Generating a document µ Draw θ from the For each word γ ~ N - Draw z n n from prior - Draw wn zn, : θ ~ LN K multinomia l ( θ ) { β } from multinomia l( β ) k zn ( µ, Σ) ( µ, Σ) γ 0 K K = K = exp log + θi γ i i= C ( γ ) K = log + i= e γ i e γ i Eric CMU, β K - Log Partition Function - Normalization Constant Σ γ z w N d 3 N
33 Outcomes from a topic model The topics β in a corpus: There is no name for each topic, you need to name it! There is no objective measure of good/bad The shown topics are the good ones, there are many many trivial ones, meaningless ones, redundant ones, you need to manually prune the results How many topics? Eric CMU,
34 Outcomes from a topic model The topic vector θ of each doc Create an embedding of docs in a topic space Their no ground truth of θ to measure quality of inference But on θ it is possible to define an objective measure of goodness, such as classification error, retrieval of similar docs, clustering, etc, of documents But there is no consensus on whether these tasks bear the true value of topic models Eric CMU,
35 Outcomes from a topic model The per-word topic indicator z: Not very useful under the bag of word representation, because of loss of ordering But it is possible to define simple probabilistic linguistic constraints (eg, bi-grams) over z and get potentially interesting results [Griffiths, Steyvers, Blei, & Tenenbaum, 004] Eric CMU,
36 Outcomes from a topic model The topic graph S (when using CTM): Kind of interesting for understanding/visualizing large corpora Eric CMU, [David Blei, MLSS09] 36
37 Outcomes from a topic model Topic change trends [David Blei, MLSS09] Eric CMU,
38 The Big Picture Unstructured Collection Structured Topic Network Topic Discovery w n w T x x x x x w Dimensionality T x k x x T Word Simplex Reduction Eric CMU, Topic Space (eg, a Simplex) 38
39 Computation on LDA Inference Given a Document D Posterior: P(Θ µ,σ, β,d) Evaluation: P(D µ,σ, Σ β ) β i θ n Learning Given a collection of documents {D i } Parameter estimation arg max ( µ, Σ, β ) log ( P( µ, Σ, β )) D i Eric CMU,
40 Exact Bayesian inference on LDA is intractable A possible query: is intractable p q y? ) (? ) (, = = D z p D p m n n πθ n Close form solution? ) ( ), ( ) ( D p D p D p n n = π π, θ n θ n ) ( ) ( ) ( ) ( ) ( } {,,, D p d d G p p z p x p m n n z i n n m n m n z m n = φ π φ α π π φ θ n θ -n θ n β β = } {,,, ) ( ) ( ) ( ) ( ) ( m n n z N n n m n m n z m n d d d G p p z p x p D p φ π π φ α π π φ L L θ n θ n θ θ β β β Sum in the denominator over T n terms, and integrate over n k-dimensional topic vectors 40 Eric CMU, 006-0
41 Approximate Inference Variational Inference Mean field approximation (Blei et al) Expectation ti propagation (Minka et al) Variational nd -order Taylor approximation (Ahmed and Xing) Markov Chain Monte Carlo Gibbs sampling (Griffiths et al) Eric CMU,
42 Collapsed Gibbs sampling (Tom Griffiths & Mark Steyvers) Collapsed Gibbs sampling Integrate out θ For variables z = z, z,, z n Draw z (t+) i from P(z i z -i, w) β i z =z (t+) (t+) (t+) (t) (t) -i, z,, z i-, z i+,, z n θ n Eric CMU,
43 Gibbs sampling Need full conditional distributions for variables Since we only sample z we need θ n β i G G number of times word w assigned to topic j number of times topic j used in document d Eric CMU,
44 Gibbs sampling i w i d i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 iteration Eric CMU,
45 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? Eric CMU,
46 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
47 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
48 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
49 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
50 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
51 Gibbs sampling iteration i w i d i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5? G G Eric CMU,
52 Gibbs sampling iteration 000 i w i d i z i z i z i MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 G G Eric CMU,
53 Learning a TM Maximum likelihood estimation: Need statistics on topic-specific word assignment (due to z), topic vector distribution (due to θ), etc Eg,, this is the formula for topic k: These are hidden variables, therefore need an EM algorithm (also known as data augmentation, or DA, in Monte Carlo paradigm) This is a reduce step in parallel implementation Eric CMU,
54 Conclusion GM-based topic models are cool Flexible Modular Interactive There are many ways of implementing topic models unsupervised supervised Efficient Inference/learning algorithms GMF, with Laplace approx for non-conjugate dist MCMC Many applications Word-sense disambiguation Image understanding Network inference Eric CMU,
Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLinear Algebra Background
CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection
More informationApplying hlda to Practical Topic Modeling
Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationMachine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,
Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More informationLatent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationPachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
: DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation
More informationLecture 22 Exploratory Text Analysis & Topic Models
Lecture 22 Exploratory Text Analysis & Topic Models Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor [Some slides borrowed from Michael Paul] 1 Text Corpus
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationtopic modeling hanna m. wallach
university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent
More informationMeasuring Topic Quality in Latent Dirichlet Allocation
Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationProbabilistic Topic Models Tutorial: COMAD 2011
Probabilistic Topic Models Tutorial: COMAD 2011 Indrajit Bhattacharya Assistant Professor Dept of Computer Sc. & Automation Indian Institute Of Science, Bangalore My Background Interests Topic Models Probabilistic
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationTopic Modeling Using Latent Dirichlet Allocation (LDA)
Topic Modeling Using Latent Dirichlet Allocation (LDA) Porter Jenkins and Mimi Brinberg Penn State University prj3@psu.edu mjb6504@psu.edu October 23, 2017 Porter Jenkins and Mimi Brinberg (PSU) LDA October
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationCrouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation Taesun Moon Katrin Erk and Jason Baldridge Department of Linguistics University of Texas at Austin 1
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLatent Dirichlet Allocation Based Multi-Document Summarization
Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationLatent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization
Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationcross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.
10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More information