Dynamic Topic Models. Abstract. 1. Introduction

Size: px
Start display at page:

Download "Dynamic Topic Models. Abstract. 1. Introduction"

Transcription

1 David M. Blei Computer Science Department, Princeton University, Princeton, NJ 08544, USA John D. Lafferty School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213, USA Abstract A family of probabilistic time series models is developed to analye the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric avelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative indo into the contents of a large document collection. The models are demonstrated by analying the OCR ed archives of the journal Science from 1880 through Introduction Managing the explosion of electronic document archives requires ne tools for automatically organiing, searching, indexing, and brosing large collections. Recent research in machine learning and statistics has developed ne techniques for finding patterns of ords in document collections using hierarchical probabilistic models Blei et al., 2003; McCallum et al., 2004; Rosen-Zvi et al., 2004; Griffiths and Steyvers, 2004; Buntine and Jakulin, 2004; Blei and Lafferty, These models are called topic models because the discovered patterns often reflect the underlying topics hich combined to form the documents. Such hierarchical probabilistic models are easily generalied to other kinds of data; for example, topic models have been used to analye images Fei-Fei and Perona, 2005; Sivic et al., 2005, biological data Pritchard et al., 2000, and survey data Erosheva, In an exchangeable topic model, the ords of each docu- Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, Copyright 2006 by the authors/oners. ment are assumed to be independently dran from a mixture of multinomials. The mixing proportions are randomly dran for each document; the mixture components, or topics, are shared by all documents. Thus, each document reflects the components ith different proportions. These models are a poerful method of dimensionality reduction for large collections of unstructured documents. Moreover, posterior inference at the document level is useful for information retrieval, classification, and topic-directed brosing. Treating ords exchangeably is a simplification that it is consistent ith the goal of identifying the semantic themes ithin each document. For many collections of interest, hoever, the implicit assumption of exchangeable documents is inappropriate. Document collections such as scholarly journals, , nes articles, and search query logs all reflect evolving content. For example, the Science article The Brain of Professor Laborde may be on the same scientific path as the article Reshaping the Cortical Motor Map by Unmasking Latent Intracortical Connections, but the study of neuroscience looked much different in 1903 than it did in The themes in a document collection evolve over time, and it is of interest to explicitly model the dynamics of the underlying topics. In this paper, e develop a dynamic topic model hich captures the evolution of topics in a sequentially organied corpus of documents. We demonstrate its applicability by analying over 100 years of OCR ed articles from the journal Science, hich as founded in 1880 by Thomas Edison and has been published through the present. Under this model, articles are grouped by year, and each year s articles arise from a set of topics that have evolved from the last year s topics. In the subsequent sections, e extend classical state space models to specify a statistical model of topic evolution. We then develop efficient approximate posterior inference techniques for determining the evolving topics from a sequential collection of documents. Finally, e present qualitative results that demonstrate ho dynamic topic models allo the exploration of a large document collection in ne 113

2 ays, and quantitative results that demonstrate greater predictive accuracy hen compared ith static topic models. 2. Dynamic Topic Models While traditional time series modeling has focused on continuous data, topic models are designed for categorical data. Our approach is to use state space models on the natural parameter space of the underlying topic multinomials, as ell as on the natural parameters for the logistic normal distributions used for modeling the document-specific topic proportions. First, e revie the underlying statistical assumptions of a static topic model, such as latent Dirichlet allocation LDA Blei et al., Let 1:K be K topics, each of hich is a distribution over a fixed vocabulary. In a static topic model, each document is assumed dran from the folloing generative process: 1. Choose topic proportions from a distribution over the K 1-simplex, such as a Dirichlet. 2. For each ord: a Choose a topic assignment Z Mult. b Choose a ord W Mult. This process implicitly assumes that the documents are dran exchangeably from the same set of topics. For many collections, hoever, the order of the documents reflects an evolving set of topics. In a dynamic topic model, e suppose that the data is divided by time slice, for example by year. We model the documents of each slice ith a K- component topic model, here the topics associated ith slice t evolve from the topics associated ith slice t 1. For a K-component model ith V terms, let t,k denote the V -vector of natural parameters for topic k in slice t. The usual representation of a multinomial distribution is by its mean parameteriation. If e denote the mean parameter of a V -dimensional multinomial by π, the ith component of the natural parameter is given by the mapping i = logπ i /π V. In typical language modeling applications, Dirichlet distributions are used to model uncertainty about the distributions over ords. Hoever, the Dirichlet is not amenable to sequential modeling. Instead, e chain the natural parameters of each topic t,k in a state space model that evolves ith Gaussian noise; the simplest version of such a model is t,k t 1,k N t 1,k,σ 2 I. 1 Our approach is thus to model sequences of compositional random variables by chaining Gaussian distributions in a dynamic model and mapping the emitted values to the simplex. This is an extension of the logistic normal distribu- N N N A A A Figure 1. Graphical representation of a dynamic topic model for three time slices. Each topic s natural parameters t,k evolve over time, together ith the mean parameters t of the logistic normal distribution for the topic proportions. tion Aitchison, 1982 to time-series simplex data West and Harrison, In LDA, the document-specific topic proportions are dran from a Dirichlet distribution. In the dynamic topic model, e use a logistic normal ith mean to express uncertainty over proportions. The sequential structure beteen models is again captured ith a simple dynamic model t t 1 N t 1,δ 2 I. 2 For simplicity, e do not model the dynamics of topic correlation, as as done for static models by Blei and Lafferty By chaining together topics and topic proportion distributions, e have sequentially tied a collection of topic models. The generative process for slice t of a sequential corpus is thus as follos: 1. Dra topics t t 1 N t 1,σ 2 I. 2. Dra t t 1 N t 1,δ 2 I. 3. For each document: a Dra η N t,a 2 I b For each ord: i. Dra Z Multπη. ii. Dra W t,d,n Multπ t,. Note that π maps the multinomial natural parameters to the mean parameters, π k,t = exp k,t, P exp k,t,. The graphical model for this generative process is shon in Figure 1. When the horiontal arros are removed, breaking the time dynamics, the graphical model reduces to a set of independent topic models. With time dynamics, the kth K 114

3 topic at slice t has smoothly evolved from the kth topic at slice t 1. For clarity of presentation, e no focus on a model ith K dynamic topics evolving as in 1, and here the topic proportion model is fixed at a Dirichlet. The technical issues associated ith modeling the topic proportions in a time series as in 2 are essentially the same as those for chaining the topics together. 3. Approximate Inference Working ith time series over the natural parameters enables the use of Gaussian models for the time dynamics; hoever, due to the nonconjugacy of the Gaussian and multinomial models, posterior inference is intractable. In this section, e present a variational method for approximate posterior inference. We use variational methods as deterministic alternatives to stochastic simulation, in order to handle the large data sets typical of text analysis. While Gibbs sampling has been effectively used for static topic models Griffiths and Steyvers, 2004, nonconjugacy makes sampling methods more difficult for this dynamic model. The idea behind variational methods is to optimie the free parameters of a distribution over the latent variables so that the distribution is close in Kullback-Liebler KL divergence to the true posterior; this distribution can then be used as a substitute for the true posterior. In the dynamic topic model, the latent variables are the topics t,k, mixture proportions t,d, and topic indicators t,d,n. The variational distribution reflects the group structure of the latent variables. There are variational parameters for each topic s sequence of multinomial parameters, and variational parameters for each of the document-level latent variables. The approximate variational posterior is K q k,1,..., k,t ˆ k,1,..., ˆ k,t 3 k=1 T Dt d=1 q t,d γ t,d N t,d n=1 q t,d,n φ t,d,n In the commonly used mean-field approximation, each latent variable is considered independently of the others. In the variational distribution of { k,1,..., k,t }, hoever, e retain the sequential structure of the topic by positing a dynamic model ith Gaussian variational observations {ˆ k,1,..., ˆ k,t }. These parameters are fit to minimie the KL divergence beteen the resulting posterior, hich is Gaussian, and the true posterior, hich is not Gaussian. A similar technique for Gaussian processes is described in Snelson and Ghahramani, The variational distribution of the document-level latent. N N N A A A Figure 2. A graphical representation of the variational approximation for the time series topic model of Figure 1. The variational parameters ˆ and ˆ are thought of as the outputs of a Kalman filter, or as observed data in a nonparametric regression setting. variables follos the same form as in Blei et al Each proportion vector t,d is endoed ith a free Dirichlet parameter γ t,d, each topic indicator t,d,n is endoed ith a free multinomial parameter φ t,d,n, and optimiation proceeds by coordinate ascent. The updates for the documentlevel variational parameters have a closed form; e use the conjugate gradient method to optimie the topic-level variational observations. The resulting variational approximation for the natural topic parameters { k,1,..., k,t } incorporates the time dynamics; e describe one approximation based on a Kalman filter, and a second based on avelet regression Variational Kalman Filtering The vie of the variational parameters as outputs is based on the symmetry properties of the Gaussian density, f µ,σ x = f x,σ µ, hich enables the use of the standard forard-backard calculations for linear state space models. The graphical model and its variational approximation are shon in Figure 2. Here the triangles denote variational parameters; they can be thought of as hypothetical outputs of the Kalman filter, to facilitate calculation. To explain the main idea behind this technique in a simpler setting, consider the model here unigram models t in the natural parameteriation evolve over time. In this model there are no topics and thus no mixing parameters. The calculations are simpler versions of those e need for the more general latent variable models, but exhibit the es- K 115

4 sential features. Our state space model is t t 1 N t 1,σ 2 I t,n t Multπ t and e form the variational state space model here ˆ t t N t, ˆν 2 t I The variational parameters are ˆ t and ˆν t. Using standard Kalman filter calculations Kalman, 1960, the forard mean and variance of the variational posterior are given by m t E t ˆ 1:t = ˆν 2 t V t 1 + σ 2 + ˆν 2 t m t V t E t m t 2 ˆ 1:t ˆν 2 t = V t 1 + σ 2 + ˆν t 2 V t 1 + σ 2 ˆν 2 t V t 1 + σ 2 + ˆν 2 t ˆ t ith initial conditions specified by fixed m 0 and V 0. The backard recursion then calculates the marginal mean and variance of t given ˆ 1:T as m t 1 E t 1 ˆ 1:T = σ 2 V t 1 + σ 2 m t σ 2 V t 1 + σ 2 m t Ṽ t 1 E t 1 m t 1 2 ˆ 1:T 2 Vt 1 = V t 1 + Ṽt V t 1 + σ 2 V t 1 + σ 2 ith initial conditions m T = m T and ṼT = V T. We approximate the posterior p 1:T 1:T using the state space posterior q 1:T ˆ 1:T. From Jensen s inequality, the loglikelihood is bounded from belo as log pd 1:T 4 q 1:T ˆ p 1:T pd 1:T 1:T 1:T log q 1:T ˆ d 1:T 1:T = E q log p 1:T + E q log pd t t + Hq Details of optimiing this bound are given in an appendix Variational Wavelet Regression The variational Kalman filter can be replaced ith variational avelet regression; for a readable introduction standard avelet methods, see Wasserman We rescale time so it is beteen 0 and 1. For 128 years of Science e take n = 2 J and J = 7. To be consistent ith our earlier notation, e assume that ˆ t = m t + ˆνɛ t here ɛ t N0,1. Our variational avelet regression algorithm estimates {ˆ t }, hich e vie as observed data, just as in the Kalman filter method, as ell as the noise level ˆν. For concreteness, e illustrate the technique using the Haar avelet basis; Daubechies avelets are used in our actual examples. The model is then J 1 ˆ t = φx t + 2 j 1 j=0 k=0 D jk ψ jk x t here x t = t/n, φx = 1 for 0 x 1, { 1 if 0 x 1 ψx = 2, 1 if 1 2 < x 1 and ψ jk x = 2 j/2 ψ2 j x k. Our variational estimate for the posterior mean becomes J 1 m t = ˆφx t + 2 j 1 j=0 k=0 ˆD jk ψ jk x t. here ˆ = n 1 n ˆ t, and ˆD jk are obtained by thresholding the coefficients Z jk = 1 n n ˆ t ψ jk x t. To estimate ˆ t e use gradient ascent, as for the Kalman filter approximation, requiring the derivatives m t / ˆ t. If soft thresholding is used, then e have that m t ˆ s = ˆ ˆ s φx t + J 1 2 j 1 j=0 k=0 ˆD jk ˆ s ψ jk x t. ith ˆ/ ˆ s = n 1 and { 1 ˆD jk / ˆ s = n ψ jkx s if Z jk > λ 0 otherise. Note also that Z jk > λ if and only if ˆD jk > 0. These derivatives can be computed using off-the-shelf softare for the avelet transform in any of the standard avelet bases. Sample results of running this and the Kalman variational algorithm to approximate a unigram model are given in Figure 3. Both variational approximations smooth out the 116

5 Darin Einstein moon e+00 2e 04 4e 04 6e 04 0e+00 2e 04 4e 04 6e 04 8e 04 1e e+00 2e 04 4e 04 6e 04 0e+00 2e 04 4e 04 6e 04 8e 04 1e 03 Figure 3. Comparison of the Kalman filter top and avelet regression bottom variational approximations to a unigram model. The variational approximations red and blue curves smooth out the local fluctuations in the unigram counts gray curves of the ords shon, hile preserving the sharp peaks that may indicate a significant change of content in the journal. The avelet regression is able to superresolve the double spikes in the occurrence of Einstein in the 1920s. The spike in the occurrence of Darin near 1910 may be associated ith the centennial of Darin s birth in local fluctuations in the unigram counts, hile preserving the sharp peaks that may indicate a significant change of content in the journal. While the fit is similar to that obtained using standard avelet regression to the normalied counts, the estimates are obtained by minimiing the KL divergence as in standard variational approximations. In the dynamic topic model of Section 2, the algorithms are essentially the same as those described above. Hoever, rather than fitting the observations from true observed counts, e fit them from expected counts under the document-level variational distributions in Analysis of Science We analyed a subset of 30,000 articles from Science, 250 from each of the 120 years beteen 1881 and Our data ere collected by JSTOR.jstor.org, a notfor-profit organiation that maintains an online scholarly archive obtained by running an optical character recognition OCR engine over the original printed journals. JS- TOR indexes the resulting text and provides online access to the scanned images of the original content through keyord search. Our corpus is made up of approximately 7.5 million ords. We pruned the vocabulary by stemming each term to its root, removing function terms, and removing terms that occurred feer than 25 times. The total vocabulary sie is 15,955. To explore the corpus and its themes, e estimated a 20-component dynamic topic model. Posterior inference took approximately 4 hours on a 1.5GHZ PoerPC Macintosh laptop. To of the resulting topics are illustrated in Figure 4, shoing the top several ords from those topics in each decade, according to the posterior mean number of occurrences as estimated using the Kalman filter variational approximation. Also shon are example articles hich exhibit those topics through the decades. As illustrated, the model captures different scientific themes, and can be used to inspect trends of ord usage ithin them. To validate the dynamic topic model quantitatively, e consider the task of predicting the next year of Science given all the articles from the previous years. We compare the predictive poer of three 20-topic models: the dynamic topic model estimated from all of the previous years, a static topic model estimated from all of the previous years, and a static topic model estimated from the single previous year. All the models are estimated to the same convergence criterion. The topic model estimated from all the previous data and dynamic topic model are initialied at the same point. The dynamic topic model performs ell; it alays assigns higher likelihood to the next year s articles than the other to models Figure 5. It is interesting that the predictive poer of each of the models declines over the years. We can tentatively attribute this to an increase in the rate of specialiation in scientific language. 117

6 9 : : 9 C < D E F E; ;? = F E =? > D C I D D? = =? J K GD 9 : L M C < D E F E; ;? = F E =? > D G =? J K GD 9 L M M? G? > D = E > J B J D? C C < D E G N < D F E; ;? = 9 L 9 M? G? > D = E > J B J D? C G O < F B 9 L P M? G? > D = G Q I G K? 9 L R M S I Q?? G? > D = E > Q I G K? 9 L T M N < D 9 L U M I O J < = N D 9 L V M < O J? = Q? 9 L X M C < F? G 9 L : M C < F? G 9 L L M J D = K > D K = C < F? G J D I D? P M M M J D I D? J B J D? C Y K D K C N H B J E > Z [ \ ] ^ _ ` a b c d _ ` d Z e f g g h i h jh k g i l m n o f m g o e!! " # $ % % & ' ' * ' +, /. 0 1 / 2 / 3 0 " 4 / 5 4 & ' " 5 6 ", ª «{ { ~ } ˆ ˆ ~ Š Œ Ž { } ˆ ˆ ~ Š Œ ~ Ž } } ˆ Œ ˆ ~ Š ƒ Ž Ž Œ } ˆ } ƒ Ž Ž ~ ƒ Ž ~ } } ~ Ž ~ Œ } } } } Š Ž ~ } Š } ~ } Ž Ž } } Š Ž ~ Œ } } Š ~ Ž Š Ž ~ } } Š Š ~ ƒ ~ Ž Ž Ž š } Ž Ž ~ Ž } } } Š ~ Ž } ~ { } Ž Ž Ž } } ˆ ~ } ~ } } } Ž Ž } } } ˆ ~ } } Ž ~ Ž Ž ~ p q r s t u v x r y r p m h i h m h o i l m k f 8 " '!!, % % & 5 5. & * ' 0 / $ * & " 2 4 ± 4 ± " 8 5 ' $ ' ' 0 0 œ ž œ Ÿ ž ž œ Ÿ ž œ Ÿ ž œ Ÿ ž œ Ÿ ž ž ž ž Figure 4. Examples from the posterior analysis of a 20-topic dynamic model estimated from the Science corpus. For to topics, e illustrate: a the top ten ords from the inferred posterior distribution at ten year lags b the posterior estimate of the frequency as a function of year of several ords from the same to topics c example articles throughout the collection hich exhibit these topics. Note that the plots are scaled to give an idea of the shape of the trajectory of the ords posterior probability i.e., comparisons across ords are not meaningful. 5. Discussion We have developed sequential topic models for discrete data by using Gaussian time series on the natural parameters of the multinomial topics and logistic normal topic proportion models. We derived variational inference algorithms that exploit existing techniques for sequential data; e demonstrated a novel use of Kalman filters and avelet regression as variational approximations. Dynamic topic models can give a more accurate predictive model, and also offer ne ays of brosing large, unstructured document collections. There are many ays that the ork described here can be extended. One direction is to use more sophisticated state space models. We have demonstrated the use of a simple Gaussian model, but it ould be natural to include a drift term in a more sophisticated autoregressive model to explicitly capture the rise and fall in popularity of a topic, or in the use of specific terms. Another variant ould allo for heteroscedastic time series. Perhaps the most promising extension to the methods presented here is to incorporate a model of ho ne topics in the collection appear or disappear over time, rather than assuming a fixed number of topics. One possibility is to use a simple Galton-Watson or birth-death process for the topic population. While the analysis of birth-death or branching processes often centers on extinction probabilities, here a goal ould be to find documents that may be responsible for spaning ne themes in a collection. 118

7 Negative log likelihood log scale 1e+06 2e+06 4e+06 7e+06 LDA prev LDA all DTM data. PhD thesis, Carnegie Mellon University, Department of Statistics. Fei-Fei, L. and Perona, P A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Vision and Pattern Recognition. Griffiths, T. and Steyvers, M Finding scientific topics. Proceedings of the National Academy of Science, 101: Kalman, R A ne approach to linear filtering and prediction problems. Transaction of the AMSE: Journal of Basic Engineering, 82: Year Figure 5. This figure illustrates the performance of using dynamic topic models and static topic models for prediction. For each year beteen 1900 and 2000 at 5 year increments, e estimated three models on the articles through that year. We then computed the variational bound on the negative log likelihood of next year s articles under the resulting model loer numbers are better. DTM is the dynamic topic model; LDA-prev is a static topic model estimated on just the previous year s articles; LDAall is a static topic model estimated on all the previous articles. Acknoledgments This research as supported in part by NSF grants IIS and IIS , the DARPA CALO project, and a grant from Google. References Aitchison, J The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B, 442: Blei, D., Ng, A., and Jordan, M Latent Dirichlet allocation. Journal of Machine Learning Research, 3: Blei, D. M. and Lafferty, J. D Correlated topic models. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA. Buntine, W. and Jakulin, A Applying discrete PCA in data analysis. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages AUAI Press. Erosheva, E Grade of membership and latent structure models ith application to disability survey McCallum, A., Corrada-Emmanuel, A., and Wang, X The author-recipient-topic model for topic and role discovery in social netorks: Experiments ith Enron and academic . Technical report, University of Massachusetts, Amherst. Pritchard, J., Stephens, M., and Donnelly, P Inference of population structure using multilocus genotype data. Genetics, 155: Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages AUAI Press. Sivic, J., Rusell, B., Efros, A., Zisserman, A., and Freeman, W Discovering objects and their location in images. In International Conference on Computer Vision ICCV Snelson, E. and Ghahramani, Z Sparse Gaussian processes using pseudo-inputs. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18, Cambridge, MA. MIT Press. Wasserman, L Springer. All of Nonparametric Statistics. West, M. and Harrison, J Bayesian Forecasting and Dynamic Models. Springer. A. Derivation of Variational Algorithm In this appendix e give some details of the variational algorithm outlined in Section 3.1, hich calculates a distribution q 1:T ˆ 1:T to maximie the loer bound on 119

8 log pd 1:T. The first term of the righthand side of 5 is E q log p t t 1 = V T 2 1 2σ 2 = V T 2 log σ 2 + log 2π E q t t 1 T t t 1 log σ 2 + log 2π 1 2σ 2 1 T σ 2 Tr m t m t 1 2 Ṽt + 1 2σ 2 TrṼ0 TrṼT using the Gaussian quadratic form identity E m,v x µ T Σ 1 x µ = The second term of 5 is m µ T Σ 1 m µ + TrΣ 1 V. E q log pd t t = n t E q t log exp t n t m t n ˆζ 1 t t + n t n t log ˆζ t exp m t + Ṽt/2 Next, e maximie ith respect to ˆ s : lˆ, ˆν ˆ = s 1 T mt σ 2 m t m t 1, ˆ m t 1, s ˆ s + n t n tˆζ 1 t exp m t + Ṽt/2 mt ˆ. s The forard-backard equations for m t can be used to derive a recurrence for m t / ˆ s. The forard recurrence is m t ˆ s = ˆν 2 t mt 1 v t 1 + σ 2 + ˆν t 2 ˆ + s ˆν 2 t 1 v t 1 + σ 2 + ˆν t 2 δ s,t, ith the initial condition m 0 / ˆ s = 0. The backard recurrence is then m t 1 σ 2 mt 1 ˆ = s V t 1 + σ 2 ˆ + s 1 σ 2 V t 1 + σ 2 mt ˆ s, ith the initial condition m T / ˆ s = m T / ˆ s. here n t = n t, introducing additional variational parameters ˆζ 1:T. The third term of 5 is the entropy Hq = = log Ṽt + T log 2π 2 log Ṽt + TV 2 log 2π. To maximie the loer bound as a function of the variational parameters e use a conjugate gradient algorithm. First, e maximie ith respect to ˆζ; the derivative is l ˆζ t = n t ˆζ 2 t Setting to ero and solving for ˆζ t gives exp m t + Ṽt/2 n t ˆζ t. ˆζ t = exp m t + Ṽt/2. 120

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052 Chong Wang Computer Science Dept. Princeton University Princeton, NJ 08540 Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 9805 David Blei Computer Science Dept. Princeton

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Dynamic Mixture Models for Multiple Time Series

Dynamic Mixture Models for Multiple Time Series Dynamic Mixture Models for Multiple Time Series Xing Wei Computer Science Department Univeristy of Massachusetts Amherst, MA 01003 xei@cs.umass.edu Jimeng Sun Computer Science Department Carnegie Mellon

More information

Web Search and Text Mining. Lecture 16: Topics and Communities

Web Search and Text Mining. Lecture 16: Topics and Communities Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation Indraneel Muherjee David M. Blei Department of Computer Science Princeton University 3 Olden Street Princeton, NJ

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

A Continuous-Time Model of Topic Co-occurrence Trends

A Continuous-Time Model of Topic Co-occurrence Trends A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Online Multiscale Dynamic Topic Models

Online Multiscale Dynamic Topic Models Online Multiscale Dynamic Topic Models Tomoharu Iata Takeshi Yamada Yasushi Sakurai Naonori Ueda NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan {iata,yamada,yasushi,ueda}@cslab.kecl.ntt.co.jp

More information

Linear Time Computation of Moments in Sum-Product Networks

Linear Time Computation of Moments in Sum-Product Networks Linear Time Computation of Moments in Sum-Product Netorks Han Zhao Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 han.zhao@cs.cmu.edu Geoff Gordon Machine Learning Department

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation.

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation. Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation BY Imane Chatri A thesis submitted to the Concordia Institute for Information

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Lecture 3a: The Origin of Variational Bayes

Lecture 3a: The Origin of Variational Bayes CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters

More information

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the

More information

Mixtures of Multinomials

Mixtures of Multinomials Mixtures of Multinomials Jason D. M. Rennie jrennie@gmail.com September, 25 Abstract We consider two different types of multinomial mixtures, () a wordlevel mixture, and (2) a document-level mixture. We

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Mixed-membership Models (and an introduction to variational inference)

Mixed-membership Models (and an introduction to variational inference) Mixed-membership Models (and an introduction to variational inference) David M. Blei Columbia University November 24, 2015 Introduction We studied mixture models in detail, models that partition data into

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Max-Margin Ratio Machine

Max-Margin Ratio Machine JMLR: Workshop and Conference Proceedings 25:1 13, 2012 Asian Conference on Machine Learning Max-Margin Ratio Machine Suicheng Gu and Yuhong Guo Department of Computer and Information Sciences Temple University,

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Expectation-Propagation for the Generative Aspect Model

Expectation-Propagation for the Generative Aspect Model Expectation-Propagation for the Generative Aspect Model Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minkastat.cmu.edu John Lafferty School of Computer Science

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Sparse Forward-Backward for Fast Training of Conditional Random Fields

Sparse Forward-Backward for Fast Training of Conditional Random Fields Sparse Forward-Backward for Fast Training of Conditional Random Fields Charles Sutton, Chris Pal and Andrew McCallum University of Massachusetts Amherst Dept. Computer Science Amherst, MA 01003 {casutton,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Non Linear Latent Variable Models

Non Linear Latent Variable Models Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content

Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content Sergei Koltcov ul Soyuza Pechatniov, 27 St Petersburg, Russia soltsov@hseru Olessia Koltsova ul Soyuza Pechatniov,

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

More information

Distance dependent Chinese restaurant processes

Distance dependent Chinese restaurant processes David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Nonparametric Bayes Pachinko Allocation

Nonparametric Bayes Pachinko Allocation LI ET AL. 243 Nonparametric Bayes achinko Allocation Wei Li Department of Computer Science University of Massachusetts Amherst MA 01003 David Blei Computer Science Department rinceton University rinceton

More information

Content-based Recommendation

Content-based Recommendation Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION

SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION SUPERVISE MULTI-MOAL TOPIC MOEL FOR IMAGE AOTATIO Thu Hoai Tran 2 and Seungjin Choi 12 1 epartment of Computer Science and Engineering POSTECH Korea 2 ivision of IT Convergence Engineering POSTECH Korea

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of August 7, 2017.

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of August 7, 2017. Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2016. All rights reserved. Draft of August 7, 2017. CHAPTER 7 Logistic Regression Numquam ponenda est pluralitas sine necessitate

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Web-Mining Agents Topic Analysis: plsi and LDA Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Acknowledgments Pilfered from: Ramesh M. Nallapati Machine Learning applied

More information

BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS

BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS Petar M. Djurić Dept. of Electrical & Computer Engineering Stony Brook University Stony Brook, NY 11794, USA e-mail: petar.djuric@stonybrook.edu

More information

Estimating Likelihoods for Topic Models

Estimating Likelihoods for Topic Models Estimating Likelihoods for Topic Models Wray Buntine NICTA and Australian National University Locked Bag 8001, Canberra, 2601, ACT Australia wray.buntine@nicta.com.au Abstract. Topic models are a discrete

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Fast Inference and Learning with Sparse Belief Propagation

Fast Inference and Learning with Sparse Belief Propagation Fast Inference and Learning with Sparse Belief Propagation Chris Pal, Charles Sutton and Andrew McCallum University of Massachusetts Department of Computer Science Amherst, MA 01003 {pal,casutton,mccallum}@cs.umass.edu

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models

Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models Edoardo M. Airoldi, Stephen E. Fienberg, Cyrille Joutard, Tanzy M. Love May 9, 2006 CMU-ML-06-101 Discovering Latent Patterns

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Lecture 19, November 19, 2012

Lecture 19, November 19, 2012 Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information