Online Multiscale Dynamic Topic Models

Size: px
Start display at page:

Download "Online Multiscale Dynamic Topic Models"

Transcription

1 Online Multiscale Dynamic Topic Models Tomoharu Iata Takeshi Yamada Yasushi Sakurai Naonori Ueda NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan ABSTRACT We propose an online topic model for sequentially analying the time evolution of topics in document collections. Topics naturally evolve ith multiple timescales. For example, some ords may be used consistently over one hundred years, hile other ords emerge and disappear over periods of a fe days. Thus, in the proposed model, current topicspecific distributions over ords are assumed to be generated based on the multiscale ord distributions of the previous epoch. Considering both the long-timescale dependency as ell as the short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in hich the model is sequentially updated using nely obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents ith timestamps. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications Data Mining; I.2.6 [Artificial Intelligence]: Learning; I.5.1 [Pattern Recognition]: Model Statistical General Terms Algorithms Keyords Topic model, Time-series analysis, Online learning 1. INTRODUCTION Great interest is being shon in developing topic models that can analye and summarie the dynamics of document collections, such as scientific papers, nes articles, and blogs [1, 5, 7, 11, 14, 2, 21, 22]. A topic model is a hierarchical probabilistic model, in hich a document is modeled as Permission to make digital or hard copies of all or part of this ork for personal or classroom use is granted ithout fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD 1, July 25 28, 21, Washington, DC, USA. Copyright 21 ACM /1/7...$1.. a mixture of topics, and a topic is modeled as a probability distribution over ords. Topic models are successfully used in a ide variety of applications including information retrieval [6], collaborative filtering [1], and visualiation [12] as ell as the analysis of dynamics. In this paper, e propose a topic model that permits the sequential analysis of the dynamics of topics ith multiple timescales, e call it the Multiscale Dynamic Topic Model (M), and its efficient online inference procedures. Topics naturally evolve ith multiple timescales. Let us consider the topic politics in a nes article collection as an example. There are some ords that appear frequently over many years, such as constitution, congress, and president. On the other hand, some ords, such as the names of members in Congress, may appear frequently over periods of tens of years, and other ords, such as the names of bills under discussion, may appear for only a fe days. Thus, in M, current topic-specific distributions over ords are assumed to be generated based on the estimates of multiple timescale ord distributions at the previous epoch. Using these multiscale priors improves the predictive performance of the model because the information loss is reduced by considering the long-timescale dependency as ell as short-timescale dependency. The online inference and parameter estimation processes can be achieved efficiently based on a stochastic EM algorithm, in hich the model is sequentially updated using nely obtained data; past data does not need to be stored and processed to make ne inferences. Some topics may exhibit strong long-timescale dependence, and others may exhibit strong short-timescale dependence. Furthermore, the dependence may differ over time. Therefore, e infer these dependencies for each timescale, for each topic, and for each epoch. By inferring the dependencies from the observed data, M can flexibly adapt to topic dynamics. A disadvantage of online inference is that it can be more unstable than batch inference. With M, the stability can be improved by smoothing using multiple estimates ith different timescales. The remainder of this paper is organied as follos. In Section 2, e formulate a topic model for multiscale dynamics, and describe its online inference procedures. In Section 3, e briefly revie related ork. In Section 4, e demonstrate the effectiveness of the proposed method by analying the dynamics of real document collections. Finally, e present concluding remarks and a discussion of future ork in Section 5.

2 Table 1: Notation Symbol Description D t number of documents at epoch t N t,d number of ords in the dth document at epoch t W number of unique ords t,d,n nth ord in the dth document at epoch t, t,d,n {1,, W } Z number of topics t,d,n topic of the nth ord in the dth document at epoch t, t,d,n {1,, Z} S number of scales θ t,d multinomial distribution over topics for the dth document at epoch t, θ t,d {θ t,d, } Z 1, θ t,d,, P θ t,d, 1 t, multinomial distribution over ords for the th topic at epoch t, t, { t,, } W 1, t,,, P t,, 1 t, multinomial distribution over ords for the th topic ith scale s at epoch t, t, { t,,} W 1, t,,, P t,, 1 2. PROPOSED METHOD 2.1 Preliminaries In the proposed model, documents are assumed to be generated sequentially at each epoch. Suppose e have a set of D t documents at the current epoch, t, and each document is represented by t,d { t,d,n } N t,d n1, i.e. the set of ords in the document. Our notation is summaried in Table 1. We assume that epoch t is a discrete variable, and e can set the time period for an epoch arbitrarily at, for example, one day or one year. Before introducing the proposed model, e revie latent Dirichlet allocation (LDA) [6, 8], hich forms the basis of the proposed model. In LDA, each document has topic proportions θ t,d. For each of the N t,d ords in the document, topic t,d,n is chosen from the topic proportions, and then ord t,d,n is generated from a topic-specific multinomial distribution over ords t,d,n. Topic proportions θ t,d and ord distributions are assumed to be generated according to symmetric Dirichlet distributions. Figure 1 (a) shos a graphical model representation of LDA, here shaded and unshaded nodes indicate observed and latent variables, respectively. 2.2 Model We consider a set of multiple timescale distributions over ords for each topic to incorporate multiple timescale properties. In order to account for the influence of the past at different timescales to the current epoch, e assume that current topic-specific ord distributions t, are generated according to the multiscale ord distributions at the previous epoch { t 1, }S s1. Here, t 1, { t 1,, }W 1 represents a distribution over ords of topic ith scale s at epoch t 1. In particular, e use the folloing asymmetric Dirichlet distribution for the prior of current ord distribution t,, in hich the Dirichlet parameter is defined so that its mean becomes proportional to the eighted sum of t-8 (4) t-1, t-4 (3) t-1, t-2 (2) t-1, (1) t-1, t-1 λ t,,4 λ t,,3 λ t,,2 t, λ t,,1 λ t,, () t-1, Figure 2: Illustration of multiscale ord distributions at epoch t ith S 4. Each histogram shos t 1,, hich is a multinomial distribution over ords ith timescale s. multiscale ord distributions at the previous epoch, t, Dirichlet( λ t,,s t 1, ), (1) s here λ t,,s is a eight for scale s in topic at epoch t, and λ t,,s >. By estimating eights {λ t,,s } S s for each epoch, for each topic, and for each timescale using the current data as described in Section 2.3, M can flexibly respond to the influence on the current distribution of the previous short- and long-timescale distributions. The estimated multiscale ord distributions { t 1, }S s1 at the previous epoch are considered as hyperparameters in the current epoch. Their estimation ill be explained in Section 2.4. There are many different ays of setting the scales, but for the simple explanation, e set them so that t, indicates the ord distribution from t 2 s to t, here larger s represents longer timescale, and (s1) t, is equivalent to the estimate of unit time ord distribution t,. We use uniform ord distribution (s) t,, W 1 for scale s. This uniform distribution is used to avoid the ero probability problem. Figure 2 illustrates multiscale ord distributions ith this setting. Word distributions are likely to be smoothed as the timescale becomes long, and be peaked as the timescale becomes short. By using the information presented in these various timescales as the prior for the current distribution ith eights, e can infer the current distribution more robustly. In stead of using 2 s 1 epochs for scale s, e can use any number of epochs. For example, if e kno that the given data exhibit periodicity e.g. of one eek and one month, e can use the scale of one eek for s 1 and one month for s 2. In such case, e can still estimate parameters in the similar ay ith the algorithm described in Section 2.4. Typically, e do not kno the periodicity of the given data in advance, e therefore consider the simple scale setting in the paper. In LDA, topic proportions θ t,d are sampled from a Dirichlet distribution. In order to capture the dynamics of topic proportions ith M, e assume that the Dirichlet parameters α t {α t, } Z 1 depend on the previous parameters. In particular, e use the folloing Gamma prior for a Dirichlet parameter of topic at epoch t, α t, Gamma(γα t 1,, γ), (2) here the mean is α t 1,, and the variance is α t 1,/γ. By using this prior, the mean is the same as that at the previous epoch unless otherise indicated by the ne data. Parame-

3 t t-1 t α α α α θ θ θ θ N D N D N D N D Z λ N^ λ N^ S+1 β S+1 Z Z (a) LDA (b) M (c) online M λ S+1 Z Figure 1: Graphical models of (a) latent Dirichlet allocation, (b) the multiscale dynamic topic model, and (c) its online inference version. ter γ controls temporal consistency of the topic proportion prior. Assuming that e have already calculated the multiscale parameters at epoch t 1, Ξ t 1 {{ t 1, }S s} Z 1 and α t 1 {α t 1, } Z 1, M is characteried by the folloing generative process for the set of documents W t { t,d } D t d1 at epoch t, 1. For each topic 1,, Z: (a) Dra topic proportion prior α t, Gamma(γα t 1,, γ), (b) Dra ord distribution t, Dirichlet( P s λ t,,s t 1, ), 2. For each document d 1,, D t: (a) Dra topic proportions θ t,d Dirichlet(α t), (b) For each ord n 1,, N t,d : i. Dra topic t,d,n Multinomial(θ t,d ), ii. Dra ord t,d,n Multinomial( t,t,d,n ). Figure 1 (b) shos a graphical model representation of M. 2.3 Online inference We present an online inference algorithm for M, that sequentially updates the model at each epoch using the nely obtained document set and the multiscale model of the previous epoch. The information in the data up to and including the previous epoch is aggregated into the previous multiscale model. The online inference and parameter estimation can be efficiently achieved by a stochastic EM algorithm [2, 3], in hich the collapsed Gibbs sampling of latent topics [8] and the maximum likelihood estimation of hyperparameters are alternately performed [19]. We assume the set of documents W t at current epoch t, and estimates of parameters from the previous epoch α t 1 and Ξ t 1 are given. The joint distribution on the set of documents, the set of topics, and the topic proportion priors given the parameters are defined as follos, P (W t, Z t, α t α t 1, γ, Ξ t 1, Λ t ) P (α t α t 1, γ)p (Z t α t )P (W t Z t, Ξ t 1, Λ t ), (3) here Z t {{ t,d,n } N t,d n1 }D t d1 represents a set of topics, and Λ t {{λ t,,s} S s} Z 1 represents a set of eights. The first term on the right hand side of (3) is as follos using (2), P (α t α t 1, γ) Y γ γα t 1, α γα t 1, 1 t, exp( γα t, ), (4) Γ(γα t 1,) here Γ( ) is the gamma function. We can integrate out the multinomial distribution parameters in M, {θ t,d } D t d1 and { t, } Z 1, by taking advantage of Dirichlet-multinomial conjugacy. The second term is calculated by P (Z t α t) Q Dt R d1 P (t,d θ t,d )P (θ t,d α t )dθ t,d, and e have the folloing equation by integrating out {θ t,d } D t d1, P Γ( P (Z t α t ) α «D Q t,) Y Q Γ(N t,d, + α t, ) Γ(α t,) Γ(N t,d + P α t,), (5) here N t,d, is the number of ords in the dth document assigned to topic at epoch t, and N t,d P N t,d,. Similarly, by integrating out { t, } Z 1, the third term is given as follos, P (W t Z t, Ξ t 1, Λ t ) Y Γ( P s λ t,,s) Q Γ(P s λ t,,s t 1,, ) Q Γ(N t,, + P s λ t,,s t 1,, ) Γ(N t, + P s λ, (6) t,,s) here N t,, is the number of times ord as assigned to topic at epoch t, and N t, P Nt,,. The inference of the latent topics Z t can be efficiently computed by using collapsed Gibbs sampling [8]. Let j (t, d, n) for notational convenience, and j be the assignment of a latent topic to the nth ord in the dth document d

4 at epoch t. Then, given the current state of all but one variable j, a ne value for j is sampled from the folloing probability, P ( j k W t, Z t\j, α t, Ξ t 1, Λ t ) Ps λ t,s,k t 1,k, j N t,d,k\j + α t,k N t,d\j + P N t,k,j \j + αt, N t,k\j + P s λ t,s,k,(7) here \j represents the count yielded by excluding the nth ord in the dth document. The parameters α t and Λ t are estimated by maximiing the joint distribution (3). The fixed-point iteration method described in [13] can be used for maximiing the joint distribution as follos, α t, γαt 1, 1 + P αt, d (Ψ(N t,d, + α t,) Ψ(α t,)) γ + P d (Ψ(N t,d + P α t, ) Ψ( P, α t, )) (8) log Γ(x) x, here Ψ( ) is a digamma function defined by Ψ(x) and, P λ t,,s λ t 1,, A t,, t,,s, (9) B t, here A t,, Ψ(N t,,+ X s λ t,,s (s ) t 1,, ) Ψ(X s λ t,,s (s ) t 1,, ), (1) B t, Ψ(N t, + X s λ t,,s ) Ψ( X s λ t,,s ). (11) By iterating Gibbs sampling ith (7) and maximum likelihood estimation ith (8) and (9), e can infer latent topics hile optimiing the parameters. Since M uses the past distributions as the current prior, the label sitching problem [17] is not likely to occur hen estimated λ t,,s is high, hich implies current topics strongly depend on the previous distributions. Label sitching can occur hen estimated λ t,,s is lo. By alloing lo λ t,,s, hich is estimated from the given data at each epoch and each topic, M can adapt flexibly to changes even if existing topics disappear and ne topics appear in midstream. 2.4 Efficient estimation of multiscale ord distributions By using the topic assignments obtained after iterating the stochastic EM algorithm, e can estimate multiscale ord distributions. Since t,, represents the probability of ord in topic from t 2 s to t, the estimation is as follos, t,, P t,, t,, P t t t 2 s 1 +1 t,, P P t, (12) t t 2 s 1 +1 t,, here t,, is the expected number of times ord as assigned to topic from t 2 s + 1 to t, and t,, is the expected number of times at t. The expected number is calculated by t,, N t, ˆt,,, here ˆ t,, is a point estimate of the probability of ord in topic at epoch t. Although e integrate out t,,, e can recover its point estimate as follos, ˆ t,, Nt,, + P s λt,,s t 1,, N t, + P s λt,,s. (13) (1) 1: t,, t,, 2: for s 2,, S do 3: if t mod 2 s 1 then 4: 5: else 6: 7: end if 8: end for t,, t,, (s 1) t,, + t 1,, (s 1) t 1,, Figure 3: Algorithm for the approximate update of t,,. While it is simpler to use the actual number of times, N t,,, instead of the expected number of times, t,,, in (12), e use the latter in order to constrain the estimate of (s1) t,, to be the estimate of t,, as follos, Note that the value from the previous value t,, (s1) t,, t,, P ˆ t,,. (14) t,, t,, t 1,, can be updated sequentially as follos, t 1,, + t,, t 2 s 1,,. (15) Therefore, t,, can be updated through just to additions instead of 2 s 1 additions. Hoever, to update t,,, e still need to store values t,, from t 2 S 1 to t 1, hich means that O(2 S 1 ZW ) memory is required in total for updating multiscale ord distributions. Since the memory requirement increases exponentially ith the number of scales, this requirement prevents us from modeling long-timescale dynamics. Thus, e consider approximating the update by decreasing the update frequency for long-timescale distributions as in Algorithm 3; this reduces the memory requirement to O(SZW ), hich is linear against the number of scales. Figure 4 illustrates approximate updating t,, ith S 3 from t 4 to t 8. Each rectangle represents t,,, here the number represents t. Each ro at each epoch represents t,,, and shaded rectangles represent that the values that differ from the previous values. t,, is updated at every 2 s 1 nd epoch. Since the dynamics of a ord distribution for a long-timescale is considered to be sloer than that for a short-timescale, this approximation, decreasing the update frequency for long-timescale distributions, is reasonable. Updating t,, ith this approximation requires us (s 1) t,, to store only the previous values, and so the memory requirement is O(SZW ). Figure 1 (c) shos a graphical model representation of online inference in M. For the Dirichlet prior parameter of the ord distribution, e use the eighted sum of the multiscale ord distributions as in (1). The parameter can be reritten as the eighted sum of the ord distributions for each epoch as follos, s1 λ t,,s t 1,, t t 2 S 1 λ t,,t ˆ t,,, (16)

5 s3 s2 s1 t t t t Figure 4: Illustration of approximate updating from t 4 to t 8 ith S 3. here λ t,,t S X s log 2 (t t +1)+1 P 7 λ t,,s P t,, P t 1 t t 2 s 1 t,, t t,,, (17) is its eight. See Appendix for the derivation. Therefore, the multiscale dynamic topic model can be seen as an approximation of a model that depends on the ord distributions for each of the previous epochs. By considering multiscale ord distributions, the number of eight parameters Λ t can be decreased from O(2 S 1 Z) to O(SZ), and this leads to more robust inference. Furthermore, the use of multiscaling also decreases the memory requirement from O(2 S 1 ZW ) to O(SZW ) as described above. 3. RELATED WORK A number of methods for analying the evolution of topics in document collections have been proposed, such as the dynamic topic model [5], topic over time [21], online latent Dirichlet allocation [1], and topic tracking model [11]. Hoever, none of the above methods take account of multiscale dynamics. For example, the dynamic topic model () [5] depends only on the previous epoch distribution. On the other hand, M depends on multiple distributions ith different timescales. Therefore, ith M, e can model the multiple timescale dependency, and so infer the current model more robustly. Moreover, hile uses a Gaussian distribution to account for the dynamics, the proposed model uses conjugate priors. Therefore, inference in M is relatively simple compared to that in. The multiscale topic tomography model (MTTM) [14] can analye the evolution of topics at various resolutions of timescales by assuming non-homogeneous Poisson processes. In contrast, M models the topic evolution ithin the Dirichletmultinomial frameork as the same ith most topic models including latent Dirichlet allocation [6]. Another advantage of M over MTTM is that it can make inferences in an online fashion. Therefore, M can greatly reduce the computational cost as ell as the memory requirements because past data need not be stored. Online inference is essential for modeling the dynamics of document collections, in hich large numbers of documents continue to accumulate at any given moment, such as nes articles and blogs, because it is necessary to adapt to the ne data immediately for topic tracking, and it is impractical to prepare sufficient memory capacity to store all past data. Online inference algorithms for topic models have been proposed [1, 4, 7, 11]. Singular value decomposition (SVD) is used for analying multiscale patterns in streaming data [15] as ell as topic models. Hoever, since SVD assumes Gaussian noise, it is inappropriate for discrete data such as document collections [9]. 4. EXPERIMENTS 4.1 Setting We evaluated the multiscale dynamic topic model ith online inference (M) using four real document collections ith timestamps: NIPS, PNAS, Digg, and Addresses. The NIPS data consists of papers from the NIPS (Neural Information Processing Systems) conference from 1987 to There ere 1,74 documents, and the vocabulary sie as 14,36. The unit epoch as set to one year, so there ere 13 epochs. The PNAS data consists of the titles of papers that appeared in the Proceedings of the National Academy of Sciences from 1915 to 25. There ere 79,477 documents, and the vocabulary sie as 2,534. The unit epoch as set at one year, so there ere 91 epochs. The Digg data consists of blog posts that appeared in the social nes ebsite Digg ( from January 29th to February 2th in 29. There ere 18,356 documents, and the vocabulary sie as 23,494. The unit epoch as set at one day, so there ere 23 epochs. The Addresses data consists of the State of the Union addresses from 179 to 22. We increased the number of documents by splitting each transcript into 3-paragraph documents as done in [21]. We omitted ords that occurred in feer than 1 documents. There ere 6,413 documents, and the vocabulary sie as 6,759. The unit epoch as set at one year, and excluding the years for hich data as missing there ere 25 epochs. We omitted stop-ords from all data sets. We compared M to,,, and. is a dynamic topic model ith online inference that does not take multiscale distributions into consideration; it corresponds to M ith S 1. Note that used here models dynamics ith Dirichlet priors hile the original ith Gaussian priors.,, and are based on LDA, and so do not model the dynamics. is an LDA that uses all past data for inference. is an LDA that uses just the current data for inference. is an online learning extension of LDA, in hich the parameters are estimated using those of the previous epoch and the ne data [4]. For a fair comparison, the hyperparameters in these LDAs ere optimied using stochastic EM as described by Wallach [19]. We set the number of latent topics at Z 5 for all models. In M, e used γ 1, and e estimated the Dirichlet prior for topic proportions subject to α t, 1 2 in order to avoid overfitting. We set the number of scales so that one of the multiscale distributions covered the entire period, or S log 2 T + 1, here T is the number of epochs. We did not compare ith the multiscale topic tomography model (MTTM) because the of MTTM as orse than that of LDA in [14] and M has a clear advantage over MTTM in that M can make inferences in an online fashion. We evaluated the predictive performance of each model using the of held-out ords, Perplexity P d PN test t,d n1 1 log P (test t,d,n t, d, D t ) P d N A t,d test, (18)

6 here Nt,d test is the number of held-out ords in the dth document at epoch t, t,d,n test is the nth held-out ords in the document, and D t represents training samples until epoch t. A loer represents higher predictive performance. We used half of the ords in 1% of the documents as held-out ords for each epoch, and used the other ords as training samples. We created ten sets of training and test data by random sampling, and evaluated the average over the ten data sets. 4.2 Results The average perplexities over the epochs are shon in Table 2, and the perplexities for each epoch are shon in Figure 5. For all data sets, M achieved the loest, hich implies that M can appropriately model the dynamics of various types of data sets through its use of multiscale properties. had higher than M because it could not model the long-timescale dependencies. The reason for the high perplexities of and is that they do not consider the dynamics. The achieved by is high because it uses only current data and ignores the past information. The average perplexities over epochs ith different numbers of topics are shon in Figure 6. Under the same number of topics, M achieved the loest perplexities in all of the cases except hen Z 15 and 2 in the NIPS data. Even if the number of topics of the other models increases, the perplexities of the other models did not become better than that of our model ith feer topics in PNAS, Digg, and Addresses data. This result indicates that the larger number of parameters of our model is not a major reason for the loer. The average perplexities over epochs ith different numbers of scales in M are shon in Figure 7. Note that s uses the uniform distribution only, hile s 1 uses the uniform distribution and the previous epoch s distribution. The perplexities decreased as the number of scales increased. This result indicates the importance of considering multiscale distributions. Figure 8 shos the average computational time per epoch hen using a computer ith a Xeon GH CPU. The computational time for M is roughly linear against the number of scales. Even though M considers multiple timescale distributions, its computational time is much smaller than that of hich considers a single timescale distribution. This is because that M uses only current samples for inference, in contrast, uses all samples for inference. Figure 9 shos the estimated λ t,,s ith different numbers of scales s in M. The sum of the values for each epoch and for each topic are normalied to one. The parameters decrease as the timescale lengthens. This result implies that recent distributions are more informative as regards estimating current distributions, hich is intuitively reasonable. Figure 1 shos to topic examples of the multiscale topic evolution in NIPS data analyed by M. Note that e omit ords appeared in the longer timescales from the table. In the longest timescale, basic ords for the research field are appropriately extracted, such as speech, recognition, and speaker in the speech recognition topic, control, action, policy, and reinforcement in the reinforcement learning topic. In the shorter timescale, e can see the evolution of trends in the research. For example, in the speech recognition research, phoneme classification is a popular task until 1995, and probabilistic approaches such as hidden Markov models (HMM) from 1996 are frequently used. 5. CONCLUSION In this paper, e have proposed a topic model ith multiscale dynamics and efficient online inference procedures. We have confirmed experimentally that the proposed method can appropriately model the dynamics in document data by considering multiscale properties, and that it is computationally efficient. In future ork, e could determine the unit time interval and the length of scale automatically from the given data. We assumed that the number of topics as knon and fixed over time. We can automatically infer the number of topics by extending the model to a nonparametric Bayesian model such as the Dirichlet process mixture model [16, 18]. Since the proposed method is applicable to various kinds of discrete data ith timestamps, such as eb access log, blog, and , e ill evaluate the proposed method further by applying it to other data sets. 6. REFERENCES [1] L. AlSumait, D. Barbara, and C. Domeniconi. On-line LDA: Adaptive topic models for mining text streams ith applications to topic detection and tracking. In ICDM 8, pages 3 12, 28. [2] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to MCMC for machine learning. Machine Learning, 5(1):5 43, 23. [3] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In UAI 9, pages 27 34, 29. [4] A. Banerjee and S. Basu. Topic models over text streams: A study of batch and online unsupervised learning. In SDM 7, 27. [5] D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML 6, pages , 26. [6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: , 23. [7] K. R. Canini, L. Shi, and T. L. Griffiths. Online inference of topics ith latent Dirichlet allocation. In AISTATS 9, volume 5, pages 65 72, 29. [8] T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 11 Suppl 1: , 24. [9] T. Hofmann. Probabilistic latent semantic analysis. In UAI 99, pages , [1] T. Hofmann. Collaborative filtering via Gaussian probabilistic latent semantic analysis. In SIGIR 3, pages , 23. [11] T. Iata, S. Watanabe, T. Yamada, and N. Ueda. Topic tracking model for analying consumer purchase behavior. In IJCAI 9, pages , 29. [12] T. Iata, T. Yamada, and N. Ueda. Probabilistic latent semantic visualiation: topic model for visualiing documents. In KDD 8, pages , 28. [13] T. Minka. Estimating a Dirichlet distribution. Technical report, M.I.T.,.

7 Table 2: Average perplexities over epochs. The value in the parenthesis represents the standard deviation over data sets. M NIPS (41.3) (37.2) (36.4) (44.) (41.5) PNAS (122.) (146.8) (159.7) (268.7) (149.1) Digg (37.7) (46.4) (27.1) (43.4) 35. (43.6) Addresses (56.5) (49.7) (75.3) (7.9) (62.) M epoch (a) NIPS epoch (b) PNAS M epoch (c) Digg epoch (d) Addresses Figure 5: Perplexities for each epoch M M 5 45 M number of topics number of topics number of topics number of topics (a) NIPS (b) PNAS (c) Digg (d) Addresses Figure 6: Average perplexities ith different numbers of topics #scales #scales #scales (a) NIPS (b) PNAS (c) Digg (d) Addresses #scales Figure 7: Average of M ith different numbers of scales.

8 all one online all one online all one online all one online (a) NIPS (b) PNAS (c) Digg (d) Addresses Figure 8: Average computational time (sec) of M per epoch ith different numbers of scales,,, and. lambda scale scale scale scale (a) NIPS (b) PNAS (c) Digg (d) Addresses lambda lambda lambda Figure 9: Average normalied eight λ ith different scales estimated in M. [14] R. Nallapati, W. Cohen, S. Ditmore, J. Lafferty, and K. Ung. Multiscale topic tomography. In KDD 7, pages , 27. [15] S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB 5, pages , 25. [16] L. Ren, D. B. Dunson, and L. Carin. The dynamic hierarchical Dirichlet process. In ICML 8, pages , 28. [17] M. Stephens. Dealing ith label sitching in mixture models. Journal of the Royal Statistical Society B, 62:795 89,. [18] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 11(476): , 26. [19] H. M. Wallach. Topic modeling: Beyond bag-of-ords. In ICML 6, pages , 26. [2] C. Wang, D. M. Blei, and D. Heckerman. Continuous time dynamic topic models. In UAI 8, pages , 28. [21] X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD 6, pages , 26. [22] X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI 7, pages , 27. APPENDIX t 1 t 2 s 1, In this appendix, e give the derivation of (16). Let P P t 1 t t 2 s 1 t,,, and t, P t,,. The Dirichlet prior parameter of the ord distribution can be reritten as the eighted sum of the ord distributions for each epoch using (12) as follos, s1 λ t,,s t 1,, s1 λ t,,s P t 1 t t 2 s 1 s1 t t 2 s 1 t 1 t 2 s 1, λ t,,s t 1 t 2 s 1, t,, t,, t t 2 S 1 s log 2 (t t +1)+1 t t 2 S 1 s log 2 (t t +1)+1 λ t,,s t 1 t 2 s 1, λ t,,s t, t 1 t 2 s 1, t,, t,, t, t t 2 S 1 λ t,,t ˆ t,,. (19)

9 speech recognition ord speaker training set tdnn time test speakers system data letter state letters neural utterances ords phoneme classification level phonetic segmentation language segment accuracy duration continuous units male sentence score dt vocabulary processing aibel acoustics error delay architecture state hmm system probabilities model ords context hmms markov probability spectral feature false acoustic independent models normaliation rate trained gradient log likelihood models sequence sequences hidden hybrid states frame transition hidden states models feature continuous modeling features adaptation human acoustic space missing systems ergodic user eakly reconstruction mapping variables constrained hit target score scores threshold detection verification putative card alarms dependent performance talkers riter vocabulary riting transformation table mapping aibel recurrent estimation dependent posterior forard mlp backard targets class frames parameters clustering update entropic mixture updates figure decoder distance elch feedback subject segmented reading factor dictionary degradation character generaliation experiment discrete emission behaviors length detection parameters term eq pdfs real (a) Speech recognition learning state control action time policy reinforcement optimal actions recognition dynamic space model exploration states programming barto sutton goal task function states algorithm model agent decision step reard markov space robot based controller system forard level memory real jordan orld skills policies singh adaptive iteration stochastic transition values expected based grid based memory controller continuous cost system temporal iteration interpolation rl machine policies environment iteration mdp singh finite update search game moore asynchronous trajectory atkeson learned point trials position methods probability critic actor skill support bellman convergence learner probabilities functions learn problem car traffic algorithms performance speed discrete trial actor process pole steps local processes problem demonstration ham bellman convergence equation processes vector representation mdps choice problem local learned probability method current options call learn problem atkins manager seeping tasks prioritied moore lqr learn cases dyna (b) Reinforcement learning belief pomdp algorithms critic observable approximate pomdps actor partially Figure 1: To topic examples of the multiscale topic evolution in NIPS data analyed by M: (a) speech recognition, and (b) reinforcement learning topics. The ten most probable ords for each epoch, timescale, and topic are shon.

Sequential Modeling of Topic Dynamics with Multiple Timescales

Sequential Modeling of Topic Dynamics with Multiple Timescales Sequential Modeling of Topic Dynamics with Multiple Timescales TOMOHARU IWATA, NTT Communication Science Laboratories TAKESHI YAMADA, NTT Science and Core Technology Laboratory Group YASUSHI SAKURAI and

More information

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052 Chong Wang Computer Science Dept. Princeton University Princeton, NJ 08540 Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 9805 David Blei Computer Science Dept. Princeton

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Dynamic Mixture Models for Multiple Time Series

Dynamic Mixture Models for Multiple Time Series Dynamic Mixture Models for Multiple Time Series Xing Wei Computer Science Department Univeristy of Massachusetts Amherst, MA 01003 xei@cs.umass.edu Jimeng Sun Computer Science Department Carnegie Mellon

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S 1,a) 1 1 SNS /// / // Time Series Topic Model Considering Dependence to Multiple Topics Sasaki Kentaro 1,a) Yoshikawa Tomohiro 1 Furuhashi Takeshi 1 Abstract: This pater proposes a topic model that considers

More information

A Continuous-Time Model of Topic Co-occurrence Trends

A Continuous-Time Model of Topic Co-occurrence Trends A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract

More information

Tsuyoshi; Shibata, Yuichiro; Oguri, management - CIKM '09, pp ;

Tsuyoshi; Shibata, Yuichiro; Oguri, management - CIKM '09, pp ; NAOSITE: 's Ac Title Author(s) Citation Dynamic hyperparameter optimization Masada, Tomonari; Fukagawa, Daiji; Tsuyoshi; Shibata, Yuichiro; Oguri, Proceeding of the 18th ACM conferen management - CIKM

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Collaborative User Clustering for Short Text Streams

Collaborative User Clustering for Short Text Streams Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Collaborative User Clustering for Short Text Streams Shangsong Liang, Zhaochun Ren, Emine Yilmaz, and Evangelos Kanoulas

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Lecture 3a: The Origin of Variational Bayes

Lecture 3a: The Origin of Variational Bayes CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Web Search and Text Mining. Lecture 16: Topics and Communities

Web Search and Text Mining. Lecture 16: Topics and Communities Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Efficient Methods for Topic Model Inference on Streaming Document Collections

Efficient Methods for Topic Model Inference on Streaming Document Collections Efficient Methods for Topic Model Inference on Streaming Document Collections Limin Yao, David Mimno, and Andrew McCallum Department of Computer Science University of Massachusetts, Amherst {lmyao, mimno,

More information

Dynamic Topic Models. Abstract. 1. Introduction

Dynamic Topic Models. Abstract. 1. Introduction David M. Blei Computer Science Department, Princeton University, Princeton, NJ 08544, USA John D. Lafferty School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213, USA BLEI@CS.PRINCETON.EDU

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Online Topic Model for Twitter Considering Dynamics of User Interests and Topic Trends

Online Topic Model for Twitter Considering Dynamics of User Interests and Topic Trends Online Topic Model for Titter Considering Dynamics of ser Interests and Topic Trends entaro Sasaki, Tomohiro Yoshikaa, Takeshi Furuhashi Graduate School of Engineering Nagoya niversity sasaki@cmplx.cse.nagoya-u.ac.jp

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Linear Time Computation of Moments in Sum-Product Networks

Linear Time Computation of Moments in Sum-Product Networks Linear Time Computation of Moments in Sum-Product Netorks Han Zhao Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 han.zhao@cs.cmu.edu Geoff Gordon Machine Learning Department

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation.

Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation. Classification of Text Documents and Extraction of Semantically Related Words using Hierarchical Latent Dirichlet Allocation BY Imane Chatri A thesis submitted to the Concordia Institute for Information

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Topic Learning and Inference Using Dirichlet Allocation Product Partition Models and Hybrid Metropolis Search

Topic Learning and Inference Using Dirichlet Allocation Product Partition Models and Hybrid Metropolis Search Technical Report CISE, University of Florida (2011) 1-13 Submitted 09/12; ID #520 Topic Learning and Inference Using Dirichlet Allocation Product Partition Models and Hybrid Metropolis Search Clint P.

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Stochastic Complexity of Variational Bayesian Hidden Markov Models

Stochastic Complexity of Variational Bayesian Hidden Markov Models Stochastic Complexity of Variational Bayesian Hidden Markov Models Tikara Hosino Department of Computational Intelligence and System Science, Tokyo Institute of Technology Mailbox R-5, 459 Nagatsuta, Midori-ku,

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

A Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design

A Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design A Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design Akinori Fujino, Naonori Ueda, and Kazumi Saito NTT Communication Science Laboratories, NTT Corporation 2-4, Hikaridai, Seika-cho,

More information

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED.

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED. Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED. Yi Wang yi.wang.2005@gmail.com August 2008 Contents Preface 2 2 Latent

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content

Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated Content Sergei Koltcov ul Soyuza Pechatniov, 27 St Petersburg, Russia soltsov@hseru Olessia Koltsova ul Soyuza Pechatniov,

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University

More information

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Web-Mining Agents Topic Analysis: plsi and LDA Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Acknowledgments Pilfered from: Ramesh M. Nallapati Machine Learning applied

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models

Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models Chong Wang Machine Learning Department Carnegie Mellon University chongw@cs.cmu.edu David M. Blei Computer Science Department

More information

The Role of Semantic History on Online Generative Topic Modeling

The Role of Semantic History on Online Generative Topic Modeling The Role of Semantic History on Online Generative Topic Modeling Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi Department of Computer Science George Mason University Fairfax - VA, USA lalsumai@gmu.edu,

More information

A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation

A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation Hajime Fujita and Shin Ishii, Nara Institute of Science and Technology 8916 5 Takayama, Ikoma, 630 0192 JAPAN

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Query-document Relevance Topic Models

Query-document Relevance Topic Models Query-document Relevance Topic Models Meng-Sung Wu, Chia-Ping Chen and Hsin-Min Wang Industrial Technology Research Institute, Hsinchu, Taiwan National Sun Yat-Sen University, Kaohsiung, Taiwan Institute

More information

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Latent Variable Models Probabilistic Models in the Study of Language Day 4 Latent Variable Models Probabilistic Models in the Study of Language Day 4 Roger Levy UC San Diego Department of Linguistics Preamble: plate notation for graphical models Here is the kind of hierarchical

More information

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information