Probit Normal Correlated Topic Model

Size: px

Start display at page:

Download "Probit Normal Correlated Topic Model"

Dale Newton
5 years ago
Views:

1 Open Journal of Statistics, 2014, 4, Publishe Online December 2014 in SciRes. Probit Normal Correlate Topic Moel Xingchen u, Ernest Fooué Center for Quality an Applie Statistics, Rochester Institute of Technology, Rochester, N, USA Receive 3 October 2014; revise 28 October 2014; accepte 15 November 2014 Copyright 2014 by authors an Scientific Research Publishing Inc. This wor is license uner the Creative Commons Attribution International License (CC B. Abstract The logistic normal istribution has recently been aapte via the transformation of multivariate Gaussian variables to moel the topical istribution of ocuments in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to moelling correlate topical structures. Our use of the probit moel in the context of topic iscovery is novel, as many authors have so far concentrate solely of the logistic moel partly ue to the formiable inefficiency of the multinomial probit moel even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an aaptation of the iagonal orthant multinomial probit in the topic moels context, resulting in the ability of our topic moeling scheme to hanle corpuses with a large number of latent topics. An aitional an very important benefit of our metho lies in the fact that unlie with the logistic normal moel whose non-conugacy leas to the nee for sophisticate sampling schemes, our approach exploits the natural conugacy inherent in the auxiliary formulation of the probit moel to achieve greater simplicity. The application of our propose scheme to a well-nown Associate Press corpus not only helps iscover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besies, our propose approach lens itself to even further scalability thans to various existing high performance algorithms an architectures capable of hanling millions of ocuments. eywors Topic Moel, Bayesian, Gibbs Sampler, Cumulative Distribution Function, Probit, Logit, Diagonal Orthant, Efficient Sampling, Auxiliary Variable, Correlation Structure, Topic, Vocabulary, Conugate, Dirichlet, Gaussian 1. Introuction The tas of recovering the latent topics unerlying a given corpus of DD ocuments has been in the forefront of active research in statistical machine learning for more than a ecae, an continues to receive the eicate How to cite this paper: u, X.C. an Fooué, E. (2014 Probit Normal Correlate Topic Moel. Open Journal of Statistics, 4,

2 X. C. u, E. Fooué contributions from many researchers from aroun the worl. Since the introuction of Latent Dirichlet Allocation (LDA [1] an then the extension to correlate topic moels (CTM [2], a series of excellent contributions have been mae to this exciting fiel, ranging from slight extension in the moelling structure to the evelopment of scalable topic moeling algorithms capable of hanling extremely large collections of ocuments, as well as selecting an optimal moel among a collection of competing moels or using the output of topic moeling as entry points (inputs to other machine learning or ata mining tass such as image analysis an sentiment extraction, ust to name a few. As far as correlate topic moels are concerne, virtually all the contributors to the fiel have so far concentrate solely on the use of the logistic normal topic moel. The seminal paper on correlate topic moel [2] aopts a variational approximation approach to moel fitting while subsequent authors lie [3] propose a Gibbs sampling scheme with ata augmentation of uniform ranom variables. More recently, [4] presente an exact an scalable Gibbs sampling algorithm with Polya-Gamma istribute auxiliary variables which is a recent evelopment of efficient sampling of logistic moel. Despite the inseparable relationship between logistic an probit moel in statistical moelling, the probit moel has not yet been propose, probably ue to its computational inefficiency for multiclass classification problem an high posterior epenence between auxiliary variables an parameters. As for practical application where topic moels are commonly employe, having multiple topics is extremely prevalent. In some cases, more than 1000 topics will be fitte to large atasets such as Wiipeia an Pubme ata. Therefore, using MCMC probit moel in topic moeling application will be impractical an inconceivable ue to its computational inefficiency. Nonetheless, a recent wor on iagonal orthant probit moel [5] substantially improve the sampling efficiency while maintaining the preictive performance, which motivate us to buil an alternative correlate topic moeling with probit normal topic istribution. On the other han, probit moels inherently capture a better epenency structure between topics an co-occurrence of wors within a topic as it oesn t assume the IIA (inepenence of irrelevant alternatives restriction of logistic moels. The rest of this paper is organize as follows: in Section 2, we present a conventional formulation of topic moeling along with our general notation an the correlate topic moels extension. Section 3 introuces our aaptation of the iagonal orthant probit moel to topic iscovery in the presence correlations among topics, along with the corresponing auxiliary variable sampling scheme for upating the probit moel parameters an the remainer of all the posterior istributions of the parameters of the moel. Unlie with the logistic normal formulation where the non-conugacy leas to the nee for sophisticate sampling scheme, in this section we clearly reveal the simplicity of our propose metho resulting from the natural conugacy inherent in the auxiliary formulation of the upating of the parameters. We also show compelling computational emonstrations of the efficiency of the iagonal orthant approach compare to the traitional multinomial probit for on both the auxiliary variable sampling an the estimation of the topic istribution. Section 4 presents the performance of our propose approach on the Associate Press ata set, featuring the intuitively appealing topics iscovere, along with the correlation structure among topics an the loglielihoo as a function of topical space imension. Section 5 eals with our conclusion, iscussion an elements of our future wor. 2. General Aspects of Topic Moels In a given corpus, one coul imagine that each ocument eals with one or more topics. For instance, one of the collection consiere in this paper is provie by the Associate Press an covers topics as varie as aviation, eucation, weather, broacasting, air force, navy, national security, international treaties, investing, international trae, war, courts, entertainment inustry, politics, an etc. From a statistical perspective, a topic is often moele as a probability istribution over wors, an as a result a given ocument is treate as a mixture of probabilistic topics [1]. We consier a setting where we have a total of V unique wors in the reference vocabulary an topics unerlying the D ocuments provie. Let w n enote the n-th wor in the -th ocument, an let z n refer to the label of the topic assigne to the n -th wor of that -th ocument. Then the probability of w is given by where Pr ( z n ( w = ( w z = ( z = Pr Pr Pr, (1 n n n n = 1 n = is the probability that the n-th wor in the -th ocument is assigne to topic. This quantity plays an important role in the analysis of correlate topic moels. In the seminal article on correlate topic moels [2], Pr ( zn = is moele for each ocument as a function of a -imensional vector η 880

3 1 2 of parameters. Specifically, the logistic-normal efines η ( η, η,, η typically set to zero for ientifiability an assumes with η ~ MVN (, η X. C. u, E. Fooué = where the last element η is µ Σ with e 1 θ = Pr zn 1 η = = f ( η =, = 1, 2,, 1 an θ, = η η e e = 1 = 1 Also, n { N } an z ~ Mult ( θ, an w ~ ( 1, 2,, n n Mult β. With all these moel components efine, the estimation tas in correlate topic moeling from a Bayesian perspective can be summarize in the following posterior D N p( η, ZW, µ, Σ p( W Z p( zn p( η µ, Σ = 1 n= 1 N δ ( C D β 1 ( = δ β = 1 n= 1 where δ ( is efine using the Gamma function ( + z n = θ N ( η µ, Σ, ( u Gamma so for a -imension vector u, = 1 δ ( u Γ =. Γ u = 1 (3 provies the ingreients for estimating the parameter vectors η that help capture the correlations among topics, an the matrix Z that contains the topical assignments. Uner the logistic normal moel, sampling from the full posterior of η erive from the oint posterior in (3 requires the use of sophisticate sampling schemes lie the one use in [4]. Although these authors manage to achieve great performances on large corpuses of ocuments, we thought it useful to contribute to correlate topic moeling by way of the multinomial probit. Clearly, as inicate earlier, most authors concentrate on logistic-normal even espite non-conugacy, an the lac of probit topic moeling can be easily attribute to the inefficiency of the corresponing sampling scheme. In the most raw formulation of the multinomial probit that intens to capture the full exten of all the correlations among the topics, the topic assignment probability is efine by (3. ( n = = θ = φ ( η Pr z u;, R u (3 The practical evaluation of (3 involves a complicate high imensional integral which is typically computationally intractable when the number of categories is greater than 4. A relaxe version of (3, one that still captures more correlation than the logit an that is also very commonly use in practice, efines θ as + θ = ( v η η ϕ( v v ( ( V η η, Φ + =Εϕ v Φ + (4 = 1, = 1, v v where 2 φ ( v = e is the stanar normal ensity, an Φ ( v = φ ( uu is the stanar normal is- 2π tribution function. Despite this relaxation, the multinomial probit in this formulation still has maor rawbacs namely: 1 Even when one is given the vector η, the calculation of θ remains computationally prohibitive even for moerate values of. In practice, one may consier using a monte carlo approximation to that integral in (4. However, such an approach in the context of a large corpus with many unerlying latent topics reners the probit formulation almost unusable. 2 As far as the estimation of η is concerne, a natural approach to sampling from the posterior of η in this context woul be to use the Metropolis-Hastings upating scheme, since the full posterior in this case is not available. Unfortunately, the Metropolis in this case is excruciatingly slow with poor mixing rates an high sensitivity to the proposal istribution. It turns out that an apparently appealing solution in this case coul come from the auxiliary variable formulation as escribe in [6]. Un- (2 881

4 X. C. u, E. Fooué fortunately, even this promising formulation fails catastrophically for moerate values as we will emonstrate in the subsequent section, ue to the high epenency structure between auxiliary variables an parameters. Essentially, the nee for Metropolis is avoie by efining an auxiliary vector of imension. For n = 1,, N, we consier the vector z n containing the current topic allocation an we repeately sample n from a -imensional multivariate Gaussian until the component of n that correspons to the non-zero inex in z n is the largest of all the components of n, i.e. zn n = = 1,, { n} max. The conition in (5 typically fails to be fulfille even when is moerately large. In fact, we emonstrate later that in some cases, it becomes impossible to fin a vector n satisfying that conition. Besies, the epenency of n on the current value of η further complicates the sampling scheme especially in the case of large topical space. In the next section, we remey these inefficiencies by proposing an eveloping our aaptation of the iagonal orthant multinomial probit. 3. Diagonal Orthant Probit for Correlate Topic Moels In a recent wor, [5] evelope the iagonal orthant probit approach to multicategorical classification. Their approach circumvents the bottlenecs mentione earlier an substantially improves the sampling efficiency while maintaining the preictive performance. Essentially, the iagonal orthant probit approach successfully maes the most of the benefits of binary classification, thereby substantially reucing the high epenency that mae the conition (5 computationally unattainable. Inee, with the iagonal orthant multinomial moel, we achieve three main benefits A more tractable an easily computatble efinition of topic istribution θ = Pr ( zn = η A clear an very straightforwar an aaptable auxiliary variable sampling scheme Thecapacity to hanle a very large number of topics ue to the efficiency an low epenency. Uner the iagonal orthant probit moel, we have θ = ( 1 Φ( η Φ( η ( 1 Φ( η Φ( η = 1 The generative process of our probit normal topic moels is essentially ientical to logistic topic moels except that the topic istribution for each ocument now is obtaine by a probit transformation of a multivariate Gaussian variable (6. As such, the generating process of a ocument of length N is as follows: 1 Draw η ~ MVN ( µ, Σ an transform η into topic istribution θ where each element of θ is compute as follows: θ = 2 For each wor position n ( 1,, N a Draw a topic assignment Z ~ Mult ( θ n n b Draw a wor W ~ ( z n Mult φ ( 1 Φ( η Φ( η ( 1 Φ( η Φ( η = 1 Where ( η ~ N µ, Σ. Throughout this paper, we ll use φ ( Φ represents the cumulative istribution of the stanar normal. We specify a Gaussian prior for η, namely ( ( to enote the -imensional multivariate Gaussian ensity function, 1 1 Τ 1 φ ( η ; µ, Σ = exp ( η µ ( η µ. Σ 2π 2 Σ ( To complete the Bayesian analysis of our probit normal topic moel, we nee to sample from the oint posterior.. (5 (6 (7 882

5 ( η Z W µ ( η µ ( Z η ( W Z X. C. u, E. Fooué p,,, Σ p, Σ p p. (8 As note earlier, the secon benefit of the iagonal orthant probit moel lies in its clear, simple, straightforwar yet powerful auxiliary variable sampling scheme. We tae avantage of that iagonal orthant property when ealing with the full posterior for η given by ( η W Z µ ( η µ ( Z η p,,, Σ p, Σ p. (9 While sampling irectly from (9 is impractical, efining a collection of auxiliary variables allows a p η, Z, W, µ, Σ using the following: For each ocument, the matrix N contains all the values of the auxiliary variables, scheme that samples from the oint posterior ( Each row ( 1,, n n n,, n = 1 2, N 1, N 1, N 1, N 1 1 2, N, N, N, N Τ = of has components, an the iagonal orthant upates them reaily using the following straightforwar sampling scheme: Let be the current topic allocation for the nth wor. For the component of n whose inex correspons to the label of current topic assignment of wor n sample from a truncate normal istribution with variance 1 restricte to positive outcomes ( η ( η ~ N,1 z 1 n + n = For all components of n whose inices o correspon to the label of current topic assignment of wor n sample from a truncate normal istribution with variance 1 restricte to negative outcomes Once the matrix rawing where ( η ( η ~ N,1 z 1 n n is obtaine, the sampling scheme upates the parameter vector ( η A µ Σ MVN ( µ η Σ η,,, ~,, ( 1 1 ( ( 1 1 Τ Τ µ 1 η = Σ η Σ µ + X A ec Ση = Σ + X A X 1 I an v ( v an. η by conveniently with X = N ec representing the row-wise vectorization of the matrix. Aopting the fully Bayesian treatment of our probit normal correlate topic moel, we a an extra layer to the hierarchy in orer to capture the variation in the mean vector an the variance-covariance matrix of the parameter vector η. Taing avantage of conugacy, we specify a normal-inverse-wishart prior for ( µ, Σ, namely, p µ, Σ = NIW µ, κ, Ψ, ν, meaning that ν, Ψ ~ IW ( Ψ, ν ( ( Σ an ( µµ, Σ, κ ~ MVN ( µ, Σ κ normal-inverse-wishart, so that we can write where κ = κ0 + D, ν = ν0 + D, p ( µ Σ W Z η = NIW ( µ κ Ψ ν,,,,,,, D = + κ 0 µ η µ 0 D+ κ0 D+ κ0, an κ Ψ = Ψ + Q + ( η µ ( η µ 0 Τ κ0 + D,. The corresponing posterior is 883

6 X. C. u, E. Fooué where D ( ( Q = η η η η Τ. = 1 As far as sampling from the full posterior istribution of Z n is concerne, we use the expression w C n, n + βw n n = Z n n W n ( n n = W n Z n θ θ V V C, n + β = 1 = 1 Pr z 1, w, p w z 1,,. where the use of C, n is use to inicate that the n-th wor is not inclue in the topic or ocument uner consieration. 4. Computational Results on the Associate Press Data In this section, we use a famous Associate Press ata set from [7] in R to uncover the wor topic istribution, the correlation structure between various topics as well as selecting optimal moels. The Associate Press corpus consists of 2244 ocuments an 10,473 wors. After preprocessing the corpus by picing frequent an common terms, we reuce the size of the wors from 10,473 to 2643 for efficient sampling. In our first experimentation, we built a correlate topic moeling structure base on the traitional multinomial probit an then teste the computational spee for ey sampling tass. The high posterior epenency structure between auxiliary variables an parameters mae multinormal probit essentially unscalable for situations where it is impossible for the sampler to yiel a ranom variate of the auxiliary variable corresponing the current topic allocation label that is also the maximum (5. For a ranom initialization of topic assignment, the sampling of auxiliary variable cannot even complete one single iteration. In the case of goo initialization of topical prior η which leas to smooth sampling of auxiliary variables, the computational efficiency is still unesirable an we observe that for larger topical space such as = 40, the auxiliary variable stumble again after some amount of iterations, inicating even goo initialization will not ease the troublesome epenency relationship between the auxiliary variables an parameters in larger topical space. Unlie with the traitional probit moel for which the computation of θ is virtually impractical for large, the iagonal orthant approach maes this computation substantially faster ever for large. The comparison of the computational spee of two essential sampling tass between the multinomial probit moel an igonal orthant probit moel are shown as below in Table 1. Table 1. All the numbers in this table represent the processing time (in secons, an are compute in Ron PC using a parallel algorithm acting on 4 CPU cores. NA here represents situations where it is impossible for the sampler to yiel a ranom variate of the auxiliary variable corresponing the current topic allocation label that is also the maximum. Sampling Tas ( = 10 MNP DO Probit Topic Distribution θ Auxiliary variable (108 to NA 3.09 Sampling Tas ( = 20 MNP DO Probit Topic Distribution θ Auxiliary variable (334 to NA 3.39 Sampling Tas ( = 30 MNP DO Probit Topic Distribution θ Auxiliary variable (528 to NA 3.49 Sampling Tas ( = 40 MNP DO Probit Topic Distribution θ Auxiliary variable (1785 to NA

7 X. C. u, E. Fooué In aition to the rastic improvement of the overall sampling efficiency, we notice that the computational complexity for sampling the auxiliary variable an topic istribution is close to O(1 an O( respectively, suggesting that probit normal topic moel now becomes an attainable an feasible tool of the traitional correlate topic moel. Central to topic moeling is the nee to etermine for a given corpus the optimal number of latent topics. As it is the case for most latent variable moels, this tas can be formiable at times, an there is no consensus among machine learning researchers as to which of the existing methos is the best. Figure 1 shows the loglielihoo as a function of the number of topics iscovere in the moel. Apart from the loglielihoo, many other techniques are commonly use such as perplexity, harmonic mean metho an so on. As we see, the optimal number of topics in this case is 30. In Table 2, we show a subset of the 30 topics uncovere where each topic is represente by the 10 most frequent wors. It can be seen that our probit normal topic moel is able to capture the co-occurrence of wors within topics successfully. In Figure 2, we also show the correlation structure between various topics which is the essential purpose of employing the correlate topic moel. Eviently, the correlation capture intuitively reflect the natural relationship between similar topics. 5. Conclusion an Discussion In the context of topic moeling where many other researchers seem to have avoie it. By aapting the iagonal orthant probit moel, we propose a probit alternative to the logit approach to the topic moeling. Compare to the multinomial probit moel we constructe, our topic iscovery scheme using iagonal orthant probit moel enoye several esirable properties; First,we gaine the efficiency in computing the topic istribution θ ; Secon, we achieve a clear an very straightforwar an aaptable auxiliary variable sampling scheme that substantially reuce the strength of the epenence structure between auxiliary variables an moel parameters, responsible for absorbing state in the Marov chain; Thirly, as a consequence of goo mixing, our approach mae the probit moel a viable an competitive alternatives to its logistic counterpart. In aition to all these benefits, our propose metho offers a straightforwar an inherent conugacy, which helps avoi those complicate sampling schemes employe in the logistics normal probit moel. In the Associate Press example explore in the previous section, not only oes our metho prouce a better lielihoo than the logistic normal topic moel with variational EM, but also iscovers meaningful topics along with unerlying correlation structure between topics. Overall, the metho we evelope in this paper offers another feasible alternatives in the context of correlate topic moel that we hope will be further explore an extene by many other researchers. Figure 1. Log lielihoo as a function of the number of topics. 885

8 X. C. u, E. Fooué Table 2. Representation of topics iscovere by our metho. Topic 25 Topic 18 Topic 23 Topic 11 Topic 1 Topic 24 Topic 27 Wor 1 court company bush stuents tax fire air Wor 2 trial billion senate school buget water plane Wor 3 uge inc vote meese billion rain flight Wor 4 prison corp uais stuent bill northern airlines Wor 5 convicte percent percent schools percent southern pilots Wor 6 ury stoc bill teachers senate inches aircraft Wor 7 rug worers enney boar income fair planes Wor 8 guilty contract sales eucation legislation egrees airline Wor 9 fbi companies bentsen teacher taxes snow eastern Wor 10 sentence offer ticet tax bush temperatures airport Topic 6 Topic 12 Topic 20 Topic 2 Topic 22 Topic 16 Topic 15 Wor 1 percent space military soviet ai police ollar Wor 2 stoc shuttle china gorbachev rebels arreste yen Wor 3 inex soviet chinese bush contras shot rates Wor 4 billion nasa soliers reagan nicaragua shooting bi Wor 5 prices launch troops moscow contra inure prices Wor 6 rose mission saui summit saninista car price Wor 7 stocs earth trae soviets military officers lonon Wor 8 average north rebels treaty ortega bus gol Wor 9 points orean hong europe saninistas illing percent Wor 10 shares south army germany rebel arrest traing Topic 19 Topic 14 Topic 7 Topic 4 Topic 30 Topic 8 Topic 17 Wor 1 iraq trae israel navy percent south film Wor 2 uwait percent israeli ship oil africa movie Wor 3 iraqi farmers ewish coast prices african music Wor 4 german farm palestinian islan price blac theater Wor 5 gulf billion arab boat cents church actor Wor 6 germany apan palestinians ships gasoline pope actress Wor 7 saui agriculture army earthquae average manela awar Wor 8 iran apanese occupie sea offers blacs ban Wor 9 bush tons stuents scale gol aparthei boo Wor 10 military rought gaza guar crue catholic films Base on the promising results we have seen in this paper, the probit normal topic moel opens the oor for various future wors. For instance, [8] propose a multi-fiel correlate topic moel by relaxing the assumption of using common set of topics globally among all ocuments, which can also be applie to the probit moel to enrich the comprehensiveness of structural relationships between topics. Another potential irection woul be to 886

9 X. C. u, E. Fooué Figure 2. Graphical representation of the correlation among topics. enhance the scalability of the moel. Currently we use a simple istribute algorithm propose by [9] an [10] for efficient Gibbs sampling. The architecture for topic moels presente by [11] can be further utilize to reuce the computational complexity substantially while elivering comparable performance. Furthermore, a novel sampling metho involving the Gibbs Max-Margin Topic [12] will further improve the computational efficiency. Acnowlegements We want to express our sincere gratitue to our reviewer for comprehensive an constructive avice. We also wish to express our heartfelt gratitue an infinite thans to Our Lay of Perpetual Help for Her ever-present support an guiance, especially for the uninterrupte flow of inspiration receive through Her most powerful intercession. References [1] Blei, D.M. an Ng, A.., Joran, M.I. an Lafferty, J. (2003 Latent Dirichlet Allocation. Journal of Machine Learning Research, 3. [2] Blei, D.M. an Lafferty, J.D. (2006 Correlate Topic Moels. Proceeings of the 23r International Conference on Machine Learning, MIT Press, Cambrige, Massachusetts, [3] Mimno, D., Wallach, H.M. an Mccallum, A. (2008 Gibbs Sampling for Logistic Normal Topic Moels with Graph- Base Priors. Proceeings of NIPS Worshop on Analyzing Graphs, [4] Chen, J.F., Zhu, J., Wang, Z., Zheng, X. an Zhang, B. (2013 Scalable Inference for Logistic-Normal Topic Moels. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z. an Weinberger,.Q., Es., Avances in Neural Information Processing Systems 26, Curran Associates, Inc., [5] Johnrow, J., Lum,. an Dunson, D.B. (2013 Diagonal Orthant Multinomial Probit Moels. JMLR Proceeings, Volume 31 of AISTATS, [6] Albert, J.H. an Chib, S. (1993 Bayesian Analysis of Binary an Polychotomous Response Data. Journal of the American Statistical Association, 88, [7] Grun, B. an Horni,. (2011 Topicmoels: An R Pacage for Fitting Topic Moels. Journal of Statistical Software, 40,

10 X. C. u, E. Fooué [8] Salomatin,., ang,.m. an La, A. (2009 Multi-Fiel Correlate Topic Moeling. Proceeings of the SIAM International Conference on Data Mining, SDM 2009, April 30-May 2, 2009, Spars, [9] ao, L.M., Mimno, D. an McCallum, A. (2009 Efficient Methos for Topic Moel Inference on Streaming Document Collections. DD 2009: Proceeings of 15th ACM SIGDD int l Conference on nowlege Discovery an Data Mining, [10] Newman, D., Asuncion, A., Smyth, P. an Welling, M. (2009 Distribute Algorithms for Topic Moels. Journal of Machine Learning Research, 10, [11] Smola, A. an Narayanamurthy, S. (2010 An Architecture for Parallel Topic Moels. Proc. VLDB Enow., 3, [12] Zhu, J., Chen, N., Perins, H. an Zhang, B. (2013 Gibbs Max-Margin Topic Moels with Data Augmentation. CoRR, abs/

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference