Prior-Based Dual Additive Latent Dirichlet Allocation for User-Item Connected Documents

Size: px

Start display at page:

Download "Prior-Based Dual Additive Latent Dirichlet Allocation for User-Item Connected Documents"

Louisa Fields
5 years ago
Views:

1 Proceeings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Prior-Base Dual Aitive Latent Dirichlet Allocation for User-Item Connecte Documents Wei Zhang Jianyong Wang Tsinghua National Laboratory for Information Science an Technology (TNList) Department of Computer Science an Technology, Tsinghua University, Beijing, China Abstract User-item connecte ocuments, such as customer reviews for specific items in online shopping website an user tips in location-base social networks, have become more an more prevalent recently. Inferring the topic istributions of user-item connecte ocuments is beneficial for many applications, incluing ocument classification an summarization of users an items. While many ifferent topic moels have been propose for moeling multiple text, most of them cannot account for the ual role of user-item connecte ocuments (each ocument is relate to one user an one item simultaneously) in topic istribution generation process. In this paper, we propose a novel probabilistic topic moel calle Prior-base Dual Aitive Latent Dirichlet Allocation (PDA-). It aresses the ual role of each ocument by associating its Dirichlet prior for topic istribution with user an item topic factors, which leas to a ocument-level asymmetric Dirichlet prior. In the experiments, we evaluate PDA- on several real atasets an the results emonstrate that our moel is effective in comparison to several other moels, incluing hel-out perplexity on moeling text an ocument classification application. 1 Introuction User-item connecte ocuments, which are written by users for specific items, occur in many scenarios. For example, in online shopping websites (e.g., Amazon 1 ), users prefer to write reviews to express their attitues towars some proucts that they have bought. These reviews have become a major reference for caniate buyers to make ecisions. Another example is that in location-base social networks (e.g., Foursquare 2 ), users always propose tips for some point-of-interests (POIs), which are beneficial for other users to gain knowlege about the POIs they have not visite before. In summary, user-item connecte ocuments own ual roles, with each ocument associate with one user an one item simultaneously. They not only reflect the core concerns of users, but also inicate the characteristics of items. Without specific explanation, we also use ocuments to enote user-item connecte ocuments for simplicity in the rest of this paper. Characterizing content of ocuments is an important problem an has many applications in ifferent fiels, e.g., natural language processing an information retrieval. Topic moels [Hofmann, 1999; Blei et al., 2003b] are elegant ways to solve this problem by assuming each ocument has a topic istribution an each topic is represente as a istribution over wors. By this way, ocuments are summarize by their associate topic istributions in a high level. For user-item connecte ocuments, not only the learne topic istributions are useful as usual, but if the topic factors of users an items can be ifferentiate, then they can be use for creating their self-introuction profiles. These profiles are beneficial for other users to gain knowlege about items an retailers to unerstan what their consumers care about. Further, current popular recommener systems can utilize them to make explainable recommenations through content matching. While many topic moels have been propose in the last ecae, almost all of them are esigne for some specific tasks an o not emphasize the ual role phenomenon of useritem connecte ocuments, let alone automatically infer topic factors of users an items simultaneously. For one user-item connecte ocument, its associate topic istribution shoul be influence by its corresponing user an item together, an for a user or an item, its topics consist in multiple ocuments relate to it. Therefore, the main challenge is to connect user, item, an reviews in a unifie topic moel while ensure ifferent combinations of user an item pair ten to generate istinct topic istributions. Besies, it is better for the topic moel to simultaneously infer user an item topic factors. Existing approaches such as author-topic moel [Rosen-Zvi et al., 2004] can only capture user s topic istribution but ignore item s topics. To aress the above issues, we propose a novel generative probabilistic topic moel calle Prior-base Dual Aitive Latent Dirichlet Allocation (PDA-) to moel the useritem connecte ocuments. This moel is somewhat inspire by [Wallach et al., 2009] which points out that asymmetric Dirichlet priors over topic istributions can lea to aitional benefit for topic moels than symmetric priors. In PDA-, we account for the ual role phenomenon by associating Dirichlet prior of each ocument with their corre- 1405

2 sponing user an item. More specifically, PDA- first assumes each user or item is represente with a topic factor. When a component of a topic factor takes a larger value, it means the corresponing user or item concentrates more on that topic. Then an exponential function is utilize to combine a pair of user an topic factor for each ocument through its aitive property of parameters to form a new Dirichlet prior an thus every component of the prior takes positive value. As a result, each ocument has a ifferent Dirichlet prior. The comparison between priors of Latent Dirichlet Allocation () [Blei et al., 2003b], asymmetric Latent Dirichlet Allocation () [Wallach et al., 2009], an PDA- is shown in Table 1. Table 1: Difference of Dirichlet hyper-priors (α, α R K ). PDA- Dir(α), α i = α j (i, j {1,..., K}) Dir(α), α i α j (i, j {1,..., K}) Dir(α )( {1,..., D }) In short, is assigne a corpus-level symmetric prior, AS- has a corpus-level asymmetric prior, while PDA- owns ocument-level asymmetric priors. The main avantages of PDA- lie in three aspects. First of all, it moels text better by accounting for the ual role in topic generations. Secon, it can automatically summarize users an items by the learne topic factors. An last, it constructs ocumentlevel priors with pairs of user an item topic factors. This enables preiction on test ocuments since after moel learning proceure, user an item topic factors are known for these ocuments. Contributions. In all, the main contributions of this paper are summarize as follows: We are the first to aress the ual role phenomenon for topic generations of topic moels in user-item connecte ocuments to the best of our knowlege. We propose a new generative moel calle PDA- which accounts for the ual role phenomenon by connecting user an item topic factors to Dirichlet priors through exponential aitive transformation. We evaluate the propose moel on several real ata sets to verify the effectiveness of PDA-. The experimental results emonstrate that our moel not only achieves superior hel-out perplexity on test ata, but generates better topics for its goo performance in ocument classification application as well. In what follows, etaile moel specification is introuce in Section 2. Then we provie experimental results an some analysis in Section 3. Relate works are briefly iscusse in Section 4. In the last, we raw a conclusion for this work.. 2 Approach This section will be ivie into three parts. The first part is mainly about the escription of PDA-, incluing the generative process of the moel. Then we erive the Gibbs EM base learning algorithm for the moel. Finally, a concise introuction to preiction on test ata is given. Before we Figure 1: Graphical moel of PDA-. procee, the main mathematical symbols we will use later are shown for reference in Table Moel Description Stanar topic moels such as always assume each ocument relates to a istribution over K latent components calle topics an each topic can be explaine by a istribution over all wors in vocabulary W. Both the topic an wor istributions are sample from a Dirichlet istribution Dir(α) known as prior in Bayesian learning. Our moel follows the basic topic generation process of. The key ifference lies in how we set the parameter α of Dirichlet prior for each useritem connecte ocument. The hyper-parameter α has been emonstrate to influence the result of text moeling [Wallach et al., 2009]. As we iscusse, each user or item has its own characteristics. We assume each user is associate with a topic factor φ u R K an the same for each item, i.e., φ i R K. In topic factors, components with larger value enote users an items concentrate more on those corresponing topics. Topic factors shoul influence generation of topic istributions for ocuments. To brige the gap between them, one natural iea is to associate topic factors with Dirichlet hyper-parameter α as in Bayesian learning framework, prior istribution reveals prior knowlege about how ata will be generate before it really comes into action. This is consistent with the intrinsics of generation of user-item connecte ocuments. Imagine that when a user plans to write something about an item, the first thing he will o is to choose some themes from what he often concentrates on an also conforme to the context of the item. Obviously, the themes are both relevant to the user an the item. These themes can be regare as latent topics stuie in topic moels. For a ocument u,v, we associate user an item topic factors with Dirichlet prior parameter of its topic istribution as follows, = exp(φ u + φ v ) (1) where exponential transformation ensures each component of to be positive, which satisfies the requirements of Dirichlet parameters. So far, topic factors of user an item can 1406

3 Table 2: Notations of main aopte symbols. D W u U v V K λ U λ V λ B φ u φ v φ b α θ η β k z j w j u,v N u,v N u,v,k Mw D Mw,k D User-item connecte ocument set Vocabulary set An user in the user set U An item in the item set V Number of topics Precision of Gaussian prior for user factor φ u Precision of Gaussian prior for item factor φ v Precision of Gaussian prior for backgroun factor φ b Topic factor of user u Topic factor of item v Topic factor of backgroun Dirichlet prior parameter for topic istribution θ Topic istribution of ocument Dirichlet prior parameter for wor istribution β Multinomial istribution over wors of topic k Latent topic assignment for wor w j The j-th wor in ocument The ocument written by user u for item v Number of wors in ocument u,v Number of wors with topic k in ocument Occurrence of wor w in corpus Occurrence of wor w with topic k in corpus influence the generation of topic istribution of ocument u,v through. One natural extension is to consier backgroun topic factor φ b aitionally. It is necessary since some wors such as conjunction wors frequently occur in ocuments. The topic relates to these wors are not specially owne by users or items. Therefore, the complete transformation between topic factors an Dirichlet parameter is efine to be = exp(φ u + φ v + φ b ) (2) Base on the above analysis, we can summarize the generative story of PDA- for user-item connecte ocuments as below, 1. Draw backgroun topic factor φ b N (λ B ). 2. For each user u, raw user topic factor φ u N (λ U ). 3. For each item v, raw item topic factor φ v N (λ V ). 4. For each topic k, raw wor istribution β k Dirichlet(η). 5. For each user-item connecte ocument u,v in ocument collection D, (a) Draw topic istribution θ Dirichlet( ) where is calculate through Equation(2). (b) For each position j in the ocument u,v : (b.1) Sample topic z j Mult(θ u,v). (b.2) Draw wor w j Mult(β zj ). The complete graphical moel representation of our propose moel is provie in Figure 1. The joint probability of a whole corpus D containing user-item connecte ocuments an topic factors Θ t = {φ U, φ V, φ b } is efine as follows, P (D, Θ t) = N (φ b ; 0, λ B) u K Dir(β k ; η) k =1 j N u,v z j N (φ u; 0, λ U ) v D u,v =1 N (φ v; 0, λ V ) Dir(θ u,v ; αu,v ) P (z j θ u,v )P (wj βz j )θu,v β k (3) 2.2 Learning In this work, our goal is to learn optimal topic factors Θ t an key latent variables θ an β by maximizing the log-likelihoo of joint probability shown in Equation (3), L = log ( P (D, Θ t ) ) (4) However, it is intractable to irectly optimize the above function an compute the posterior istribution of θ an β ue to the summarization of latent topics z i in iscrete space an coupling of θ an β. Luckily, if the assignments of latent topics can be inferre for all wors in reviews, then not only θ an β can be calculate, but also Θ t are tractable to be optimize. This intuition leas to the iea of Gibb EM algorithm. Gibbs EM learning algorithm is wiely aopte in (partially) Bayesian latent factor moels [Wallach, 2006; Liu et al., 2013] which alternates between sampling the value of latent variables (Expectation Step) an optimizing some moel parameters (Maximization Step). More specifically, we aopt collapse Gibbs Sampling to etermine all the latent topics in E-step an optimize Θ t through graient ascent in M-step. E-step Given a wor position j in ocument u,v, the key inferential problem in collapse Gibbs sampling is to erive the posterior istribution of its associate topic, i.e., P (z u,v k z u,v, j, wu,v,j = ) where zu,v, j enotes all topic assignments of wors in ocument are known except the wor in position j. As Θ t are fixe in E-step, can be calculate through Equation (2). By applying Bayesian formula an conitional epenence property, it is easy to erive the following formal efinition of the posterior istribution accoring to the results of [Griffiths an Steyvers, 2004], P (z u,v,j = k zu,v, j, wu,v ) = N u,v,k + αu,v,k N + K,k M D w j,k + η Mw D + W η (5) where all the counts o not inclue the current wor. The above equation is very similar to that in stanar topic moel except the Dirichlet prior,k. M-step After getting the latent topic assignments for all wors in reviews, the objective log-likelihoo of joint probability becomes P (D, Z, Θ t ) an thus the summarization term in Equation (3) vanishes. Due to the conjugate prior property of Dirichlet multinomial istribution [Heinrich, 2004; Griffiths an Steyvers, 2004], θ an β can be integrate out an P (D, Z, Θ t ) is now reformulate as below, P (D, Z, Θ t ) = N (φ b ; 0, λ B ) u K u,v k=1 Γ(N,k + αu,v,k ) Γ(N u,v u,v K k=1 u,v w Γ(N N (φ u ; 0, λ U ) N (φ v ; 0, λ V ) v + Γ( K k=1 αu,v,k ) K k=1 αu,v,k ) K k=1 Γ(αu,v,k ),k,w + η) Γ(N u,v Γ( W η),k + W η) w Γ(η) (6) 1407

4 where Γ is Gamma function with form Γ(x) = (x 1)! when x is an integer. By maximizing the log-likelihoo of the above joint probability with graient ascent algorithm, we can get optimal Θ t. Graient ascent consists of two steps. The first step is to euce the graients of the parameters. For example, the graient of φ u,k is L φ u,k = u u,v,k + Ψ(N u,v λ U φ u,k,k + αu,v,k ( Ψ( k,k ) Ψ(,k ) u,v ) Ψ(N + ),k ) k where Ψ is Digamma function which is equal to logarithmic erivative of the Gamma function. The graients of φ v,k an φ b,k are analogous to φ u,k except the summarization conition an regularization term. We shoul emphasize although for one review, φ u,k an φ v,k are similar, ifferent reviews connect to ifferent users an items an thus the upates for the users an items are istinct from a whole perspective. Base on the obtaine graients, the secon step of graient ascent is to upate moel parameters through (7) Θ t+1 = Θ t + ω L Θ t (8) where ω is the learning rate of the algorithm. Superscript t enotes the finishe number of iterations. Besies, after the iterative learning process converges, we can calculate β an θ by collecting enough subsequent samples though a burn-in process as [Blei et al., 2003a], t β k,w = M t,w,k D + η t M w D (9) + W η θ,k = t N u,v t,,k + αu,v,k t N + k αu,v,k (10) where N u,v t,,k enotes the samples collecte in the t iteration. In summary, the whole learning algorithm for PDA- is conclue in Algorithm Preiction When the PDA- moel is utilize for ocument moeling on test ataset, Θ t an β shoul be fixe to be the optimal values learne from training set. The only important step is to infer latent topic assignments over test ocuments by sampling from a resembling formula as Equation (5), which aitionally incorporates relevant count variables of test ocuments. 3 Experiments In this section, we first introuce the atasets we use an the preprocessing steps for them. Then we iscuss the aopte comparison moels an the hyper-parameter setting of PDA-. Finally, we analyze the experimental results to emonstrate the effectiveness of our metho. Algorithm 1: The Gibbs EM Algorithm for PDA- Input: User-item connect ocument collection D. Output: Optimal user, item, an backgroun topic factors Θ t, ocument-topic istribution θ, topic-wor istribution β 1 Initialize hyper-parameters an Θ t 2 Ranomly raw latent topic assignments of wors in ocuments 3 tt = 0 4 while Not converge an tt Iter max o 5 E-step: 6 (a) Sampling all wors latent topic assignments 7 through Equation (5) 8 M-step: 9 (a) Calculate the graients through Equation (7) 10 (b) Upate Θ t through Equation (8) 11 tt tt Calculate θ an β through Equation (9) an (10) uner a burn-in process 3.1 Dataset We aopt three real ata collections from Yelp 3 an [McAuley an Leskovec, 2013]. Base on their origins, we enominate the three ata sets as Yelp, AmazonFoo an AmazonSport, respectively. To clean text ata, we aopt the following four processing steps: (1) converting all letters into lowercase, (2) filtering stop wors 4 an punctuations, (3) removing reviews which are too short, an (4) saving frequent wors to form a vocabulary. After cleaning text ata, we further remove users an items with less than 5 reviews to ensure that users an items have enough relate ocuments. Finally, we obtain the experimental ata sets whose basic statistics are shown in Table 3. We ranomly ivie the two collections into train, valiation, an test set with the ratio 7 : 1 : 2 for testing hel-out perplexity an further binary ocument classification task. Table 3: Introuction of Experimental Datasets. Data User Item Reviews Length Yelp (wors) AmazonFoo (wors) AmazonSport (wors) 3.2 Comparison Moels To verify the effectiveness of our moel, we first introuce the comparison moels aopte in the experiments. Latent Dirichlet Allocation (): Although [Blei et al., 2003b] can neither capture author topics, nor obtain item topics, it is still necessary to analyze its result to verify the effectiveness of an PDA- for exploring asymmetric Dirichlet prior as we mentione in Section challenge

5 Perplexity Dimension PDA- (a) Change Perplexity on Yelp. Perplexity Dimension PDA- (b) Change Perplexity on AmazonFoo. Perplexity Dimension PDA- (c) Change Perplexity on AmazonSport. Figure 2: Result in terms of Perplexity with change of topic components. Author-topic Moel (ATM): We aapt ATM [Rosen-Zvi et al., 2004] to our problem by regaring users as authors in the moel. ATM somewhat resembles PDA- as both of them can obtain users topics. However, ATM cannot hanle the ouble role phenomenon in user-item connecte ocuments. Intuitively, it will perform worse than the new moel. Gibbs Sampler for Dirichlet Multinomial (GSDMM): GSDMM [Yin an Wang, 2014] is a stanar topic moel for hanling short text such as tweets in social meia by assuming all wors in a ocument have the same topic an use as a comparison here ue to the short length of the ocuments shown in Table 3, Asymmetric Latent Dirichlet Allocation (): AS- [Wallach et al., 2009] is a strong comparison as it aresses popularity bias for each topic by incorporating asymmetric Dirichlet prior. It can be regare as the most similar metho to PDA- which only consiers the backgroun topic factor for topic generation process. 3.3 Parameter Setting All the hyper-parameters are etermine base on their performances on valiation atasets. For all the comparison methos an PDA-, we assign 0.1 to η. For comparisons except, we choose α to be 0.1 as well for its goo performance. The concentration parameter α in is tunne to be 0.1. Apart from η, λ U, λ V, an λ b are set to be 1 for PDA- uniformly. 3.4 Results Results on Text Moeling We first compare all the aopte moels in terms of perplexity on test atasets. Perplexity is wiely use in probabilistic moel for checking their quality. It is irectly relevant to log-likelihoo of probability an normally efine as below, Perplexity(D test ) = exp ( D test log P () D test N ) (11) The etaile results are shown in Table 4. We notice that GSDMM performs not very well in user-item connecte ocument moeling although the average length of the ocuments is short. One reason is that user-item connecte ocuments are still longer than tweets in social meia an very short text is more likely to contain only one topic. Table 4: Comparisons of ifferent moels in terms of Perplexity on test ata with K = 40. Metho Yelp AmazonFoo AmazonSport ATM GSDMM PDA ATM ose not show promising results although it moels the role of users in topic generation. This may be explaine by the fact that ATM is originally esigne for the scenario where each ocument has multiple authors (e.g., research papers). As each user-item connecte ocument is uniquely owne by one user, the ATM may not hanle it very well. outperforms consistently in three atasets. This makes sense that asymmetric Dirichlet prior captures popularity bias of topics in generation process from a corpus level while ignores the bias. It emonstrates that the iea of moifying hyper-parameters of Dirichlet prior for topic istribution is promising an also provies evience for PDA- to further ifferentiate prior parameters. It is clear that PDA- achieves superior performances among all the aopte moels. Especially, PDA- behaves better than the strong comparison significantly. This inicates that asymmetric Dirichlet prior which capture topic concentration bias from more graine ocument level is better than from corpus level. Moreover, we iscover gains a minor improvement over in the last ataset, while the ecrease of perplexity of PDA- is prominent, which inicates our moel has wier applicability. Dimension Influence on Text Moeling We analyze the performance evolution with change of number of latent components from 10 to 100 with a step size of 10. We only compare,, an PDA- as their results are closer than the other two moels. As Figure 2 shown, the perplexity of all the three moels ecrease steaily with increase of topic components. We choose K = 40 for other experiments in this work by consiering a trae-off between efficiency an effectiveness as larger number of components costs more time to learn moels. Besies, the perplexity if- 1409

6 Table 5: Comparisons of ifferent methos in terms of Accuracy, Precision, an F1 Metric on classification task with K=40. Metho Yelp AmazonFoo AmazonSport Accuracy Precision F1 Accuracy Precision F1 Accuracy Precison F PDA ferences between these moels are relatively evient when K 40 an thus the comparisons in experiments will not be influence. In general, PDA- outperforms an regarless of how the number of topic components varies. Application to Document Classification As the perplexity oes not correlate with human jugments, it may be better to test topic moels when using its generate topics for other tasks [Chang et al., 2009]. In this work, we apply,, an PDA- to binary ocument classification task by utilizing topic istributions learne from them as features for supervise classification moels. We shoul emphasize our goal here is to test the quality of learne topics, but not to compare with state-of-the-art ocument classification methos. On the other han, features of topic moels can be regare as a complementary for other supervise methos. Base on the rating scores associate with the three atasets, we ivie all the ocuments of training, valiation, an test sets into two classes. The critical value between for all atasets is specifie to be 3.5 ([1-5]). The number of ocuments in each class are shown in Table 6. Table 6: Statistics of classification atasets. Data 1st-class 2n-class Yelp AmazonFoo AmazonSport We compare,, an PDA- in terms of average accuracy, precision, an F1 metrics. Among them, precision an F1 are first compute in each class an then take weighte average over two classes. We aopt stanar supervise classifier, i.e., Support Vector Machine (SVM), in the experiments. As the results shown in Table 5, PDA- achieves best performance across all the atasets. Hence the quality of topics generate by PDA- turns out to be better than those of the other moels. Besies, although gains lower perplexity than in Yelp ataset, it oes not perform better in terms of classification metrics. This reveals obtaining lower perplexity oes not ensure better iscriminative power of features for classification performance efinitely, which also reveals the robustness of PDA-. Case Stuy To give an intuitive interpretation of PDA-, we ranomly sample two users an two items to reveal their topic factors in Figure 3, an show their representative topics (topic inex correspons to the largest value for each user an item topic factor) learne from AmazonFoo ata in Table 7. We fin that the sample users an items only focus on several topics, for many values less than 0. Thus, they can be characterize by a few critical topics which have clear inications. For example, user-2 likes chips very much. Weight Topic Inex USER-1 USER-2 ITEM-1 ITEM-2 Figure 3: Sample user an item topic factors. Table 7: Selecte topics from AmazonFoo ata set. T3: amazon, price, store, buy, local, orer, prouct, fin, bought, shipping T9: foo, cat, cats, chicken, eat, tuna, canne, fish, cans, flavors T25: chips, flavor, salt, potato, sweet, bag, taste, chip, popchips, goo T27: bars, bar, fat, calories, snack, fiber, grams, protein, healthy, sugar 4 Relate Work Topic moels are popularly utilize for moeling text. Different topic moels concentrate on ifferent aspects of text. For example, [Blei an McAuliffe, 2007] exploite the labels corresponing to ocuments to construct their supervise topic moel. The authors in [Mei et al., 2008] consiere the link relations between ocuments an assume linke ocuments have similar topic istributions. Recently, many topic moels have been propose for moeling review text [Titov an McDonal, 2008; Zhao et al., 2010; Sauper et al., 2011; Moghaam an Ester, 2011; Mukherjee et al., 2014]. Many of them are esigne for sentiment analysis, incluing aspects mining an rating preiction. While our moel is not esigne specially for review text, but a wier range of text, i.e., user-item connecte ocuments an we concentrate on accounting for the ual role in the topic generation process of the text which are ignore in these moels. Collaborative topic moel (CTM) [Wang an Blei, 2011] consiere the user an item information simultaneously for preicting ratings of users for items. Nevertheless, its topic generation process is the similar to, which oes not take the ual role into consieration. Author-topic moel [Rosen-Zvi et al., 2004] is somewhat similar with our moel as it can obtain the topic 1410

7 istribution of users. However, the topic generation is only influence by users an it still cannot moel the ual role phenomenon. Many papers eal with ual roles in ifferent research fiels, such as the ual role of hashtag in social networks [Yang et al., 2012] an users in question answering [Xu et al., 2012]. However, few previous works emphasize the ual role phenomenon in topic generation process of topic moels. 5 Conclusion In this paper, we propose a new probabilistic topic moel calle PDA- to account for the ual role phenomenon of user-item connecte ocuments. PDA- moels the topic generation process through the joint effect of user, item, an backgroun topic factors on the Dirichlet prior of topic istribution. A Gibbs EM base learning algorithm is erive for the new moel to learn optimal topic factors. The experimental results on real ata collections have shown that PDA- achieves better hel-out perplexity an binary classification accuracy than several other moels. Acknowlegments This work was supporte in part by National Basic Research Program of China (973 Program) uner Grant No. 2011CB302206, National Natural Science Founation of China uner Grant No , an Tsinghua University Initiative Scientific Research Program. References [Blei an McAuliffe, 2007] Davi M. Blei an Jon D. McAuliffe. Supervise topic moels. In in Proc. of NIPS 07, Vancouver, British Columbia, Canaa, December 3-6, 2007, pages , [Blei et al., 2003a] Davi M. Blei, Thomas L. Griffiths, Michael I. Joran, an Joshua B. Tenenbaum. Hierarchical topic moels an the neste chinese restaurant process. In in Proc. of NIPS 03 December 8-13, 2003, Vancouver an Whistler, British Columbia, Canaa], pages 17 24, [Blei et al., 2003b] Davi M. Blei, Anrew Y. Ng, an Michael I. Joran. Latent irichlet allocation. Journal of Machine Learning Research, 3: , [Chang et al., 2009] Jonathan Chang, Joran L. Boy-Graber, Sean Gerrish, Chong Wang, an Davi M. Blei. Reaing tea leaves: How humans interpret topic moels. In in Proc. of NIPS 09. Proceeings of a meeting hel 7-10 December 2009, Vancouver, British Columbia, Canaa., pages , [Griffiths an Steyvers, 2004] Thomas L. Griffiths an Mark Steyvers. Fining scientific topics. PNAS, 101(suppl. 1): , [Heinrich, 2004] Gregor Heinrich. Parameter estimation for text analysis. Technical report, [Hofmann, 1999] Thomas Hofmann. Probabilistic latent semantic inexing. In in Proc. of SIGIR 99, August 15-19, 1999, Berkeley, CA, USA, pages 50 57, [Liu et al., 2013] Bin Liu, Yanjie Fu, Zijun Yao, an Hui Xiong. Learning geographical preferences for point-of-interest recommenation. in Proc. of KDD 13, pages , New York, NY, USA, ACM. [McAuley an Leskovec, 2013] Julian J. McAuley an Jure Leskovec. Hien factors an hien topics: unerstaning rating imensions with review text. In in Proc. of RecSys 13, Hong Kong, China, October 12-16, 2013, pages , [Mei et al., 2008] Qiaozhu Mei, Deng Cai, Duo Zhang, an ChengXiang Zhai. Topic moeling with network regularization. In in Proc. of WWW 08, Beijing, China, April 21-25, 2008, pages , [Moghaam an Ester, 2011] Samaneh Moghaam an Martin Ester. I: interepenent moel for learning latent aspects an their ratings from online prouct reviews. In in Proc. of SIGIR 11, Beijing, China, July 25-29, 2011, pages , [Mukherjee et al., 2014] Subhabrata Mukherjee, Gaurab Basu, an Sachinra Joshi. Joint author sentiment topic moel. In in Proc. of SDM 14, Philaelphia, Pennsylvania, USA, April 24-26, 2014, pages , [Rosen-Zvi et al., 2004] Michal Rosen-Zvi, Thomas L. Griffiths, Mark Steyvers, an Pahraic Smyth. The author-topic moel for authors an ocuments. In in Proc. of UAI 04, Banff, Canaa, July 7-11, 2004, pages , [Sauper et al., 2011] Christina Sauper, Aria Haghighi, an Regina Barzilay. Content moels with attitue. In in Proc. of ACL 11, June, 2011, Portlan, Oregon, USA, pages , [Titov an McDonal, 2008] Ivan Titov an Ryan T. McDonal. Moeling online reviews with multi-grain topic moels. In in Proc. of WWW 08, Beijing, China, April 21-25, 2008, pages , [Wallach et al., 2009] Hanna M. Wallach, Davi M. Mimno, an Anrew McCallum. Rethinking : why priors matter. In in Proc. of NIPS 09. December 7-10, 2009, Vancouver, British Columbia, Canaa., pages , [Wallach, 2006] Hanna M. Wallach. Topic moeling: Beyon bagof-wors. ICML 06, pages , New York, NY, USA, ACM. [Wang an Blei, 2011] Chong Wang an Davi M. Blei. Collaborative topic moeling for recommening scientific articles. In Proceeings of the 17th ACM SIGKDD International Conference on Knowlege Discovery an Data Mining, San Diego, CA, USA, August 21-24, 2011, pages , [Xu et al., 2012] Fei Xu, Zongcheng Ji, an Bin Wang. Dual role moel for question recommenation in community question answering. In in Proc. of SIGIR 12, Portlan, OR, USA, August 12-16, 2012, pages , [Yang et al., 2012] Lei Yang, Tao Sun, Ming Zhang, an Qiaozhu Mei. We know #tag: oes the ual role affect hashtag aoption? In in Proc. of WWW 12, Lyon, France, April 16-20, 2012, pages , [Yin an Wang, 2014] Jianhua Yin an Jianyong Wang. A irichlet multinomial mixture moel-base approach for short text clustering. In in Proc. of KDD 14, New York, NY, USA - August 24-27, 2014, pages , [Zhao et al., 2010] Wayne Xin Zhao, Jing Jiang, Hongfei Yan, an Xiaoming Li. Jointly moeling aspects an opinions with a maxent-la hybri. In in Proc. of EMNLP 10, 9-11 October 2010, MIT Stata Center, Massachusetts, USA, A meeting of SIG- DAT, a Special Interest Group of the ACL, pages 56 65,

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example