A generic approach to topic models and its application to virtual communities

Size: px
Start display at page:

Download "A generic approach to topic models and its application to virtual communities"

Transcription

1 A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English translation, 45min) Faculty of Mathematics and Computer Science University of Leipzig 28 November 2012 Version 2.9 Gregor Heinrich A generic approach to topic models 1 / 39

2 A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English translation, 45min) Faculty of Mathematics and Computer Science University of Leipzig 28 November 2012 Version 2.9 Gregor Heinrich A generic approach to topic models 1 / 39

3 Overview Introduction Generic topic models Inference methods Application to virtual communities Conclusions and outlook Gregor Heinrich A generic approach to topic models 2 / 39

4 Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the Motivation: Virtual communities cooperation recommendation authorship authorship annotation annotation authorship citation citation similarity Virtual communities = groups of persons who exchange information and knowledge electronically Examples: organisations, digital libraries, Web 2.0 applications incl. social networks Data are multimodal: text content; authorship, citation, annotations and recommendations; cooperation and other social relations Typical case: discrete data with high dynamics and large volumes Gregor Heinrich A generic approach to topic models 3 / 39

5 Motivation: Unsupervised mining of discrete data Identification of relationships in large data volumes Only data (and possibly model) required (information retrieval, network analysis, clustering, NLP methods) Density problem: Features too sparse for analysis in high-dimensional feature space Vocabulary problem: Semantic similarity lexical similarity (polysemy, synonymy, etc.) 4/24/12 1:58 AM teller restaurant adhere network bank computer atm pressure unit bar stick location long object verb verb glue long object staff employees people counter furniture verb judicial assembly court people table furniture prevent location yard personnel Gregor Heinrich A generic approach to topic models 4 / 39

6 Motivation: Unsupervised mining of discrete data Identification of relationships in large data volumes Only data (and possibly model) required (information retrieval, network analysis, clustering, NLP methods) Density problem: Features too sparse for analysis in high-dimensional feature space Vocabulary problem: Semantic similarity lexical similarity (polysemy, synonymy, etc.) 4/24/12 1:58 AM teller restaurant adhere network bank computer atm pressure unit bar stick location long object verb verb glue long object staff employees people counter furniture verb judicial assembly court people table furniture prevent location yard personnel Gregor Heinrich A generic approach to topic models 4 / 39

7 Topic models as approach words documents Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

8 Topic models as approach words topics documents Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

9 Topic models as approach rhythm drum bar words topics documents Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

10 Topic models as approach rhythm drum bar restaurant wine words topics documents Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

11 Topic models as approach rhythm drum bar restaurant Playing Drums for Beginners Rhythm & Spice Jamaican Grill wine words topics documents Leipzig s Bars and Restaurants Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

12 Topic models as approach rhythm drum bar restaurant Playing Drums for Beginners Rhythm & Spice Jamaican Grill wine words topics documents Leipzig s Bars and Restaurants Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

13 Topic models as approach words topics documents Probabilistic representations of grouped discrete data Illustrative for text: Words grouped in documents Latent Topics = Probability distributions over vocabulary. Dominant terms of a topic are semantically similar. Language = Mixture of topics (latent semantic structure) Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction Gregor Heinrich A generic approach to topic models 5 / 39

14 Language models: Unigram model p(w z) z z w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 w m,n } {{ } document 1 } {{ } document 2 word n document m One distribution for all data Gregor Heinrich A generic approach to topic models 6 / 39

15 Language models: Unigram mixture model p(w z 1 ) p(w z 2 ) z 1 z 2 z m w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 w m,n } {{ } document 1 } {{ } document 2 word n document m One distribution per document Gregor Heinrich A generic approach to topic models 6 / 39

16 Language models: Unigram admixture model p(w z 1,1 ) p(w z 2,3 ) z 1,1 z 1,2 z 1,3 z 2,1 z 2,2 z 2,3 z m,n w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 w m,n } {{ } document 1 } {{ } document 2 word n document m One distribution per word basic topic model Gregor Heinrich A generic approach to topic models 6 / 39

17 Language models: Unigram admixture model p(w z 1,1 ) p(w z 2,3 ) z 1,1 z 1,2 z 1,3 z 2,1 z 2,2 z 2,3 z m,n w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 w m,n } {{ } document 1 } {{ } document 2 word n document m One distribution per word basic topic model Gregor Heinrich A generic approach to topic models 6 / 39

18 Bayesian topic models: The Dirichlet distribution p(w z) Dir( α) p 3 α = (4, 4, 2) p 1 p 2 ϑ3 = 1 ϑ3 = 1 ϑ3 = 1 Bayesian methodology: Distributions generated from prior distributions Speech + other discrete data: Dirichlet distribution important prior: Defined on simplex: Surface containing all discrete distributions Parameter α controls behaviour Bayesian topic model: Latent Dirichlet Allocation (LDA) (Blei et al. 2003) ϑ1 = 1 ϑ2 = 1 ϑ1 = 1 ϑ2 = 1 ϑ1 = 1 ϑ2 = 1 Gregor Heinrich A generic approach to topic models 7 / 39

19 Latent Dirichlet Allocation α ϑ m Dir(α) z m,n β ϕ k w m,n Dir(β) topic k Latent Dirichlet Allocation (Blei et al. 2003) word n document m Gregor Heinrich A generic approach to topic models 8 / 39

20 Latent Dirichlet Allocation Concert tonight at Rhythm and Spice Restaurant... α ϑ m Dir(α) z m,n β ϕ k w m,n Dir(β) topic k Latent Dirichlet Allocation (Blei et al. 2003) word n document m Gregor Heinrich A generic approach to topic models 8 / 39

21 Latent Dirichlet Allocation Concert tonight at Rhythm and Spice Restaurant... α ϑ m Dir(α) topic 1 Dir(β) ϕ 1 ϕ 2 restaurant bar food topic 2 grill concert music rhythm bar β ϕ k topic k Generating word distributions for all topics z m,n w m,n word n document m Gregor Heinrich A generic approach to topic models 8 / 39

22 Latent Dirichlet Allocation Concert tonight at Rhythm and Spice Restaurant... ϑ 1 document 1 topic 1 topic 2... α ϑ m Dir(α) topic 1 Dir(β) ϕ 1 ϕ 2 restaurant bar food topic 2 grill concert music rhythm bar β ϕ k topic k Generating topic distribution for document z m,n w m,n word n document m Gregor Heinrich A generic approach to topic models 8 / 39

23 Latent Dirichlet Allocation Concert tonight at Rhythm and Spice Restaurant... ϑ 1 document 1 topic 1 topic 2... α ϑ m Dir(α) topic 1 Dir(β) ϕ 1 ϕ 2 restaurant bar food topic 2 grill concert music rhythm bar β ϕ k topic k Sampling the topic index for first word, z = 2 z m,n w m,n word n document m Gregor Heinrich A generic approach to topic models 8 / 39

24 Latent Dirichlet Allocation Concert tonight at Rhythm and Spice Restaurant... ϑ 1 document 1 topic 1 topic 2... α ϑ m Dir(α) topic 1 Dir(β) ϕ 1 ϕ 2 restaurant bar food topic 2 grill concert music rhythm bar β ϕ k topic k z m,n w m,n word n document m Sampling a word from term distribution for topic 2, concert Gregor Heinrich A generic approach to topic models 8 / 39

25 State of the art Large number of published models that extend LDA: Authors (Rosen-Zvi et al. 2004), Citations (Dietz et al. 2007), Hierarchy (Li and McCallum 2006; Li et al. 2007), Image features and captions (Barnard et al. 2003) etc. Results for topic model (title + abstract) only since 2012: ACM >400, Google Scholar >1300. Expanding research area with practical relevance But: No existing analysis as generic model class Partly tedious derivation, especially for inference algorithms Conjecture: Important properties generic across models Simplifications for derivation of concrete model properties, inference algorithms and design methods Gregor Heinrich A generic approach to topic models 9 / 39

26 State of the art Large number of published models that extend LDA: Expert tag topic model Authors (Rosen-Zvi et al. 2004), Citations (Dietz et al. 2007), Hierarchy (Li and a m α McCallum 2006; Li et al. 2007), Image features and captions (Barnard et al. 2003) etc. Results for topic model x (title [1, A] + abstract) only since 2012: ACM >400, γgoogle y m, j Scholar z m,n >1300. β Expanding research c m, j w m,n area ϕ with k practical relevance k [1, K] But: No existing j [1, Jm] n [1, Nm] analysis k as [1, K] generic model class Partly tedious derivation, especially for inference algorithms Conjecture: ψ k (Heinrich 2011) x m, j x m,n m [1, M] Important properties generic across models ϑ x Simplifications for derivation of concrete model properties, inference algorithms and design methods Gregor Heinrich A generic approach to topic models 9 / 39

27 State of the art Large number of published models that extend LDA: Authors (Rosen-Zvi et al. 2004), Citations (Dietz et al. 2007), Hierarchy (Li and McCallum 2006; Li et al. 2007), Image features and captions (Barnard et al. 2003) etc. Results for topic model (title + abstract) only since 2012: ACM >400, Google Scholar >1300. Expanding research area with practical relevance But: No existing analysis as generic model class Partly tedious derivation, especially for inference algorithms Conjecture: Important properties generic across models Simplifications for derivation of concrete model properties, inference algorithms and design methods Gregor Heinrich A generic approach to topic models 9 / 39

28 State of the art Large number of published models that extend LDA: Authors (Rosen-Zvi et al. 2004), Citations (Dietz et al. 2007), Hierarchy (Li and McCallum 2006; Li et al. 2007), Image features and captions (Barnard et al. 2003) etc. Results for topic model (title + abstract) only since 2012: ACM >400, Google Scholar >1300. Expanding research area with practical relevance But: No existing analysis as generic model class Partly tedious derivation, especially for inference algorithms Conjecture: Important properties generic across models Simplifications for derivation of concrete model properties, inference algorithms and design methods Gregor Heinrich A generic approach to topic models 9 / 39

29 Research questions How can topic models be described in a generic way in order to use their properties across particular applications? Can generic topic models be implemented generically and, if so, can repeated structures be exploited for optimisations? How can generic models be applied to data in virtual communities? Gregor Heinrich A generic approach to topic models 10 / 39

30 Overview Introduction Generic topic models Inference methods Application to virtual communities Conclusions and outlook Gregor Heinrich A generic approach to topic models 11 / 39

31 How can topic models be described in a generic way in order to use their properties across particular applications? Gregor Heinrich A generic approach to topic models 12 / 39

32 Topic models: Example structures ζm α ϑm cm,n zm,n α ϑc zm,n c [1,C] β ϕk wm,n β ϕk wm,n k [1,K] n [1,Nm] k [1,K] n [1,Nm] m [1,M] m [1,M] (a) Latent Dirichlet allocation (LDA) (b) Author topic model (ATM) z 1 m,n α0 ϑm,0 z T m,n α r ϑ r m z 2 m,n αt ϑm,t z t m,n lm,n ζt,t γ αx ϑm,x z 3 m,n T [1, T ] T,t [1, T t ] y [1,K] β ϕk wm,n β ϕk wm,n k [1, t + T +1] l [1,L] n [1,Nm] n [1,Nm] m [1,M] m [1,M] (c) Pachinko allocation model (PAM4) (d) Hierarchical PAM (hpam) (Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007) Gregor Heinrich A generic approach to topic models 13 / 39

33 Generic topic models NoMMs θ 1 α θ 2 θ k Dir( θ k α) θ 3 x in k = f (x in) K x out Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

34 Generic topic models NoMMs α α α θ k θ k θ k K k = f (x x in1 in1, x in2 ) x out x in k = f (x in) K x out x in k = f (x in ) K x out1 x in2 x out2 Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

35 Generic topic models NoMMs α α α θ k K θ k θ k k = f (x x in1 in1, x in2 ) k = f (x in ) K x 1 x 2 k = f (x in ) K x out1 x in2 x out2 Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

36 Generic topic models NoMMs x in1 x θ 1 k α x in2 θ k α x 2 θ k α x out1 x out2 Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

37 Generic topic models NoMMs Dir( θ k α) Dir( θ k α) Dir( θ k α) x in1 x θ 1 k α x in2 θ k α x 2 θ k α x out1 x out2 Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

38 Generic topic models NoMMs LDA Dir( ϑ k α) Dir( ϕ k α) m ϑ m α z m,n ϕ k β w m,n Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

39 Generic topic models NoMMs LDA Dir( ϑ k α) Dir( ϕ k α) m ϑ m α z m,n ϕ k β w m,n Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

40 Generic topic models NoMMs LDA ϑ 1 Dir( ϑ k α) Dir( ϕ k α) m=1 ϑ m α z m,n ϕ k β w m,n Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

41 Generic topic models NoMMs LDA ϑ 1 Dir( ϑ k α) Dir( ϕ k α) m=1 ϑ m α z 1,1 =3 ϕ k β w m,n Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

42 Generic topic models NoMMs LDA ϕ 3 Dir( ϑ k α) Dir( ϕ k α) m=1 ϑ m α z 1,1 =3 ϕ k β w m,n Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

43 Generic topic models NoMMs LDA ϕ 3 Dir( ϑ k α) Dir( ϕ k α) m=1 ϑ m α z 1,1 =3 ϕ k β w 1,1 =2 Generic characteristics of topic models: Levels with discrete components θ k, generated from Dirichlet distributions Coupling via values of discrete variables x Network of mixed membership (NoMM): Compact representation for topic models Directed acyclical graph Node: sample from mixture component, selection via incoming edges; terminal node: observation Edge: propagation of discrete values to child nodes. Gregor Heinrich A generic approach to topic models 14 / 39

44 l [1,L] k [1, t + T +1] y [1,K] T [1, T ] n [1,Nm] m [1,M] n [1,Nm] m [1,M] T,t [1, T t ] Topic models as NoMMs m [M] ϑ m α z m,n =k [K] w m,n =t ϕ k β [V] (a) Latent Dirichlet allocation, LDA α ϑm zm,n β ϕk wm,n k [1,K] n [1,Nm] m [M] a m x m,n =x [A] ϑ x α z m,n =k [K] ϕ k β w m,n =t [V] m [1,M] ζm cm,n (b) Author topic model, ATM α ϑc c [1,C] zm,n m [M] ϑ r m αr z 2 m,n =x [s1] z ϑ 3 m,n m,x α =y w m,n =t x ϑ [s2] y β [V] β ϕk wm,n z 1 m,n m [M] m [M] z T m,n =T (c) Pachinko allocation model, PAM ϑ m,0 α 0 [ T ] ϑ m,t α T z t m,n =t [ t ] z T m,n=t l m,n ζ T,t γ [3] w m,n ϕ l,t,t β [V] [ T + t +1] β α0 αt α r αx ϕk ϑm,0 ϑm,t ϑ r m ϑm,x z 2 m,n z 3 m,n wm,n z T m,n z t m,n lm,n ζt,t γ (d) Hierarchical pachinko allocation model, hpam β ϕk wm,n (Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007) Gregor Heinrich A generic approach to topic models 15 / 39

45 Overview Introduction Generic topic models Inference methods Application to virtual communities Conclusions and outlook Gregor Heinrich A generic approach to topic models 16 / 39

46 Can generic topic models be implemented generically...? Gregor Heinrich A generic approach to topic models 17 / 39

47 Bayes sches Inferenzproblem und Gibbs-Sampler Bayesian inference: inversion of generative process : Find distributions over parameters Θ and latent variables/topics H, given observations V and Dirichlet parameters A = Determine posterior distribution p(h, Θ V, A) Intractability approximative approaches Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC) In topic models: Marginalise parameters Θ ( Collapsed GS) Sample topics H i for each data point i in turn: H i p(h i H i, V, A) m x y w ϑ m r α r ϑ m,x α ϕ y β Θ 1 H 1 Θ 2 H 2 Θ 3 V Gregor Heinrich A generic approach to topic models 18 / 39

48 Bayes sches Inferenzproblem und Gibbs-Sampler Bayesian inference: inversion of generative process : Find distributions over parameters Θ and latent variables/topics H, given observations V and Dirichlet parameters A = Determine posterior distribution p(h, Θ V, A) Intractability approximative approaches Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC) In topic models: Marginalise parameters Θ ( Collapsed GS) Sample topics H i for each data point i in turn: H i p(h i H i, V, A) m x y w ϑ m r α r ϑ m,x α ϕ y β Θ 1 H 1 Θ 2 H 2 Θ 3 V Gregor Heinrich A generic approach to topic models 18 / 39

49 Bayes sches Inferenzproblem und Gibbs-Sampler Bayesian inference: inversion of generative process : Find distributions over parameters Θ and latent variables/topics H, given observations V and Dirichlet parameters A = Determine posterior distribution p(h, Θ V, A) Intractability approximative approaches Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC) In topic models: Marginalise parameters Θ ( Collapsed GS) Sample topics H i for each data point i in turn: H i p(h i H i, V, A) m??? ϑ r m α r?? ϑ m,x α ϕ y β w p( Θ 1, H 1, Θ 2, H 2, Θ 3 V, A) Gregor Heinrich A generic approach to topic models 18 / 39

50 Bayes sches Inferenzproblem und Gibbs-Sampler Bayesian inference: inversion of generative process : Find distributions over parameters Θ and latent variables/topics H, given observations V and Dirichlet parameters A = Determine posterior distribution p(h, Θ V, A) Intractability approximative approaches Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC) In topic models: Marginalise parameters Θ ( Collapsed GS) Sample topics H i for each data point i in turn: H i p(h i H i, V, A) m??? ϑ r m α r?? ϑ m,x α ϕ y β w p( Θ 1, H 1, Θ 2, H 2, Θ 3 V, A) Gregor Heinrich A generic approach to topic models 18 / 39

51 Bayes sches Inferenzproblem und Gibbs-Sampler Bayesian inference: inversion of generative process : Find distributions over parameters Θ and latent variables/topics H, given observations V and Dirichlet parameters A = Determine posterior distribution p(h, Θ V, A) Intractability approximative approaches Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC) In topic models: Marginalise parameters Θ ( Collapsed GS) Sample topics H i for each data point i in turn: H i p(h i H i, V, A) m ϑ r m α r?? ϑ m,x α ϕ y β w H 1 i, H2 i p( Hi 1, Hi 2 H i, V, A) Gregor Heinrich A generic approach to topic models 18 / 39

52 Sampling distribution for NoMMs m x y w ϑ m r α r ϑ m,x α ϕ y β Gibbs sampler can be generically derived (Heinrich 2009) Typical case: Quotients of factors over levels l: p(h i H i, V, A) l n i k,t + α k,t + α t n i n k,t = count of co-occurrences between input and output values of a level (components and samples) More complex variants covered by q(k, t) beta({n k,t} T t=1 + α) beta({n i k,t }T t=1 + α) [l] Gregor Heinrich A generic approach to topic models 19 / 39

53 Sampling distribution for NoMMs m x y w ϑ m r α r ϑ m,x α ϕ y β Gibbs sampler can be generically derived (Heinrich 2009) Typical case: Quotients of factors over levels l: p(h i H i, V, A) l n i k,t + α k,t + α t n i n k,t = count of co-occurrences between input and output values of a level (components and samples) More complex variants covered by q(k, t) beta({n k,t} T t=1 + α) beta({n i k,t }T t=1 + α) [l] Gregor Heinrich A generic approach to topic models 19 / 39

54 Sampling distribution for NoMMs m x y w q(m, x) q((m, x), y) q(y, w) Gibbs sampler can be generically derived (Heinrich 2009) Typical case: Quotients of factors over levels l: p(h i H i, V, A) l n i k,t + α k,t + α t n i [l] [ ] = q(k, t) [l] n k,t = count of co-occurrences between input and output values of a level (components and samples) More complex variants covered by q(k, t) beta({n k,t} T t=1 + α) beta({n i k,t }T t=1 + α) l Gregor Heinrich A generic approach to topic models 19 / 39

55 Typology of NoMM substructures a ϑ a α z ϑ z α b v a ϑ a α x y ϑ x α ϑ y α b c a b ϑ a α ϑ b α y x ϑ k α c N1. Dirichlet multinomial a q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y) ϑ c a z ϑ z α b a E2. Autonomous edges C2. Combined indices ϑ a α z ϑ z α ϑ z α ϑ c a,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z 1 ) q(b, z 2 ) q(z 1, c c) q(z 2, c c) N2. Observed parameters E3. Coupled edges C3. Interleaved indices NoMM substructures: Nodes, edges/branches, component indices/merging of edges: Representation via q-functions and likelihood Multiple samples per data point: q(a, x y) for respective level Library incl. additional structures: alternative distributions, regression, aggregation etc. q-functions + other factors b c Gregor Heinrich A generic approach to topic models 20 / 39 a b ϑ a α ϑ b α z 1 z 2 ϑ z α c

56 Implementation: Gibbs meta-sampler data = validate data = deploy topic model specification code templates Java model instance NoMM code generator } {{ } Java VM generate compile C/Java code module optimise C/Java code prototype } {{ } native platform Code generator for topic models in Java and C Separation of knowledge domains: topic model applications vs. machine learning vs. computing architecture Gregor Heinrich A generic approach to topic models 21 / 39

57 Implementation: Gibbs meta-sampler data = validate data = deploy topic model specification code templates Java model instance NoMM code generator } {{ } Java VM generate Code generator for topic models in Java and C compile C/Java code module optimise C/Java code prototype } {{ } native platform Separation of knowledge domains: topic model applications vs. machine learning vs. computing architecture Gregor Heinrich A generic approach to topic models 21 / 39

58 Example NoMM script and generated kernel: hpam2 m [M] m [M] ϑ m α ϑ x m,x α x x x mn =x [X] w mn root ϕ k β [V] [Y] x = 0 : k = 0 x 0, y = 0 words : k = 1 + x x, y 0 : k = 1 + X + y y mn =y model = HPAM2 description: Hierarchical PAM model 2 (HPAM2) words sequences: # variables sampled for each (m,n) w, x, y : m, n network: # each line one NoMM node m >> theta alpha sub >> x m,x >> thetax alphax[x] >> y x,y >> phi[k] >> w # java code to assign k k : { if (x==0) { k = 0; } else if (y==0) k = 1 + x; else k = 1 + X + y; }. sup // hidden edge for (hx = 0; hx < X; hx++) { // sup hidden edge for (hy = 0; hy < Y; hy++) { mxsel = X * m + hx; mxjsel = hx; words if (hx == 0) ksel = 0; else if (hy == 0) ksel = 1 + hx; else ksel sub = 1 + X + hy; pp[hx][hy] = (nmx[m][hx] + alpha[hx]) * (nmxy[mxsel][hy] + alphax[mxjsel][hy]) / (nmxysum[mxsel] + alphaxsum[mxjsel]) * (nkw[ksel][w[m][n]] + beta) / (nkwsum[ksel] + betasum); psum += pp[hx][hy]; } // for h } // for h sub words words words Gregor Heinrich A generic approach to topic models 22 / 39

59 Example NoMM script and generated kernel: hpam2 m [M] m [M] ϑ m α ϑ x m,x α x x x mn =x [X] y mn =y [Y] ϕ k β w mn [V] x = 0 : k = 0 x 0, y = 0 : k = 1 + x x, y 0 : k = 1 + X + y model = HPAM2 description: Hierarchical PAM model 2 (HPAM2) sequences: # variables sampled for each (m,n) w, x, y : m, n network: # each line one NoMM node m >> theta alpha >> x m,x >> thetax alphax[x] >> y x,y >> phi[k] >> w # java code to assign k k : { if (x==0) { k = 0; } else if (y==0) k = 1 + x; else k = 1 + X + y; }. // hidden edge for (hx = 0; hx < X; hx++) { // hidden edge for (hy = 0; hy < Y; hy++) { mxsel = X * m + hx; mxjsel = hx; if (hx == 0) ksel = 0; else if (hy == 0) ksel = 1 + hx; else ksel = 1 + X + hy; pp[hx][hy] = (nmx[m][hx] + alpha[hx]) * (nmxy[mxsel][hy] + alphax[mxjsel][hy]) / (nmxysum[mxsel] + alphaxsum[mxjsel]) * (nkw[ksel][w[m][n]] + beta) / (nkwsum[ksel] + betasum); psum += pp[hx][hy]; } // for h } // for h Gregor Heinrich A generic approach to topic models 22 / 39

60 Document Topic distribution in Gibbs sampler Iteration 1 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

61 Document Topic distribution in Gibbs sampler Iteration 5 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

62 Document Topic distribution in Gibbs sampler Iteration 10 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

63 Document Topic distribution in Gibbs sampler Iteration 15 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

64 Document Topic distribution in Gibbs sampler Iteration 20 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

65 Document Topic distribution in Gibbs sampler Iteration 30 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

66 Document Topic distribution in Gibbs sampler Iteration 40 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

67 Document Topic distribution in Gibbs sampler Iteration 50 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

68 Document Topic distribution in Gibbs sampler Iteration 60 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

69 Document Topic distribution in Gibbs sampler Iteration 80 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

70 Document Topic distribution in Gibbs sampler Iteration 100 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

71 Document Topic distribution in Gibbs sampler Iteration 120 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

72 Document Topic distribution in Gibbs sampler Iteration 150 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

73 Document Topic distribution in Gibbs sampler Iteration 200 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

74 Document Topic distribution in Gibbs sampler Iteration 300 Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

75 Document Topic distribution in Gibbs sampler Iteration 500, converged stationary state Document topic matrix ϑ (200 documents, 50 topics) Gregor Heinrich A generic approach to topic models 23 / 39

76 perplexity Fast sampling: Hybrid scaling methods PAM4 indep. depend. model dim. ser. par. indep. speedup LDA PAM PAM PAM iterations 5000 Serial and parallel scaling methods: Generalised results for LDA to generic NoMMs, specifically (Porteous et al. 2008; Newman et al. 2009) + novel approach Problem: Sampling space for stat. dependent variables: K L... Independence assumption: Separate samplers with dimensions K + L +... K L... Empirical result: Iterations, but topic quality comparable Hybrid approaches with independent samplers highly effective Implementation: complexity covered by meta-sampler Gregor Heinrich A generic approach to topic models 24 / 39

77 Overview Introduction Generic topic models Inference methods Application to virtual communities Conclusions and outlook Gregor Heinrich A generic approach to topic models 25 / 39

78 How can generic models be applied to data in virtual communities? Gregor Heinrich A generic approach to topic models 26 / 39

79 NoMM design process a ϑa α z ϑz α b v a ϑa α x y ϑx α ϑy α b c a b ϑa α ϑb α y x ϑk α c N1. Dirichlet multinomial a q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y) ϑ c a z ϑz α b a E2. Autonomous edges C2. Combined indices ϑa α z ϑz α ϑz α ϑ c a,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z 1 ) q(b, z 2 ) q(z 1, c c) q(z 2, c c) N2. Observed parameters E3. Coupled edges C3. Interleaved indices Typology Library of NoMM substructures Idea: Construct models from simple substructures that connect terminal nodes: Terminal nodes multimodal data (virtual communities...) Substructures relationships in data; latent semantics Process: Assumptions on dependencies in data Iterative association to structures in model (usage of typology) Gibbs distribution known! model behaviour: q(x, y) = rich get richer Implementation and test with Gibbs meta-sampler; possibly iteration b c Gregor Heinrich A generic approach to topic models 27 / 39 a b ϑa α ϑb α z 1 z 2 ϑz α c

80 NoMM design process Define Formulate Create modelling task model model terminals 1 and 2 metrics 3 assumptions 4 5 Define evidence Compose model and predict properties Generate Evaluate Write and adapt Implement based on 6 Gibbs 7 sampler target metric 8 test 9 corpus 10 NoMM script Optimise and integrate for target platform Typology Library of NoMM substructures Idea: Construct models from simple substructures that connect terminal nodes: Terminal nodes multimodal data (virtual communities...) Substructures relationships in data; latent semantics Process: Assumptions on dependencies in data Iterative association to structures in model (usage of typology) Gibbs distribution known! model behaviour: q(x, y) = rich get richer Implementation and test with Gibbs meta-sampler; possibly iteration Gregor Heinrich A generic approach to topic models 27 / 39

81 Application: Expert finding with tag annotations authorship authorship Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the annotation Scenario: Expert finding via documents with tag annotations Authors of relevant documents experts Frequently documents with additional annotations, here: tags Goal: Enable tag queries, improve quality of text queries Problem: Tags often incomplete, partly wrong Connection of tags and experts via topics (1) Data: For each document m: text w m, authors a m, tags c m (2) Goal: Tag query c : p( c a) = max, word query w : p( w a) = max (3) Terminal nodes: Authors in input, words and tags in output Gregor Heinrich A generic approach to topic models 28 / 39

82 Application: Expert finding with tag annotations AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m Scenario: Expert finding via documents with tag annotations Authors of relevant documents experts Frequently documents with additional annotations, here: tags Goal: Enable tag queries, improve quality of text queries Problem: Tags often incomplete, partly wrong Connection of tags and experts via topics (1) Data: For each document m: text w m, authors a m, tags c m (2) Goal: Tag query c : p( c a) = max, word query w : p( w a) = max (3) Terminal nodes: Authors in input, words and tags in output Gregor Heinrich A generic approach to topic models 28 / 39

83 Application: Expert finding with tag annotations AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m Scenario: Expert finding via documents with tag annotations Authors of relevant documents experts Frequently documents with additional annotations, here: tags Goal: Enable tag queries, improve quality of text queries Problem: Tags often incomplete, partly wrong Connection of tags and experts via topics (1) Data: For each document m: text w m, authors a m, tags c m (2) Goal: Tag query c : p( c a) = max, word query w : p( w a) = max (3) Terminal nodes: Authors in input, words and tags in output Gregor Heinrich A generic approach to topic models 28 / 39

84 Application: Expert finding with tag annotations AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m Scenario: Expert finding via documents with tag annotations Authors of relevant documents experts Frequently documents with additional annotations, here: tags Goal: Enable tag queries, improve quality of text queries Problem: Tags often incomplete, partly wrong Connection of tags and experts via topics (1) Data: For each document m: text w m, authors a m, tags c m (2) Goal: Tag query c : p( c a) = max, word query w : p( w a) = max (3) Terminal nodes: Authors in input, words and tags in output Gregor Heinrich A generic approach to topic models 28 / 39

85 Model assumptions AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m (4) Model assumptions: (a) Expertise of an author is weighted with the portion of authorship (b) Semantics of expertise expressed by topics z. Each author has a single field of expertise (topic distribution). (c) Semantics of tags expressed by topics y Gregor Heinrich A generic approach to topic models 29 / 39

86 Model assumptions AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m (4) Model assumptions: (a) Expertise of an author is weighted with the portion of authorship (b) Semantics of expertise expressed by topics z. Each author has a single field of expertise (topic distribution). (c) Semantics of tags expressed by topics y Gregor Heinrich A generic approach to topic models 29 / 39

87 Model assumptions AB 1 2 a m AB TH authors 5 14 w m Retrieval performance. The most prominent retrieval measures are precision and recall. Recall is defined as the ratio between the number of retrieved relevant items to the total number of existing relevant items. Precision is defined as the ratio between the tags c m TH document m (4) Model assumptions: (a) Expertise of an author is weighted with the portion of authorship (b) Semantics of expertise expressed by topics z. Each author has a single field of expertise (topic distribution). (c) Semantics of tags expressed by topics y Gregor Heinrich A generic approach to topic models 29 / 39

88 Model construction a m w m,n [M, N m ] authors word c m tags p(... a, w, c)... (5) Model construction: (a) Start with terminal nodes (from step 3) Gregor Heinrich A generic approach to topic models 30 / 39

89 Model construction x m,n m w m,n a m... [M, N m ] document author word p(x,... ) a m,x q(x,... )... c m tags (b) Authorship a m given as observed distribution node sampels author x of a word Gregor Heinrich A generic approach to topic models 30 / 39

90 Model construction x m,n m z m,n w m,n a m q(x, z) [M, N m ] document author word topic word p(x, z,... ) a m,x q(x, z)... c m tags (c) Each author has only a single field of expertise (topic distribution) q(x, z) associates (word-)topics with sampled authors x (cf. ATM) Gregor Heinrich A generic approach to topic models 30 / 39

91 Model construction m a m x m,n q(x, z) z m,n w m,n q(z, w) [M, N m ] document author word topic word p(x, z,... ) a m,x q(x, z) q(z, w)... c m tags (d) Topic distribution over terms connect z and w via q(z, w) Gregor Heinrich A generic approach to topic models 30 / 39

92 Model construction m document a m x m,n j author q(x,z y) word topic w m,n z m,n q(z, w) [M, N m ] tag topic y m, j word c m, j q(y, c) [M, J m ] tag p(x, z, y ) a m,x q(x, z y) q(z, w) q(y, c) (e) Introduce tag topics y m,j for c m,j as distributions over tags q(x, z y) overlays values for z and y Gregor Heinrich A generic approach to topic models 30 / 39

93 Model construction m document a m x m,n author q(x,z y) word topic w m,n z m,n q(z, w) [M, N m ] tag topic y m, j word c m, j q(y, c) [M, J m ] tag p(x, z, y ) a m,x q(x, z y) q(z, w) q(y, c) (e) Introduce tag topics y m,j for c m,j as distributions over tags q(x, z y) overlays values for z and y Gregor Heinrich A generic approach to topic models 30 / 39

94 Model construction m document a m x m, j author q(x,z y) word topic w m,n z m,n q(z, w) [M, N m ] tag topic y m, j word c m, j q(y, c) [M, J m ] tag p(x, z, y ) a m,x q(x, z y) q(z, w) q(y, c) (e) Introduce tag topics y m,j for c m,j as distributions over tags q(x, z y) overlays values for z and y Gregor Heinrich A generic approach to topic models 30 / 39

95 Model construction ordinary approach Expert tag topic model (Heinrich 2011) a m α x m, j x m,n ϑ x x [1, A] γ y m, j z m,n β ψ k k [1, K] c m, j w m,n m [1, M] ϕ k j [1, Jm] n [1, Nm] k [1, K] Gregor Heinrich A generic approach to topic models 30 / 39

96 Expert tag topic model: Evaluation word queries ATM ETT tag queries ETT coherence score ATM ETT NIPS Corpus: 2.3 million words, 2037 authors, 165 tags Retrieval: Average Term queries: ETT > ATM Tag queries: Similarly good AP values Topic coherence (Mimno et al. 2011): ETT > ATM Semi-supervised learning: Tag queries retrieve items without tags Gregor Heinrich A generic approach to topic models 31 / 39

97 Overview Introduction Generic topic models Inference methods Application to virtual communities Conclusions and outlook Gregor Heinrich A generic approach to topic models 32 / 39

98 Conclusions: Resesarch contributions Networks of Mixed Membership: Generic model and domain-specific compact representation of topic models Inference algorithms: Generic Gibbs sampler Fast sampling methods (serial, parallel, independent) Implementation in Gibbs meta-sampler Design process based on typology of NoMM substructures Application to virtual communities: Expert tag topic model for expert finding with annotated documents Contribution to facilitated model-based construction of topic models, specifically for virtual communities and other multimodal scenarios Gregor Heinrich A generic approach to topic models 33 / 39

99 Conclusions: Resesarch contributions Networks of Mixed Membership: Generic model and domain-specific compact representation of topic models Inference algorithms: Generic Gibbs sampler Fast sampling methods (serial, parallel, independent) Implementation in Gibbs meta-sampler + Variational Inference for NoMMs (Heinrich and Goesele 2009) Design process based on typology of NoMM substructures Application to virtual communities: Expert tag topic model for expert finding with annotated documents Contribution to facilitated model-based construction of topic models, specifically for virtual communities and other multimodal scenarios Gregor Heinrich A generic approach to topic models 33 / 39

100 Conclusions: Resesarch contributions Networks of Mixed Membership: Generic model and domain-specific compact representation of topic models Inference algorithms: Generic Gibbs sampler Fast sampling methods (serial, parallel, independent) Implementation in Gibbs meta-sampler + Variational Inference for NoMMs (Heinrich and Goesele 2009) Design process based on typology of NoMM substructures + AMQ model: Meta-model for virtual communities as formal basis for scenario modelling (Heinrich 2010) Application to virtual communities: Expert tag topic model for expert finding with annotated documents Contribution to facilitated model-based construction of topic models, specifically for virtual communities and other multimodal scenarios Gregor Heinrich A generic approach to topic models 33 / 39

101 Conclusions: Resesarch contributions Networks of Mixed Membership: Generic model and domain-specific compact representation of topic models Inference algorithms: Generic Gibbs sampler Fast sampling methods (serial, parallel, independent) Implementation in Gibbs meta-sampler + Variational Inference for NoMMs (Heinrich and Goesele 2009) Design process based on typology of NoMM substructures + AMQ model: Meta-model for virtual communities as formal basis for scenario modelling (Heinrich 2010) Application to virtual communities: Expert tag topic model for expert finding with annotated documents + Models ETT2 and ETT3 incl. novel NoMM structure; retrieval approaches (Heinrich 2011b) Contribution to facilitated model-based construction of topic models, specifically for virtual communities and other multimodal scenarios Gregor Heinrich A generic approach to topic models 33 / 39

102 Conclusions: Resesarch contributions Networks of Mixed Membership: Generic model and domain-specific compact representation of topic models Inference algorithms: Generic Gibbs sampler Fast sampling methods (serial, parallel, independent) Implementation in Gibbs meta-sampler + Variational Inference for NoMMs (Heinrich and Goesele 2009) Design process based on typology of NoMM substructures + AMQ model: Meta-model for virtual communities as formal basis for scenario modelling (Heinrich 2010) Application to virtual communities: Expert tag topic model for expert finding with annotated documents + Models ETT2 and ETT3 incl. novel NoMM structure; retrieval approaches (Heinrich 2011b) + Graph-based expert search using ETT models: Integration of explorative search/browsing and distributions from topic models Contribution to facilitated model-based construction of topic models, specifically for virtual communities and other multimodal scenarios Gregor Heinrich A generic approach to topic models 33 / 39

103 Conclusions: Resesarch contributions 4/15/12 1: The Efficiency and the 2 Blind Separation of Semiparametric Approach 4 Robustness of Natural 1 Filtered Sources Using to Multichannel Blind Receptive Field Formation Multiplicative Updating Gradient Descent State-Space Approach, Deconvolution of in Natural Scene Rule for Blind Separation Learning Rule, Nonminimum Phase 2 Environments Comparison Derived from the Method Algorithms for Systems, of Single Cell Learning of Scoring, cites Independent Components Rules, cites Analysis and Higher 9 A New Learning Algorithm cites Order Statistics, 1 Search for Information for Blind Signal Separation authors authors Bearing Components in authors cites authors New Approximations of Speech, authors Differential Entropy for authors authors Independent Component cites Analysis and Projection 1 authors Rokni_U Pursuit, Data Visualization and authors cites Cichocki_A Shouval_H Feature Selection New 2 cites authors Sparse Code Shrinkage.' Algorithms for Denoising by Nonlinear Nongaussian Data, cites authors 8. Maximum Likelihood authors Yang_H 1 Hyvarinen_A Estimation, 1. [blind source separation] 8 5. Extended ICA Removes [independent component authors authors Artifacts from cites analysis] One-unit Learning Rules Electroencephalographic authors cites Lee_T 1 Oja_E 1 for Independent Recordings, authors Component Analysis, Blind Separation of authors 10. Delayed and Convolved cites Sources, Parra_L 1 cites authors authors authors Bell_A 2 5 Independent Component cites Lin_J 3 Analysis for Identification Unsupervised of Artifacts in Classification with Non- cites authors Magnetoencephalographi Gaussian Mixture Models authors authors c Recordings, Using ICA, authors cites authors authors authors 1 4 Independent Component Factorizing Multivariate authors Symplectic Nonlinear 4 Analysis of Function Classes, A Non-linear Information Component Analysis Eiectroencephalographic 2 Maximisation Algorithm 4 Edges are the Data cites Maximum Likelihood Blind that Performs Blind 2 "Independent Source Separation and Source Separation A 4 Separation Unmixing Hyperspectral Components" of Natural cites Density Estimation by citescontext-sensitive citesdata, Scenes, Faithful Equivariant SOM, Generalization of ICA, Figure : ETT1: Expert search in community browser file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg Gregor Heinrich A generic approach to topic models 33 / 39 Page 1 o

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

A Generic Approach to Topic Models

A Generic Approach to Topic Models A Generic Approach to Topic Models Gregor Heinrich Fraunhofer IGD + University of Leipzig Darmstadt, Germany heinrich@igd.fraunhofer.de Abstract. This article contributes a generic model of topic models.

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

A Continuous-Time Model of Topic Co-occurrence Trends

A Continuous-Time Model of Topic Co-occurrence Trends A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Present and Future of Text Modeling

Present and Future of Text Modeling Present and Future of Text Modeling Daichi Mochihashi NTT Communication Science Laboratories daichi@cslab.kecl.ntt.co.jp T-FaNT2, University of Tokyo 2008-2-13 (Wed) Present and Future of Text Modeling

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Probabilistic Topic Models Tutorial: COMAD 2011

Probabilistic Topic Models Tutorial: COMAD 2011 Probabilistic Topic Models Tutorial: COMAD 2011 Indrajit Bhattacharya Assistant Professor Dept of Computer Sc. & Automation Indian Institute Of Science, Bangalore My Background Interests Topic Models Probabilistic

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

28 : Approximate Inference - Distributed MCMC

28 : Approximate Inference - Distributed MCMC 10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,

More information

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED.

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED. Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details THIS IS AN EARLY DRAFT. YOUR FEEDBACKS ARE HIGHLY APPRECIATED. Yi Wang yi.wang.2005@gmail.com August 2008 Contents Preface 2 2 Latent

More information

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Parameter estimation for text analysis

Parameter estimation for text analysis Parameter estimation for text analysis Gregor Heinrich gregor@arbylon.net Abstract. This primer presents parameter estimation methods common in Bayesian statistics and apply them to discrete probability

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Measuring Topic Quality in Latent Dirichlet Allocation

Measuring Topic Quality in Latent Dirichlet Allocation Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery

DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery Yingjie Hu, Grant McKenzie, Jiue-An Yang, Song Gao, Amin Abdalla, and

More information

Lecture 19, November 19, 2012

Lecture 19, November 19, 2012 Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Interpretable Latent Variable Models

Interpretable Latent Variable Models Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami 1 PROBABILISTIC PROGRAMMING Probabilistic programming is to probabilistic modelling as deep learning is to neural networks (Antti Honkela,

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Efficient Methods for Topic Model Inference on Streaming Document Collections

Efficient Methods for Topic Model Inference on Streaming Document Collections Efficient Methods for Topic Model Inference on Streaming Document Collections Limin Yao, David Mimno, and Andrew McCallum Department of Computer Science University of Massachusetts, Amherst {lmyao, mimno,

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami Adapted from my talk in AIHelsinki seminar Dec 15, 2016 1 MOTIVATING INTRODUCTION Most of the artificial intelligence success stories

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The Doubly Correlated Nonparametric Topic Model

The Doubly Correlated Nonparametric Topic Model The Doubly Correlated Nonparametric Topic Model Dae Il Kim and Erik B. Sudderth Department of Computer Science Brown University, Providence, RI 0906 daeil@cs.brown.edu, sudderth@cs.brown.edu Abstract Topic

More information