Machine Learning Summer School, Austin, TX January 08, 2015

Size: px
Start display at page:

Download "Machine Learning Summer School, Austin, TX January 08, 2015"

Transcription

1 Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08, / 45

2 for Part of Poisson, gamma, and negative inference for the negative Sparse codes Regression Images for Dictionary counts P N Latent variable X = P K K N modelsφ Θ for discrete data Latent Dirichlet allocation Poisson factor Words Documents Matrix P N X Topics = P K Words Φ Topics Documents K N Θ 2 / 45

3 data is common Nonnegative and discrete: Number of auto insurance claims / highway accidents / crimes Consumer behavior, labor mobility, marketing, voting Photon counting Species sampling Text Infectious diseases, Google Flu Trends Next generation sequencing (statistical genomics) Mixture can be viewed as a count- problem Number of points in a cluster (mixture model, we are a count vector) Number of words assigned to topic k in document j (we are a K J latent count matrix in a topic model/mixed-membership model) 3 / 45

4 data is common Nonnegative and discrete: Number of auto insurance claims / highway accidents / crimes Consumer behavior, labor mobility, marketing, voting Photon counting Species sampling Text Infectious diseases, Google Flu Trends Next generation sequencing (statistical genomics) Mixture can be viewed as a count- problem Number of points in a cluster (mixture model, we are a count vector) Number of words assigned to topic k in document j (we are a K J latent count matrix in a topic model/mixed-membership model) 3 / 45

5 Siméon-Denis Poisson (21 June April 1840) "Life is good for only two things: doing mathematics and teaching it." Poisson 4 / 45

6 Siméon-Denis Poisson (21 June April 1840) "Life is good for only two things: doing mathematics and teaching it." Poisson 4 / 45

7 Poisson x Pois(λ) Probability mass function: P(x λ) = λx e λ, x {0, 1,...} x! The mean and variance is the same: E[x] = Var[x] = λ. Restrictive to model over-dispersed (variance greater than the mean) counts that are commonly observed in practice. A basic building block to construct more flexible count. Overdispersed are commonly observed due to Heterogeneity: difference individuals Contagion: dependence the occurrence of events 5 / 45

8 Mixed Poisson x Pois(λ), λ f Λ (λ) Mixing the Poisson rate parameter with a positive leads to a mixed Poisson. A mixed Poisson is always over-dispersed. Law of total expectation: Law of total variance: E[x] = E[E[x λ]] = E[λ]. Var[x] = Var[E[x λ]] + E[Var[x λ]] = Var[λ] + E[λ]. Thus Var[x] > E[x] unless λ is a constant. The gamma is a popular choice as it is conjugate to the Poisson. 6 / 45

9 Mixing the gamma with the Poisson as ( ) p x Pois(λ), λ Gamma r,, 1 p where p/(1 p) is the gamma scale parameter, leads to the negative x NB(r, p) with probability mass function P(x r, p) = Γ(x + r) x!γ(r) px (1 p) r, x {0, 1,...} 7 / 45

10 Compound Poisson A compound Poisson is the summation of a Poisson random number of i.i.d. random variables. If x = n i=1 y i, where n Pois(λ) and y i are i.i.d. random variable, then x is a compound Poisson random variable. The negative random variable x NB(r, p) can also be generated as a compound Poisson random variable as l x = u i, l Pois[ r ln(1 p)], u i Log(p) i=1 where u Log(p) is the logarithmic with probability mass function P(u p) = 1 p u ln(1 p) u, u {1, 2, }. 8 / 45

11 m NB(r, p) r is the dispersion parameter p is the probability parameter Probability mass function ( ) Γ(r + m) r f M (m r, p) = m!γ(r) pm (1 p) r = ( 1) m p m (1 p) r m It is a gamma-poisson mixture It is a compound Poisson rp Its variance is greater that its mean rp (1 p) 2 1 p Var[m] = E[m] + (E[m])2 r 9 / 45

12 The conjugate prior for the negative probability parameter p is the beta : if m i NB(r, p), p Beta(a 0, b 0 ), then ( ) n (p ) = Beta a 0 + m i, b 0 + nr i=1 The conjugate prior for the negative dispersion parameter r is unknown, but we have a simple data augmentation technique to derive closed-form Gibbs sampling update equations for r. 10 / 45

13 If we assign m customers to tables using a Chinese restaurant process with concentration parameter r, then the random number of occupied tables l follows the Chinese Restaurant Table (CRT) f L (l m, r) = Γ(r) Γ(m + r) s(m, l) r l, l = 0, 1,, m. s(m, l) are unsigned Stirling numbers of the first kind. The joint of the customer count m NB(r, p) and table count is the Poisson-logarithmic bivariate count s(m, l) r l f M,L (m, l r, p) = (1 p) r p m. m! 11 / 45

14 Poisson-logarithmic bivariate count Probability mass function: l s(m, l) r f M,L (m, l; r, p) = (1 p) r p m. m! It is clear that the gamma is a conjugate prior for r to this bivariate count. The joint of the customer count and table count are equivalent: Draw NegBino(r, p) customers Draw Poisson(--r ln (1 -- p)) tables Assign customers to tables using a Chinese restaurant process with concentration parameter r Draw Logarithmic(p) customers on each table 12 / 45

15 inference for the negative count : m i NegBino(r, p), p Beta(a 0, b 0 ), r Gamma(e 0, 1/f 0 ). Gibbs sampling via data augmetantion: (p ) Beta (a 0 + n i=1 m i, b 0 + nr) ; (l i ) = ( ) m i t=1 b t, b t Bernoulli r t+r 1 ; ( (r ) Gamma e 0 + ) n i=1 l 1 i, f 0 n ln(1 p). Expectation-Maximization Variational Bayes 13 / 45

16 inference for the negative count : m i NegBino(r, p), p Beta(a 0, b 0 ), r Gamma(e 0, 1/f 0 ). Gibbs sampling via data augmetantion: (p ) Beta (a 0 + n i=1 m i, b 0 + nr) ; (l i ) = ( ) m i t=1 b t, b t Bernoulli r t+r 1 ; ( (r ) Gamma e 0 + ) n i=1 l 1 i, f 0 n ln(1 p). Expectation-Maximization Variational Bayes 13 / 45

17 Gibbs sampling: E[r] = 1.076, E[p] = Expectation-Maximization: r : 1.025, p : Variational Bayes: E[r] = 0.999, E[p] = For this example, variational Bayes inference correctly identifies the modes but underestimates the posterior variances of model parameters. 14 / 45

18 Gibbs sampling: E[r] = 1.076, E[p] = Expectation-Maximization: r : 1.025, p : Variational Bayes: E[r] = 0.999, E[p] = For this example, variational Bayes inference correctly identifies the modes but underestimates the posterior variances of model parameters. 14 / 45

19 Gibbs sampling: E[r] = 1.076, E[p] = Expectation-Maximization: r : 1.025, p : Variational Bayes: E[r] = 0.999, E[p] = For this example, variational Bayes inference correctly identifies the modes but underestimates the posterior variances of model parameters. 14 / 45

20 gamma chain NegBino-Gamma-Gamma-... Augmentation (CRT, NegBino)-Gamma-Gamma-... Equivalence (Log, Poisson)-Gamma-Gamma-... Marginalization NegBino-Gamma / 45

21 gamma chain NegBino-Gamma-Gamma-... Augmentation (CRT, NegBino)-Gamma-Gamma-... Equivalence (Log, Poisson)-Gamma-Gamma-... Marginalization NegBino-Gamma / 45

22 gamma chain NegBino-Gamma-Gamma-... Augmentation (CRT, NegBino)-Gamma-Gamma-... Equivalence (Log, Poisson)-Gamma-Gamma-... Marginalization NegBino-Gamma / 45

23 gamma chain NegBino-Gamma-Gamma-... Augmentation (CRT, NegBino)-Gamma-Gamma-... Equivalence (Log, Poisson)-Gamma-Gamma-... Marginalization NegBino-Gamma / 45

24 gamma chain NegBino-Gamma-Gamma-... Augmentation (CRT, NegBino)-Gamma-Gamma-... Equivalence (Log, Poisson)-Gamma-Gamma-... Marginalization NegBino-Gamma / 45

25 Poisson and multinomial Suppose that x 1,..., x K are independent Poisson random variables with x k Pois(λ k ), x = K k=1 x k. Set λ = K k=1 λ k; let (y, y 1,..., y K ) be random variables such that ( ) y Pois(λ), (y 1,..., y k ) y Mult y; λ 1 λ,..., λ K λ. Then the of x = (x, x 1,..., x K ) is the same as the of y = (y, y 1,..., y K ). 16 / 45

26 Model: Multinomial and Dirichlet (x i1,..., x ik ) Multinomial(n i, p 1,..., p k ), (p 1,..., p k ) Dirichlet(α 1,..., α k ) = Γ( k j=1 α j) k j=1 Γ(α j) The conditional posterior of (p 1,..., p k ) is Dirichlet distributed as ( (p 1,..., p k ) Dirichlet α 1 + i k j=1 x i1,..., α k + i p α j 1 j x ik ) 17 / 45

27 Gamma and Dirichlet Suppose that random variables y and (y 1,..., y K ) are independent with y Gamma(γ, 1/c), (y 1,..., y K ) Dir(γp 1,, γp K ) where K k=1 p k = 1; Let x k = yy k then {x k } 1,K are independent gamma random variables with x k Gamma(γp k, 1/c). The proof can be found in arxiv: v1 18 / 45

28 various Modeling Mixture Modeling Gaussian Logit Logarithmic Polya-Gamma Binomial Chinese Restaurant Beta Bernoulli Latent Gaussian Poisson Multinomial Gamma Dirichlet 19 / 45

29 Poisson regression Model: y i Pois(λ i ), λ i = exp(x T i β) Model assumption: Var[y i x i ] = E[y i x i ] = exp(x T i β). Poisson regression does not model over-dispersion. 20 / 45

30 Model: Model property: Poisson regression with multiplicative random effects y i Pois(λ i ), λ i exp(x T i β)ɛ i Var[y i x i ] = E[y i x i ] + Var[ɛ i] E 2 [ɛ i ] E2 [y i x i ]. regression (gamma random effect): ɛ i Gamma(r, 1/r) = r r Γ(r) ɛ i r 1 e rɛ i. Lognormal-Poisson regression (lognormal random effect): ɛ i ln N (0, σ 2 ). 21 / 45

31 Model: Model property: Poisson regression with multiplicative random effects y i Pois(λ i ), λ i exp(x T i β)ɛ i Var[y i x i ] = E[y i x i ] + Var[ɛ i] E 2 [ɛ i ] E2 [y i x i ]. regression (gamma random effect): ɛ i Gamma(r, 1/r) = r r Γ(r) ɛ i r 1 e rɛ i. Lognormal-Poisson regression (lognormal random effect): ɛ i ln N (0, σ 2 ). 21 / 45

32 Model: Model property: Poisson regression with multiplicative random effects y i Pois(λ i ), λ i exp(x T i β)ɛ i Var[y i x i ] = E[y i x i ] + Var[ɛ i] E 2 [ɛ i ] E2 [y i x i ]. regression (gamma random effect): ɛ i Gamma(r, 1/r) = r r Γ(r) ɛ i r 1 e rɛ i. Lognormal-Poisson regression (lognormal random effect): ɛ i ln N (0, σ 2 ). 21 / 45

33 Lognormal and gamma mixed negative regression Model (Zhou et al., ICML2012): y i NegBino (r, p i ), r Gamma(a 0, 1/h) inference with the Polya-Gamma. Model properties: ( ) Var[y i x i ] = E[y i x i ] + e σ2 (1 + r 1 ) 1 E 2 [y i x i ]. Special cases: regression: σ 2 = 0; Lognormal-Poisson regression: r ; Poisson regression: σ 2 = 0 and r. 22 / 45

34 Lognormal and gamma mixed negative regression Model (Zhou et al., ICML2012): y i NegBino (r, p i ), r Gamma(a 0, 1/h) ( ) pi ψ i = log = x T i β + ln ɛ i, ɛ i ln N (0, σ 2 ) 1 p i inference with the Polya-Gamma. Model properties: ( ) Var[y i x i ] = E[y i x i ] + e σ2 (1 + r 1 ) 1 E 2 [y i x i ]. Special cases: regression: σ 2 = 0; Lognormal-Poisson regression: r ; Poisson regression: σ 2 = 0 and r. 22 / 45

35 Lognormal and gamma mixed negative regression Model (Zhou et al., ICML2012): y i NegBino (r, p i ), r Gamma(a 0, 1/h) ( ) pi ψ i = log = x T i β + ln ɛ i, ɛ i ln N (0, σ 2 ) 1 p i inference with the Polya-Gamma. Model properties: ( ) Var[y i x i ] = E[y i x i ] + e σ2 (1 + r 1 ) 1 E 2 [y i x i ]. Special cases: regression: σ 2 = 0; Lognormal-Poisson regression: r ; Poisson regression: σ 2 = 0 and r. 22 / 45

36 Lognormal and gamma mixed negative regression Model (Zhou et al., ICML2012): y i NegBino (r, p i ), r Gamma(a 0, 1/h) ( ) pi ψ i = log = x T i β + ln ɛ i, ɛ i ln N (0, σ 2 ) 1 p i inference with the Polya-Gamma. Model properties: ( ) Var[y i x i ] = E[y i x i ] + e σ2 (1 + r 1 ) 1 E 2 [y i x i ]. Special cases: regression: σ 2 = 0; Lognormal-Poisson regression: r ; Poisson regression: σ 2 = 0 and r. 22 / 45

37 Lognormal and gamma mixed negative regression Model (Zhou et al., ICML2012): y i NegBino (r, p i ), r Gamma(a 0, 1/h) ( ) pi ψ i = log = x T i β + ln ɛ i, ɛ i ln N (0, σ 2 ) 1 p i inference with the Polya-Gamma. Model properties: ( ) Var[y i x i ] = E[y i x i ] + e σ2 (1 + r 1 ) 1 E 2 [y i x i ]. Special cases: regression: σ 2 = 0; Lognormal-Poisson regression: r ; Poisson regression: σ 2 = 0 and r. 22 / 45

38 example on the NASCAR dataset: Model Poisson NB LGNB LGNB Parameters (MLE) (MLE) (VB) (Gibbs) σ 2 N/A N/A r N/A β β 1 (Laps) β 2 (Drivers) β 3 (TrkLen) Using Variational Bayes inference, we can calculate the correlation matrix for (β 1, β 2, β 3 ) T as / 45

39 Latent Dirichlet allocation Poisson factor Latent Dirichlet allocation (Blei et al., 2003) Hierarchical model: x ji Mult(φ zji ) z ji Mult(θ j ) φ k Dir(η,..., η) ( α θ j Dir K,..., α ) K There are K topics {φ k } 1,K, each of which is a over the V words in the vocabulary. There are N documents in the corpus and θ j represents the proportion of the K topics in the jth document. x ji is the ith word in the jth document. z ji is the index of the topic selected by x ji. 24 / 45

40 Latent Dirichlet allocation Poisson factor Denote n vjk = i δ(x ji = v)δ(z ji = k), n v k = j n vjk, n jk = v n vjk, and n k = j n jk. Blocked Gibbs sampling: P(z ji = k ) φ xji kθ jk, k {1,..., K} (φ k ) Dir(η + n 1 k,..., η + n V k ) ( α (θ j ) Dir K + n j1,..., α ) K + n jk Variational Bayes inference (Blei et al., 2003). 25 / 45

41 Latent Dirichlet allocation Poisson factor Collapsed Gibbs sampling (Griffiths and Steyvers, 2004): Marginalizing out both the topics {φ k } 1,K and the topic proportions {θ j } 1,N. Sample z ji conditioning on all the other topic assignment indices z ji : P(z ji = k z ji ) η + n ji ( x ji k V η + n ji n ji jk + α ), k {1,..., K} K k This is easy to understand as P(z ji = k φ k, θ j ) φ xji kθ jk P(z ji = k z ji ) = P(z ji = k φ k, θ j )P(φ k, θ j z ji )dφ k dθ j P(φ k z ji ) = Dir(η + n ji 1 k,..., η + n ji V k ) ( α P(θ j z ji ) = Dir K + n ji j1,..., α ) K + n ji jk P(φ k, θ j z ji ) = P(φ k z ji )P(θ j z ji ) 26 / 45

42 Latent Dirichlet allocation Poisson factor In latent Dirichlet allocation, the words in a document are assumed to be exchangeable (bag-of-words assumption). Sparse codes Below we will Images relate latent Dirichlet Dictionary allocation to Poisson P N factor X and show = P K K N it essentially Φ tries Θ to factorize the term-document word count matrix under the Poisson likelihood: Words Documents Matrix P N X Topics = P K Words Φ Topics Documents K N Θ 27 / 45

43 Latent Dirichlet allocation Poisson factor Poisson factor alaysis Factorize the term-document word count matrix M Z V N + under the Poisson likelihood as M Pois(ΦΘ) where Z + = {0, 1,...} and R + = {x : x > 0}. m vj is the number of times that term v appears in document j. Factor loading matrix: Φ = (φ 1,..., φ K ) R V K +. Factor score matrix: Θ = (θ 1,..., θ N ) R K N +. A large number of discrete latent variable models can be united under the Poisson factor framework, with the main differences on how the priors for φ k and θ j are constructed. 28 / 45

44 Latent Dirichlet allocation Poisson factor Two equivalent augmentations Poisson factor ( K ) m vj Pois φ vk θ jk Augmentation 1: m vj = k=1 K n vjk, n vjk Pois(φ vk θ jk ) k=1 Augmentation 2: ( K ) φ vk θ jk m vj Pois φ vk θ jk, ζ vjk = K k=1 φ vkθ jk k=1 [n vj1,, n vjk ] Mult (m vj ; ζ vj1,, ζ vjk ) 29 / 45

45 Latent Dirichlet allocation Poisson factor Nonnegative matrix and gamma-poisson factor Gamma priors on Φ and( Θ: K ) m vj = Pois k=1 φ vkθ jk φ vk Gamma(a φ, 1/b φ ), θ jk Gamma(a θ, g k /a θ ). Expectation-Maximization (EM) algorithm: φ vk = φ vk θ jk = θ jk a φ 1 φ vk a θ 1 θ jk + N i=1 b φ + θ k m vj θ jk K k=1 φ vkθ jk m vj φ vk K k=1 φ vkθ jk + P p=1. a θ /g k + φ k If we set b φ = 0, a φ = a θ = 1 and g k =, then the EM algorithm is the same as those of non-negative matrix (Lee and Seung, 2000) with an objective function of minimizing the KL divergence D KL (M ΦΘ). 30 / 45

46 Latent Dirichlet allocation and Dirichlet-Poisson factor Latent Dirichlet allocation Poisson factor Dirichlet priors on Φ and( Θ: K ) m vj = Pois k=1 φ vkθ jk φ k Dir(η,..., η), θ j Dir(α/K,..., α/k). One may show that both the block Gibbs sampling inference and variational Bayes inference of the Dirichlet-Poisson factor model are the same as that of the Latent Dirichlet allocation. 31 / 45

47 Latent Dirichlet allocation Poisson factor Beta-gamma-Poisson factor Hierachical model (Zhou et al., 2012, Carin, 2014): m vj = K n vjk, n vjk Pois(φ vk θ jk ) k=1 φ k Dir (η,, η), θ jk Gamma [r j, p k /(1 p k )], r j Gamma(e 0, 1/f 0 ), p k Beta[c/K, c(1 1/K)]. n jk = V v=1 n vjk NB(r j, p k ) This parametric model becomes a nonparametric model governed by the beta-negative process as K. 32 / 45

48 Latent Dirichlet allocation Poisson factor Gamma-gamma-Poisson factor Hierachical model ( Carin, 2014): m vj = n jk NB(r k, p j ) K n vjk, n vjk Pois(φ vk θ jk ) k=1 φ k Dir (η,, η), θ jk Gamma [r k, p j /(1 p j )], p j Beta(a 0, b 0 ), r k Gamma(γ 0 /K, 1/c). This parametric model becomes a nonparametric model governed by the gamma-negative process as K. 33 / 45

49 Latent Dirichlet allocation Poisson factor Poisson factor and mixed-membership We may represent the Poisson factor m vj = K n vjk, n vjk Pois(φ vk θ jk ) k=1 in terms of a mixed-membership model, whose group sizes are randomized, as ( ) K θ jk x ji Mult(φ zji ), z ji k θ δ k, m j Pois θ jk, jk k k=1 where i = 1,..., m j in the jth document, and n vjk = m j i=1 δ(x ji = v)δ(z ji = k). The likelihoods of the two representations are different update to a multinomial coefficient (Zhou, 2014). 34 / 45

50 Latent Dirichlet allocation Poisson factor Connections to previous approaches Nonnegative matrix (K-L divergence) (NMF) Latent Dirichlet allocation (LDA) GaP: gamma-poisson factor model (GaP) (Canny, 2004) Hierarchical Dirichlet process LDA (HDP-LDA) (Teh et al., 2006) Poisson factor Infer Infer Support Related priors on θ jk (p k, r j ) (p j, r k ) K algorithms gamma NMF Dirichlet LDA beta-gamma GaP gamma-gamma HDP-LDA 35 / 45

51 Latent Dirichlet allocation Poisson factor Blocked Gibbs sampling Sample z ji from multinomial; n vjk = m j i=1 δ(x ji = v)δ(z ji = k). Sample φ k from Dirichlet For the beta-negative model (beta-gamma-poisson factor ) Sample l jk from CRT(n jk, r j ) Sample r j from gamma Sample p k from beta Sample θ jk from Gamma(r j + n jk, p k ) For the gamma-negative model (gamma-gamma-poisson factor ) Sample l jk from CRT(n jk, r k ) Sample r k from gamma Sample p j from beta Sample θ jk from Gamma(r k + n jk, p j ) Collapsed Gibbs sampling for the beta-negative model can be found in (Zhou, 2014). 36 / 45

52 Latent Dirichlet allocation Poisson factor Example application Example Topics of United Nation General Assembly Resolutions inferred by the gamma-gamma-poisson factor : Topic 1 trade world conference organization negotiations Topic 2 rights human united nations commission Topic 3 environment management protection affairs appropriate Topic 4 women gender equality including system Topic 5 economic summits outcomes conferences major The gamma-negative and beta-negative models have distinct mechanisms on controlling the number of inferred factors. They produce state-of-the-art perplexity results when used for topic of a document corpus (Zhou et al, 2012, Zhou and Carin 2014, Zhou 2014). 37 / 45

53 Stochastic blockmodel A relational (graph) is commonly used to describe the relationship nodes, where a node could represent a person, a movie, a protein, etc. Two nodes are connected if there is an edge (link) them. An undirected unweighted relational with N nodes can be equivalently represented with a sysmetric binary affinity matrix B {0, 1} N N, where b ij = b ji = 1 if an edge exists nodes i and j and b ij = b ji = 0 otherwise. 38 / 45

54 Stochastic blockmodel Each node is assigned to a cluster. Stochastic blockmodel The probability for an edge to exist two nodes is solely decided by the clusters that they are assigned to. Hierachical model: b ij Bernoulli(p zi z j ), p k1 k 2 Beta(a 0, b 0 ), z i Mult(π 1,..., π K ), (π 1,..., π K ) Dir(α/K,..., α/k) Blocked Gibbs sampling: P(z i = k ) = π k j i p b ij for j > i kz j (1 p kzj ) 1 b ij 39 / 45

55 Infinite relational model (Kemp et al., 2006) Stochastic blockmodel As K, the stochastic block model becomes a nonparametric model governed by the Chinese restaurant process (CRP) with concentration parameter α: b ij Bernoulli(p zi z j ), p k1 k 2 Beta(a 0, b 0 ), (z 1,..., z N ) CRP(α) for i > j Collapsed Gibbs sampling can be derived by marginalizing out p k1 k 2 and using the prediction rule of the Chinese restaurant process. 40 / 45

56 Stochastic blockmodel The coauthor of the top 234 NIPS authors / 45

57 Stochastic blockmodel The reordered using the stochastic blockmodel / 45

58 The estimated link probabilities within and blocks Stochastic blockmodel / 45

59 D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In AAAI, D. D. Lee and H. S. Seung. Algorithms for non-negative matrix. In NIPS, Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. JASA, M. Zhou, L. Hannah, D. Dunson, and L. Carin. Beta-negative process and Poisson factor. In AISTATS, M. Zhou, L. Li, D. Dunson, and L. Carin. Lognormal and gamma mixed negative regression. In ICML, / 45

60 M. L. Carin. Augment-and-conquer negative processes. In NIPS, M. L. Carin. process count and mixture. IEEE TPAMI, M. Zhou. Beta-negative process and exchangeable random partitions for mixed-membership. In NIPS, / 45

Negative Binomial Process Count and Mixture Modeling

Negative Binomial Process Count and Mixture Modeling Negative Binomial Process Count and Mixture Modeling Mingyuan Zhou and Lawrence Carin Abstract The seemingly disjoint problems of count and mixture modeling are united under the negative binomial NB process.

More information

Augment-and-Conquer Negative Binomial Processes

Augment-and-Conquer Negative Binomial Processes Augment-and-Conquer Negative Binomial Processes Mingyuan Zhou Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 mz@ee.duke.edu Lawrence Carin Dept. of Electrical and Computer

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 77,

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 787,

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material A The Negative Binomial Process: Details A. Negative binomial process random count matrix To

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Beta-Negative Binomial Process and Poisson Factor Analysis

Beta-Negative Binomial Process and Poisson Factor Analysis Mingyuan Zhou Lauren A. Hannah David B. Dunson Lawrence Carin Department of ECE, Department of Statistical Science, Duke University, Durham NC 2778, USA Abstract A beta-negative binomial (BNB process is

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Bayesian non parametric approaches: an introduction

Bayesian non parametric approaches: an introduction Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric

More information

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material : Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical

More information

The Indian Buffet Process: An Introduction and Review

The Indian Buffet Process: An Introduction and Review Journal of Machine Learning Research 12 (2011) 1185-1224 Submitted 3/10; Revised 3/11; Published 4/11 The Indian Buffet Process: An Introduction and Review Thomas L. Griffiths Department of Psychology

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Dirichlet Processes: Tutorial and Practical Course

Dirichlet Processes: Tutorial and Practical Course Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University

More information

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson SAW56@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Supplementary Material of High-Order Stochastic Gradient hermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering,

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Distance dependent Chinese restaurant processes

Distance dependent Chinese restaurant processes David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232

More information

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Probabilistic modeling of NLP

Probabilistic modeling of NLP Structured Bayesian Nonparametric Models with Variational Inference ACL Tutorial Prague, Czech Republic June 24, 2007 Percy Liang and Dan Klein Probabilistic modeling of NLP Document clustering Topic modeling

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Dirichlet Processes and other non-parametric Bayesian models

Dirichlet Processes and other non-parametric Bayesian models Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Hierarchical Models, Nested Models and Completely Random Measures

Hierarchical Models, Nested Models and Completely Random Measures See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/238729763 Hierarchical Models, Nested Models and Completely Random Measures Article March 2012

More information

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

The supervised hierarchical Dirichlet process

The supervised hierarchical Dirichlet process 1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint

More information

Mixed-membership Models (and an introduction to variational inference)

Mixed-membership Models (and an introduction to variational inference) Mixed-membership Models (and an introduction to variational inference) David M. Blei Columbia University November 24, 2015 Introduction We studied mixture models in detail, models that partition data into

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Inference Methods for Latent Dirichlet Allocation

Inference Methods for Latent Dirichlet Allocation Inference Methods for Latent Dirichlet Allocation Chase Geigle University of Illinois at Urbana-Champaign Department of Computer Science geigle1@illinois.edu October 15, 2016 Abstract Latent Dirichlet

More information

Nonparametric Mixed Membership Models

Nonparametric Mixed Membership Models 5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction

More information

Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm

Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm Vadim Smolyakov 1, and Julian Straub 2 and Sue Zheng 3 and John W. Fisher III 4 Abstract Modern vehicles are

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

UNIVERSITY OF CALIFORNIA, IRVINE. Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications DISSERTATION

UNIVERSITY OF CALIFORNIA, IRVINE. Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications DISSERTATION UNIVERSITY OF CALIFORNIA, IRVINE Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Collapsed Variational Dirichlet Process Mixture Models

Collapsed Variational Dirichlet Process Mixture Models Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science

More information

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang Computer Science Department Princeton University chongw@cs.princeton.edu David M. Blei Computer Science Department

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

arxiv: v1 [stat.ml] 8 Jan 2012

arxiv: v1 [stat.ml] 8 Jan 2012 A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007 COS 597C: Bayesian Nonparametrics Lecturer: David Blei Lecture # Scribes: Jordan Boyd-Graber and Francisco Pereira October, 7 Gibbs Sampling with a DP First, let s recapitulate the model that we re using.

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Nonparametric Probabilistic Modelling

Nonparametric Probabilistic Modelling Nonparametric Probabilistic Modelling Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Signal processing and inference

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information