Priors for Random Count Matrices with Random or Fixed Row Sums
|
|
- Kristopher Pope
- 6 years ago
- Views:
Transcription
1 Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin th Conference on Bayesian Nonparametrics Raleigh, NC, June, / 7
2 Table of Contents Motivations How to construct an infinite random count matrix? Priors for random count matrices Infinite vocabulary naive Bayes classifiers Random count matrices and mixed-membership modeling Conclusions / 7
3 Motivations Where do random count matrices appear? Directly observable random count matrices: Text analysis: document-word count matrix DNA-sequencing: sample-gene count matrix Social network analysis: user-venue check-in count matrix Consumer behavior: consumer-product count matrix Latent random count matrices: Topic models [Blei et al., ]: document-topic count matrix (the sum of each row is the length of the corresponding document) Hidden Markov models: state-state transition count matrix / 7
4 Motivations Motivations to Study Random Count Matrices Lack of priors to describe random count matrices with a potentially infinite number of rows/columns. A naive Bayes classifier often requires a predetermined vocabulary shared across all categories, and has to ignore previously unseen features/terms. How to calculate the predictive distribution of a new count vector that brings previously unseen terms? Interesting combinatorial structures unique to infinite random count matrices. Priors for random count matrices can be used to construct priors for mixed-membership modeling. / 7
5 Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Document Document Document Document Priors for Random Count Matrices with Random or Fixed Row Sums Motivations Representation of a count vector under a count matrix Mac.Hardware Politics.Guns Mac.Hardware Politics.Guns Term A Mac.Hardware document Term Term A Mac.Hardware document Term Term A Politics.Guns document Term Term A Politics.Guns document Term New term New term New term.. New term / 7
6 Motivations Infinite random count matrices to be studied No natural upper bound on the number of rows or columns Conditionally independent rows, i.i.d. columns Parallel column-wise construction Sequential row-wise constructions Predictive distribution of a new row count vector that brings new features Random count matrices with fixed row sums for mixed-membership modeling / 7
7 How to construct an infinite random count matrix? Related prior distributions Prior distributions for counts: Poisson, logarithmic, digamma distributions Negative binomial, beta-negative binomial, and gamma-negative binomial distributions Poisson-logarithmic bivariate distribution [Zhou & Carin, ] Generating a random count vector: Chinese restaurant process, Pitman-Yor process Normalized random measures with independent increments [Regazzini, Lijoi, & Prünster, ; James, Lijoi, & Prünster, 9] Exchangeable partition probability functions (EPPFs) [Pitman, ]; Size dependent EPPFs [Zhou & Walker, ] Generating an infinite random binary matrix: Indian buffet process [Griffiths & Ghahramani, ]; Beta-Bernoulli process [Thibaux & Jordan, 7] Generating an infinite random count matrix: How? 7 / 7
8 How to construct an infinite random count matrix? Steps to construct an infinite random count matrix Choose a completely random measure G, a draw from which consists of countably infinite atoms G = k= r kδ ωk. For X j := k= n jkδ ωk, draw counts n jk f (r k, θ j ), where f denotes a count distribution parameterized by r k and θ j. Denote n :k = (n k,..., n Jk ) T and n k = J j= n jk. The count matrix N J is constructed by organizing all the nonzero column count vectors, {n :k } k:n k >, in an arbitrary order into a random count matrix. In practice, we cannot instantiate all the atoms of G. Thus we will have to marginalize G out from {X j },J to construct N J. / 7
9 Priors for random count matrices Example: gamma-poisson or negative binomial process Gamma-Poisson process [Titsias, ; Zhou & Carin, ; Zhou et al., ] X j PP(G), G ΓP(G, /c) Conditional likelihood: p({x j },J G) = k= r n k k J j= n jk! e Jr k K J = e JG(Ω\D) k= r n k k e Jr k J j= n jk! To marginalize G out, one may separate Ω to the absolution continuous space and points of discontinuity, and then apply the characteristic function to G(Ω\D) and the Lévy measure of G to each point of discontinuity. The {X j },J to N J is a one-to-(k J!) mapping, thus f (N J γ, c) = E G [p({x j },J G)] K J! 9 / 7
10 Priors for random count matrices Example: gamma-poisson or negative binomial process Exchangeable rows and i.i.d. columns Distribution for the count matrix: f (N J γ, c) = γk J exp [ γ ln( J+c c )] K J! Row exchangeable, column i.i.d: K J k= n :k Multinomial(n k, /J,..., /J), n k Log[J/(J + c)], K J Pois {γ [ln(j + c) ln(c)]}. Γ(n k ) (J+c) n k J j= n jk! Closed-form Gibbs sampling update equations for model parameters / 7
11 Priors for random count matrices Example: gamma-poisson or negative binomial process Exchangeable rows and i.i.d. columns Distribution for the count matrix: f (N J γ, c) = γk J exp [ γ ln( J+c c )] K J! Row exchangeable, column i.i.d: K J k= n :k Multinomial(n k, /J,..., /J), n k Log[J/(J + c)], K J Pois {γ [ln(j + c) ln(c)]}. Γ(n k ) (J+c) n k J j= n jk! Closed-form Gibbs sampling update equations for model parameters / 7
12 Priors for random count matrices Example: gamma-poisson or negative binomial process Sequential row-wise construction Sequential row-wise construction: p(n + J+ N J, θ) = f (N J+ θ) f (N J θ) = K J!K + J+! K J+! K J+ k=k J + Log K J k= ( NB n (J+)k ; n k, ( ) n (J+)k ; J + c + ) J + c + Pois { K + J+; γ [ln(j + c + ) ln(j + c)] }. To add a new row to N J Z J K J : First, draw count NB(n k, p J+ ) at each existing column Second, draw K + J+ Pois {γ [ln(j + c + ) ln(j + c)]} number of new columns Third, draw Log(pJ+ ) random count at each new column The combinatorial coefficient arises as the newly added columns are inserted into the original ones at random locations, with their relative orders preserved. / 7
13 Priors for random count matrices Example: gamma-poisson or negative binomial process rows 7 9 columns Figure: A sequentially constructed negative binomial process random count matrix N J NBPM(γ, c). / 7
14 Priors for random count matrices Example: gamma-negative binomial process Gamma-negative binomial process [Zhou & Carin, ; Zhou et al., ] Gamma-negative binomial process: Conditional likelihood: X j NBP(G, p j ), G ΓP(G, /c) p({x j },J G, p) = Augmented likelihood: k= j= K J p({x j, L j },J G, p) = e q G(Ω\D) J Γ(n jk + r k ) n jk!γ(r k ) pn jk j ( p j ) r k k= where q j = ln( p j ) and q = J j= q j. r l k e q r k k ( J j= s(n jk, l jk ) p n jk j n jk! ), / 7
15 Priors for random count matrices Example: gamma-negative binomial process Distribution for the (augmented) count matrix: f (N J, L J θ) = γk J exp [ γ ln( c+q ) ] c K J! Row heterogeneity, column i.i.d.: n jk = l jk t= K J k= n jkt, n jkt Log(p j ), ( J Γ(l k ) (c + q ) l k j= (l k,..., l Jk ) Mult(l k, q /q,..., q J /q ), l k Log[q /(c + q )], K J Pois{γ [ln(c + q ) ln(c)]}. s(n jk, l jk ) p n jk j n jk! Closed-form Gibbs sampling update equations for model parameters. ) / 7
16 Priors for random count matrices Example: gamma-negative binomial process Predictive distribution of a new row: p(n + J+, L+ J+ N J, L J, θ) = K J!K + J+! K J+! K J k= NB (l (J+)k ; l k, K J+ k=k J + Log (l (J+)k ; KJ+ k= SumLog ( l (J+)k, p J+ ) ) q J+ c+q +q J+ ) q J+ c+q +q J+ Pois { K + J+ ; γ [ln(c + q + q J+ ) ln(c + q )] }. To add a new row: q Draw NB(l k, J+ c+q +q J+ ) tables at existing columns (dishes) Draw K + J+ Pois {γ [ln(c + q + q J+ ) ln(c + q )]} new dishes Draw Log( q J+ c+q +q J+ ) tables at each new dish Draw Log(pJ+ ) customers at each table and aggregate the counts across the tables of the same dish as n (J+)k = l (J+)k t= n (J+)kt / 7
17 Priors for random count matrices Example: gamma-negative binomial process rows columns 7 9 Figure: A sequentially constructed gamma-negative binomial process random count matrix N J GNBPM(γ, c, p,, p J ). / 7
18 Priors for random count matrices Example: beta-negative binomial process Beta-negative binomial process Beta-negative binomial process [Zhou et al., ; Broderick et al., ; Zhou & Carin ; Heaukulani & Roy, ; Zhou et al., ]: Conditional likelihood: X j NBP(r j, B), B BP(c, B ) p({x j },J B, r) = e p r where K J k= p n k k ( p k) r p = k=k J + ln( p k) J j= Γ(n jk + r j ) n jk!γ(r j ) 7 / 7
19 Priors for random count matrices Example: beta-negative binomial process Distribution for the count matrix: f (N J γ, c, r) = γk J e γ [ψ(c+r ) ψ(c)] K J! K J k= Row heterogeneity, column i.i.d.: Γ(n k )Γ(c + r ) Γ(c + n k + r ) n :k DirMult(n k, r,, r J ) J j= n k Digam(r, c) K J Pois { γ [ψ(c + r ) ψ(c)] } where Digam(n r, c) = Γ(r+n)Γ(c+r) ψ(c+r) ψ(c) nγ(c+n+r)γ(r) Γ(n jk + r j ) n jk!γ(r j ) Closed-form Gibbs sampling update equations for model parameters / 7
20 Priors for random count matrices Example: beta-negative binomial process Ice cream buffet process (a.k.a., multi-scoop IBP [Zhou et al., ] and negative binomial IBP [Heaukulani & Roy, ]) Sequential row-wise construction: p(n + J+ N J) = K J!K + J+! K J+! KJ k= BNB(n (J+)k; r J+, n k, c + r ) K J+ k=k J + Digam(n (J+)k; r J+, c + r ) Pois { K + J+ ; γ [ψ(c + r + r J+ ) ψ(c + r )] }. To add a new row: Customer J + takes n(j+)k BNB(r J+, n k, c + r ) number of scoops at an existing ice cream (column). The customer further selects K + J+ Pois {γ [ψ(c + r + r J+ ) ψ(c + r )]} new ice creams out of the buffet line. The customer takes n(j+)k Digam(r J+, c + r ) number of scoops at each new ice cream. 9 / 7
21 Priors for random count matrices Example: beta-negative binomial process columns rows 7 9 Figure: A sequentially constructed beta-negative binomial process random count matrix N J BNBPM(γ, c, r,, r J ). / 7
22 Priors for random count matrices Example: beta-negative binomial process Comparison of different priors NBP: Var[n (J+)k ] = E[n (J+)k ] + E [n (J+)k ] n k GNBP: Var[n (J+)k ] = E[n (J+)k] p J+ + E [n (J+)k ] BNBP: Var[n (J+)k ] = E[n (J+)k] c+r n k +c+r l k + E [n (J+)k ] n k (c+r ) n k +c+r / 7
23 Priors for random count matrices Example: beta-negative binomial process columns rows NBP columns rows NBP columns rows NBP columns rows GNBP 7 7 columns rows GNBP 7 7 columns rows GNBP 9 columns rows BNBP 7 columns rows BNBP 7 7 columns rows BNBP / 7
24 Priors for random count matrices Example: beta-negative binomial process Training and posterior predictive checking (a) The observed count matrix (b) A simulated NBP random count matrix Documents Documents 9 Words 9 Words (c) A simulated GNBP random count matrix (d) A simulated BNBP random count matrix Documents Documents 7 Words Words / 7
25 Infinite vocabulary naive Bayes classifiers Predictive distribution of a new row vector The predictive distribution of a row vector n J+ is p(n J+ N J, θ) = p(n+ J+ N J, θ) K + J+! () = K K J+! J! K J!K +!f (N J+ θ) J+. () K J+! f (N J θ) The normalizing constant /K + J+! in () arises because a realization of N + J+ to n J+ is one-to-many, with K + J+! distinct orderings of these new columns. The normalizing constant K J!/K J+! in () arises because there are K + J+ i= (K J + i)! = K J+!/K J! ways to insert the K + J+ new columns into the original ordered K J columns, which is again a one-to-many mapping. / 7
26 Infinite vocabulary naive Bayes classifiers Each category is summarized as a random count matrix N J ; columns with all zeros are excluded. Gibbs sampling is used to infer the parameters θ that generate N J ; to represent the posterior of θ, S MCMC samples {θ [s] },S are collected. For a testing row count vector n J+, its predictive likelihood given N J is calculated via Monte Carlo integration using p(n J+ N J ) = S S p(n + J+ N J, θ [s] ) s= for both the NBP and BNBP, and using p(n J+ N J ) = S for the GNBP. S s= K + J+! p(n + J+ N J, L [s] J, θ[s] ) K + J+! / 7
27 Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Document Document Document Document Priors for Random Count Matrices with Random or Fixed Row Sums Infinite vocabulary naive Bayes classifiers Infinite vocabulary naive Bayes classifiers Mac.Hardware Politics.Guns Mac.Hardware Politics.Guns Term A Mac.Hardware document Term Term A Mac.Hardware document Term Term A Politics.Guns document Term Term A Politics.Guns document Term New term New term New term.. New term / 7
28 Infinite vocabulary naive Bayes classifiers (a) Infinite vocabulary (b) Finite vocabulary Accuracy NBP Multinomial BNBP GNBP Accuracy Ratio of training documents..... Ratio of training documents Figure: Document categorization results on the Newsgroup dataset with (a) an unconstrained vocabulary that can grow to infinite, and (b) an predetermined finite vocabulary of size V =,, using the negative binomial process (NBP), gamma-negative binomial process (GNBP), and beta-negative binomial process (BNBP). The results of the multinomial naive Bayes classifier using Laplace smoothing are included for comparison. 7 / 7
29 Infinite vocabulary naive Bayes classifiers (a) Infinite vocabulary (b) Finite vocabulary Accuracy...7 NBP Multinomial BNBP GNBP Accuracy Ratio of training documents Ratio of training documents Figure: Analogous plots to the plots in the previous Figure for the TDT dataset. The predetermined finite vocabulary has the size of V =,77. / 7
30 Infinite vocabulary naive Bayes classifiers Figure: (a) The predicted probabilities of the test documents under different categories for the CNAE-9 dataset, using the GNBP nonparametric Bayesian naive Bayes classifier with % of the documents of each of the nine categories used for training. (b) Boxplots of the categorization accuracies; each accuracy is computed with S =, S =, S =, or S = MCMC samples. 9 / 7
31 Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Construct EPPFs for mixture modeling using priors for random count vectors [Zhou & Walker, ] One way to generate a random count vector (n,..., n l ): Draw l, the length of the vector, and then draw independent positive random counts {n k },l. Another way to generate such a random count vector: Draw a total count n, and partition it using an EPPF, resulting in a set of exchangeable categorical variables z = (z,..., z n ). Map z to a random positive count vector (n,..., n l ), where n k := n i= δ(z i = k) >. Both ways lead to the same distributed (n,..., n l ) if and only if P(n,..., n l, n) = n! l! l P(z, n) k= n k! (Sample size dependent) EPPF for Mixture modeling: [ ] P(z, n) n! P(n,..., n l, n) P(z n) = = P(n) l! l k= n k! P(n) / 7
32 Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Construct EPPFs for mixed-membership modeling using priors for random count matrices [Zhou ] BNBP random count matrix prior f (N J r, γ, c) = γk J e γ [ψ(c+r ) ψ(c)] KJ Γ(n k )Γ(c+r ) J Γ(n jk +r j ) K J! k= Γ(c+n k +r ) j= n jk!γ(r j ) With z = (z,..., z JmJ ) and n jk = m j i= δ(z ji = k), the joint distribution of a column count vector m = (m,..., m J ) T and its partition into a column exchangeable latent random count matrix with K J nonempty columns can be expressed as f (z, m r, γ, c) = K J! = γk J e γ[ψ(c+r ) ψ(c)] J j= m j! J j= K J k= m j! KJ k= n jk! Γ(n k)γ(c + r ) Γ(c + n k + r ) f (N J r, γ, c) J j= Γ(n jk + r j ) Γ(r j ) / 7
33 Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling The BNBP s EPPF for mixed-membership modeling: f (z m, r, γ, c) = f (z, m r, γ, c) f (m r, γ, c) The prediction rule is simple: = K J! J j= m j! KJ k= n jk! P(z ji z ji f (z ji, z ji, m r, γ, c), m, r, γ, c) = K ji. J + k= f (z ji = k, z ji, m r, γ, c) n ji k γ r j, c + r c + n ji k + r f (N J r, γ, c) f (m r, γ, c) (n ji jk + r j ), for k =,, K ji J ; if k = K ji J +. / 7
34 Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Random count matrices with fixed row sums (a) r i = (b) r i = (c) r i = Group Group Group Partition Partition Partition Figure: Random draws from the EPPF that governs the BNBP s exchangeable random partitions of groups (rows), each of which has data points. The jth row of each matrix, which sums to, represents the partition of the m j = data points of the jth group over a random number of exchangeable clusters. The kth column of each matrix represents the kth nonempty cluster in order of appearance in Gibbs sampling (the empty clusters are deleted). / 7
35 Random count matrices and mixed-membership modeling Gamma-negative binomial process (GNBP) mixed-membership modeling The GNBP s EPPF for mixed-membership modeling GNBP random count matrix prior f (N J, L J γ, c, p) = γk J exp[ γ ln( c+q c )] KJ K J! k= ( Γ(l k ) J (c+q ) l k j= s(n jk,l jk ) p n jk j n jk! With z = (z,..., z JmJ ), b = (b,..., b JmJ ), and n jkt = m j i= δ(z ji = k, b ji = t), the joint distribution of a column count vector m = (m,..., m J ) T, its partition into a column exchangeable latent random count matrix with K J nonempty columns, and an auxiliary categorical random vector can be expressed as ) f (b, z, m γ, c, p) = γ K J J p m j K J j Γ(l k) m j! (c + q ) l k j= e γ ln( c+q c ) k= J l jk j= t= Γ(n jkt ) / 7
36 Random count matrices and mixed-membership modeling Gamma-negative binomial process (GNBP) mixed-membership modeling The GNBP s EPPF for mixed-membership modeling: The prediction rule is simple: f (z, b m, γ, c, p) = f (z, b, m γ, c, p) f (m γ, c, p) P(z ji = k, b ji = t b ji, z ji, m, p, c) = f (z ji = k, b ji = t, b ji, z ji, m p, c) z ji,b ji f (z ji, b ji, b ji, z ji, m p, c) n ji jkt, ji if k KJ l ji k /(c + q ), if k K ji J γ /(c + q ),, t l ji jk ;, t = l ji jk + ; if k = K ji J +, t =. If we let z ji be the dish index and b ji be the table index for customer i in restaurant j, then the collapsed Gibbs sampler can be related to the Chinese restaurant franchise sampler of the hierarchical Dirichlet process (Teh et al., ). / 7
37 Conclusions Conclusions A family of probability mass functions for random count matrices. The proposed random count matrices have a random number of i.i.d. columns and could also be constructed by adding one row at a time. Their parameters can be inferred with closed-form Gibbs sampling update equations. Infinite vocabulary naive Bayes classifiers. Priors for random count matrices can be used to construct (group size dependent) EPPFs for mixed-membership modeling, with simple prediction rules for collapsed Gibbs sampling. / 7
38 Conclusions Main References M. Zhou, O. H. M. Padilla and J. G. Scott. Priors for random count matrices derived from a family of negative binomial processes. arxiv:.,. M. Zhou. Beta-negative binomial process and exchangeable random partitions for mixed-membership modeling. NIPS,. M. Zhou and S. G. Walker. Sample size dependent species models. arxiv:.,. C. Heaukulani and D. M. Roy. The combinatorial structure of beta negative binomial processes. arxiv:.,. T. Broderick, L. Mackey, J. Paisley, and M. I. Jordan. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Analysis and Machine Intelligence,. M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence,. M. Zhou and L. Carin. Augment-and-conquer negative binomial processes. In NIPS,. M. Zhou, L. Hannah, D. Dunson, and L. Carin. Beta-negative binomial process and Poisson factor analysis. In AISTATS,. 7 / 7
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material A The Negative Binomial Process: Details A. Negative binomial process random count matrix To
More informationBeta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 77,
More informationBeta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 787,
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationBayesian non parametric approaches: an introduction
Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationAugment-and-Conquer Negative Binomial Processes
Augment-and-Conquer Negative Binomial Processes Mingyuan Zhou Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 mz@ee.duke.edu Lawrence Carin Dept. of Electrical and Computer
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationHaupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process
Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationDependent hierarchical processes for multi armed bandits
Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino
More informationMachine Learning Summer School, Austin, TX January 08, 2015
Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,
More informationPoisson Latent Feature Calculus for Generalized Indian Buffet Processes
Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Lancelot F. James (paper from arxiv [math.st], Dec 14) Discussion by: Piyush Rai January 23, 2015 Lancelot F. James () Poisson Latent
More informationOn collapsed representation of hierarchical Completely Random Measures
Gaurav Pandey Ambedkar Dukkipati Department of Computer Science and Automation Indian Institute of Science, Bangalore-560012, India GP88@CSA.IISC.ERNET.IN AD@CSA.IISC.ERNET.IN In this paper, it is our
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationNegative Binomial Process Count and Mixture Modeling
Negative Binomial Process Count and Mixture Modeling Mingyuan Zhou and Lawrence Carin Abstract The seemingly disjoint problems of count and mixture modeling are united under the negative binomial NB process.
More informationFeature Allocations, Probability Functions, and Paintboxes
Feature Allocations, Probability Functions, and Paintboxes Tamara Broderick, Jim Pitman, Michael I. Jordan Abstract The problem of inferring a clustering of a data set has been the subject of much research
More informationThe Indian Buffet Process: An Introduction and Review
Journal of Machine Learning Research 12 (2011) 1185-1224 Submitted 3/10; Revised 3/11; Published 4/11 The Indian Buffet Process: An Introduction and Review Thomas L. Griffiths Department of Psychology
More informationNonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)
Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationInfinite Latent Feature Models and the Indian Buffet Process
Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationA marginal sampler for σ-stable Poisson-Kingman mixture models
A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina
More informationHierarchical Models, Nested Models and Completely Random Measures
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/238729763 Hierarchical Models, Nested Models and Completely Random Measures Article March 2012
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationAn Infinite Product of Sparse Chinese Restaurant Processes
An Infinite Product of Sparse Chinese Restaurant Processes Yarin Gal Tomoharu Iwata Zoubin Ghahramani yg279@cam.ac.uk CRP quick recap The Chinese restaurant process (CRP) Distribution over partitions of
More informationOn the posterior structure of NRMI
On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationNonparametric Factor Analysis with Beta Process Priors
Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract
More informationBayesian Nonparametrics: some contributions to construction and properties of prior distributions
Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationApplied Nonparametric Bayes
Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi
More informationPart IV: Monte Carlo and nonparametric Bayes
Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation
More informationarxiv: v2 [stat.ml] 10 Sep 2012
Distance Dependent Infinite Latent Feature Models arxiv:1110.5454v2 [stat.ml] 10 Sep 2012 Samuel J. Gershman 1, Peter I. Frazier 2 and David M. Blei 3 1 Department of Psychology and Princeton Neuroscience
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron INRIA IMB - University of Bordeaux Talence, France Francois.Caron@inria.fr Abstract We develop a novel Bayesian nonparametric model for
More informationA Stick-Breaking Construction of the Beta Process
John Paisley 1 jwp4@ee.duke.edu Aimee Zaas 2 aimee.zaas@duke.edu Christopher W. Woods 2 woods004@mc.duke.edu Geoffrey S. Ginsburg 2 ginsb005@duke.edu Lawrence Carin 1 lcarin@ee.duke.edu 1 Department of
More informationarxiv: v1 [stat.ml] 20 Nov 2012
A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationCombinatorial Clustering and the Beta. Negative Binomial Process
Combinatorial Clustering and the Beta 1 Negative Binomial Process Tamara Broderick, Lester Mackey, John Paisley, Michael I. Jordan Abstract arxiv:1111.1802v5 [stat.me] 10 Jun 2013 We develop a Bayesian
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationarxiv: v2 [stat.ml] 4 Aug 2011
A Tutorial on Bayesian Nonparametric Models Samuel J. Gershman 1 and David M. Blei 2 1 Department of Psychology and Neuroscience Institute, Princeton University 2 Department of Computer Science, Princeton
More informationParallel Markov Chain Monte Carlo for Pitman-Yor Mixture Models
Parallel Markov Chain Monte Carlo for Pitman-Yor Mixture Models Avinava Dubey School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Sinead A. Williamson McCombs School of Business University
More informationCollapsed Variational Dirichlet Process Mixture Models
Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science
More informationCSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado
CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical
More informationDecoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang Computer Science Department Princeton University chongw@cs.princeton.edu David M. Blei Computer Science Department
More informationScalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material
: Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical
More informationThe IBP Compound Dirichlet Process and its Application to Focused Topic Modeling
The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson SAW56@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationStochastic Variational Inference for the HDP-HMM
Stochastic Variational Inference for the HDP-HMM Aonan Zhang San Gultekin John Paisley Department of Electrical Engineering & Data Science Institute Columbia University, New York, NY Abstract We derive
More informationBayesian Mixtures of Bernoulli Distributions
Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationDynamic Probabilistic Models for Latent Feature Propagation in Social Networks
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationTree-Based Inference for Dirichlet Process Mixtures
Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationBayesian Nonparametric Models for Ranking Data
Bayesian Nonparametric Models for Ranking Data François Caron 1, Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College
More informationApplied Bayesian Nonparametrics 3. Infinite Hidden Markov Models
Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik Sudderth Brown University Work by E. Fox, E. Sudderth, M. Jordan, & A. Willsky AOAS 2011: A Sticky HDP-HMM with
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationDependent Random Measures and Prediction
Dependent Random Measures and Prediction Igor Prünster University of Torino & Collegio Carlo Alberto 10th Bayesian Nonparametric Conference Raleigh, June 26, 2015 Joint wor with: Federico Camerlenghi,
More informationMixed Membership Models for Time Series
20 Mixed Membership Models for Time Series Emily B. Fox Department of Statistics, University of Washington, Seattle, WA 98195, USA Michael I. Jordan Computer Science Division and Department of Statistics,
More informationDirichlet Processes and other non-parametric Bayesian models
Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationStick-Breaking Beta Processes and the Poisson Process
Stic-Breaing Beta Processes and the Poisson Process John Paisley David M. Blei 3 Michael I. Jordan,2 Department of EECS, 2 Department of Statistics, UC Bereley 3 Computer Science Department, Princeton
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationNonparametric Probabilistic Modelling
Nonparametric Probabilistic Modelling Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Signal processing and inference
More informationNonparametric Bayes Pachinko Allocation
LI ET AL. 243 Nonparametric Bayes achinko Allocation Wei Li Department of Computer Science University of Massachusetts Amherst MA 01003 David Blei Computer Science Department rinceton University rinceton
More informationarxiv: v1 [stat.ml] 8 Jan 2012
A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process
More informationNon-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources
th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical
More informationBeta processes, stick-breaking, and power laws
Beta processes, stick-breaking, and power laws T. Broderick, M. Jordan, J. Pitman Presented by Jixiong Wang & J. Li November 17, 2011 DP vs. BP Dirichlet Process Beta Process DP vs. BP Dirichlet Process
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationLatent Dirichlet Bayesian Co-Clustering
Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason
More informationNonparametric Bayesian Models for Sparse Matrices and Covariances
Nonparametric Bayesian Models for Sparse Matrices and Covariances Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Bayes
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More information