Priors for Random Count Matrices with Random or Fixed Row Sums

Similar documents
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Bayesian nonparametric latent feature models

Non-Parametric Bayes

Probabilistic Graphical Models

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian non parametric approaches: an introduction

Infinite latent feature models and the Indian Buffet Process

Bayesian Nonparametric Models

Augment-and-Conquer Negative Binomial Processes

Bayesian nonparametric models for bipartite graphs

Bayesian Nonparametrics: Dirichlet Process

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

A Brief Overview of Nonparametric Bayesian Models

Dependent hierarchical processes for multi armed bandits

Machine Learning Summer School, Austin, TX January 08, 2015

Poisson Latent Feature Calculus for Generalized Indian Buffet Processes

On collapsed representation of hierarchical Completely Random Measures

Bayesian Nonparametric Models on Decomposable Graphs

Bayesian nonparametric latent feature models

Negative Binomial Process Count and Mixture Modeling

Feature Allocations, Probability Functions, and Paintboxes

The Indian Buffet Process: An Introduction and Review

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Bayesian Methods for Machine Learning

Infinite Latent Feature Models and the Indian Buffet Process

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Image segmentation combining Markov Random Fields and Dirichlet Processes

A marginal sampler for σ-stable Poisson-Kingman mixture models

Hierarchical Models, Nested Models and Completely Random Measures

Bayesian nonparametrics

Nonparametric Bayesian Matrix Factorization for Assortative Networks

An Infinite Product of Sparse Chinese Restaurant Processes

On the posterior structure of NRMI

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 3a: Dirichlet processes

Nonparametric Factor Analysis with Beta Process Priors

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

13: Variational inference II

Applied Nonparametric Bayes

Distance dependent Chinese restaurant processes

Truncation error of a superposed gamma process in a decreasing order representation

Part IV: Monte Carlo and nonparametric Bayes

arxiv: v2 [stat.ml] 10 Sep 2012

Bayesian nonparametric models for bipartite graphs

A Stick-Breaking Construction of the Beta Process

arxiv: v1 [stat.ml] 20 Nov 2012

Gentle Introduction to Infinite Gaussian Mixture Modeling

Bayesian Nonparametrics

Combinatorial Clustering and the Beta. Negative Binomial Process

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Advanced Machine Learning

Spatial Normalized Gamma Process

Latent Dirichlet Allocation (LDA)

arxiv: v2 [stat.ml] 4 Aug 2011

Parallel Markov Chain Monte Carlo for Pitman-Yor Mixture Models

Collapsed Variational Dirichlet Process Mixture Models

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Methods: Naïve Bayes

Stochastic Variational Inference for the HDP-HMM

Bayesian Mixtures of Bernoulli Distributions

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Online Bayesian Passive-Agressive Learning

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dirichlet Enhanced Latent Semantic Analysis

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Tree-Based Inference for Dirichlet Process Mixtures

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Bayesian Nonparametric Models for Ranking Data

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Dependent Random Measures and Prediction

Mixed Membership Models for Time Series

Dirichlet Processes and other non-parametric Bayesian models

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Stick-Breaking Beta Processes and the Poisson Process

Hierarchical Dirichlet Processes

Nonparametric Probabilistic Modelling

Nonparametric Bayes Pachinko Allocation

arxiv: v1 [stat.ml] 8 Jan 2012

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Beta processes, stick-breaking, and power laws

Topic Modelling and Latent Dirichlet Allocation

Latent Dirichlet Bayesian Co-Clustering

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Introduction to Probabilistic Machine Learning

Two Useful Bounds for Variational Inference

Bayes methods for categorical data. April 25, 2017

Dirichlet Processes: Tutorial and Practical Course

Sparse Stochastic Inference for Latent Dirichlet Allocation

Transcription:

Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin th Conference on Bayesian Nonparametrics Raleigh, NC, June, / 7

Table of Contents Motivations How to construct an infinite random count matrix? Priors for random count matrices Infinite vocabulary naive Bayes classifiers Random count matrices and mixed-membership modeling Conclusions / 7

Motivations Where do random count matrices appear? Directly observable random count matrices: Text analysis: document-word count matrix DNA-sequencing: sample-gene count matrix Social network analysis: user-venue check-in count matrix Consumer behavior: consumer-product count matrix Latent random count matrices: Topic models [Blei et al., ]: document-topic count matrix (the sum of each row is the length of the corresponding document) Hidden Markov models: state-state transition count matrix / 7

Motivations Motivations to Study Random Count Matrices Lack of priors to describe random count matrices with a potentially infinite number of rows/columns. A naive Bayes classifier often requires a predetermined vocabulary shared across all categories, and has to ignore previously unseen features/terms. How to calculate the predictive distribution of a new count vector that brings previously unseen terms? Interesting combinatorial structures unique to infinite random count matrices. Priors for random count matrices can be used to construct priors for mixed-membership modeling. / 7

Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Document Document Document Document Priors for Random Count Matrices with Random or Fixed Row Sums Motivations Representation of a count vector under a count matrix Mac.Hardware Politics.Guns Mac.Hardware Politics.Guns Term A Mac.Hardware document Term Term A Mac.Hardware document Term Term A Politics.Guns document Term Term A Politics.Guns document Term New term New term New term.. New term / 7

Motivations Infinite random count matrices to be studied No natural upper bound on the number of rows or columns Conditionally independent rows, i.i.d. columns Parallel column-wise construction Sequential row-wise constructions Predictive distribution of a new row count vector that brings new features Random count matrices with fixed row sums for mixed-membership modeling / 7

How to construct an infinite random count matrix? Related prior distributions Prior distributions for counts: Poisson, logarithmic, digamma distributions Negative binomial, beta-negative binomial, and gamma-negative binomial distributions Poisson-logarithmic bivariate distribution [Zhou & Carin, ] Generating a random count vector: Chinese restaurant process, Pitman-Yor process Normalized random measures with independent increments [Regazzini, Lijoi, & Prünster, ; James, Lijoi, & Prünster, 9] Exchangeable partition probability functions (EPPFs) [Pitman, ]; Size dependent EPPFs [Zhou & Walker, ] Generating an infinite random binary matrix: Indian buffet process [Griffiths & Ghahramani, ]; Beta-Bernoulli process [Thibaux & Jordan, 7] Generating an infinite random count matrix: How? 7 / 7

How to construct an infinite random count matrix? Steps to construct an infinite random count matrix Choose a completely random measure G, a draw from which consists of countably infinite atoms G = k= r kδ ωk. For X j := k= n jkδ ωk, draw counts n jk f (r k, θ j ), where f denotes a count distribution parameterized by r k and θ j. Denote n :k = (n k,..., n Jk ) T and n k = J j= n jk. The count matrix N J is constructed by organizing all the nonzero column count vectors, {n :k } k:n k >, in an arbitrary order into a random count matrix. In practice, we cannot instantiate all the atoms of G. Thus we will have to marginalize G out from {X j },J to construct N J. / 7

Priors for random count matrices Example: gamma-poisson or negative binomial process Gamma-Poisson process [Titsias, ; Zhou & Carin, ; Zhou et al., ] X j PP(G), G ΓP(G, /c) Conditional likelihood: p({x j },J G) = k= r n k k J j= n jk! e Jr k K J = e JG(Ω\D) k= r n k k e Jr k J j= n jk! To marginalize G out, one may separate Ω to the absolution continuous space and points of discontinuity, and then apply the characteristic function to G(Ω\D) and the Lévy measure of G to each point of discontinuity. The {X j },J to N J is a one-to-(k J!) mapping, thus f (N J γ, c) = E G [p({x j },J G)] K J! 9 / 7

Priors for random count matrices Example: gamma-poisson or negative binomial process Exchangeable rows and i.i.d. columns Distribution for the count matrix: f (N J γ, c) = γk J exp [ γ ln( J+c c )] K J! Row exchangeable, column i.i.d: K J k= n :k Multinomial(n k, /J,..., /J), n k Log[J/(J + c)], K J Pois {γ [ln(j + c) ln(c)]}. Γ(n k ) (J+c) n k J j= n jk! Closed-form Gibbs sampling update equations for model parameters / 7

Priors for random count matrices Example: gamma-poisson or negative binomial process Exchangeable rows and i.i.d. columns Distribution for the count matrix: f (N J γ, c) = γk J exp [ γ ln( J+c c )] K J! Row exchangeable, column i.i.d: K J k= n :k Multinomial(n k, /J,..., /J), n k Log[J/(J + c)], K J Pois {γ [ln(j + c) ln(c)]}. Γ(n k ) (J+c) n k J j= n jk! Closed-form Gibbs sampling update equations for model parameters / 7

Priors for random count matrices Example: gamma-poisson or negative binomial process Sequential row-wise construction Sequential row-wise construction: p(n + J+ N J, θ) = f (N J+ θ) f (N J θ) = K J!K + J+! K J+! K J+ k=k J + Log K J k= ( NB n (J+)k ; n k, ( ) n (J+)k ; J + c + ) J + c + Pois { K + J+; γ [ln(j + c + ) ln(j + c)] }. To add a new row to N J Z J K J : First, draw count NB(n k, p J+ ) at each existing column Second, draw K + J+ Pois {γ [ln(j + c + ) ln(j + c)]} number of new columns Third, draw Log(pJ+ ) random count at each new column The combinatorial coefficient arises as the newly added columns are inserted into the original ones at random locations, with their relative orders preserved. / 7

Priors for random count matrices Example: gamma-poisson or negative binomial process rows 7 9 columns Figure: A sequentially constructed negative binomial process random count matrix N J NBPM(γ, c). / 7

Priors for random count matrices Example: gamma-negative binomial process Gamma-negative binomial process [Zhou & Carin, ; Zhou et al., ] Gamma-negative binomial process: Conditional likelihood: X j NBP(G, p j ), G ΓP(G, /c) p({x j },J G, p) = Augmented likelihood: k= j= K J p({x j, L j },J G, p) = e q G(Ω\D) J Γ(n jk + r k ) n jk!γ(r k ) pn jk j ( p j ) r k k= where q j = ln( p j ) and q = J j= q j. r l k e q r k k ( J j= s(n jk, l jk ) p n jk j n jk! ), / 7

Priors for random count matrices Example: gamma-negative binomial process Distribution for the (augmented) count matrix: f (N J, L J θ) = γk J exp [ γ ln( c+q ) ] c K J! Row heterogeneity, column i.i.d.: n jk = l jk t= K J k= n jkt, n jkt Log(p j ), ( J Γ(l k ) (c + q ) l k j= (l k,..., l Jk ) Mult(l k, q /q,..., q J /q ), l k Log[q /(c + q )], K J Pois{γ [ln(c + q ) ln(c)]}. s(n jk, l jk ) p n jk j n jk! Closed-form Gibbs sampling update equations for model parameters. ) / 7

Priors for random count matrices Example: gamma-negative binomial process Predictive distribution of a new row: p(n + J+, L+ J+ N J, L J, θ) = K J!K + J+! K J+! K J k= NB (l (J+)k ; l k, K J+ k=k J + Log (l (J+)k ; KJ+ k= SumLog ( l (J+)k, p J+ ) ) q J+ c+q +q J+ ) q J+ c+q +q J+ Pois { K + J+ ; γ [ln(c + q + q J+ ) ln(c + q )] }. To add a new row: q Draw NB(l k, J+ c+q +q J+ ) tables at existing columns (dishes) Draw K + J+ Pois {γ [ln(c + q + q J+ ) ln(c + q )]} new dishes Draw Log( q J+ c+q +q J+ ) tables at each new dish Draw Log(pJ+ ) customers at each table and aggregate the counts across the tables of the same dish as n (J+)k = l (J+)k t= n (J+)kt / 7

Priors for random count matrices Example: gamma-negative binomial process rows 7 9 9 7 columns 7 9 Figure: A sequentially constructed gamma-negative binomial process random count matrix N J GNBPM(γ, c, p,, p J ). / 7

Priors for random count matrices Example: beta-negative binomial process Beta-negative binomial process Beta-negative binomial process [Zhou et al., ; Broderick et al., ; Zhou & Carin ; Heaukulani & Roy, ; Zhou et al., ]: Conditional likelihood: X j NBP(r j, B), B BP(c, B ) p({x j },J B, r) = e p r where K J k= p n k k ( p k) r p = k=k J + ln( p k) J j= Γ(n jk + r j ) n jk!γ(r j ) 7 / 7

Priors for random count matrices Example: beta-negative binomial process Distribution for the count matrix: f (N J γ, c, r) = γk J e γ [ψ(c+r ) ψ(c)] K J! K J k= Row heterogeneity, column i.i.d.: Γ(n k )Γ(c + r ) Γ(c + n k + r ) n :k DirMult(n k, r,, r J ) J j= n k Digam(r, c) K J Pois { γ [ψ(c + r ) ψ(c)] } where Digam(n r, c) = Γ(r+n)Γ(c+r) ψ(c+r) ψ(c) nγ(c+n+r)γ(r) Γ(n jk + r j ) n jk!γ(r j ) Closed-form Gibbs sampling update equations for model parameters / 7

Priors for random count matrices Example: beta-negative binomial process Ice cream buffet process (a.k.a., multi-scoop IBP [Zhou et al., ] and negative binomial IBP [Heaukulani & Roy, ]) Sequential row-wise construction: p(n + J+ N J) = K J!K + J+! K J+! KJ k= BNB(n (J+)k; r J+, n k, c + r ) K J+ k=k J + Digam(n (J+)k; r J+, c + r ) Pois { K + J+ ; γ [ψ(c + r + r J+ ) ψ(c + r )] }. To add a new row: Customer J + takes n(j+)k BNB(r J+, n k, c + r ) number of scoops at an existing ice cream (column). The customer further selects K + J+ Pois {γ [ψ(c + r + r J+ ) ψ(c + r )]} new ice creams out of the buffet line. The customer takes n(j+)k Digam(r J+, c + r ) number of scoops at each new ice cream. 9 / 7

Priors for random count matrices Example: beta-negative binomial process 7 9 9 7 columns rows 7 9 Figure: A sequentially constructed beta-negative binomial process random count matrix N J BNBPM(γ, c, r,, r J ). / 7

Priors for random count matrices Example: beta-negative binomial process Comparison of different priors NBP: Var[n (J+)k ] = E[n (J+)k ] + E [n (J+)k ] n k GNBP: Var[n (J+)k ] = E[n (J+)k] p J+ + E [n (J+)k ] BNBP: Var[n (J+)k ] = E[n (J+)k] c+r n k +c+r l k + E [n (J+)k ] n k (c+r ) n k +c+r / 7

Priors for random count matrices Example: beta-negative binomial process columns rows NBP columns rows NBP columns rows NBP columns rows GNBP 7 7 columns rows GNBP 7 7 columns rows GNBP 9 columns rows BNBP 7 columns rows BNBP 7 7 columns rows BNBP / 7

Priors for random count matrices Example: beta-negative binomial process Training and posterior predictive checking (a) The observed count matrix (b) A simulated NBP random count matrix Documents Documents 9 Words 9 Words (c) A simulated GNBP random count matrix (d) A simulated BNBP random count matrix Documents Documents 7 Words Words / 7

Infinite vocabulary naive Bayes classifiers Predictive distribution of a new row vector The predictive distribution of a row vector n J+ is p(n J+ N J, θ) = p(n+ J+ N J, θ) K + J+! () = K K J+! J! K J!K +!f (N J+ θ) J+. () K J+! f (N J θ) The normalizing constant /K + J+! in () arises because a realization of N + J+ to n J+ is one-to-many, with K + J+! distinct orderings of these new columns. The normalizing constant K J!/K J+! in () arises because there are K + J+ i= (K J + i)! = K J+!/K J! ways to insert the K + J+ new columns into the original ordered K J columns, which is again a one-to-many mapping. / 7

Infinite vocabulary naive Bayes classifiers Each category is summarized as a random count matrix N J ; columns with all zeros are excluded. Gibbs sampling is used to infer the parameters θ that generate N J ; to represent the posterior of θ, S MCMC samples {θ [s] },S are collected. For a testing row count vector n J+, its predictive likelihood given N J is calculated via Monte Carlo integration using p(n J+ N J ) = S S p(n + J+ N J, θ [s] ) s= for both the NBP and BNBP, and using p(n J+ N J ) = S for the GNBP. S s= K + J+! p(n + J+ N J, L [s] J, θ[s] ) K + J+! / 7

Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Document Document Document Document Priors for Random Count Matrices with Random or Fixed Row Sums Infinite vocabulary naive Bayes classifiers Infinite vocabulary naive Bayes classifiers Mac.Hardware Politics.Guns Mac.Hardware Politics.Guns Term A Mac.Hardware document Term Term A Mac.Hardware document Term Term A Politics.Guns document Term Term A Politics.Guns document Term New term New term New term.. New term / 7

Infinite vocabulary naive Bayes classifiers (a) Infinite vocabulary (b) Finite vocabulary.9.9.... Accuracy.7.7.... NBP Multinomial BNBP GNBP Accuracy.7.7......... Ratio of training documents..... Ratio of training documents Figure: Document categorization results on the Newsgroup dataset with (a) an unconstrained vocabulary that can grow to infinite, and (b) an predetermined finite vocabulary of size V =,, using the negative binomial process (NBP), gamma-negative binomial process (GNBP), and beta-negative binomial process (BNBP). The results of the multinomial naive Bayes classifier using Laplace smoothing are included for comparison. 7 / 7

Infinite vocabulary naive Bayes classifiers (a) Infinite vocabulary (b) Finite vocabulary.9.9.9.9 Accuracy...7 NBP Multinomial BNBP GNBP Accuracy...7.7.... Ratio of training documents.7.... Ratio of training documents Figure: Analogous plots to the plots in the previous Figure for the TDT dataset. The predetermined finite vocabulary has the size of V =,77. / 7

Infinite vocabulary naive Bayes classifiers Figure: (a) The predicted probabilities of the test documents under different categories for the CNAE-9 dataset, using the GNBP nonparametric Bayesian naive Bayes classifier with % of the documents of each of the nine categories used for training. (b) Boxplots of the categorization accuracies; each accuracy is computed with S =, S =, S =, or S = MCMC samples. 9 / 7

Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Construct EPPFs for mixture modeling using priors for random count vectors [Zhou & Walker, ] One way to generate a random count vector (n,..., n l ): Draw l, the length of the vector, and then draw independent positive random counts {n k },l. Another way to generate such a random count vector: Draw a total count n, and partition it using an EPPF, resulting in a set of exchangeable categorical variables z = (z,..., z n ). Map z to a random positive count vector (n,..., n l ), where n k := n i= δ(z i = k) >. Both ways lead to the same distributed (n,..., n l ) if and only if P(n,..., n l, n) = n! l! l P(z, n) k= n k! (Sample size dependent) EPPF for Mixture modeling: [ ] P(z, n) n! P(n,..., n l, n) P(z n) = = P(n) l! l k= n k! P(n) / 7

Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Construct EPPFs for mixed-membership modeling using priors for random count matrices [Zhou ] BNBP random count matrix prior f (N J r, γ, c) = γk J e γ [ψ(c+r ) ψ(c)] KJ Γ(n k )Γ(c+r ) J Γ(n jk +r j ) K J! k= Γ(c+n k +r ) j= n jk!γ(r j ) With z = (z,..., z JmJ ) and n jk = m j i= δ(z ji = k), the joint distribution of a column count vector m = (m,..., m J ) T and its partition into a column exchangeable latent random count matrix with K J nonempty columns can be expressed as f (z, m r, γ, c) = K J! = γk J e γ[ψ(c+r ) ψ(c)] J j= m j! J j= K J k= m j! KJ k= n jk! Γ(n k)γ(c + r ) Γ(c + n k + r ) f (N J r, γ, c) J j= Γ(n jk + r j ) Γ(r j ) / 7

Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling The BNBP s EPPF for mixed-membership modeling: f (z m, r, γ, c) = f (z, m r, γ, c) f (m r, γ, c) The prediction rule is simple: = K J! J j= m j! KJ k= n jk! P(z ji z ji f (z ji, z ji, m r, γ, c), m, r, γ, c) = K ji. J + k= f (z ji = k, z ji, m r, γ, c) n ji k γ r j, c + r c + n ji k + r f (N J r, γ, c) f (m r, γ, c) (n ji jk + r j ), for k =,, K ji J ; if k = K ji J +. / 7

Random count matrices and mixed-membership modeling Beta-negative binomial process (BNBP) mixed-membership modeling Random count matrices with fixed row sums (a) r i = (b) r i = (c) r i = 9 9 9 7 7 7 9 9 9 9 Group Group Group 9 7 9 9 7 7 7 9 9 7 7 Partition Partition Partition Figure: Random draws from the EPPF that governs the BNBP s exchangeable random partitions of groups (rows), each of which has data points. The jth row of each matrix, which sums to, represents the partition of the m j = data points of the jth group over a random number of exchangeable clusters. The kth column of each matrix represents the kth nonempty cluster in order of appearance in Gibbs sampling (the empty clusters are deleted). / 7

Random count matrices and mixed-membership modeling Gamma-negative binomial process (GNBP) mixed-membership modeling The GNBP s EPPF for mixed-membership modeling GNBP random count matrix prior f (N J, L J γ, c, p) = γk J exp[ γ ln( c+q c )] KJ K J! k= ( Γ(l k ) J (c+q ) l k j= s(n jk,l jk ) p n jk j n jk! With z = (z,..., z JmJ ), b = (b,..., b JmJ ), and n jkt = m j i= δ(z ji = k, b ji = t), the joint distribution of a column count vector m = (m,..., m J ) T, its partition into a column exchangeable latent random count matrix with K J nonempty columns, and an auxiliary categorical random vector can be expressed as ) f (b, z, m γ, c, p) = γ K J J p m j K J j Γ(l k) m j! (c + q ) l k j= e γ ln( c+q c ) k= J l jk j= t= Γ(n jkt ) / 7

Random count matrices and mixed-membership modeling Gamma-negative binomial process (GNBP) mixed-membership modeling The GNBP s EPPF for mixed-membership modeling: The prediction rule is simple: f (z, b m, γ, c, p) = f (z, b, m γ, c, p) f (m γ, c, p) P(z ji = k, b ji = t b ji, z ji, m, p, c) = f (z ji = k, b ji = t, b ji, z ji, m p, c) z ji,b ji f (z ji, b ji, b ji, z ji, m p, c) n ji jkt, ji if k KJ l ji k /(c + q ), if k K ji J γ /(c + q ),, t l ji jk ;, t = l ji jk + ; if k = K ji J +, t =. If we let z ji be the dish index and b ji be the table index for customer i in restaurant j, then the collapsed Gibbs sampler can be related to the Chinese restaurant franchise sampler of the hierarchical Dirichlet process (Teh et al., ). / 7

Conclusions Conclusions A family of probability mass functions for random count matrices. The proposed random count matrices have a random number of i.i.d. columns and could also be constructed by adding one row at a time. Their parameters can be inferred with closed-form Gibbs sampling update equations. Infinite vocabulary naive Bayes classifiers. Priors for random count matrices can be used to construct (group size dependent) EPPFs for mixed-membership modeling, with simple prediction rules for collapsed Gibbs sampling. / 7

Conclusions Main References M. Zhou, O. H. M. Padilla and J. G. Scott. Priors for random count matrices derived from a family of negative binomial processes. arxiv:.,. M. Zhou. Beta-negative binomial process and exchangeable random partitions for mixed-membership modeling. NIPS,. M. Zhou and S. G. Walker. Sample size dependent species models. arxiv:.,. C. Heaukulani and D. M. Roy. The combinatorial structure of beta negative binomial processes. arxiv:.,. T. Broderick, L. Mackey, J. Paisley, and M. I. Jordan. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Analysis and Machine Intelligence,. M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence,. M. Zhou and L. Carin. Augment-and-conquer negative binomial processes. In NIPS,. M. Zhou, L. Hannah, D. Dunson, and L. Carin. Beta-negative binomial process and Poisson factor analysis. In AISTATS,. 7 / 7