Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material

Size: px

Start display at page:

Download "Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material"

Grace Dorsey
5 years ago
Views:

1 Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material A The Negative Binomial Process: Details A. Negative binomial process random count matrix To generate a random count matrix we construct a gamma-poisson process as X j PPG G ΓPG 0 /c. A. Zhou and Carin 205 derives the marginal distribution of X = J j= X j and calls it as the negative binomial process NBP a draw from which is represented as an exchangeable random count vector. We do not consider that simplification in this paper and consequently our definition of the NBP a draw from which is represented as a row-column exchangeable random count matrix differs from the one in Zhou and Carin 205. The conditional likelihood in 4 can be re-written as p{x j } J G = e JGΩ k = r n k k J j= n jk!δω k = ω k. Applying the Palm formula Daley and Vere-Jones 988 James 2002 Bertoin 2006 Caron et al. 204 to the expectation E G [p{x j } J G] we have E G [p{x j } J G] = E = R + Ω =... { KJ = [ e JGΩ K J k = r n J j= n j! e Jr νdr dω E R + Ω [ ] r n k k J j= n jk!δω k = ω k e JGΩ\{ω } k=2 k = } r n k k J j= n jk! e Jr k νdr k dω k {E [ ]} G e JGΩ\D J. ] r n k k J j= n jk!δω k = ω k

2 Directly calculation with R + Ω rn e Jr νdrdω = γ 0 J + c n Γn and E G [e JGΩ\D J ] = + J/c γ 0 leads to p{x j } J γ 0 c = E G [p{x j } J G] = γ K J 0 e γ 0 ln J+c c Γn k J+c n k J j= n jk!. B Gamma-Negative Binomial Process: Details B. GNBP random count matrix Given the gamma process G ΓPG 0 /c we define X G NBPG p as a negative binomial process such that XA NBGA p for each A Ω. Replacing the Poisson processes in A. with the negative binomial processes defined in this way yields a gammanegative binomial process GNBP: X j NBPG p j G ΓPG 0 /c. With a draw from the gamma process G ΓPG 0 /c expressed as G = r kδ ωk a draw from X j G NBPG p j can be expressed as X j = n jkδ ωk n jk NBr k p j. The GNBP employs row-specific probability parameters p j to model row heterogeneity and hence X j are conditionally independent but not identically distributed if p j at different rows are set differently. Note that the GNBP is previously proposed in Zhou and Carin 205 which focuses on finding the conditional posterior of G without considering the marginalization of G. The GNBP hierarchical construction is conceptually simple but to obtain a random count matrix we have to marginalize out the gamma process G ΓPG 0 /c. As it is difficult to directly marginalize G out of the conditional likelihood of the observed J rows as p{x j } J G p = J j= Γn jk + r k n jk!γr k pn jk j p j r k where p := p... p J we first augment each n jk NBr k p j under its compound Poisson representation as n jk SumLogl jk p j l jk Poisr k q j. 2

3 Define X SumLogPL p as a sum-logarithmic process such that XA SumLogLA p for each A Ω. With X j NBPG p j augmented as X j SumLogPL j p j L j PPq j G we may express the joint likelihood of X j and L j as p{x j L j } J G p = J j= sn jk l jk r l jk k p n jk j p j r k n jk! With l k := J j= l jk similar to the analysis in Section A we can reexpress the likelihood as p{x j L j } J G p = e q GΩ\D r l ke q r k k J j= sn jk l jk p n jk j. B. n jk! Similar to the analysis in Section A. with G marginalized out as p{x j L j } J γ 0 c p = E G [p{x j L j } J G p] we obtain the GNBP random matrix prior in 0 using fn J L J γ 0 c p = p{x j L j } J γ 0 c p. B.2 K J! Although not obvious one may verify that 0 defines the PMF of a compound random count matrix which can be generated via n jk SumLogl jk p j l k... l Jk Multl k q /q... q J /q l k Log[q /c + q ] K J Pois{γ 0 [lnc + q lnc]}. B.3 Let σ... σj denote a random permutation of the column indices. If p j are set differently for different rows then Multl k q σ /q... q σj /q d Multl k q /q... q J /q and hence the introduced random count matrix no longer maintains row exchangeability. Comparing B.3 with 6 one may identify several key differences between the GNBP and NBP random count matrices. First one may increase p j to encourage the jth row to have larger counts than the others. Second both n jk and the column sum n k are generated from compound distributions. In fact if we let p j e then the matrix {l jk } jk in 3

4 B.3 is exactly a NBP random count matrix and the GNBP builds its random matrix using n jk SumLogl jk p j. The sequential construction of a GNBP random count matrix can be intuitively explained as drawing dishes drawing tables at each dish and then drawing customers at each table. Similar to the definition of N + J+ we let L+ J+ represent the new row and columns added to L J. Using 0 following the analysis in Section 2.. one may show with direct calculation that K J+ pn + J+ L+ J+ N J L J θ = K J!K + J+! SumLog l J+k p J+ K J+! q J+ NB l J+k ; l k c + q + q J+ K J+ q J+ Log l J+k ; c + q + q J+ k=k J + Pois { K + J+ ; γ 0 [lnc + q + q J+ lnc + q ] }. B.4 Thus to add a new row we first draw NB[l k q J+ /c + q + q J+ ] tables at existing columns dishes; we then draw K + J+ Pois{γ 0[lnc+q +q J+ lnc+q ]} new dishes each of which is associated with Log[q J+ /c + q + q J+ ] tables; we further draw Logp J+ customers at each table and aggregate the counts across the tables of the same dish as n J+k = lj+k t= n J+kt ; and in the final step we insert the K + J+ new columns into the K J original columns without reordering which again is a one to K J+!/ K J! K + J+! mapping. emphasize that the number of tables customers for a new dish which follows a logarithmic sum-logarithmic distribution must be at least one; the implication is that there are infinite many dishes that have not yet been ordered by any of the tables seated by existing customers. The sequential construction provides a convenient way to construct a GNBP random count matrix one row at a time. With the latent counts l J+k marginalized out one may show that the predictive distribution for N + J+ given N J and L J can be expressed in terms of the Poisson LogLog and We 4

5 GNB distributions as pn + J+ N J L J θ = K J!K + J+! K J+! K J+ k=k J + GNB n J+k ; l k c + q p J+ LogLog n J+k ; c + q p J+ Pois { K + J+ ; γ 0 [lnc + q + q J+ lnc + q ] } B.5 where n LogLogc p represents a logarithmic mixed sum-logarithmic distribution defined on positive integers and n GNBl c p represents a gamma mixed negative binomial distribution defined on Z whose PMFs are shown in Appendix D. B.2 Inference for parameters Both the GNB and LogLog distributions have complicated PMFs involving Stirling numbers of the first kind and it seems difficult to infer their parameters. Fortunately using the likelihoods B. and 0 and the data augmentation techniques developed for the negative binomial distribution Zhou and Carin 205 we are able to derive closed-form conditional posteriors for the GNBP. To complete the model we let γ 0 Gammae 0 /f 0 p j Betaa 0 b 0 and c Gammac 0 /d 0. We sample the model parameters as γ 0 Gamma e 0 + K J f 0 ln c n jk l jk = u t u t Bernoulli t= r k Gamma l k /c + q c+q r k r k + t {GΩ\D J } Gamma γ 0 /c + q p j Beta a 0 + m j b 0 + GΩ c Gamma c 0 + γ 0 /[d 0 + GΩ]. B.6 5

6 C Beta-Negative Binomial Process: Details C. BNBP random count matrix The GNBP generalizes the NBP by replacing the Poisson process in A. using a negative binomial process and shares the negative binomial dispersion parameters across rows. Exploiting an alternative strategy that shares the negative binomial probability parameters across rows we construct a BNBP as X j NBPr j B B BPc B 0 where p k = Bω k is the weight of the atom ω k of the beta process B BPc B 0 and X j B NBPr j B is a negative binomial process such that X j A = k:ω k A n jk n jk NBr j p k for each A Ω. With r := r... r J similar to the analysis in Appendix B the likelihood of the BNBP can be expressed as p{x} J B r = e p r p n k k p k r J j= Γn jk + r j n jk!γr j C. where p denotes the sum over all the atoms in the absolutely continuous space Ω\D J as p := k:n k =0 ln p k and r := J j= r j. Using the Lévy-Khintchine theorem and the Laplace transform of p can be expressed as { E[e sp ] = exp = exp [ [ p s ] νdpdω [0] Ω γ 0 i=0 ] c + i c + i + s = exp { γ 0 [ψc + s ψc]} } 6

7 where ψx = Γ x/γx is the digamma function; we define such a random variable as the logbeta random variable p logbetaγ 0 c whose mean and variance are E[p ] = γ 0 ψ c and Var[p ] = γ 0 ψ 2 c respectively where ψ n x = dn ψx dx n. As before one may verify with direct calculation that defines the PMF of a columni.i.d. random count matrix N J Z J K J which can be generated via n :k DirMultn k r... r J n k Digamr c K J Pois { γ 0 [ψc + r ψc] } C.2 where the PMFs of both the Dirichlet-multinomial DirMult and digamma distributions are shown in the Appendix. Note that if r j are set differently for different rows then DirMultn k r σ... r σj d DirMultn k r... r J and hence the corresponding random count matrix no longer maintains row exchangeability. The sequential construction of a BNBP random count matrix can be intuitively understood as an ice cream buffet process ICBP. Using similar to the analysis in Section 2. we have pn + J+ N J = K J!K + J+! K J+! K J+ k=k J + BNBn J+k ; r J+ n k c + r Digamn J+k ; r J+ c + r Pois { K + J+ ; γ 0 [ψc + r + r J+ ψc + r ] } C.3 where the PMF for the beta-negative binomial BNB distribution is shown in Appendix D. Thus to add a row to N J Z J K J customer J + takes n J+k BNBr J+ n k c + r number of scoops at an existing ice cream column; the customer further selects K + J+ Pois {γ 0 [ψc + r + r J+ ψc + r ]} new ice creams out of the buffet line and takes n J+k 7

8 Digamr J+ c + r number of scoops at each new ice cream. Thus the ICBP can also be considered as a multiple-scoop Indian buffet process an analogy used in Zhou et al Note that when r j we have K + J+ Pois[γ 0/c + J] confirming the derivation about the number of new dishes ice creams in Section 3.2 of Zhou et al. 202 which however provides no descriptions about the distributions of the number of scoops at existing and new ice creams. We emphasize that the number of scoops at a new ice cream which follows a digamma distribution must be at least one; the implication is that there are infinite many ice creams in the buffet line that have not yet been scooped by any of the existing customers. Similar to the GNBP random count matrix the BNBP random count matrix is column exchangeable but not row exchangeable if the row-specific dispersion parameters r j are fixed at different values. A related marked BNBP of Zhou et al. 202 Zhou and Carin 202 attaches an independent negative binomial dispersion parameter r k for each atom of the beta process and infers its values under a finite approximation of the beta process; another related BNBP of Broderick et al. 205 uses a single dispersion parameter r and sets its value empirically. None of these papers however marginalize out the beta process to define a prior on columni.i.d. random count matrices a challenge tackled in this paper. Independently of our work Heaukulani and Roy 203 also describe the marginalization of the beta process from the negative binomial process where the obtained BNBP is called the negative binomial Indian buffet process. Although the idea of marginalizing out the beta process is shared by both papers the techniques and combinatorial arguments used are quite different. Their paper focuses on a special case of the BNBP where a single dispersion parameter r is used for all the X j s. Our model allows row-specific dispersion parameters r j develops an efficient inference scheme for all model parameters derives the predictive distribution of a new row count vector under a BNBP random count matrix and also situates the BNBP in the larger family of count-matrix priors derived from negativebinomial processes. Due to different parameterization of the Lévy measure the beta process mass parameter γ 0 in this paper can be considered as γ 0 c in Thibaux and Jordan 2007 and Zhou et al

9 C.2 Inference for parameters For all the atoms in the absolutely continuous part of the space Ω\D J we have that νdpdω = p p c+r dpb 0 dω. Thus the Laplace transform of p can be expressed as E[e sp ] = exp { γ 0 [ψc + r + s ψc + r ]} and hence we have p logbetaγ 0 c + r. With its Laplace transform we sample p using the method proposed in Ridout To complete the model we let γ 0 Gammae 0 /f 0 r j Gammaa 0 b 0 and c Gammac 0 /d 0. Using both the conditional likelihood C. and the marginal likelihood and the data augmentation techniques developed in Zhou and Carin 205 we sample the model parameters as γ 0 Gamma e 0 + K J f 0 + ψc + r ψc p k Betan k c + r p logbetaγ 0 c + r n jk r j l jk = u t u t Bernoulli r j + t t= r j Gamma a 0 + l j b 0 + p K J ln p k. C.4 The only parameter that does not have an analytic conditional posterior is the concentration parameter c. Since using Campbell s theorem Kingman 993 we have E[ k p k] = [0] Ω pνdpdω = γ 0/c to sample c we use Qc = Gamma c 0 + γ 0 d 0 + p + K J p k C.5 as the proposal distribution in an independence chain Metropolis-Hastings sampling step. One may also sample c using a griddy-gibbs sampler Ritter and Tanner

10 D Some useful distributions Direct calculation shows that the logarithmic mixed sum-logarithmic LogLog distribution expressed as n SumLogl p l Log ln p has PMF c ln p f N n c p = n l= snl p n n! Γl [c ln p] l ln[c ln p] lnc for n { 2...}; and the negative binomial mixed sum-logarithmic distribution expressed as n SumLogl p l NB has PMF e ln p c ln p f N n e c p = n l=0 c e p n sn l Γen! Γe + l [c ln p] e+l for n {0...}. The iterative calculation of sn l /n! under the logarithmic scale is described in Appendix E. Using 2 one may show that the negative binomial mixed sum-logarithmic distribution shown above is equivalent to a gamma mixed negative binomial GNB distribution generated by n NBr p r Gammae /c. Note that n LogLogc p is the limit of n GNBe c p as e 0 conditioning on n > 0 thus it can be considered as a truncated GNB distribution. The Dirichlet-multinomial DirMult distribution Mosimann 962 Madsen et al is a Dirichlet mixed multinomial distribution with PMF DirMultn :k n k r = n k! Γr J J j= n kj! Γn k + r j= Γn kj + r j Γr j and the digamma distribution Sibuya 979 has PMF Digamn r c = Γr + nγc + r ψc + r ψc nγc + n + rγr D. where n = Since the beta-negative binomial BNB distribution has PMF f N n r e c = 0 NBn; r pbetap; e cdp = Γr + n Γc + rγe + nγe + c n!γr Γe + c + r + nγeγc 0

11 one may show that conditioning on n > 0 n BNBr e c becomes n Digamr c as e 0. Thus the digamma distribution can be considered as a truncated BNB distribution. Since the Laplace transform of the logbeta random variable p logbetaγ 0 c can be reexpressed as E[e sp ] = { exp i=0 γ 0 c + i [ + s ]} c + i we can generate p logbetaγ 0 c as an infinite sum of independent compound Poisson random variables as p = u i λ i λ i = i=0 t= γ0 λ it u i Pois λ it Gamma c + i. D.2 c + i E Calculating Stirling Numbers of the First Kind The unsigned Stirling numbers of the first kind sn l appear in the predictive distribution for the GNBP. It is numerically unstable to recursively calculate sn l based on sn l = n sn l + sn l as sn l would rapidly reach the maximum value allowed by a finite precision machine as n increases. Denoting gn l = ln sn l lnn! we iteratively calculate gn l with gn = lnn lnn + ln gn gn n = gn n ln n and gn l = ln n n + gn l + ln { + exp[gn l gn l lnn ]} for 2 l n. This approach is found to be numerically stable. References J. Bertoin. Random fragmentation and coagulation processes volume 02. Cambridge University Press 2006.

12 T. Broderick L. Mackey J. Paisley and M. I. Jordan. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Analysis and Machine Intelligence 205. F. Caron Y. W. Teh and B. T. Murphy. Bayesian nonparametric Plackett-Luce models for the analysis of clustered ranked data. Annal of Applied Statistics 204. D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes volume 2. Springer 988. C. Heaukulani and D. M. Roy. The combinatorial structure of beta negative binomial processes. arxiv: L. F. James. Poisson process partition calculus with applications to exchangeable models and bayesian nonparametrics. arxiv preprint math/ J. F. C. Kingman. Poisson Processes. Oxford University Press 993. R. E. Madsen D. Kauchak and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In ICML J. E. Mosimann. On the compound multinomial distribution the multivariate β-distribution and correlations among proportions. Biometrika pages M. S. Ridout. Generating random numbers from a distribution specified by its Laplace transform. Statistics and Computing pages C. Ritter and M. A. Tanner. Facilitating the Gibbs sampler: the Gibbs stopper and the griddy- Gibbs sampler. Journal of the American Statistical Association 992. M. Sibuya. Generalized hypergeometric digamma and trigamma distributions. Annals of the Institute of Statistical Mathematics pages R. Thibaux and M. I. Jordan. Hierarchical beta processes and the Indian buffet process. In AISTATS M. Zhou and L. Carin. Augment-and-conquer negative binomial processes. In NIPS 202. M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence 205. M. Zhou L. Hannah D. Dunson and L. Carin. Beta-negative binomial process and Poisson factor analysis. In AISTATS

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences