Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material
|
|
- Grace Dorsey
- 5 years ago
- Views:
Transcription
1 Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material A The Negative Binomial Process: Details A. Negative binomial process random count matrix To generate a random count matrix we construct a gamma-poisson process as X j PPG G ΓPG 0 /c. A. Zhou and Carin 205 derives the marginal distribution of X = J j= X j and calls it as the negative binomial process NBP a draw from which is represented as an exchangeable random count vector. We do not consider that simplification in this paper and consequently our definition of the NBP a draw from which is represented as a row-column exchangeable random count matrix differs from the one in Zhou and Carin 205. The conditional likelihood in 4 can be re-written as p{x j } J G = e JGΩ k = r n k k J j= n jk!δω k = ω k. Applying the Palm formula Daley and Vere-Jones 988 James 2002 Bertoin 2006 Caron et al. 204 to the expectation E G [p{x j } J G] we have E G [p{x j } J G] = E = R + Ω =... { KJ = [ e JGΩ K J k = r n J j= n j! e Jr νdr dω E R + Ω [ ] r n k k J j= n jk!δω k = ω k e JGΩ\{ω } k=2 k = } r n k k J j= n jk! e Jr k νdr k dω k {E [ ]} G e JGΩ\D J. ] r n k k J j= n jk!δω k = ω k
2 Directly calculation with R + Ω rn e Jr νdrdω = γ 0 J + c n Γn and E G [e JGΩ\D J ] = + J/c γ 0 leads to p{x j } J γ 0 c = E G [p{x j } J G] = γ K J 0 e γ 0 ln J+c c Γn k J+c n k J j= n jk!. B Gamma-Negative Binomial Process: Details B. GNBP random count matrix Given the gamma process G ΓPG 0 /c we define X G NBPG p as a negative binomial process such that XA NBGA p for each A Ω. Replacing the Poisson processes in A. with the negative binomial processes defined in this way yields a gammanegative binomial process GNBP: X j NBPG p j G ΓPG 0 /c. With a draw from the gamma process G ΓPG 0 /c expressed as G = r kδ ωk a draw from X j G NBPG p j can be expressed as X j = n jkδ ωk n jk NBr k p j. The GNBP employs row-specific probability parameters p j to model row heterogeneity and hence X j are conditionally independent but not identically distributed if p j at different rows are set differently. Note that the GNBP is previously proposed in Zhou and Carin 205 which focuses on finding the conditional posterior of G without considering the marginalization of G. The GNBP hierarchical construction is conceptually simple but to obtain a random count matrix we have to marginalize out the gamma process G ΓPG 0 /c. As it is difficult to directly marginalize G out of the conditional likelihood of the observed J rows as p{x j } J G p = J j= Γn jk + r k n jk!γr k pn jk j p j r k where p := p... p J we first augment each n jk NBr k p j under its compound Poisson representation as n jk SumLogl jk p j l jk Poisr k q j. 2
3 Define X SumLogPL p as a sum-logarithmic process such that XA SumLogLA p for each A Ω. With X j NBPG p j augmented as X j SumLogPL j p j L j PPq j G we may express the joint likelihood of X j and L j as p{x j L j } J G p = J j= sn jk l jk r l jk k p n jk j p j r k n jk! With l k := J j= l jk similar to the analysis in Section A we can reexpress the likelihood as p{x j L j } J G p = e q GΩ\D r l ke q r k k J j= sn jk l jk p n jk j. B. n jk! Similar to the analysis in Section A. with G marginalized out as p{x j L j } J γ 0 c p = E G [p{x j L j } J G p] we obtain the GNBP random matrix prior in 0 using fn J L J γ 0 c p = p{x j L j } J γ 0 c p. B.2 K J! Although not obvious one may verify that 0 defines the PMF of a compound random count matrix which can be generated via n jk SumLogl jk p j l k... l Jk Multl k q /q... q J /q l k Log[q /c + q ] K J Pois{γ 0 [lnc + q lnc]}. B.3 Let σ... σj denote a random permutation of the column indices. If p j are set differently for different rows then Multl k q σ /q... q σj /q d Multl k q /q... q J /q and hence the introduced random count matrix no longer maintains row exchangeability. Comparing B.3 with 6 one may identify several key differences between the GNBP and NBP random count matrices. First one may increase p j to encourage the jth row to have larger counts than the others. Second both n jk and the column sum n k are generated from compound distributions. In fact if we let p j e then the matrix {l jk } jk in 3
4 B.3 is exactly a NBP random count matrix and the GNBP builds its random matrix using n jk SumLogl jk p j. The sequential construction of a GNBP random count matrix can be intuitively explained as drawing dishes drawing tables at each dish and then drawing customers at each table. Similar to the definition of N + J+ we let L+ J+ represent the new row and columns added to L J. Using 0 following the analysis in Section 2.. one may show with direct calculation that K J+ pn + J+ L+ J+ N J L J θ = K J!K + J+! SumLog l J+k p J+ K J+! q J+ NB l J+k ; l k c + q + q J+ K J+ q J+ Log l J+k ; c + q + q J+ k=k J + Pois { K + J+ ; γ 0 [lnc + q + q J+ lnc + q ] }. B.4 Thus to add a new row we first draw NB[l k q J+ /c + q + q J+ ] tables at existing columns dishes; we then draw K + J+ Pois{γ 0[lnc+q +q J+ lnc+q ]} new dishes each of which is associated with Log[q J+ /c + q + q J+ ] tables; we further draw Logp J+ customers at each table and aggregate the counts across the tables of the same dish as n J+k = lj+k t= n J+kt ; and in the final step we insert the K + J+ new columns into the K J original columns without reordering which again is a one to K J+!/ K J! K + J+! mapping. emphasize that the number of tables customers for a new dish which follows a logarithmic sum-logarithmic distribution must be at least one; the implication is that there are infinite many dishes that have not yet been ordered by any of the tables seated by existing customers. The sequential construction provides a convenient way to construct a GNBP random count matrix one row at a time. With the latent counts l J+k marginalized out one may show that the predictive distribution for N + J+ given N J and L J can be expressed in terms of the Poisson LogLog and We 4
5 GNB distributions as pn + J+ N J L J θ = K J!K + J+! K J+! K J+ k=k J + GNB n J+k ; l k c + q p J+ LogLog n J+k ; c + q p J+ Pois { K + J+ ; γ 0 [lnc + q + q J+ lnc + q ] } B.5 where n LogLogc p represents a logarithmic mixed sum-logarithmic distribution defined on positive integers and n GNBl c p represents a gamma mixed negative binomial distribution defined on Z whose PMFs are shown in Appendix D. B.2 Inference for parameters Both the GNB and LogLog distributions have complicated PMFs involving Stirling numbers of the first kind and it seems difficult to infer their parameters. Fortunately using the likelihoods B. and 0 and the data augmentation techniques developed for the negative binomial distribution Zhou and Carin 205 we are able to derive closed-form conditional posteriors for the GNBP. To complete the model we let γ 0 Gammae 0 /f 0 p j Betaa 0 b 0 and c Gammac 0 /d 0. We sample the model parameters as γ 0 Gamma e 0 + K J f 0 ln c n jk l jk = u t u t Bernoulli t= r k Gamma l k /c + q c+q r k r k + t {GΩ\D J } Gamma γ 0 /c + q p j Beta a 0 + m j b 0 + GΩ c Gamma c 0 + γ 0 /[d 0 + GΩ]. B.6 5
6 C Beta-Negative Binomial Process: Details C. BNBP random count matrix The GNBP generalizes the NBP by replacing the Poisson process in A. using a negative binomial process and shares the negative binomial dispersion parameters across rows. Exploiting an alternative strategy that shares the negative binomial probability parameters across rows we construct a BNBP as X j NBPr j B B BPc B 0 where p k = Bω k is the weight of the atom ω k of the beta process B BPc B 0 and X j B NBPr j B is a negative binomial process such that X j A = k:ω k A n jk n jk NBr j p k for each A Ω. With r := r... r J similar to the analysis in Appendix B the likelihood of the BNBP can be expressed as p{x} J B r = e p r p n k k p k r J j= Γn jk + r j n jk!γr j C. where p denotes the sum over all the atoms in the absolutely continuous space Ω\D J as p := k:n k =0 ln p k and r := J j= r j. Using the Lévy-Khintchine theorem and the Laplace transform of p can be expressed as { E[e sp ] = exp = exp [ [ p s ] νdpdω [0] Ω γ 0 i=0 ] c + i c + i + s = exp { γ 0 [ψc + s ψc]} } 6
7 where ψx = Γ x/γx is the digamma function; we define such a random variable as the logbeta random variable p logbetaγ 0 c whose mean and variance are E[p ] = γ 0 ψ c and Var[p ] = γ 0 ψ 2 c respectively where ψ n x = dn ψx dx n. As before one may verify with direct calculation that defines the PMF of a columni.i.d. random count matrix N J Z J K J which can be generated via n :k DirMultn k r... r J n k Digamr c K J Pois { γ 0 [ψc + r ψc] } C.2 where the PMFs of both the Dirichlet-multinomial DirMult and digamma distributions are shown in the Appendix. Note that if r j are set differently for different rows then DirMultn k r σ... r σj d DirMultn k r... r J and hence the corresponding random count matrix no longer maintains row exchangeability. The sequential construction of a BNBP random count matrix can be intuitively understood as an ice cream buffet process ICBP. Using similar to the analysis in Section 2. we have pn + J+ N J = K J!K + J+! K J+! K J+ k=k J + BNBn J+k ; r J+ n k c + r Digamn J+k ; r J+ c + r Pois { K + J+ ; γ 0 [ψc + r + r J+ ψc + r ] } C.3 where the PMF for the beta-negative binomial BNB distribution is shown in Appendix D. Thus to add a row to N J Z J K J customer J + takes n J+k BNBr J+ n k c + r number of scoops at an existing ice cream column; the customer further selects K + J+ Pois {γ 0 [ψc + r + r J+ ψc + r ]} new ice creams out of the buffet line and takes n J+k 7
8 Digamr J+ c + r number of scoops at each new ice cream. Thus the ICBP can also be considered as a multiple-scoop Indian buffet process an analogy used in Zhou et al Note that when r j we have K + J+ Pois[γ 0/c + J] confirming the derivation about the number of new dishes ice creams in Section 3.2 of Zhou et al. 202 which however provides no descriptions about the distributions of the number of scoops at existing and new ice creams. We emphasize that the number of scoops at a new ice cream which follows a digamma distribution must be at least one; the implication is that there are infinite many ice creams in the buffet line that have not yet been scooped by any of the existing customers. Similar to the GNBP random count matrix the BNBP random count matrix is column exchangeable but not row exchangeable if the row-specific dispersion parameters r j are fixed at different values. A related marked BNBP of Zhou et al. 202 Zhou and Carin 202 attaches an independent negative binomial dispersion parameter r k for each atom of the beta process and infers its values under a finite approximation of the beta process; another related BNBP of Broderick et al. 205 uses a single dispersion parameter r and sets its value empirically. None of these papers however marginalize out the beta process to define a prior on columni.i.d. random count matrices a challenge tackled in this paper. Independently of our work Heaukulani and Roy 203 also describe the marginalization of the beta process from the negative binomial process where the obtained BNBP is called the negative binomial Indian buffet process. Although the idea of marginalizing out the beta process is shared by both papers the techniques and combinatorial arguments used are quite different. Their paper focuses on a special case of the BNBP where a single dispersion parameter r is used for all the X j s. Our model allows row-specific dispersion parameters r j develops an efficient inference scheme for all model parameters derives the predictive distribution of a new row count vector under a BNBP random count matrix and also situates the BNBP in the larger family of count-matrix priors derived from negativebinomial processes. Due to different parameterization of the Lévy measure the beta process mass parameter γ 0 in this paper can be considered as γ 0 c in Thibaux and Jordan 2007 and Zhou et al
9 C.2 Inference for parameters For all the atoms in the absolutely continuous part of the space Ω\D J we have that νdpdω = p p c+r dpb 0 dω. Thus the Laplace transform of p can be expressed as E[e sp ] = exp { γ 0 [ψc + r + s ψc + r ]} and hence we have p logbetaγ 0 c + r. With its Laplace transform we sample p using the method proposed in Ridout To complete the model we let γ 0 Gammae 0 /f 0 r j Gammaa 0 b 0 and c Gammac 0 /d 0. Using both the conditional likelihood C. and the marginal likelihood and the data augmentation techniques developed in Zhou and Carin 205 we sample the model parameters as γ 0 Gamma e 0 + K J f 0 + ψc + r ψc p k Betan k c + r p logbetaγ 0 c + r n jk r j l jk = u t u t Bernoulli r j + t t= r j Gamma a 0 + l j b 0 + p K J ln p k. C.4 The only parameter that does not have an analytic conditional posterior is the concentration parameter c. Since using Campbell s theorem Kingman 993 we have E[ k p k] = [0] Ω pνdpdω = γ 0/c to sample c we use Qc = Gamma c 0 + γ 0 d 0 + p + K J p k C.5 as the proposal distribution in an independence chain Metropolis-Hastings sampling step. One may also sample c using a griddy-gibbs sampler Ritter and Tanner
10 D Some useful distributions Direct calculation shows that the logarithmic mixed sum-logarithmic LogLog distribution expressed as n SumLogl p l Log ln p has PMF c ln p f N n c p = n l= snl p n n! Γl [c ln p] l ln[c ln p] lnc for n { 2...}; and the negative binomial mixed sum-logarithmic distribution expressed as n SumLogl p l NB has PMF e ln p c ln p f N n e c p = n l=0 c e p n sn l Γen! Γe + l [c ln p] e+l for n {0...}. The iterative calculation of sn l /n! under the logarithmic scale is described in Appendix E. Using 2 one may show that the negative binomial mixed sum-logarithmic distribution shown above is equivalent to a gamma mixed negative binomial GNB distribution generated by n NBr p r Gammae /c. Note that n LogLogc p is the limit of n GNBe c p as e 0 conditioning on n > 0 thus it can be considered as a truncated GNB distribution. The Dirichlet-multinomial DirMult distribution Mosimann 962 Madsen et al is a Dirichlet mixed multinomial distribution with PMF DirMultn :k n k r = n k! Γr J J j= n kj! Γn k + r j= Γn kj + r j Γr j and the digamma distribution Sibuya 979 has PMF Digamn r c = Γr + nγc + r ψc + r ψc nγc + n + rγr D. where n = Since the beta-negative binomial BNB distribution has PMF f N n r e c = 0 NBn; r pbetap; e cdp = Γr + n Γc + rγe + nγe + c n!γr Γe + c + r + nγeγc 0
11 one may show that conditioning on n > 0 n BNBr e c becomes n Digamr c as e 0. Thus the digamma distribution can be considered as a truncated BNB distribution. Since the Laplace transform of the logbeta random variable p logbetaγ 0 c can be reexpressed as E[e sp ] = { exp i=0 γ 0 c + i [ + s ]} c + i we can generate p logbetaγ 0 c as an infinite sum of independent compound Poisson random variables as p = u i λ i λ i = i=0 t= γ0 λ it u i Pois λ it Gamma c + i. D.2 c + i E Calculating Stirling Numbers of the First Kind The unsigned Stirling numbers of the first kind sn l appear in the predictive distribution for the GNBP. It is numerically unstable to recursively calculate sn l based on sn l = n sn l + sn l as sn l would rapidly reach the maximum value allowed by a finite precision machine as n increases. Denoting gn l = ln sn l lnn! we iteratively calculate gn l with gn = lnn lnn + ln gn gn n = gn n ln n and gn l = ln n n + gn l + ln { + exp[gn l gn l lnn ]} for 2 l n. This approach is found to be numerically stable. References J. Bertoin. Random fragmentation and coagulation processes volume 02. Cambridge University Press 2006.
12 T. Broderick L. Mackey J. Paisley and M. I. Jordan. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Analysis and Machine Intelligence 205. F. Caron Y. W. Teh and B. T. Murphy. Bayesian nonparametric Plackett-Luce models for the analysis of clustered ranked data. Annal of Applied Statistics 204. D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes volume 2. Springer 988. C. Heaukulani and D. M. Roy. The combinatorial structure of beta negative binomial processes. arxiv: L. F. James. Poisson process partition calculus with applications to exchangeable models and bayesian nonparametrics. arxiv preprint math/ J. F. C. Kingman. Poisson Processes. Oxford University Press 993. R. E. Madsen D. Kauchak and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In ICML J. E. Mosimann. On the compound multinomial distribution the multivariate β-distribution and correlations among proportions. Biometrika pages M. S. Ridout. Generating random numbers from a distribution specified by its Laplace transform. Statistics and Computing pages C. Ritter and M. A. Tanner. Facilitating the Gibbs sampler: the Gibbs stopper and the griddy- Gibbs sampler. Journal of the American Statistical Association 992. M. Sibuya. Generalized hypergeometric digamma and trigamma distributions. Annals of the Institute of Statistical Mathematics pages R. Thibaux and M. I. Jordan. Hierarchical beta processes and the Indian buffet process. In AISTATS M. Zhou and L. Carin. Augment-and-conquer negative binomial processes. In NIPS 202. M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence 205. M. Zhou L. Hannah D. Dunson and L. Carin. Beta-negative binomial process and Poisson factor analysis. In AISTATS
Priors for Random Count Matrices with Random or Fixed Row Sums
Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences
More informationBeta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 77,
More informationBeta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling Mingyuan Zhou IROM Department, McCombs School of Business The University of Texas at Austin, Austin, TX 787,
More informationAugment-and-Conquer Negative Binomial Processes
Augment-and-Conquer Negative Binomial Processes Mingyuan Zhou Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 mz@ee.duke.edu Lawrence Carin Dept. of Electrical and Computer
More informationNegative Binomial Process Count and Mixture Modeling
Negative Binomial Process Count and Mixture Modeling Mingyuan Zhou and Lawrence Carin Abstract The seemingly disjoint problems of count and mixture modeling are united under the negative binomial NB process.
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationPoisson Latent Feature Calculus for Generalized Indian Buffet Processes
Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Lancelot F. James (paper from arxiv [math.st], Dec 14) Discussion by: Piyush Rai January 23, 2015 Lancelot F. James () Poisson Latent
More informationBayesian non parametric approaches: an introduction
Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric
More informationMachine Learning Summer School, Austin, TX January 08, 2015
Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationBayesian Nonparametric Models for Ranking Data
Bayesian Nonparametric Models for Ranking Data François Caron 1, Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More informationHierarchical Models, Nested Models and Completely Random Measures
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/238729763 Hierarchical Models, Nested Models and Completely Random Measures Article March 2012
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationInfinite Latent Feature Models and the Indian Buffet Process
Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational
More informationOn collapsed representation of hierarchical Completely Random Measures
Gaurav Pandey Ambedkar Dukkipati Department of Computer Science and Automation Indian Institute of Science, Bangalore-560012, India GP88@CSA.IISC.ERNET.IN AD@CSA.IISC.ERNET.IN In this paper, it is our
More informationNonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)
Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron INRIA IMB - University of Bordeaux Talence, France Francois.Caron@inria.fr Abstract We develop a novel Bayesian nonparametric model for
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationNon-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources
th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical
More informationBeta-Negative Binomial Process and Poisson Factor Analysis
Mingyuan Zhou Lauren A. Hannah David B. Dunson Lawrence Carin Department of ECE, Department of Statistical Science, Duke University, Durham NC 2778, USA Abstract A beta-negative binomial (BNB process is
More informationA Stick-Breaking Construction of the Beta Process
John Paisley 1 jwp4@ee.duke.edu Aimee Zaas 2 aimee.zaas@duke.edu Christopher W. Woods 2 woods004@mc.duke.edu Geoffrey S. Ginsburg 2 ginsb005@duke.edu Lawrence Carin 1 lcarin@ee.duke.edu 1 Department of
More informationCombinatorial Clustering and the Beta. Negative Binomial Process
Combinatorial Clustering and the Beta 1 Negative Binomial Process Tamara Broderick, Lester Mackey, John Paisley, Michael I. Jordan Abstract arxiv:1111.1802v5 [stat.me] 10 Jun 2013 We develop a Bayesian
More informationHaupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process
Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationThe Indian Buffet Process: An Introduction and Review
Journal of Machine Learning Research 12 (2011) 1185-1224 Submitted 3/10; Revised 3/11; Published 4/11 The Indian Buffet Process: An Introduction and Review Thomas L. Griffiths Department of Psychology
More informationBayesian Nonparametrics: some contributions to construction and properties of prior distributions
Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationarxiv: v2 [stat.ml] 10 Sep 2012
Distance Dependent Infinite Latent Feature Models arxiv:1110.5454v2 [stat.ml] 10 Sep 2012 Samuel J. Gershman 1, Peter I. Frazier 2 and David M. Blei 3 1 Department of Psychology and Princeton Neuroscience
More informationFeature Allocations, Probability Functions, and Paintboxes
Feature Allocations, Probability Functions, and Paintboxes Tamara Broderick, Jim Pitman, Michael I. Jordan Abstract The problem of inferring a clustering of a data set has been the subject of much research
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationThe Kernel Beta Process
The Kernel Beta Process Lu Ren Electrical & Computer Engineering Dept. Duke University Durham, NC 27708 lr22@duke.edu David Dunson Department of Statistical Science Duke University Durham, NC 27708 dunson@stat.duke.edu
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationDependent hierarchical processes for multi armed bandits
Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationNonparametric Factor Analysis with Beta Process Priors
Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationTree-Based Inference for Dirichlet Process Mixtures
Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationHyperparameter estimation in Dirichlet process mixture models
Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and
More informationAn Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism Aaron C. Courville, Douglas Eck and Yoshua Bengio Department of Computer Science and Operations Research University of Montréal Montréal, Québec,
More informationStick-Breaking Beta Processes and the Poisson Process
Stic-Breaing Beta Processes and the Poisson Process John Paisley David M. Blei 3 Michael I. Jordan,2 Department of EECS, 2 Department of Statistics, UC Bereley 3 Computer Science Department, Princeton
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationExchangeable random hypergraphs
Exchangeable random hypergraphs By Danna Zhang and Peter McCullagh Department of Statistics, University of Chicago Abstract: A hypergraph is a generalization of a graph in which an edge may contain more
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationStochastic Variational Inference for the HDP-HMM
Stochastic Variational Inference for the HDP-HMM Aonan Zhang San Gultekin John Paisley Department of Electrical Engineering & Data Science Institute Columbia University, New York, NY Abstract We derive
More informationNumerical Analysis for Statisticians
Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method
More informationarxiv: v1 [stat.ml] 20 Nov 2012
A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationBeta processes, stick-breaking, and power laws
Beta processes, stick-breaking, and power laws Tamara Broderick Michael Jordan Jim Pitman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-211-125
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationA permutation-augmented sampler for DP mixture models
Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBeta processes, stick-breaking, and power laws
Beta processes, stick-breaking, and power laws T. Broderick, M. Jordan, J. Pitman Presented by Jixiong Wang & J. Li November 17, 2011 DP vs. BP Dirichlet Process Beta Process DP vs. BP Dirichlet Process
More informationOn the Fisher Bingham Distribution
On the Fisher Bingham Distribution BY A. Kume and S.G Walker Institute of Mathematics, Statistics and Actuarial Science, University of Kent Canterbury, CT2 7NF,UK A.Kume@kent.ac.uk and S.G.Walker@kent.ac.uk
More informationarxiv: v1 [stat.ml] 30 Mar 2015
Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process arxiv:53.8535v [stat.ml] 3 Mar 25 ABSTRACT Junyu Xuan Jie Lu University of Technology University of Technology Sydney Sydney 5
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation Julyan Arbel Inria Grenoble, Université Grenoble Alpes julyan.arbel@inria.fr Igor Prünster Bocconi University, Milan
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationCompound Random Measures
Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models
More informationInfering the Number of State Clusters in Hidden Markov Model and its Extension
Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationBayesian nonparametric models of sparse and exchangeable random graphs
Bayesian nonparametric models of sparse and exchangeable random graphs F. Caron & E. Fox Technical Report Discussion led by Esther Salazar Duke University May 16, 2014 (Reading group) May 16, 2014 1 /
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationarxiv: v1 [stat.ml] 8 Jan 2012
A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process
More information