On the posterior structure of NRMI

Similar documents
Asymptotics for posterior hazards

Asymptotics for posterior hazards

Dependent hierarchical processes for multi armed bandits

Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation

Dependent Random Measures and Prediction

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Discussion of On simulation and properties of the stable law by L. Devroye and L. James

Slice sampling σ stable Poisson Kingman mixture models

Dependent mixture models: clustering and borrowing information

On some distributional properties of Gibbs-type priors

A Brief Overview of Nonparametric Bayesian Models

A marginal sampler for σ-stable Poisson-Kingman mixture models

Compound Random Measures

New Dirichlet Mean Identities

Foundations of Nonparametric Bayesian Methods

Bayesian nonparametric latent feature models

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation

Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors

On the Truncation Error of a Superposed Gamma Process

Non-Parametric Bayes

Unit-rate Poisson representations of completely random measures

arxiv: v1 [math.st] 28 Feb 2015

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics

Normalized kernel-weighted random measures

Priors for Random Count Matrices with Random or Fixed Row Sums

Controlling the reinforcement in Bayesian non-parametric mixture models

On the Stick-Breaking Representation for Homogeneous NRMIs

Bayesian nonparametric models of sparse and exchangeable random graphs

arxiv: v2 [math.st] 27 May 2014

arxiv: v2 [math.st] 22 Apr 2016

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

Some Developments of the Normalized Random Measures with Independent Increments

Full Bayesian inference with hazard mixture models

Bayesian nonparametrics

Bayesian nonparametric models for bipartite graphs

Hybrid Dirichlet processes for functional data

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

2 Ishwaran and James: Gibbs sampling stick-breaking priors Our second method, the blocked Gibbs sampler, works in greater generality in that it can be

Poisson Latent Feature Calculus for Generalized Indian Buffet Processes

Dirichlet Processes: Tutorial and Practical Course

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Nonparametric Models on Decomposable Graphs

On a coverage model in communications and its relations to a Poisson-Dirichlet process

Lecture 2: Priors and Conjugacy

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

Beta processes, stick-breaking, and power laws

Probability and Distributions

Exchangeable random partitions for statistical and economic modelling

arxiv: v1 [stat.ml] 20 Nov 2012

Improper mixtures and Bayes s theorem

An adaptive truncation method for inference in Bayesian nonparametric models

CS Lecture 19. Exponential Families & Expectation Propagation

Bayesian Nonparametric Models

Flexible Bayesian Nonparametric Priors and Bayesian Computational Methods

Dirichlet Process. Yee Whye Teh, University College London

Bayesian Nonparametric Models for Ranking Data

Infinite latent feature models and the Indian Buffet Process

Two Useful Bounds for Variational Inference

Two Tales About Bayesian Nonparametric Modeling

Stat 451 Lecture Notes Numerical Integration

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Bayesian nonparametric model for sparse dynamic networks

Bayesian estimation of the discrepancy with misspecified parametric models

Nonparametric Bayesian Methods - Lecture I

Lecture 3a: Dirichlet processes

Math 362, Problem set 1

Slice Sampling Mixture Models

Sample Spaces, Random Variables

Combinatorial Clustering and the Beta. Negative Binomial Process

A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Defining Predictive Probability Functions for Species Sampling Models

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Beta processes, stick-breaking, and power laws

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Exchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process

STAT Advanced Bayesian Inference

arxiv: v3 [math.st] 30 Nov 2018

Local-Mass Preserving Prior Distributions for Nonparametric Bayesian Models

Bayesian semiparametric analysis of short- and long- term hazard ratios with covariates

13: Variational inference II

Feature Allocations, Probability Functions, and Paintboxes

Hyperparameter estimation in Dirichlet process mixture models

Hierarchical Models & Bayesian Model Selection

Predictivist Bayes density estimation

Bayesian nonparametric models for bipartite graphs

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

The Indian Buffet Process: An Introduction and Review

Probabilistic Graphical Models

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian non-parametric model to longitudinally predict churn

Mixture Modeling for Marked Poisson Processes

Poisson random measure: motivation

Stat 451 Lecture Notes Simulating Random Variables

General Bayesian Inference I

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Transcription:

On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007

Outline CRM and NRMI Completely random measures (CRM) NRMIs Relation to other random probability measures

Outline CRM and NRMI Completely random measures (CRM) NRMIs Relation to other random probability measures Posterior structure Conjugacy Posterior characterization Predictive distributions Generalized Pólya urn scheme The two parameter Poisson Dirichlet process

Outline CRM and NRMI Completely random measures (CRM) NRMIs Relation to other random probability measures Posterior structure Conjugacy Posterior characterization Predictive distributions Generalized Pólya urn scheme The two parameter Poisson Dirichlet process Hierarchical mixture models NRMI mixture model The posterior distribution of the mixture model

Outline CRM and NRMI Completely random measures (CRM) NRMIs Relation to other random probability measures Posterior structure Conjugacy Posterior characterization Predictive distributions Generalized Pólya urn scheme The two parameter Poisson Dirichlet process Hierarchical mixture models NRMI mixture model The posterior distribution of the mixture model Some concluding remarks

Completely random measures DEFINITION (Kingman, 1967). µ is a completely random measure (CRM) on (X, X ) if (i) µ( ) = 0 (ii) for any collection of disjoint sets in X, B 1, B 2,..., the random variables µ(b 1 ), µ(b 2 ),... are mutually independent and µ( j 1 B j ) = j 1 µ(b j )

Completely random measures DEFINITION (Kingman, 1967). µ is a completely random measure (CRM) on (X, X ) if (i) µ( ) = 0 (ii) for any collection of disjoint sets in X, B 1, B 2,..., the random variables µ(b 1 ), µ(b 2 ),... are mutually independent and µ( j 1 B j ) = j 1 µ(b j ) Let G ν = {g : g(x) µ(dx) < }. Then, µ is characterized by its X Laplace functional [ E e X g(x) µ(dx)] { } = exp [1 e v g(x) ] ν(dv, dx) R + X for any g G ν. In the following, denote by ψ( ) the Laplace exponent R + X [1 e v ] ν(dv, dx).

Completely random measures DEFINITION (Kingman, 1967). µ is a completely random measure (CRM) on (X, X ) if (i) µ( ) = 0 (ii) for any collection of disjoint sets in X, B 1, B 2,..., the random variables µ(b 1 ), µ(b 2 ),... are mutually independent and µ( j 1 B j ) = j 1 µ(b j ) Let G ν = {g : g(x) µ(dx) < }. Then, µ is characterized by its X Laplace functional [ E e X g(x) µ(dx)] = exp { } [1 e v g(x) ] ν(dv, dx) R + X for any g G ν. In the following, denote by ψ( ) the Laplace exponent R + X [1 e v ] ν(dv, dx). = µ is identified by the intensity ν (which represents the intensity of the underlying Poisson random measure).

Letting α be a non atomic and σ finite measure on X. According to the decomposition of ν we distinguish two classes of CRM: (a) if ν(dv, dx) = ρ(dv) α(dx), we say that µ is homogeneous; (b) if ν(dv, dx) = ρ(dv x) α(dx), we say that µ is non homogeneous.

Letting α be a non atomic and σ finite measure on X. According to the decomposition of ν we distinguish two classes of CRM: (a) if ν(dv, dx) = ρ(dv) α(dx), we say that µ is homogeneous; (b) if ν(dv, dx) = ρ(dv x) α(dx), we say that µ is non homogeneous. Necessary assumptions for the normalization to be well defined: (A) µ is almost surely finite R + X [1 e λv ] ν(dv, dx) < for every λ > 0 if µ is a homogeneous CRM α being a finite measure

Letting α be a non atomic and σ finite measure on X. According to the decomposition of ν we distinguish two classes of CRM: (a) if ν(dv, dx) = ρ(dv) α(dx), we say that µ is homogeneous; (b) if ν(dv, dx) = ρ(dv x) α(dx), we say that µ is non homogeneous. Necessary assumptions for the normalization to be well defined: (A) µ is almost surely finite R + X [1 e λv ] ν(dv, dx) < for every λ > 0 if µ is a homogeneous CRM α being a finite measure (B) µ is almost surely strictly positive ν(r +, X) = infinite activity of µ

Normalized random measures with independent increments (NRMIs) DEFINITION. Let µ be a CRM on (X, X ) satisfying (A) and (B). Then the random probability measure on (X, X ) given by P( ) = µ( ) µ(x) is well defined and termed normalized random measure with independent increments (NRMI).

Normalized random measures with independent increments (NRMIs) DEFINITION. Let µ be a CRM on (X, X ) satisfying (A) and (B). Then the random probability measure on (X, X ) given by P( ) = µ( ) µ(x) is well defined and termed normalized random measure with independent increments (NRMI). A NRMI is uniquely characterized by the intensity ν of the corresponding CRM µ: according to the structure of ν we will distinguish homogeneous and non homogeneous NRMI.

Special cases of NRMI 1. Dirichlet process: Let µ be a gamma CRM with α a finite measure on X and ν(dv, dx) = e v dv α(dx) v = NRMI is a Dirichlet process with parameter measure α.

Special cases of NRMI 1. Dirichlet process: Let µ be a gamma CRM with α a finite measure on X and ν(dv, dx) = e v dv α(dx) v = NRMI is a Dirichlet process with parameter measure α. 2. Normalized generalized gamma (GG) process: Let µ be a GG CRM (Brix, 99) with α a finite measure on X and α ν(dv, dx) = Γ(1 α) s 1 α e τs ds α(dx) = NRMI is a normalized GG process with parameter α.

Special cases of NRMI 1. Dirichlet process: Let µ be a gamma CRM with α a finite measure on X and ν(dv, dx) = e v dv α(dx) v = NRMI is a Dirichlet process with parameter measure α. 2. Normalized generalized gamma (GG) process: Let µ be a GG CRM (Brix, 99) with α a finite measure on X and α ν(dv, dx) = Γ(1 α) s 1 α e τs ds α(dx) = NRMI is a normalized GG process with parameter α. 3. Normalized extended gamma process: Let µ be an extended gamma CRM (Dykstra & Laud, 81) with ν(dv, dx) = e b(t)v dv dt v with b a strictly positive function and α s.t. µ(x) < a.s. = NRMI is a normalized extended gamma process with parameters α and b.

Relation to other random probability measures Homogeneous NRMI are members of the following families of random probability measures: (i) Species sampling models (Pitman, 96) are defined as P( ) = i 1 P i δ Xi ( ) + ( 1 i 1 P i ) H( ) where 0 < P i < 1 are random weights such that i 1 P i 1, independent of the locations X i, which are i.i.d. with some non atomic distribution H.

Relation to other random probability measures Homogeneous NRMI are members of the following families of random probability measures: (i) Species sampling models (Pitman, 96) are defined as P( ) = i 1 P i δ Xi ( ) + ( 1 i 1 P i ) H( ) where 0 < P i < 1 are random weights such that i 1 P i 1, independent of the locations X i, which are i.i.d. with some non atomic distribution H. Problem: concrete assignment of the random weights P i : Stick-breaking procedure (Ishwaran and James 2001). Remark: A non homogeneous NRMI is not a species sampling model: weights and locations are not independent.

Relation to other random probability measures Homogeneous NRMI are members of the following families of random probability measures: (i) Species sampling models (Pitman, 96) are defined as P( ) = i 1 P i δ Xi ( ) + ( 1 i 1 P i ) H( ) where 0 < P i < 1 are random weights such that i 1 P i 1, independent of the locations X i, which are i.i.d. with some non atomic distribution H. Problem: concrete assignment of the random weights P i : Stick-breaking procedure (Ishwaran and James 2001). Remark: A non homogeneous NRMI is not a species sampling model: weights and locations are not independent. (ii) Poisson Kingman models (Pitman, 03): More tractable than general species sampling models, but is still difficult to derive expressions for posterior quantities.

Characterization of the Dirichlet process (X n ) n 1 is a sequence of exchangeable observations with values in X governed by a NRMI.

Characterization of the Dirichlet process (X n ) n 1 is a sequence of exchangeable observations with values in X governed by a NRMI. A sample X (n) = (X 1,..., X n ) will contain: X1,..., X k (n) the k distinct observations in X n j > 0 the number of observations equal to Xj (j = 1,..., k).

Characterization of the Dirichlet process (X n ) n 1 is a sequence of exchangeable observations with values in X governed by a NRMI. A sample X (n) = (X 1,..., X n ) will contain: X1,..., X k (n) the k distinct observations in X n j > 0 the number of observations equal to Xj (j = 1,..., k). Let P be the set of all NRMIs and let P P. The posterior distribution of P, given X (n), is still in P if and only if P is a Dirichlet process.

Characterization of the Dirichlet process (X n ) n 1 is a sequence of exchangeable observations with values in X governed by a NRMI. A sample X (n) = (X 1,..., X n ) will contain: X1,..., X k (n) the k distinct observations in X n j > 0 the number of observations equal to Xj (j = 1,..., k). Let P be the set of all NRMIs and let P P. The posterior distribution of P, given X (n), is still in P if and only if P is a Dirichlet process. = CONJUGACY is a distinctive feature of the Dirichlet process.

Characterization of the Dirichlet process (X n ) n 1 is a sequence of exchangeable observations with values in X governed by a NRMI. A sample X (n) = (X 1,..., X n ) will contain: X1,..., X k (n) the k distinct observations in X n j > 0 the number of observations equal to Xj (j = 1,..., k). Let P be the set of all NRMIs and let P P. The posterior distribution of P, given X (n), is still in P if and only if P is a Dirichlet process. = CONJUGACY is a distinctive feature of the Dirichlet process. Nonetheless, conditionally on a latent variable U and the data X (n), the (posterior) NRMI P X (n), U is still a NRMI.

The latent variable U U is not an auxiliary variable: it has a precise meaning summarizing the normalization procedure and the distribution of µ(x).

The latent variable U U is not an auxiliary variable: it has a precise meaning summarizing the normalization procedure and the distribution of µ(x). τ m (u x) = R + s m e us ρ x (ds) for any m 1 and x X

The latent variable U U is not an auxiliary variable: it has a precise meaning summarizing the normalization procedure and the distribution of µ(x). τ m (u x) = s m e us ρ R + x (ds) for any m 1 and x X U 0 is a positive random variable with density f 0 (u) e ψ(u) τ(u x) η(dx) X

The latent variable U U is not an auxiliary variable: it has a precise meaning summarizing the normalization procedure and the distribution of µ(x). τ m (u x) = s m e us ρ R + x (ds) for any m 1 and x X U 0 is a positive random variable with density f 0 (u) e ψ(u) τ(u x) η(dx) U n is a positive random variable whose density, conditional on the data X (n), is for any n 1 f (u X (n) ) u n 1 e ψ(u) X k j=1 τ nj (u X j )

The latent variable U U is not an auxiliary variable: it has a precise meaning summarizing the normalization procedure and the distribution of µ(x). τ m (u x) = s m e us ρ R + x (ds) for any m 1 and x X U 0 is a positive random variable with density f 0 (u) e ψ(u) τ(u x) η(dx) U n is a positive random variable whose density, conditional on the data X (n), is for any n 1 f (u X (n) ) u n 1 e ψ(u) X k j=1 τ nj (u X j ) Remark: The distribution of (U X (n) ) is a mixture of gamma distributions with mixing measure the posterior total mass ( µ(x) X (n) ) y n f U X (n)(u) = Γ(n) un 1 e yu Q (dy X (n)) (0,+ ) where Q( X (n) ) denotes the posterior distribution of µ(x).

The posterior distribution of the CRM µ

The posterior distribution of the CRM µ The posterior distribution of µ, given X (n), is a mixture with respect to f (u X (n) )

The posterior distribution of the CRM µ The posterior distribution of µ, given X (n), is a mixture with respect to f (u X (n) ) Given U n = u and X (n), where µ d = µ u + k i=1 J (u) i δ X i

The posterior distribution of the CRM µ The posterior distribution of µ, given X (n), is a mixture with respect to f (u X (n) ) Given U n = u and X (n), where (i) jump J (u) i at Xi µ d = µ u + k i=1 J (u) i δ X i has density f i (s) s n i e us ρ X (ds) i

The posterior distribution of the CRM µ The posterior distribution of µ, given X (n), is a mixture with respect to f (u X (n) ) Given U n = u and X (n), where (i) jump J (u) i at Xi µ d = µ u + (ii) µ u is a CRM with intensity k i=1 J (u) i δ X i has density f i (s) s n i e us ρ X (ds) i ν (u) (dx, ds) = e us ρ x (ds) η(dx)

The posterior distribution of the CRM µ The posterior distribution of µ, given X (n), is a mixture with respect to f (u X (n) ) Given U n = u and X (n), where (i) jump J (u) i at Xi µ d = µ u + (ii) µ u is a CRM with intensity k i=1 J (u) i δ X i has density f i (s) s n i e us ρ X (ds) i ν (u) (dx, ds) = e us ρ x (ds) η(dx) (iii) µ u and J (u) i (i = 1,..., k) are independent.

The posterior distribution of the NRMI P It now follows easily that given U n and X (n), the (posterior) distribution of P is again a NRMI:

The posterior distribution of the NRMI P It now follows easily that given U n and X (n), the (posterior) distribution of P is again a NRMI: P U n, X (n) = d = µ u + k i=1 J(u) i µ u (X) + k δ X i d = w µ u + (1 w) µ u (X) with w = µ u (X)( µ u (X) + k i=1 J(u) i ) 1. i=1 J(u) i k i=1 J(u) i k r=1 J(u) r δ X i

The posterior distribution of the normalized GG process Let P be a normalized GG process. Then the (posterior) distribution of µ given U n and X (n), µ can be represented as where µ u + k i=1 J (u) i δ X i

The posterior distribution of the normalized GG process Let P be a normalized GG process. Then the (posterior) distribution of µ given U n and X (n), µ can be represented as where µ u + k i=1 J (u) i δ X i (i) µ u is a GG CRM with intensity measure ν (u) (ds, dx) = σ Γ(1 σ) s 1 σ e (u+1)s ds α(dx) (ii) the fixed points of discontinuity coincide with the distinct observations Xi, the jumps J i Gamma(u + 1, n i σ), for i = 1,..., k; (iii) µ (u) and J i (i = 1,..., k) are independent. Moreover, the distribution of U, conditional on X (n), is f (u X (n) ) un 1 e α(x)(1+u)σ (u + 1) n kσ.

Predictive distributions The (predictive) distribution of X n+1, given X (n), coincides with P[X n+1 dx n+1 X 1,..., X n ] = w (n) α(dx n+1 ) + 1 n k j=1 w (n) j δ X j (dx n+1 )

Predictive distributions The (predictive) distribution of X n+1, given X (n), coincides with P[X n+1 dx n+1 X 1,..., X n ] = w (n) α(dx n+1 ) + 1 n k j=1 w (n) j δ X j (dx n+1 ) where w (n) = 1 n + 0 u τ 1 (u x n+1 ) f (u X (n) ) du + w (n) j = 0 u τn +1(u X j j ) τ nj (u Xj f (u X (n) ) du )

Predictive distributions The (predictive) distribution of X n+1, given X (n), coincides with P[X n+1 dx n+1 X 1,..., X n ] = w (n) α(dx n+1 ) + 1 n k j=1 w (n) j δ X j (dx n+1 ) where w (n) = 1 n + 0 u τ 1 (u x n+1 ) f (u X (n) ) du + w (n) j = 0 u τn +1(u X j j ) τ nj (u Xj f (u X (n) ) du ) For the homogeneous case one obtains the predictive distributions of Pitman (2003).

Sampling from the marginal distribution of the X i s Note that, conditionally on U n = u, the predictive distribution is k P[X n+1 dx n+1 X (n), U n = u] κ 1 (u) τ 1 (u x n+1 ) α(dx n+1 )+ where κ 1 (u) = X τ 1(u x) α(dx). j=1 τ nj +1(u Xj ) τ nj (u Xj δ ) X (dx n+1 ) j

Sampling from the marginal distribution of the X i s Note that, conditionally on U n = u, the predictive distribution is k P[X n+1 dx n+1 X (n), U n = u] κ 1 (u) τ 1 (u x n+1 ) α(dx n+1 )+ j=1 τ nj +1(u Xj ) τ nj (u Xj δ ) X (dx n+1 ) j where κ 1 (u) = X τ 1(u x) α(dx). From this one can implement an analogue of the Pólya urn scheme in order to draw a sample X (n) from P

Sampling from the marginal distribution of the X i s Note that, conditionally on U n = u, the predictive distribution is k P[X n+1 dx n+1 X (n), U n = u] κ 1 (u) τ 1 (u x n+1 ) α(dx n+1 )+ j=1 τ nj +1(u Xj ) τ nj (u Xj δ ) X (dx n+1 ) j where κ 1 (u) = X τ 1(u x) α(dx). From this one can implement an analogue of the Pólya urn scheme in order to draw a sample X (n) from P Let m(dx u) τ 1 (u x) α(dx)

Sampling from the marginal distribution of the X i s Note that, conditionally on U n = u, the predictive distribution is k P[X n+1 dx n+1 X (n), U n = u] κ 1 (u) τ 1 (u x n+1 ) α(dx n+1 )+ j=1 τ nj +1(u Xj ) τ nj (u Xj δ ) X (dx n+1 ) j where κ 1 (u) = X τ 1(u x) α(dx). From this one can implement an analogue of the Pólya urn scheme in order to draw a sample X (n) from P Let For any i 2 set m(dx u) τ 1 (u x) α(dx) m(dx i x i 1, u) = P[X i dx i X i 1, U i 1 = u]

Generalization of a Pólya urn scheme 1) Sample U 0 from f 0 (u) 2) Sample X 1 from m(dx U 0 )

Generalization of a Pólya urn scheme 1) Sample U 0 from f 0 (u) 2) Sample X 1 from m(dx U 0 ) 3) At step i Sample U i 1 from f (u X (i 1) ) Generate ξ i from m(dξ U i 1 ) and X i from m(dx X 1,..., X i 1, U i 1 ) { ξi prob κ X i = 1 (U i 1 ) X j,i 1 prob τ nj,i 1 +1(U i 1 X j,i 1 )/τ n j,i 1 +1(U i 1 X j,i 1 ) where X j,i 1 is the j th distinct value among X 1,..., X i 1 and n j,i 1 = card{x s : X s = X j,i 1, s = 1,..., i 1}

Sampling the posterior random measure Recall that, given U n = u and X (n), µ d = µ u + k i=1 J(u) i δ X i

Sampling the posterior random measure Recall that, given U n = u and X (n), µ = d µ u + k Algorithm: i=1 J(u) i δ X i

Sampling the posterior random measure Recall that, given U n = u and X (n), µ = d µ u + k Algorithm: (1) Sample U n from f (u X (i 1) ) (2) Sample J (Un) i from the density f i (s) ds s n i e Uns ρ X (ds) i i=1 J(u) i δ X i

Sampling the posterior random measure Recall that, given U n = u and X (n), µ = d µ u + k Algorithm: (1) Sample U n from f (u X (i 1) ) (2) Sample J (Un) i from the density f i (s) ds s n i e Uns ρ X (ds) i i=1 J(u) i (3) Simulate a realization of the completely random measure µ (Un) with intensity measure ν (Un) (dx, ds) = e Uns ρ x (ds) η(dx) via the Ferguson and Klass algorithm. δ X i

The two parameter Poisson Dirichlet process The PD(σ, θ) can be represented (Pitman, 96) as species sampling model p i=1 iδ Xi with stick breaking weights i 1 ind iid p i = V i (1 V j ) V i beta(θ + iσ, 1 σ), X i H j=1

The two parameter Poisson Dirichlet process The PD(σ, θ) can be represented (Pitman, 96) as species sampling model p i=1 iδ Xi with stick breaking weights i 1 ind iid p i = V i (1 V j ) V i beta(θ + iσ, 1 σ), X i H j=1 Using this representation, in Pitman ( 96), it is shown that k P X (n) = d (1 pi ) P k (k) + i=1 where P (k) = PD(σ, θ + kσ) and (p 1,..., p k ) Dir(n 1 σ,..., n k σ, θ + kσ) j=1 p j δ X j

The two parameter Poisson Dirichlet process The PD(σ, θ) can be represented (Pitman, 96) as species sampling model p i=1 iδ Xi with stick breaking weights i 1 ind iid p i = V i (1 V j ) V i beta(θ + iσ, 1 σ), X i H j=1 Using this representation, in Pitman ( 96), it is shown that k P X (n) = d (1 pi ) P k (k) + where P (k) = PD(σ, θ + kσ) and (p 1,..., p k ) Dir(n 1 σ,..., n k σ, θ + kσ) The PD(σ, θ) process is also representable as normalized measure i=1 P( ) = φ( ) φ(x), but φ does not have independent increments (Pitman and Yor, 97). Indeed, the Laplace functional of φ is of the form E[e f (x) φ(dx) ] = 1 Γ(θ) 0 u θ 1 e 0 j=1 p j δ X j (u+f (x)) σ P 0 (dx) du

Identify a latent variable U n such that U n X (n) has density f (u X (n) ) = α Γ(k + θ/α) uθ+kα 1 e u α Then, given U n and X (n), the (posterior) distribution of ϕ coincides with the distribution of the random measure µ u + k i=1 where µ u is a GG CRM with intensity ν (u) (s) = J (u) i δ X i α Γ(1 α) s 1 α e u s (1) The jumps J (u) i Gamma(u, n i α). Finally, the jumps J (u) i (i = 1,..., k) and µ u are, conditional on U n, independent.

Hierarchical mixture models Y i X i ind f ( X i ) X i P iid P P NRMI

Hierarchical mixture models Y i X i ind f ( X i ) X i P iid P P NRMI Equivalently, Y (n) = (Y 1,..., Y n ) are exchangeable draws from the random density f ( ) = f ( x) P(dx). X

The posterior distribution of the mixture model

The posterior distribution of the mixture model The posterior density f, given the observations Y (n), is f ( x) P(dx Y (n) ) X where P(dx Y (n) ) is the (posterior) random probability measure whose distribution is P(dx Y (n) ) = d P(dp X (n) )P(dX (n) Y (n) ). with

The posterior distribution of the mixture model The posterior density f, given the observations Y (n), is f ( x) P(dx Y (n) ) X where P(dx Y (n) ) is the (posterior) random probability measure whose distribution is P(dx Y (n) ) = d P(dp X (n) )P(dX (n) Y (n) ). with P(dp X (n) ) is the (posterior) distribution of the NRMI P, given X (n)

The posterior distribution of the mixture model The posterior density f, given the observations Y (n), is f ( x) P(dx Y (n) ) X where P(dx Y (n) ) is the (posterior) random probability measure whose distribution is P(dx Y (n) ) = d P(dp X (n) )P(dX (n) Y (n) ). with P(dp X (n) ) is the (posterior) distribution of the NRMI P, given X (n) P(dX (n) Y (n) ) is determined via Bayes theorem as { n i=1 f (Y i X i ) } m(dx (n) ) { n i=1 f (Y i X i ) } m(dx (n) ) where m(dx (n) ) is the marginal distribution of the latent variables.

The posterior distribution of the mixture model The posterior density f, given the observations Y (n), is f ( x) P(dx Y (n) ) X where P(dx Y (n) ) is the (posterior) random probability measure whose distribution is P(dx Y (n) ) = d P(dp X (n) )P(dX (n) Y (n) ). with P(dp X (n) ) is the (posterior) distribution of the NRMI P, given X (n) P(dX (n) Y (n) ) is determined via Bayes theorem as { n i=1 f (Y i X i ) } m(dx (n) ) { n i=1 f (Y i X i ) } m(dx (n) ) where m(dx (n) ) is the marginal distribution of the latent variables. Remark: In any mixture model, the crucial point is the determination of a tractable expression for P(dp X (n) ): once available, by following Escobar and West (1995) the derivation of a simulation algorithm is trivial.

Some concluding remarks Question 1: is it preferable to specify a GG prior as mixing measure (which includes Dirichlet process as special case) or stick with the Dirichlet process and enrich it with hyperpriors? What about parsimony in model specification?

Some concluding remarks Question 1: is it preferable to specify a GG prior as mixing measure (which includes Dirichlet process as special case) or stick with the Dirichlet process and enrich it with hyperpriors? What about parsimony in model specification? Question 2: Do we need applied statistical motivations for the introduction of new classes of priors? E.g. beta process (Hjort, 90) introduced for survival analysis, but turned out to be also the de Finetti measure of the Indian Buffet Process. Random probability measures are objects of interest in their own well beyond what we may think: e.g. the distribution of a mean functional of the two parameter PD process is relevant for the study of phylogenetic trees.

Some concluding remarks Mixture model is not the only use one can make of discrete nonparametric priors: if the data come from a discrete distribution, then it is reasonable the model the data with a discrete nonparameteric prior (see Ramses talk). Simpler context and there you get a real feeling of the limitations of the Dirichlet process: prediction is not monotone in the number of observed species.