Dependent Random Measures and Prediction

Size: px
Start display at page:

Download "Dependent Random Measures and Prediction"

Transcription

1 Dependent Random Measures and Prediction Igor Prünster University of Torino & Collegio Carlo Alberto 10th Bayesian Nonparametric Conference Raleigh, June 26, 2015 Joint wor with: Federico Camerlenghi, Antonio Lijoi, Bernardo Nipoti and Peter Orbanz 1 / 42

2 Outline Exchangeability & Partial Exchangeability Dependence and its extreme cases Dependent Random Measures with Additive Structure Dependent Random Measures with Nested Structure Dependent Random Measures with Hierarchical Structure 2 / 42

3 Exchangeability & Partial Exchangeability Exchangeability Exchangeability Probabilistic statement concerning homogeneity (symmetry, analogy) between past and future justifies induction = Inferences are not affected by the order of the observations Suitable mathematical framewor for inference alternative to the classical approach with fixed and unnown probability distribution P = A sequence of observations (X n ) n 1 is exchangeable if (X 1, X 2,..., X n ) d = (X π(1), X π(2),..., X π(n) ) for any n 1 and any permutation π of (1,..., n). de Finetti s representation theorem (X n ) n 1 is exchangeable if and only if n P[X 1 A 1,..., X n A n ] = P(A i )Q(dP) P i=1 where P is the space of probability measures on X. = Q is the de Finetti measure of (X n ) n 1 and acts as a prior distribution for Bayesian inference being the law of a random probability measure P. 3 / 42

4 Exchangeability & Partial Exchangeability Beyond Exchangeability In many applications with real data, dependence structures more general than exchangeability are required. Indeed, in de Finetti (1938): But the case of exchangeability can only be considered as a limiting case: the case in which this analogy is, in a certain sense, absolute for all events under consideration. [..] To get from the case of exchangeability to other cases which are more general but still tractable, we must tae up the case where we still encounter analogies among the events under consideration, but without attaining the limiting case of exchangeability. nonparametric regression (covariate indexed observations) time dependent data change point problems data from different related studies or experiments... 4 / 42

5 Exchangeability & Partial Exchangeability Partial exchangeability Partial exchangeability A sequence (X, Y ) = (X i, Y j ) i,j 1 is partially exchangeable if (X 1,..., X n1, Y 1,..., Y n2 ) = d (X π(1),1,..., X π(n1), Y φ(1),..., Y φ(n2)) for any n 1, n 2 1 and any permutations π and φ of (1,..., n 1 ) and (1,..., n 2 ). de Finetti s representation theorem (X, Y ) = (X i, Y j ) i,j 1 is partially exchangeable if and only if P [X 1 A 1,..., X n1 A n1, Y 1 B 1,..., Y n2 B n2 ] n 1 n 2 = P 1 (A i ) P 2 (B j ) Q(dP 1, dp 2 ). P 2 i=1 j=1 = This is the same as saying that (X i, Y j ) P 1, P 2 iid P 1 P 2 i, j 1 ( P 1, P 2 ) Q with ( P 1, P 2 ) a vector of dependent random probabilities and Q is the prior. 5 / 42

6 Exchangeability & Partial Exchangeability Discrete partial exchangeable priors Induced partition structure Two samples X (n1) = {X 1,..., X n1 } and Y (n2) = {Y 1,..., Y n2 } from partially exchangeable sequences (X j ) j 1 and (Y i ) i 1 Prior Q selects almost surely discrete random probabilities ( P 1, P 2 ) = ties within each sample and possibly also across different samples. Partition of [N] = {1,..., n 1 + n 2 } induced by X (n1) and Y (n2) into 1 distinct values specific to X (n1) with frequencies n X j (j = 1,..., 1 ) 2 distinct values specific to Y (n2) with frequencies n Y j (j = 1,..., 2 ) 0 distinct values shared by the two samples with frequencies q X r and q Y r (r = 1,..., 0 ). partially Exchangeable Partition Probability Function (peppf) Π (N) (n X, n Y, q X, q Y )=E X 1 P nx j j=1 2 1 (dx j) where N = n 1 + n 2 and = l=1 0 2 (dy l) P ny l r=1 P qx r 1 (dz r ) P qy r 2 (dz r ) The expression of Π (n1+n2) depends on the choice of Q and is ey for prediction. 6 / 42

7 Exchangeability & Partial Exchangeability Prediction over additional samples A structural predictive property Goal: Conditional on (X (n1), Y (n2) ), predict features of an additional sample X (m n1) = X n1+1,..., X n1+m Interest in probability of sampling values in Y (n2) but not X (n1) : A new,old = Y (n2) \ X (n1) # ( A new,old X (m n1)) = # observations in X (m n1) belonging to A new,old The corresponding estimate is linear in the additional sample size m E [ # ( A new,old X (m n1)) X (n1), Y (n2)] = m P[X n1+1 A new,old X (n1), Y (n2) ] Remars: Such a property holds true whatever the prior Q on P 2 The same property holds true for the second sample and even when considering more than two samples Linearity is due to the identity in distribution conditional on the observed samples (X (n1), Y (n2) ) = structural property of partial exchangeability 7 / 42

8 Exchangeability & Partial Exchangeability Prediction over additional samples Same conclusions hold true when estimating leading to # ( A old,new X (m n1)) with A old,new = X (n1) \ Y (n2) # ( A old,old X (m n1)) with A old,old = X (n1) Y (n2) # ( A new,new X (m n1)) with A new,new = {X (n1) Y (n2) } c E[# ( A, X (m n1)) X (n1), Y (n2) ] = m P[X n1+1 A, X (n1), Y (n2) ] over all 4 sets X n1+i, i = 1,..., m, will belong to. = The expected proportion of observations, from an additional sample, that coincide with either new or shared or sample specific distinct values is the same, for any m, and equals P[X n1+1 A, X (n1), Y (n2) ] 8 / 42

9 Dependence and its extreme cases Measuring dependence Extreme cases of dependence induced by the prior Q Maximal dependence full exchangeability i.e. Q is degenerate on the diagonal {(P 1, P 2 ) P 2 : P 1 = P 2 }, namely P 1 = P 2 (a.s.) Independence P 1 and P 2 are (unconditionally) independent with respect to Q. = corresponds to maximal heterogeneity: inference on each sample is not influenced by the observations from the other sample. Example: Correlation is a popular measure of dependence. Since Corr( P 1 (A), P 2 (A)), typically does not depend on A it is taen as a measure of overall dependence. With respect to the extreme cases P1 = P 2 (a.s.) = Corr( P 1 (A), P 2 (A)) = 1 P 1 P 2 = Corr( P 1 (A), P 2 (A)) = 0 but still it is not an ideal measure of dependence for ( P 1, P 2 ). 9 / 42

10 Dependence and its extreme cases Dependence via stic breaing Strategies for constructing dependent nonparametric priors Most popular approach to the definition of dependent random probability measures is via dependent stic-breaing constructions introduced by MacEachern (1999, 2000): Let P 1 = p i,1 δ Zi,1 P2 = p j,2 δ Zj,2 with i 1 j 1 i 1 j 1 p i,1 = W i,1 (1 W l,1 ) p j,2 = W j,2 (1 W l,2 ). l=1 and dependence between P 1 and P 2 is induced by specifying dependent stic breaing weights (W i,1, W j,2 ) dependent locations Z j,1 and Z j,2 Pros: quite straightforward implementation of conditional simulation algorithms such as, e.g., slice sampler (Waler, 2007) or the retrospective sampler (Papaspiliopoulos & Roberts, 2008) Cons: it is almost impossible to derive analytic expressions for quantities of interest both marginal and conditional l=1 10 / 42

11 Dependence and its extreme cases Dependence via random measures Dependent priors via transformed random measures Alternative approach based on general random measures with 3 possible strategies (and combinations thereof) for creating dependance: 1. Additive structures Let M i be random measures (i = 0, 1, 2) and define P 1 = T ( M 1 + M 0 ) P 2 = T ( M 2 + M 0 ) where T is a suitable transformation (e.g. normalization). 2. Hierarchical structures Let P 0 = T ( M 0 ) be such that T ( M i ) P 0 = P i P 0 Q i ( P 0 ) with E[ P i P 0 ] = P 0, for i = 1, 2 P 0 Q 0 (P 0 ) with P 0 = E[ P 0 ] 3. Nested structures Let R = T ( M) be a random probability measure on P ( P 1, P 2 ) R iid R R L (Q 0 ) with Q 0 = E[ R] the law of a random probability P 0 = T ( M 0 ) on X. = Strategies already used in, respectively, Müller, Quintana & Rosner (2004), Teh, Jordan, Beal & Blei (2006), Rodriguez, Dunson & Gelfand (2008) relying on stic breaing constructions instead of the general theory of random measures. 11 / 42

12 Dependence and its extreme cases Completely random measures Completely random measures Let M be the space of boundedly finite measures on X. Completely random measures (Kingman, 1967) A random element µ taing values in M such that µ(a 1 ), µ(a 2 ),... µ(a d ) are mutually independent for any d 1 and for any collection of pairwise disjoint sets A 1,..., A d said to be a completely random measure (CRM). is The realizations of a CRM are a.s. discrete and, assuming there are no fixed points of discontinuity, a CRM can be represented as µ( ) = J i δ Zi ( ) with both the locations Z i s and the positive jumps J i s random. i=1 12 / 42

13 Dependence and its extreme cases Completely random measures A CRM µ is uniquely characterized by its Laplace functional ] [e = e E g(x) µ(dx) X [1 e v g(x) ]ν(dv,dx) R + X with ν indicating the intensity measure, which characterizes the CRM µ. Two cases Homogenous CRM µ: jumps (Ji ) i 1 & locations (Z i ) i 1 independent ν(ds, dx) = α(dx) ρ(s) ds Non-homogeneous CRM µ: (Ji ) i 1 & (Z i ) i 1 dependent ν(ds, dx) = α(dx) ρ x (s) ds Noteworthy examples: 1. gamma CRM: ν(dv, dx) = e τv v dv α(dx) 2. σ stable CRM: ν(dv, dx) = σ dv α(dx) Γ(1 σ) v 1+σ 3. generalized gamma CRM: ν(dv, dx) = σ e τv dv α(dx) Γ(1 σ) v 1+σ 13 / 42

14 Dependence and its extreme cases Transformations of CRMs CRM based priors Normalized completely random measures (Regazzini, Lijoi and P., 2003) Let µ be a CRM on X such that 0 < µ(x) < a.s. Then P( ) = µ( ) µ(x) is a normalized completely random measure (NRMI=Normalized Random Measure with Independent increments). = characterized by the Lévy intensity ν associated to µ. Hazard rate mixture models (Dystra & Laud, 1981; James, 2005) CRM µ on Y and ernel (t, y) on R + Y h(t) = (t, y) µ(dy) = Survival function P((t, { t )) = exp h(s) } ds Y Other examples (see Lijoi & P., 2010): NTR priors, Pitman Yor process, cumulative hazard priors, / 42

15 Dependence and its extreme cases EPPF Partition structure in the exchangeable case With CRM based priors, i.e. P = T ( µ), in the exchangeable case one has Marginal and predictive distributions of the data Characterization of the posterior of P X1,..., X n MCMC sampling schemes Applications to: density estimation, clustering, species sampling problems, inference on functionals of P,... Most practical implementations with discrete P rely on the exchangeable partition probability function (EPPF) (Pitman, 1995) induced by P Π (n) (n 1,..., n ) = E P nj (dx j ) X If P is a NRMI, one has an explicit EPPF in terms of the intensity ν of µ. E.g., if P is a Dirichlet process (Ferguson, 1973) with base measure θ P 0, then i=1 Π (n) (n 1,..., n ) = θ (θ) n Γ(n j ) j=1 which corresponds to the celebrated Ewens sampling formula (Ewens, 1972). 15 / 42

16 Dependent Random Measures with Additive Structure GM dependent CRMs Additive structure: GM dependent CRMs Construct dependent (homogeneous and finite) CRMs ( µ 1, µ 2 ) by means of an additive structure with a random element in common: µ 1 = µ 1 + µ 0 µ 2 = µ 2 + µ 0 where µ 0, µ 1, µ 2 are mutually independent CRMs with random Lévy intensity ν(dv, dx) = θzp 0 (dx)ρ(v)dv for µ 1, µ 2 ν 0 (dv, dx) = θ(1 Z)P 0 (dx)ρ(v)dv for µ 0 and Z is some [0, 1] valued random variable. = Griffiths-Milne dependent CRMs: ( µ 1, µ 2 ) GM-CRM Remars: Marginally µ 1 and µ 2 are CRMs with ν(dv, dx) = θp 0 (dx)ρ(v)dv. If Z = 1, a.s., then µ 1 µ 2 If Z = 0, a.s., then µ 1 = µ 2 a.s. 16 / 42

17 Dependent Random Measures with Additive Structure GM dependent NRMI Additive structure: GM dependent NRMIs GM dependent NRMI (Lijoi, Nipoti & P, 2014) If ( µ 1, µ 2 ) GM-CRM with ρ(r + ) =, then P 1 = µ 1 µ 1 (X) = µ 1 + µ 0 µ 1 (X) + µ 0 (X), P 2 = µ 2 µ 2 (X) = µ 2 + µ 0 µ 2 (X) + µ 0 (X) define a vector of GM dependent NRMI, ( P 1, P 2 ) GM-NRMI. The distribution Q of ( P 1, P 2 ) attains the two extreme cases: Z = 1 = P 1 P 2 (Unconditional independence) Z = 0 = P 1 = P 2 (a.s.) (full exchangeability) Non negative correlation between P 1 (A) and P 2 (A), for any A, i.e. Corr z ( P 1 (A), P 2 (A)) = (1 z) I(θ, z) where I(θ, z) > 0 is expressed in terms of ν Discreteness of ( P 1, P 2 ) = ties within and across the two samples P[X i = Y l ] = with τ q (u) = s q e us ρ(s) ds 0 0 u e θ ψ(u) τ 2 (u) du > 0 and ψ(u) = [1 e us ] ρ(s) ds 0 17 / 42

18 Dependent Random Measures with Additive Structure GM dependent NRMI peppf induced by X (n 1) and Y (n 2) Π (n 1+n 2 ) (n X, n Y, q X, q Y θ z) = Γ(n 1) Γ(n 2) j=1 ( ) (1 z) 0+ i + l z 1+ 2 i l u n 1 1 v n 2 1 e θz[ψ(u)+ψ(v)] θ(1 z)ψ(u+v) τ n X (u + i j jv) 2 j=1 τ n Y (l j ju + v) 0 τ q X r +qr Y r=1 (u + v) du dv where the sum runs over the set of all vectors i = (i 1,..., i 1 ) {0, 1} 1 l = (l 1,..., l 2 ) {0, 1} 1 2 with i = j=1 ij and l = 2 lj. j=1 and z = 1 = 0 = 0 = product of marginal EPPFs Π (n 1+n 2 ) (n X, n Y, q X, q Y 1) = Π (n 1) 1 (n X 1,..., nx 1 ) Π (n 2) 2 (n Y 1,..., ny 2 ) z = 0 = i = 1 & l = 2 = global EPPF Π (n 1+n 2 ) (n X, n Y, q X, q Y θ 0) = Γ(n 1 + n 2) 0 s n 1+n 2 1 e θψ(s) 1 j=1 τ n X (s) j 2 i=1 τ n Y (s) i 0 τ q X r +qr Y r=1 (s) ds 18 / 42

19 Dependent Random Measures with Additive Structure GM dependent NRMI Invariance condition: If λ i is any permutation of the vector of integers (1,..., i ), with i = 0, 1, 2, and λ i w = (w λi (1),..., w λi ( i )), then Π (n1+n2) (n X, n Y, q) = Π (n1+n2) (λ 1 n X, λ 2 n Y, λ 0 q) = exchangeability within three separate groups of clusters those with non shared values for the two samples those shared by the two samples peppf for GM dependent Dirichlet processes is in terms of hypergeometric functions 3 F 2, with some numerical issues peppf for GM dependent σ stable processes simpler and easier to evaluate For MCMC sampling: augmentation with suitable auxiliary random variables Application to dependent mixtures f l (x) = (x; θ) p l (dθ) l = 1, 2 for density estimation and clustering Θ 19 / 42

20 Dependent Random Measures with Nested Structure The Model Nested models Nested Dirichlet process introduced in Rodriguez, Dunson and Gelfand (2008) can be generalized to nested homogeneous NRMIs Nested NRMI (X i, Y j ) ( P 1, P 2 ) ind P 1 P 2 ( P 1, P 2 ) R iid R R L (Q 0 ) Q 0 ( ) = P[ P 0 ] P 0 NRMI In other terms R is a nested random probability measure such that iid R = ω j δ Gj G j P 0 P0 = γ i δ θi Since R is discrete j 1 i 1 π 1 = P[ P 1 = P 2 ] = θ u e θ ψ(u) τ 2 (u) du > / 42

21 Dependent Random Measures with Nested Structure Partition structure Nested NRMI: The peppf associated to a nested NRMI is Π (N) (n X, n Y, q X, q Y ) = π 1 Π (n1+n2) (n X, n Y, q X + q Y ) +(1 π 1 ) Π (n1) }{{} 1 (n X ) Π (n2) 2 (n Y ) I {0} ( 0 ) }{{} joint exchangeability (unconditional) independence = Linear combination of exchangeability and (unconditional) independence EPPF with exchangeability of all N = n 1 + n 2 observations in both samples Π (N) (n X, n Y, q X + q Y ) = θ 0 Γ(N) 0 u N 1 e θ 0 ψ 0 (u) Marginal EPPF in the independence case Π (n) (n) = θ 0 Γ(n) 0 1 j=1 τ (0) (u) n X j u n 1 e θ 0 ψ 0 (u) 2 i=1 j=1 τ (0) (u) n Y i 0 r=1 τ n (0) j (u) du with ψ 0(u) = [1 e us ] ρ 0 0(s) ds & τ m (0) (u) = s m e us ρ 0 0(s) ds τ (0) (u) du qr X +qy r 21 / 42

22 Dependent Random Measures with Nested Structure Partition structure Nested NRMI: The peppf associated to a nested NRMI is Π (N) (n X, n Y, q X, q Y ) = π 1 Π (n1+n2) (n X, n Y, q X + q Y ) +(1 π 1 ) Π (n1) }{{} 1 (n X ) Π (n2) 2 (n Y ) I {0} ( 0 ) }{{} joint exchangeability (unconditional) independence Extreme cases of full exchangeability and (uncond.) independence are obtained If X (n1) Y (n2), i.e. the two samples share at least one distinct value: ] P [ P 1 = P 2 X (n 1), Y (n2) = 1 A single distinct value in common between the two samples, i.e. 0 1, implies that, conditionally on the data, P 1 = P 2 with probability 1 and one has full exchangeablity (no heterogeneity) Restrictive! Need a model that still accommodates for heterogeneity, despite the presence of shared observations across samples = additive structure: let both P 1 and P 2 depend on an idyosincratic component and a common random probability measure P 0, which explains ties with positive probability and avoids the degeneracy. 22 / 42

23 Dependent Random Measures with Nested Structure Alternative Model Nested NRMI with additive structure Nested NRMI with additive homogeneous component (X i, Y j ) ( P 1, P 2 ) iid P 1 P 2 (P 1, P 2, P 0 ) ( R, R ) R 2 R ξ is a [0, 1] valued random variable P l d = ξ Pl + (1 ξ) P 0 l = 1, 2 R is a nested random probability measure & R R R = δ P 0 and P 0 d = P 0 Conditionally on latent allocation variables ζ, the peppf is { Π (N) (n X, n Y, q X, q Y ; ζ) = Π (N v) π 1 Π (v) + (1 π 1 ) Π (v1) 1 Π (v2) 2 = Even with shared observations the component of the peppf that accounts for heterogeneity does not vanish Π (v1) Π (v2) > } 23 / 42

24 Dependent Random Measures with Nested Structure Illustration Illustrative example: Density estimation (X i, Y j ) (θ 1,i, θ 2,j ) ind h( ; θ 1,i ) h( ; θ 2,j ) (θ 1,i, θ 2,j ) iid P 1 P 2 Gaussian ernel h( ; (M, V )) with (M, V ) R R + Nested prior with additive structure based on stable NRMI: P 1 = ξp 1 + (1 ξ)p 0 P2 = ξp 2 + (1 ξ)p 0 with standard choices of (conjugate) base measure and hyperpriors = Comparison: Standard nested vs. nested with additive structure ξ = 1 (a.s.) vs ξ Unif(0, 1) Simulate n 1 =n 2 = 100 iid values from X N(5, 0.6) N(10, 0.6) X N(0, 0.6) + 1 N(5, 0.6) 2 and estimate marginal densities f 1 and f 2 (also clustering is possible) Marginal MCMC algorithm readily derived from peppf. 24 / 42

25 Dependent Random Measures with Nested Structure Illustration True density and estimated density Figure: Model with ξ = 1 Figure: Model with ξ Unif(0, 1) 25 / 42

26 Dependent Random Measures with Hierarchical Structure Hierarchical NRMI models Hierarchical processes Hierarchical (homogeneous) NRMI for partially exchangeable data (X 1,i, X 2,j ) ( P 1, P 2 ) iid P 1 P 2 (j 1, j 2 ) N 2 P i P 0 NRMI( P 0 ) i = 1, 2 P 0 = NRMI(P 0 ) with P 0 a non atomic measure on X. = If the NRMI coincides with the Dirichlet process, one obtains the hierarchical Dirichlet process (HDP) introduced in Teh, Jordan, Beal and Blei (2006). See also Teh & Jordan (2010). Data generated by the model are: X (ni ) = (X i,1,..., X i,ni ) for i = 1, 2 data will contain ties with positive probability within and across samples leading to X (ni ) having i distinct values specific to sample i with frequencies (n i,1,..., n i,i ) for i = 1, 2, and 0 values shared by the two samples with frequencies (q i,1,..., q i,0 ). Goal: Derive the distribution of the partition structure. 26 / 42

27 Dependent Random Measures with Hierarchical Structure Chinese restaurant franchise metaphor Chinese restaurant franchise metaphor Observable level: customers and dishes There are d = 2 restaurants sharing the same menu. X i,j : label of the dish served at restaurant i to customer j Each table is served the same dish chosen by 1st customer seated at that table The same dish can be served at different tables within the same restaurant and across different restaurants. The sample information is best recorded by = distinct dishes served either restaurant to N = n1 + n 2 customers with corresponding frequency vectors ni = (n i,1,..., n i, ), where n i,j 0 and n 1 and n 2 will contain 2 and 1 0 s, respectively. Latent level: tables (governed by P 0 ) l i,j is the number of tables in restaurant i serving dish j whose range is {1,..., n i,j } if dish j is served at restaurant i and 0 otherwise. l j = 2 i=1 l i,j is then the total number of tables serving dish j for j = 1,..., and l the overall number of tables. l : total number of tables in the two restaurants 27 / 42

28 Dependent Random Measures with Hierarchical Structure Augmented partition structure A suitable augmentation provides a more refined partition structure involving also: q i,j,t : frequency of customers at restaurant i eating dish j and sitting at table t q i,j = (q i,j,1,..., q i,j,li,j ): frequency vector of customers in restaurant i eating dish j at each of the l i,j tables. By marginalizing over the tables one re-obtains the observed frequencies n i,j = q i,j = l i,j t=1 q i,j,t. First target: characterize the joint distribution of the partitions of the N = n 1 + n 2 customers into the = distinct dishes being served in the 2 restaurants the q i,j customers among the l i,j tables in restaurant i serving dish j, for i = 1, 2 and j = 1,..., 2 Φ ( l ),0 (l 1,, l ) Φ (ni ) l }{{} (q i,i i,1,..., q i, ) i=1 }{{} EPPF on the number of tables multivariate partition distribution where Φ (n),0 is the EPPF associated to P 0 and Φ (n),i is deduced from ( P 1, P 2 ) 28 / 42

29 Dependent Random Measures with Hierarchical Structure peppf for the general case Partially exchangeable partition probability function peppf of a hierarchical NRMI Π (n) (n 1,, n ) = q l l is a sum over all partitions q Φ ( l ),0 (l 1,, l ) 2 i=1 Φ (ni ) l i,i (q i,1,..., q i, ) is a sum over all compatible table configurations, i.e. over l i,j {1,..., n i,j } with l i,j = 0 if n i,j = 0. Partition probability function with the constraint q i,j = n i,j Φ (ni ) l (q i,i i,1,..., q i, ) = θ l i Γ(n i ) 0 u ni 1 e θψ(u) l i,j τ qi,j,t (u)du. j=1 t=1 Φ ( l ),0 (l 1,, l ) is the EPPF associated to P 0 NRMI(P 0 ). = Unlie GM dependent and nested processes, hierarchical processes do not achieve the limiting cases of joint exchangeability or (unconditional) independence. 29 / 42

30 Dependent Random Measures with Hierarchical Structure peppf for the HDP & hierarchical σ stable process peppf for the HDP & hierarchical σ stable process peepf HDP Π (n) (n θ0 1,..., n ) = 2 (θ) i=1 n i l θ l (θ 0 ) l ( l j 1)! j=1 2 s(n i,j, l i,j ) where s(r, s) is the absolute value of the Stirling number of the first ind. Probabilistic interpretation: Φ ( l ),0 ( l 1,..., l ): EPPF of a Dirichlet process with base measure θ 0 P 0 K ni,j is the number of distinct tables in which the n i,j customers are seated, according to a Dirichlet process with base measure θp 0 Π (n) (n 1,..., n ) = 2 (θ) j=1 n i,j i=1 (θ) ni peepf hierarchical normalized σ stable l Φ ( l ),0 (l 1,, l ) i=1 2 i=1 j=1 P[K ni,j = l i,j ] σ 1 0 Γ() σ l 2 2 Γ( l i=1 i ) 2 i=1 Γ(ni ) Γ( l ) l j=1 (1 σ 0) l j 1 2 i=1 j=1 C (n i,j, l i,j; σ) σ l i,j where C (n, l; σ) is the generalized factorial coefficient. 30 / 42

31 Dependent Random Measures with Hierarchical Structure peppf for the hierarchical PY model peppf for the hierarchical Pitman Yor model With σ, σ 0 (0, 1) and θ, θ 0 > 0, the hierarchy of random probabilities is ( P 1, P 2 ) P 0 iid PY(σ0, θ 0 ; P 0 ), P 0 PY(σ, θ; P 0 ) Possible constructions of the Pitman-Yor (PY) process: randomization of the total mass c of a generalized gamma NRMI (Pitman & Yor, 1997). peppf of HPY processes Π (n) (n 1,..., n ) = l 1 r=1 (θ 0 + rσ 0 ) (θ 0 + 1) l 1 (1 σ 0 ) l j 1 j=1 2 i=1 li 1 (θ + rσ) r=1 (θ + 1) ni 1 j=1 C (n i,j, l i,j ; σ) σ li,j the number n i,j of customers eating dish j in restaurant i are allocated to K ni,j tables generated by a PY(σ, θ; P 0 ) process the global tables s configuration is weighted by the EPPF of the higher level PY(θ 0, σ 0 ; P 0 ) process the peppf is obtained by marginalizing with respect to the table configuration. 31 / 42

32 Dependent Random Measures with Hierarchical Structure Exchangeable case EPPF of a hierarchical NRMI prior (d = 1) d = 1 = exchangeable case with de Finetti measure a hierarchical process: Π (n) (n 1,..., n ) = Φ ( l ),0 (l 1,..., l ) Φ (n) l (q 1,..., q ) l q = A similar formula can be given for hierarchical Pitman Yor processes. From this result one recovers two coagulation properties (Pitman, 1999): If P is a hierarchical stable NRMI Π (n) (n 1,..., n ) = (σ σ 0) 1 Γ() Γ(n) (1 σ σ 0 ) nj 1 i.e. P is itself a stable NRMI with parameter σσ 0 (0, 1). If P is a HPY process with θ = θ 0 σ, then P PY(σσ 0, θ; P 0 ) j=1 32 / 42

33 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Prediction with hierarchical processes Conditional on observed samples X (n1) and Y (n2), interest in prediction of features related to additional future samples X n1+1,..., X n1+m 1 Y n2+1,..., Y n2+m 2 Species sampling: X i s and Y j s are species labels with species shared within and between samples and the goal is to estimate e.g.: the number of new distinct species that will be observed the probability that (ni + m i + 1)-th observation will be a new species Exchangeable case: closed form estimators for Gibbs type priors (reviewed in De Blasi et al., 2015) Partially exchangeable case: it is not possible to evaluate exactly inferences = simulation algorithm based on the peppf Illustration: ESTs analyses Useful tool for gene identification in organisms ESTs are generated by randomly sequencing genes from a cdna library, which consist of a large and unnown number of differentially expressed genes (typically millions) = potentially infinite. Only a sample corresponding to a small portion of the library is typically available 33 / 42

34 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Citrus clementina libraries Samples from two libraries of citrus clementina fruits: FRUIT1 ( FlavFr1 ): n 1 = 1593 ESTs with 1 = 806 distinct genes ( 1 /n 1.51) FRUIT2 ( RindPdig24 ): n 2 = 900 ESTs with 2 = 687 distinct genes ( 2 /n 2.76) The two samples share 183 genes and their frequency is 520 in the FRUIT 1 and 317 in the FRUIT 2 samples (about 1/3) Expression level FRUIT 1 FRUIT 2 FRUITS n K n / 42

35 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Models: P1 P 2 independent PY processes vs ( P 1, P 2 ) hierarchical PY process Discovery probability decay: Probability that the (n i + m i + 1) th observation is new as the size m i of additional sample varies (a) (separately) exchangeable (b) partially exchangeable Borrowing of information and narrower HPD bands 35 / 42

36 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Other quantities of interest: # new distinct genes identified by additional sequencing: K X m n 1 & K Y m n 2 # additionally sequenced genes coinciding with new ones: L X m n 1 & L Y m n 2 P 1 P 2 PY processes ( P 1, P 2) HPY process FRUIT 1 FRUIT 2 FRUIT 1 FRUIT 2 m ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n In the dependent case, finer prediction is possible. For instance, considering additional sequencing for FRUIT 1: for m = 600, ˆK X m n 1 = 224.5, which is the sum of the predicted 37.6 genes new to FRUIT 1 but already observed in FRUIT 2 and overall new genes. for m = 2000, it is predicted that 96.6 genes of the 504 originally observed only for FRUIT 2 will be detected also in the FRUIT 1 library. 36 / 42

37 Dependent Random Measures with Hierarchical Structure Hierarchical random mixture hazards Hierarchical hazard rate mixtures & dependent mixture hazards Hierarchical CRMs on Y Define ( µ 1, µ 2 ) µ 0 d = CRM(ρ1, µ 0 ) CRM(ρ 2, µ 0 ) µ 0 d = CRM(ρ0, c 0 P 0 ) Construct dependent hazard rate mixture models h l (t) = (t, y) µ l (dy) l = 1, 2. Y = the corresponding partially exchangeable sequences (X i,1 ) i 1 and (X i,2 ) i 1 imply survival times distributions of the form [ ] P X i,1 > t 1, X j,2 > t 2 ( µ 1, µ 2 ) = 2 l=1 { tl exp 0 } h l (s) 37 / 42

38 Dependent Random Measures with Hierarchical Structure Posterior characterization n Lielihood: with K l (y) = l i=1 2 { L (µ 1, µ 2 ; X) = exp l=1 Xi,l Augmented lielihood: 2 { L (µ 1, µ 2 ; X, Y ) = exp l=1 Y 0 (t, y) dt K l (y)µ l (dy) Y (X i,l ; y) µ l (dy) } nl i=1 } nl K l (y)µ l (dy) i=1 Y (X i,l ; Y i,l ) µ l (dy i,l ) and the latent variables Y i,l are generated by a discrete random probability Posterior characterization The posterior of ( µ 1, µ 2 ), given the data X and the latents Y, equals the distribution of the random measure vector ( 1 2 ) (µ 1, µ 2 ) + Jj,1 δ Y, J j,1 j,2 δ Y j,2 j=1 (µ 1, µ 2 ) is a hierarchical CRM with updated marginal Lévy intensities j=1 jumps Jj,l are independent and with nown density = The model can be modified so to include censored observations and also other covariates, i.e. a semiparametric Cox proportional hazards type of specification 38 / 42

39 Dependent Random Measures with Hierarchical Structure Illustration Illustration: Estimation of the survival functions S 1 & S 2 Simulation study: 2 samples of size n 1 = n 2 = 100 generated from Weibull distributions with different parameters chosen s.t. the survival functions cross and do not satisfy the assumption of proportional hazards = dependent hierarchical model is able to distinguish the two survival functions Figure: Estimated and true survival functions 39 / 42

40 Concluding remars Inference with non exchangeable data is analytically challenging CRM based dependent priors loo promising: conditionally on a suitable latent structure, they typically display distributional properties reminiscent to those available in the exchangeable case Technique for deriving moment measures (and sometimes even complete posteriors) is general and needs only some adaptation depending on the specific transformations of the CRMs In the exchangeable case (d = 1) a hierarchical random probability yields a new class of nonparametric priors: for all of them, but the two cases where coagulation is achieved, one has ] P [X n+1 = new X (n) = f (n, K n, (N 1,..., N Kn )) Hence, they are not of Gibbs type, though one can still identify the associated prediction rule. 40 / 42

41 Some References Some References Camerlenghi, Lijoi, Orbanz & Prünster (2015). Hierarchical processes and prediction. Submitted. Camerlenghi, Lijoi & Prünster (2015). Hierarchical structures in survival analysis. In preparation. Camerlenghi, Dunson, Lijoi, Prünster & Rodriguez (2015). Generalized nested nonparametric priors for modeling heterogeneous data. In preparation. De Blasi, Favaro, Lijoi, Mena, Prünster & Ruggiero (2015). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE TPAMI 37, Dystra & Laud (1981). A Bayesian nonparametric approach to reliability. Ann. Statist. 9, Ewens (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, Ferguson (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, de Finetti (1938). Sur la condition d équivalence partielle. Act.sci. industr. 739, Griffiths & Milne (1978). A class of bivariate Poisson processes. J. Mult. Anal. 8, James (2005). Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann. Statist. 33, Kingman (1967). Completely random measures. Pacific J. Math. 21, Lijoi, Nipoti & Prünster (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20, / 42

42 Some References Lijoi, Nipoti & Prünster (2014). Dependent mixture models: clustering and borrowing information. Comput. Statist. & Data Anal. 71, Lijoi & Prünster (2010). Models beyond the Dirichlet process. In Bayesian Nonparametrics, Cambridge University Press. MacEachern (1999). Dependent nonparametric processes. In ASA Proceedings. MacEachern (2000). Dependent Dirichlet processes. OSU Tech. Report. Müller, Quintana & Rosner (2004). A method for combining inference across related nonparametric Bayesian models. J. Roy. Statist. Soc. Ser. B 66, Papaspiliopoulos & Roberts (2008). Retrospective Marov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometria 95, Pitman (1995).Exchangeable and partially exchangeable random partitions. Prob. Th. and Rel. Fields 102, Pitman (1999). Coalescents with multiple collisions. Ann. Probab. 27, Pitman & Yor (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, Regazzini, Lijoi & Prünster (2003). Distributional results for means of random measures with independent increments. Ann. Statist. 31, Rodríguez, Dunson & Gelfand (2008). The nested Dirichlet process. J. Amer. Statist. Assoc. 103, Teh & Jordan (2010). Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics, Cambridge University Press. Teh, Jordan, Beal & Blei (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, Waler (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simul. Computat. 36, / 42

Dependent hierarchical processes for multi armed bandits

Dependent hierarchical processes for multi armed bandits Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino

More information

On the posterior structure of NRMI

On the posterior structure of NRMI On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline

More information

Dependent mixture models: clustering and borrowing information

Dependent mixture models: clustering and borrowing information ISSN 2279-9362 Dependent mixture models: clustering and borrowing information Antonio Lijoi Bernardo Nipoti Igor Pruenster No. 32 June 213 www.carloalberto.org/research/working-papers 213 by Antonio Lijoi,

More information

A marginal sampler for σ-stable Poisson-Kingman mixture models

A marginal sampler for σ-stable Poisson-Kingman mixture models A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina

More information

Compound Random Measures

Compound Random Measures Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

On some distributional properties of Gibbs-type priors

On some distributional properties of Gibbs-type priors On some distributional properties of Gibbs-type priors Igor Prünster University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics Workshop ICERM, 21st September 2012 Joint work with: P. De Blasi,

More information

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,

More information

Slice sampling σ stable Poisson Kingman mixture models

Slice sampling σ stable Poisson Kingman mixture models ISSN 2279-9362 Slice sampling σ stable Poisson Kingman mixture models Stefano Favaro S.G. Walker No. 324 December 2013 www.carloalberto.org/research/working-papers 2013 by Stefano Favaro and S.G. Walker.

More information

Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi

More information

Discussion of On simulation and properties of the stable law by L. Devroye and L. James

Discussion of On simulation and properties of the stable law by L. Devroye and L. James Stat Methods Appl (2014) 23:371 377 DOI 10.1007/s10260-014-0269-4 Discussion of On simulation and properties of the stable law by L. Devroye and L. James Antonio Lijoi Igor Prünster Accepted: 16 May 2014

More information

Bayesian Nonparametric Models on Decomposable Graphs

Bayesian Nonparametric Models on Decomposable Graphs Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

arxiv: v1 [math.st] 28 Feb 2015

arxiv: v1 [math.st] 28 Feb 2015 Are Gibbs type priors the most natural generalization of the Dirichlet process? P. De Blasi 1, S. Favaro 1, A. Lijoi 2, R.H. Mena 3, I. Prünster 1 and M. Ruggiero 1 1 Università degli Studi di Torino and

More information

Two Tales About Bayesian Nonparametric Modeling

Two Tales About Bayesian Nonparametric Modeling Two Tales About Bayesian Nonparametric Modeling Pierpaolo De Blasi Stefano Favaro Antonio Lijoi Ramsés H. Mena Igor Prünster Abstract Gibbs type priors represent a natural generalization of the Dirichlet

More information

Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation Truncation error of a superposed gamma process in a decreasing order representation Julyan Arbel Inria Grenoble, Université Grenoble Alpes julyan.arbel@inria.fr Igor Prünster Bocconi University, Milan

More information

On the Truncation Error of a Superposed Gamma Process

On the Truncation Error of a Superposed Gamma Process On the Truncation Error of a Superposed Gamma Process Julyan Arbel and Igor Prünster Abstract Completely random measures (CRMs) form a key ingredient of a wealth of stochastic models, in particular in

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Defining Predictive Probability Functions for Species Sampling Models

Defining Predictive Probability Functions for Species Sampling Models Defining Predictive Probability Functions for Species Sampling Models Jaeyong Lee Department of Statistics, Seoul National University leejyc@gmail.com Fernando A. Quintana Departamento de Estadísica, Pontificia

More information

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Unit-rate Poisson representations of completely random measures

Unit-rate Poisson representations of completely random measures Unit-rate Poisson representations of completely random measures Peter Orbanz and Sinead Williamson Abstract: Constructive definitions of discrete random measures, which specify a sampling procedure for

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

arxiv: v2 [math.st] 27 May 2014

arxiv: v2 [math.st] 27 May 2014 Full Bayesian inference with hazard mixture models Julyan Arbel a, Antonio Lijoi b,a, Bernardo Nipoti c,a, arxiv:145.6628v2 [math.st] 27 May 214 a Collegio Carlo Alberto, via Real Collegio, 3, 124 Moncalieri,

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Hierarchical Species Sampling Models

Hierarchical Species Sampling Models Hierarchical Species Sampling Models Federico Bassetti a Roberto Casarin b Luca Rossini c a Polytechnic University of Milan, Italy b Ca Foscari University of Venice, Italy c Free University of Bozen-Bolzano,

More information

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation Antonio LIJOI, Igor PRÜNSTER, andstepheng.walker The past decade has seen a remarkable development in the area of Bayesian

More information

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT

More information

arxiv: v1 [stat.ml] 20 Nov 2012

arxiv: v1 [stat.ml] 20 Nov 2012 A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College

More information

Random Partition Distribution Indexed by Pairwise Information

Random Partition Distribution Indexed by Pairwise Information Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore

More information

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

An ergodic theorem for partially exchangeable random partitions

An ergodic theorem for partially exchangeable random partitions Electron. Commun. Probab. 22 (2017), no. 64, 1 10. DOI: 10.1214/17-ECP95 ISSN: 1083-589X ELECTRONIC COMMUNICATIONS in PROBABILITY An ergodic theorem for partially exchangeable random partitions Jim Pitman

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Full Bayesian inference with hazard mixture models

Full Bayesian inference with hazard mixture models ISSN 2279-9362 Full Bayesian inference with hazard mixture models Julyan Arbel Antonio Lijoi Bernardo Nipoti No. 381 December 214 www.carloalberto.org/research/working-papers 214 by Julyan Arbel, Antonio

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

Bayesian Nonparametric Models for Ranking Data

Bayesian Nonparametric Models for Ranking Data Bayesian Nonparametric Models for Ranking Data François Caron 1, Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College

More information

A Bayesian hierarchical model for related densities using Polya trees

A Bayesian hierarchical model for related densities using Polya trees A Bayesian hierarchical model for related densities using Polya trees arxiv:1710.01702v2 [stat.me] 6 Oct 2017 Jonathan Christensen Duke University October 9, 2017 Abstract Li Ma Duke University Bayesian

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS Applied Probability Trust (29 September 2014) REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS HARRY CRANE, Rutgers University PETER MCCULLAGH, University of Chicago Abstract We study k-divisible

More information

Exchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process

Exchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process Exchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process Notes for Oxford Statistics Grad Lecture Benjamin Bloem-Reddy benjamin.bloem-reddy@stats.ox.ac.uk

More information

New Dirichlet Mean Identities

New Dirichlet Mean Identities Hong Kong University of Science and Technology Isaac Newton Institute, August 10, 2007 Origins CIFARELLI, D. M. and REGAZZINI, E. (1979). Considerazioni generali sull impostazione bayesiana di problemi

More information

Nonparametric Mixed Membership Models

Nonparametric Mixed Membership Models 5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

arxiv: v1 [stat.ml] 8 Jan 2012

arxiv: v1 [stat.ml] 8 Jan 2012 A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process

More information

Department of Statistics. University of California. Berkeley, CA May 1998

Department of Statistics. University of California. Berkeley, CA May 1998 Prediction rules for exchangeable sequences related to species sampling 1 by Ben Hansen and Jim Pitman Technical Report No. 520 Department of Statistics University of California 367 Evans Hall # 3860 Berkeley,

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Poisson Latent Feature Calculus for Generalized Indian Buffet Processes

Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Lancelot F. James (paper from arxiv [math.st], Dec 14) Discussion by: Piyush Rai January 23, 2015 Lancelot F. James () Poisson Latent

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Lifeng Ye (UCL, London,

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

A Bayesian semiparametric partially PH model for clustered time-to-event data

A Bayesian semiparametric partially PH model for clustered time-to-event data A Bayesian semiparametric partially PH model for clustered time-to-event data Bernardo Nipoti School of Computer Science and Statistics, Trinity College, Dublin, Ireland Alejandro Jara Department of Statistics,

More information

Bayesian Nonparametric Mixture, Admixture, and Language Models

Bayesian Nonparametric Mixture, Admixture, and Language Models Bayesian Nonparametric Mixture, Admixture, and Language Models Yee Whye Teh University of Oxford Nov 2015 Overview Bayesian nonparametrics and random probability measures Mixture models and clustering

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

Distance dependent Chinese restaurant processes

Distance dependent Chinese restaurant processes David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232

More information

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Bayesian Nonparametric Learning of Complex Dynamical Phenomena Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),

More information

On the Stick-Breaking Representation for Homogeneous NRMIs

On the Stick-Breaking Representation for Homogeneous NRMIs Bayesian Analysis 216 11, Number3,pp.697 724 On the Stick-Breaking Representation for Homogeneous NRMIs S. Favaro,A.Lijoi,C.Nava,B.Nipoti,I.Prünster,andY.W.Teh Abstract. In this paper, we consider homogeneous

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Supplementary material for. Aclassofhazardratemixtures. for combining survival data from di erent experiments

Supplementary material for. Aclassofhazardratemixtures. for combining survival data from di erent experiments Supplementary material for Aclassofhazardratemixtures for combining survival data from di erent eriments Antonio Lijoi Department of conomics and Management University of Pavia, 271 Pavia, Italy email:

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors

Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors Antonio LIJOI, RamsésH.MENA, and Igor PRÜNSTER In recent years the Dirichlet process prior has experienced a great success in the context

More information

Hierarchical Dirichlet Processes

Hierarchical Dirichlet Processes Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley

More information

Bayesian nonparametric model for sparse dynamic networks

Bayesian nonparametric model for sparse dynamic networks Bayesian nonparametric model for sparse dynamic networks Konstantina Palla, François Caron and Yee Whye Teh 23rd of February, 2018 University of Glasgow Konstantina Palla 1 / 38 BIG PICTURE Interested

More information

Flexible Bayesian Nonparametric Priors and Bayesian Computational Methods

Flexible Bayesian Nonparametric Priors and Bayesian Computational Methods Universidad Carlos III de Madrid Repositorio institucional e-archivo Tesis http://e-archivo.uc3m.es Tesis Doctorales 2016-02-29 Flexible Bayesian Nonparametric Priors and Bayesian Computational Methods

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Bayesian nonparametric models of sparse and exchangeable random graphs

Bayesian nonparametric models of sparse and exchangeable random graphs Bayesian nonparametric models of sparse and exchangeable random graphs F. Caron & E. Fox Technical Report Discussion led by Esther Salazar Duke University May 16, 2014 (Reading group) May 16, 2014 1 /

More information

Applied Nonparametric Bayes

Applied Nonparametric Bayes Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Dahua Lin Toyota Technological Institute at Chicago dhlin@ttic.edu Abstract Reliance on computationally expensive

More information

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

On the Fisher Bingham Distribution

On the Fisher Bingham Distribution On the Fisher Bingham Distribution BY A. Kume and S.G Walker Institute of Mathematics, Statistics and Actuarial Science, University of Kent Canterbury, CT2 7NF,UK A.Kume@kent.ac.uk and S.G.Walker@kent.ac.uk

More information

Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes

Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes Journal of Machine Learning Research 1 (2) 1-48 Submitted 4/; Published 1/ Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes Yee Whye Teh y.w.teh@stats.ox.ac.uk

More information

Some Developments of the Normalized Random Measures with Independent Increments

Some Developments of the Normalized Random Measures with Independent Increments Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 46-487 c 2006, Indian Statistical Institute Some Developments of the Normalized Random Measures with Independent Increments Laura

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information