Dependent Random Measures and Prediction
|
|
- Edward Phelps
- 5 years ago
- Views:
Transcription
1 Dependent Random Measures and Prediction Igor Prünster University of Torino & Collegio Carlo Alberto 10th Bayesian Nonparametric Conference Raleigh, June 26, 2015 Joint wor with: Federico Camerlenghi, Antonio Lijoi, Bernardo Nipoti and Peter Orbanz 1 / 42
2 Outline Exchangeability & Partial Exchangeability Dependence and its extreme cases Dependent Random Measures with Additive Structure Dependent Random Measures with Nested Structure Dependent Random Measures with Hierarchical Structure 2 / 42
3 Exchangeability & Partial Exchangeability Exchangeability Exchangeability Probabilistic statement concerning homogeneity (symmetry, analogy) between past and future justifies induction = Inferences are not affected by the order of the observations Suitable mathematical framewor for inference alternative to the classical approach with fixed and unnown probability distribution P = A sequence of observations (X n ) n 1 is exchangeable if (X 1, X 2,..., X n ) d = (X π(1), X π(2),..., X π(n) ) for any n 1 and any permutation π of (1,..., n). de Finetti s representation theorem (X n ) n 1 is exchangeable if and only if n P[X 1 A 1,..., X n A n ] = P(A i )Q(dP) P i=1 where P is the space of probability measures on X. = Q is the de Finetti measure of (X n ) n 1 and acts as a prior distribution for Bayesian inference being the law of a random probability measure P. 3 / 42
4 Exchangeability & Partial Exchangeability Beyond Exchangeability In many applications with real data, dependence structures more general than exchangeability are required. Indeed, in de Finetti (1938): But the case of exchangeability can only be considered as a limiting case: the case in which this analogy is, in a certain sense, absolute for all events under consideration. [..] To get from the case of exchangeability to other cases which are more general but still tractable, we must tae up the case where we still encounter analogies among the events under consideration, but without attaining the limiting case of exchangeability. nonparametric regression (covariate indexed observations) time dependent data change point problems data from different related studies or experiments... 4 / 42
5 Exchangeability & Partial Exchangeability Partial exchangeability Partial exchangeability A sequence (X, Y ) = (X i, Y j ) i,j 1 is partially exchangeable if (X 1,..., X n1, Y 1,..., Y n2 ) = d (X π(1),1,..., X π(n1), Y φ(1),..., Y φ(n2)) for any n 1, n 2 1 and any permutations π and φ of (1,..., n 1 ) and (1,..., n 2 ). de Finetti s representation theorem (X, Y ) = (X i, Y j ) i,j 1 is partially exchangeable if and only if P [X 1 A 1,..., X n1 A n1, Y 1 B 1,..., Y n2 B n2 ] n 1 n 2 = P 1 (A i ) P 2 (B j ) Q(dP 1, dp 2 ). P 2 i=1 j=1 = This is the same as saying that (X i, Y j ) P 1, P 2 iid P 1 P 2 i, j 1 ( P 1, P 2 ) Q with ( P 1, P 2 ) a vector of dependent random probabilities and Q is the prior. 5 / 42
6 Exchangeability & Partial Exchangeability Discrete partial exchangeable priors Induced partition structure Two samples X (n1) = {X 1,..., X n1 } and Y (n2) = {Y 1,..., Y n2 } from partially exchangeable sequences (X j ) j 1 and (Y i ) i 1 Prior Q selects almost surely discrete random probabilities ( P 1, P 2 ) = ties within each sample and possibly also across different samples. Partition of [N] = {1,..., n 1 + n 2 } induced by X (n1) and Y (n2) into 1 distinct values specific to X (n1) with frequencies n X j (j = 1,..., 1 ) 2 distinct values specific to Y (n2) with frequencies n Y j (j = 1,..., 2 ) 0 distinct values shared by the two samples with frequencies q X r and q Y r (r = 1,..., 0 ). partially Exchangeable Partition Probability Function (peppf) Π (N) (n X, n Y, q X, q Y )=E X 1 P nx j j=1 2 1 (dx j) where N = n 1 + n 2 and = l=1 0 2 (dy l) P ny l r=1 P qx r 1 (dz r ) P qy r 2 (dz r ) The expression of Π (n1+n2) depends on the choice of Q and is ey for prediction. 6 / 42
7 Exchangeability & Partial Exchangeability Prediction over additional samples A structural predictive property Goal: Conditional on (X (n1), Y (n2) ), predict features of an additional sample X (m n1) = X n1+1,..., X n1+m Interest in probability of sampling values in Y (n2) but not X (n1) : A new,old = Y (n2) \ X (n1) # ( A new,old X (m n1)) = # observations in X (m n1) belonging to A new,old The corresponding estimate is linear in the additional sample size m E [ # ( A new,old X (m n1)) X (n1), Y (n2)] = m P[X n1+1 A new,old X (n1), Y (n2) ] Remars: Such a property holds true whatever the prior Q on P 2 The same property holds true for the second sample and even when considering more than two samples Linearity is due to the identity in distribution conditional on the observed samples (X (n1), Y (n2) ) = structural property of partial exchangeability 7 / 42
8 Exchangeability & Partial Exchangeability Prediction over additional samples Same conclusions hold true when estimating leading to # ( A old,new X (m n1)) with A old,new = X (n1) \ Y (n2) # ( A old,old X (m n1)) with A old,old = X (n1) Y (n2) # ( A new,new X (m n1)) with A new,new = {X (n1) Y (n2) } c E[# ( A, X (m n1)) X (n1), Y (n2) ] = m P[X n1+1 A, X (n1), Y (n2) ] over all 4 sets X n1+i, i = 1,..., m, will belong to. = The expected proportion of observations, from an additional sample, that coincide with either new or shared or sample specific distinct values is the same, for any m, and equals P[X n1+1 A, X (n1), Y (n2) ] 8 / 42
9 Dependence and its extreme cases Measuring dependence Extreme cases of dependence induced by the prior Q Maximal dependence full exchangeability i.e. Q is degenerate on the diagonal {(P 1, P 2 ) P 2 : P 1 = P 2 }, namely P 1 = P 2 (a.s.) Independence P 1 and P 2 are (unconditionally) independent with respect to Q. = corresponds to maximal heterogeneity: inference on each sample is not influenced by the observations from the other sample. Example: Correlation is a popular measure of dependence. Since Corr( P 1 (A), P 2 (A)), typically does not depend on A it is taen as a measure of overall dependence. With respect to the extreme cases P1 = P 2 (a.s.) = Corr( P 1 (A), P 2 (A)) = 1 P 1 P 2 = Corr( P 1 (A), P 2 (A)) = 0 but still it is not an ideal measure of dependence for ( P 1, P 2 ). 9 / 42
10 Dependence and its extreme cases Dependence via stic breaing Strategies for constructing dependent nonparametric priors Most popular approach to the definition of dependent random probability measures is via dependent stic-breaing constructions introduced by MacEachern (1999, 2000): Let P 1 = p i,1 δ Zi,1 P2 = p j,2 δ Zj,2 with i 1 j 1 i 1 j 1 p i,1 = W i,1 (1 W l,1 ) p j,2 = W j,2 (1 W l,2 ). l=1 and dependence between P 1 and P 2 is induced by specifying dependent stic breaing weights (W i,1, W j,2 ) dependent locations Z j,1 and Z j,2 Pros: quite straightforward implementation of conditional simulation algorithms such as, e.g., slice sampler (Waler, 2007) or the retrospective sampler (Papaspiliopoulos & Roberts, 2008) Cons: it is almost impossible to derive analytic expressions for quantities of interest both marginal and conditional l=1 10 / 42
11 Dependence and its extreme cases Dependence via random measures Dependent priors via transformed random measures Alternative approach based on general random measures with 3 possible strategies (and combinations thereof) for creating dependance: 1. Additive structures Let M i be random measures (i = 0, 1, 2) and define P 1 = T ( M 1 + M 0 ) P 2 = T ( M 2 + M 0 ) where T is a suitable transformation (e.g. normalization). 2. Hierarchical structures Let P 0 = T ( M 0 ) be such that T ( M i ) P 0 = P i P 0 Q i ( P 0 ) with E[ P i P 0 ] = P 0, for i = 1, 2 P 0 Q 0 (P 0 ) with P 0 = E[ P 0 ] 3. Nested structures Let R = T ( M) be a random probability measure on P ( P 1, P 2 ) R iid R R L (Q 0 ) with Q 0 = E[ R] the law of a random probability P 0 = T ( M 0 ) on X. = Strategies already used in, respectively, Müller, Quintana & Rosner (2004), Teh, Jordan, Beal & Blei (2006), Rodriguez, Dunson & Gelfand (2008) relying on stic breaing constructions instead of the general theory of random measures. 11 / 42
12 Dependence and its extreme cases Completely random measures Completely random measures Let M be the space of boundedly finite measures on X. Completely random measures (Kingman, 1967) A random element µ taing values in M such that µ(a 1 ), µ(a 2 ),... µ(a d ) are mutually independent for any d 1 and for any collection of pairwise disjoint sets A 1,..., A d said to be a completely random measure (CRM). is The realizations of a CRM are a.s. discrete and, assuming there are no fixed points of discontinuity, a CRM can be represented as µ( ) = J i δ Zi ( ) with both the locations Z i s and the positive jumps J i s random. i=1 12 / 42
13 Dependence and its extreme cases Completely random measures A CRM µ is uniquely characterized by its Laplace functional ] [e = e E g(x) µ(dx) X [1 e v g(x) ]ν(dv,dx) R + X with ν indicating the intensity measure, which characterizes the CRM µ. Two cases Homogenous CRM µ: jumps (Ji ) i 1 & locations (Z i ) i 1 independent ν(ds, dx) = α(dx) ρ(s) ds Non-homogeneous CRM µ: (Ji ) i 1 & (Z i ) i 1 dependent ν(ds, dx) = α(dx) ρ x (s) ds Noteworthy examples: 1. gamma CRM: ν(dv, dx) = e τv v dv α(dx) 2. σ stable CRM: ν(dv, dx) = σ dv α(dx) Γ(1 σ) v 1+σ 3. generalized gamma CRM: ν(dv, dx) = σ e τv dv α(dx) Γ(1 σ) v 1+σ 13 / 42
14 Dependence and its extreme cases Transformations of CRMs CRM based priors Normalized completely random measures (Regazzini, Lijoi and P., 2003) Let µ be a CRM on X such that 0 < µ(x) < a.s. Then P( ) = µ( ) µ(x) is a normalized completely random measure (NRMI=Normalized Random Measure with Independent increments). = characterized by the Lévy intensity ν associated to µ. Hazard rate mixture models (Dystra & Laud, 1981; James, 2005) CRM µ on Y and ernel (t, y) on R + Y h(t) = (t, y) µ(dy) = Survival function P((t, { t )) = exp h(s) } ds Y Other examples (see Lijoi & P., 2010): NTR priors, Pitman Yor process, cumulative hazard priors, / 42
15 Dependence and its extreme cases EPPF Partition structure in the exchangeable case With CRM based priors, i.e. P = T ( µ), in the exchangeable case one has Marginal and predictive distributions of the data Characterization of the posterior of P X1,..., X n MCMC sampling schemes Applications to: density estimation, clustering, species sampling problems, inference on functionals of P,... Most practical implementations with discrete P rely on the exchangeable partition probability function (EPPF) (Pitman, 1995) induced by P Π (n) (n 1,..., n ) = E P nj (dx j ) X If P is a NRMI, one has an explicit EPPF in terms of the intensity ν of µ. E.g., if P is a Dirichlet process (Ferguson, 1973) with base measure θ P 0, then i=1 Π (n) (n 1,..., n ) = θ (θ) n Γ(n j ) j=1 which corresponds to the celebrated Ewens sampling formula (Ewens, 1972). 15 / 42
16 Dependent Random Measures with Additive Structure GM dependent CRMs Additive structure: GM dependent CRMs Construct dependent (homogeneous and finite) CRMs ( µ 1, µ 2 ) by means of an additive structure with a random element in common: µ 1 = µ 1 + µ 0 µ 2 = µ 2 + µ 0 where µ 0, µ 1, µ 2 are mutually independent CRMs with random Lévy intensity ν(dv, dx) = θzp 0 (dx)ρ(v)dv for µ 1, µ 2 ν 0 (dv, dx) = θ(1 Z)P 0 (dx)ρ(v)dv for µ 0 and Z is some [0, 1] valued random variable. = Griffiths-Milne dependent CRMs: ( µ 1, µ 2 ) GM-CRM Remars: Marginally µ 1 and µ 2 are CRMs with ν(dv, dx) = θp 0 (dx)ρ(v)dv. If Z = 1, a.s., then µ 1 µ 2 If Z = 0, a.s., then µ 1 = µ 2 a.s. 16 / 42
17 Dependent Random Measures with Additive Structure GM dependent NRMI Additive structure: GM dependent NRMIs GM dependent NRMI (Lijoi, Nipoti & P, 2014) If ( µ 1, µ 2 ) GM-CRM with ρ(r + ) =, then P 1 = µ 1 µ 1 (X) = µ 1 + µ 0 µ 1 (X) + µ 0 (X), P 2 = µ 2 µ 2 (X) = µ 2 + µ 0 µ 2 (X) + µ 0 (X) define a vector of GM dependent NRMI, ( P 1, P 2 ) GM-NRMI. The distribution Q of ( P 1, P 2 ) attains the two extreme cases: Z = 1 = P 1 P 2 (Unconditional independence) Z = 0 = P 1 = P 2 (a.s.) (full exchangeability) Non negative correlation between P 1 (A) and P 2 (A), for any A, i.e. Corr z ( P 1 (A), P 2 (A)) = (1 z) I(θ, z) where I(θ, z) > 0 is expressed in terms of ν Discreteness of ( P 1, P 2 ) = ties within and across the two samples P[X i = Y l ] = with τ q (u) = s q e us ρ(s) ds 0 0 u e θ ψ(u) τ 2 (u) du > 0 and ψ(u) = [1 e us ] ρ(s) ds 0 17 / 42
18 Dependent Random Measures with Additive Structure GM dependent NRMI peppf induced by X (n 1) and Y (n 2) Π (n 1+n 2 ) (n X, n Y, q X, q Y θ z) = Γ(n 1) Γ(n 2) j=1 ( ) (1 z) 0+ i + l z 1+ 2 i l u n 1 1 v n 2 1 e θz[ψ(u)+ψ(v)] θ(1 z)ψ(u+v) τ n X (u + i j jv) 2 j=1 τ n Y (l j ju + v) 0 τ q X r +qr Y r=1 (u + v) du dv where the sum runs over the set of all vectors i = (i 1,..., i 1 ) {0, 1} 1 l = (l 1,..., l 2 ) {0, 1} 1 2 with i = j=1 ij and l = 2 lj. j=1 and z = 1 = 0 = 0 = product of marginal EPPFs Π (n 1+n 2 ) (n X, n Y, q X, q Y 1) = Π (n 1) 1 (n X 1,..., nx 1 ) Π (n 2) 2 (n Y 1,..., ny 2 ) z = 0 = i = 1 & l = 2 = global EPPF Π (n 1+n 2 ) (n X, n Y, q X, q Y θ 0) = Γ(n 1 + n 2) 0 s n 1+n 2 1 e θψ(s) 1 j=1 τ n X (s) j 2 i=1 τ n Y (s) i 0 τ q X r +qr Y r=1 (s) ds 18 / 42
19 Dependent Random Measures with Additive Structure GM dependent NRMI Invariance condition: If λ i is any permutation of the vector of integers (1,..., i ), with i = 0, 1, 2, and λ i w = (w λi (1),..., w λi ( i )), then Π (n1+n2) (n X, n Y, q) = Π (n1+n2) (λ 1 n X, λ 2 n Y, λ 0 q) = exchangeability within three separate groups of clusters those with non shared values for the two samples those shared by the two samples peppf for GM dependent Dirichlet processes is in terms of hypergeometric functions 3 F 2, with some numerical issues peppf for GM dependent σ stable processes simpler and easier to evaluate For MCMC sampling: augmentation with suitable auxiliary random variables Application to dependent mixtures f l (x) = (x; θ) p l (dθ) l = 1, 2 for density estimation and clustering Θ 19 / 42
20 Dependent Random Measures with Nested Structure The Model Nested models Nested Dirichlet process introduced in Rodriguez, Dunson and Gelfand (2008) can be generalized to nested homogeneous NRMIs Nested NRMI (X i, Y j ) ( P 1, P 2 ) ind P 1 P 2 ( P 1, P 2 ) R iid R R L (Q 0 ) Q 0 ( ) = P[ P 0 ] P 0 NRMI In other terms R is a nested random probability measure such that iid R = ω j δ Gj G j P 0 P0 = γ i δ θi Since R is discrete j 1 i 1 π 1 = P[ P 1 = P 2 ] = θ u e θ ψ(u) τ 2 (u) du > / 42
21 Dependent Random Measures with Nested Structure Partition structure Nested NRMI: The peppf associated to a nested NRMI is Π (N) (n X, n Y, q X, q Y ) = π 1 Π (n1+n2) (n X, n Y, q X + q Y ) +(1 π 1 ) Π (n1) }{{} 1 (n X ) Π (n2) 2 (n Y ) I {0} ( 0 ) }{{} joint exchangeability (unconditional) independence = Linear combination of exchangeability and (unconditional) independence EPPF with exchangeability of all N = n 1 + n 2 observations in both samples Π (N) (n X, n Y, q X + q Y ) = θ 0 Γ(N) 0 u N 1 e θ 0 ψ 0 (u) Marginal EPPF in the independence case Π (n) (n) = θ 0 Γ(n) 0 1 j=1 τ (0) (u) n X j u n 1 e θ 0 ψ 0 (u) 2 i=1 j=1 τ (0) (u) n Y i 0 r=1 τ n (0) j (u) du with ψ 0(u) = [1 e us ] ρ 0 0(s) ds & τ m (0) (u) = s m e us ρ 0 0(s) ds τ (0) (u) du qr X +qy r 21 / 42
22 Dependent Random Measures with Nested Structure Partition structure Nested NRMI: The peppf associated to a nested NRMI is Π (N) (n X, n Y, q X, q Y ) = π 1 Π (n1+n2) (n X, n Y, q X + q Y ) +(1 π 1 ) Π (n1) }{{} 1 (n X ) Π (n2) 2 (n Y ) I {0} ( 0 ) }{{} joint exchangeability (unconditional) independence Extreme cases of full exchangeability and (uncond.) independence are obtained If X (n1) Y (n2), i.e. the two samples share at least one distinct value: ] P [ P 1 = P 2 X (n 1), Y (n2) = 1 A single distinct value in common between the two samples, i.e. 0 1, implies that, conditionally on the data, P 1 = P 2 with probability 1 and one has full exchangeablity (no heterogeneity) Restrictive! Need a model that still accommodates for heterogeneity, despite the presence of shared observations across samples = additive structure: let both P 1 and P 2 depend on an idyosincratic component and a common random probability measure P 0, which explains ties with positive probability and avoids the degeneracy. 22 / 42
23 Dependent Random Measures with Nested Structure Alternative Model Nested NRMI with additive structure Nested NRMI with additive homogeneous component (X i, Y j ) ( P 1, P 2 ) iid P 1 P 2 (P 1, P 2, P 0 ) ( R, R ) R 2 R ξ is a [0, 1] valued random variable P l d = ξ Pl + (1 ξ) P 0 l = 1, 2 R is a nested random probability measure & R R R = δ P 0 and P 0 d = P 0 Conditionally on latent allocation variables ζ, the peppf is { Π (N) (n X, n Y, q X, q Y ; ζ) = Π (N v) π 1 Π (v) + (1 π 1 ) Π (v1) 1 Π (v2) 2 = Even with shared observations the component of the peppf that accounts for heterogeneity does not vanish Π (v1) Π (v2) > } 23 / 42
24 Dependent Random Measures with Nested Structure Illustration Illustrative example: Density estimation (X i, Y j ) (θ 1,i, θ 2,j ) ind h( ; θ 1,i ) h( ; θ 2,j ) (θ 1,i, θ 2,j ) iid P 1 P 2 Gaussian ernel h( ; (M, V )) with (M, V ) R R + Nested prior with additive structure based on stable NRMI: P 1 = ξp 1 + (1 ξ)p 0 P2 = ξp 2 + (1 ξ)p 0 with standard choices of (conjugate) base measure and hyperpriors = Comparison: Standard nested vs. nested with additive structure ξ = 1 (a.s.) vs ξ Unif(0, 1) Simulate n 1 =n 2 = 100 iid values from X N(5, 0.6) N(10, 0.6) X N(0, 0.6) + 1 N(5, 0.6) 2 and estimate marginal densities f 1 and f 2 (also clustering is possible) Marginal MCMC algorithm readily derived from peppf. 24 / 42
25 Dependent Random Measures with Nested Structure Illustration True density and estimated density Figure: Model with ξ = 1 Figure: Model with ξ Unif(0, 1) 25 / 42
26 Dependent Random Measures with Hierarchical Structure Hierarchical NRMI models Hierarchical processes Hierarchical (homogeneous) NRMI for partially exchangeable data (X 1,i, X 2,j ) ( P 1, P 2 ) iid P 1 P 2 (j 1, j 2 ) N 2 P i P 0 NRMI( P 0 ) i = 1, 2 P 0 = NRMI(P 0 ) with P 0 a non atomic measure on X. = If the NRMI coincides with the Dirichlet process, one obtains the hierarchical Dirichlet process (HDP) introduced in Teh, Jordan, Beal and Blei (2006). See also Teh & Jordan (2010). Data generated by the model are: X (ni ) = (X i,1,..., X i,ni ) for i = 1, 2 data will contain ties with positive probability within and across samples leading to X (ni ) having i distinct values specific to sample i with frequencies (n i,1,..., n i,i ) for i = 1, 2, and 0 values shared by the two samples with frequencies (q i,1,..., q i,0 ). Goal: Derive the distribution of the partition structure. 26 / 42
27 Dependent Random Measures with Hierarchical Structure Chinese restaurant franchise metaphor Chinese restaurant franchise metaphor Observable level: customers and dishes There are d = 2 restaurants sharing the same menu. X i,j : label of the dish served at restaurant i to customer j Each table is served the same dish chosen by 1st customer seated at that table The same dish can be served at different tables within the same restaurant and across different restaurants. The sample information is best recorded by = distinct dishes served either restaurant to N = n1 + n 2 customers with corresponding frequency vectors ni = (n i,1,..., n i, ), where n i,j 0 and n 1 and n 2 will contain 2 and 1 0 s, respectively. Latent level: tables (governed by P 0 ) l i,j is the number of tables in restaurant i serving dish j whose range is {1,..., n i,j } if dish j is served at restaurant i and 0 otherwise. l j = 2 i=1 l i,j is then the total number of tables serving dish j for j = 1,..., and l the overall number of tables. l : total number of tables in the two restaurants 27 / 42
28 Dependent Random Measures with Hierarchical Structure Augmented partition structure A suitable augmentation provides a more refined partition structure involving also: q i,j,t : frequency of customers at restaurant i eating dish j and sitting at table t q i,j = (q i,j,1,..., q i,j,li,j ): frequency vector of customers in restaurant i eating dish j at each of the l i,j tables. By marginalizing over the tables one re-obtains the observed frequencies n i,j = q i,j = l i,j t=1 q i,j,t. First target: characterize the joint distribution of the partitions of the N = n 1 + n 2 customers into the = distinct dishes being served in the 2 restaurants the q i,j customers among the l i,j tables in restaurant i serving dish j, for i = 1, 2 and j = 1,..., 2 Φ ( l ),0 (l 1,, l ) Φ (ni ) l }{{} (q i,i i,1,..., q i, ) i=1 }{{} EPPF on the number of tables multivariate partition distribution where Φ (n),0 is the EPPF associated to P 0 and Φ (n),i is deduced from ( P 1, P 2 ) 28 / 42
29 Dependent Random Measures with Hierarchical Structure peppf for the general case Partially exchangeable partition probability function peppf of a hierarchical NRMI Π (n) (n 1,, n ) = q l l is a sum over all partitions q Φ ( l ),0 (l 1,, l ) 2 i=1 Φ (ni ) l i,i (q i,1,..., q i, ) is a sum over all compatible table configurations, i.e. over l i,j {1,..., n i,j } with l i,j = 0 if n i,j = 0. Partition probability function with the constraint q i,j = n i,j Φ (ni ) l (q i,i i,1,..., q i, ) = θ l i Γ(n i ) 0 u ni 1 e θψ(u) l i,j τ qi,j,t (u)du. j=1 t=1 Φ ( l ),0 (l 1,, l ) is the EPPF associated to P 0 NRMI(P 0 ). = Unlie GM dependent and nested processes, hierarchical processes do not achieve the limiting cases of joint exchangeability or (unconditional) independence. 29 / 42
30 Dependent Random Measures with Hierarchical Structure peppf for the HDP & hierarchical σ stable process peppf for the HDP & hierarchical σ stable process peepf HDP Π (n) (n θ0 1,..., n ) = 2 (θ) i=1 n i l θ l (θ 0 ) l ( l j 1)! j=1 2 s(n i,j, l i,j ) where s(r, s) is the absolute value of the Stirling number of the first ind. Probabilistic interpretation: Φ ( l ),0 ( l 1,..., l ): EPPF of a Dirichlet process with base measure θ 0 P 0 K ni,j is the number of distinct tables in which the n i,j customers are seated, according to a Dirichlet process with base measure θp 0 Π (n) (n 1,..., n ) = 2 (θ) j=1 n i,j i=1 (θ) ni peepf hierarchical normalized σ stable l Φ ( l ),0 (l 1,, l ) i=1 2 i=1 j=1 P[K ni,j = l i,j ] σ 1 0 Γ() σ l 2 2 Γ( l i=1 i ) 2 i=1 Γ(ni ) Γ( l ) l j=1 (1 σ 0) l j 1 2 i=1 j=1 C (n i,j, l i,j; σ) σ l i,j where C (n, l; σ) is the generalized factorial coefficient. 30 / 42
31 Dependent Random Measures with Hierarchical Structure peppf for the hierarchical PY model peppf for the hierarchical Pitman Yor model With σ, σ 0 (0, 1) and θ, θ 0 > 0, the hierarchy of random probabilities is ( P 1, P 2 ) P 0 iid PY(σ0, θ 0 ; P 0 ), P 0 PY(σ, θ; P 0 ) Possible constructions of the Pitman-Yor (PY) process: randomization of the total mass c of a generalized gamma NRMI (Pitman & Yor, 1997). peppf of HPY processes Π (n) (n 1,..., n ) = l 1 r=1 (θ 0 + rσ 0 ) (θ 0 + 1) l 1 (1 σ 0 ) l j 1 j=1 2 i=1 li 1 (θ + rσ) r=1 (θ + 1) ni 1 j=1 C (n i,j, l i,j ; σ) σ li,j the number n i,j of customers eating dish j in restaurant i are allocated to K ni,j tables generated by a PY(σ, θ; P 0 ) process the global tables s configuration is weighted by the EPPF of the higher level PY(θ 0, σ 0 ; P 0 ) process the peppf is obtained by marginalizing with respect to the table configuration. 31 / 42
32 Dependent Random Measures with Hierarchical Structure Exchangeable case EPPF of a hierarchical NRMI prior (d = 1) d = 1 = exchangeable case with de Finetti measure a hierarchical process: Π (n) (n 1,..., n ) = Φ ( l ),0 (l 1,..., l ) Φ (n) l (q 1,..., q ) l q = A similar formula can be given for hierarchical Pitman Yor processes. From this result one recovers two coagulation properties (Pitman, 1999): If P is a hierarchical stable NRMI Π (n) (n 1,..., n ) = (σ σ 0) 1 Γ() Γ(n) (1 σ σ 0 ) nj 1 i.e. P is itself a stable NRMI with parameter σσ 0 (0, 1). If P is a HPY process with θ = θ 0 σ, then P PY(σσ 0, θ; P 0 ) j=1 32 / 42
33 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Prediction with hierarchical processes Conditional on observed samples X (n1) and Y (n2), interest in prediction of features related to additional future samples X n1+1,..., X n1+m 1 Y n2+1,..., Y n2+m 2 Species sampling: X i s and Y j s are species labels with species shared within and between samples and the goal is to estimate e.g.: the number of new distinct species that will be observed the probability that (ni + m i + 1)-th observation will be a new species Exchangeable case: closed form estimators for Gibbs type priors (reviewed in De Blasi et al., 2015) Partially exchangeable case: it is not possible to evaluate exactly inferences = simulation algorithm based on the peppf Illustration: ESTs analyses Useful tool for gene identification in organisms ESTs are generated by randomly sequencing genes from a cdna library, which consist of a large and unnown number of differentially expressed genes (typically millions) = potentially infinite. Only a sample corresponding to a small portion of the library is typically available 33 / 42
34 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Citrus clementina libraries Samples from two libraries of citrus clementina fruits: FRUIT1 ( FlavFr1 ): n 1 = 1593 ESTs with 1 = 806 distinct genes ( 1 /n 1.51) FRUIT2 ( RindPdig24 ): n 2 = 900 ESTs with 2 = 687 distinct genes ( 2 /n 2.76) The two samples share 183 genes and their frequency is 520 in the FRUIT 1 and 317 in the FRUIT 2 samples (about 1/3) Expression level FRUIT 1 FRUIT 2 FRUITS n K n / 42
35 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Models: P1 P 2 independent PY processes vs ( P 1, P 2 ) hierarchical PY process Discovery probability decay: Probability that the (n i + m i + 1) th observation is new as the size m i of additional sample varies (a) (separately) exchangeable (b) partially exchangeable Borrowing of information and narrower HPD bands 35 / 42
36 Dependent Random Measures with Hierarchical Structure Prediction in Genomics Other quantities of interest: # new distinct genes identified by additional sequencing: K X m n 1 & K Y m n 2 # additionally sequenced genes coinciding with new ones: L X m n 1 & L Y m n 2 P 1 P 2 PY processes ( P 1, P 2) HPY process FRUIT 1 FRUIT 2 FRUIT 1 FRUIT 2 m ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n 2 ˆK X m n 1 ˆL Y m n In the dependent case, finer prediction is possible. For instance, considering additional sequencing for FRUIT 1: for m = 600, ˆK X m n 1 = 224.5, which is the sum of the predicted 37.6 genes new to FRUIT 1 but already observed in FRUIT 2 and overall new genes. for m = 2000, it is predicted that 96.6 genes of the 504 originally observed only for FRUIT 2 will be detected also in the FRUIT 1 library. 36 / 42
37 Dependent Random Measures with Hierarchical Structure Hierarchical random mixture hazards Hierarchical hazard rate mixtures & dependent mixture hazards Hierarchical CRMs on Y Define ( µ 1, µ 2 ) µ 0 d = CRM(ρ1, µ 0 ) CRM(ρ 2, µ 0 ) µ 0 d = CRM(ρ0, c 0 P 0 ) Construct dependent hazard rate mixture models h l (t) = (t, y) µ l (dy) l = 1, 2. Y = the corresponding partially exchangeable sequences (X i,1 ) i 1 and (X i,2 ) i 1 imply survival times distributions of the form [ ] P X i,1 > t 1, X j,2 > t 2 ( µ 1, µ 2 ) = 2 l=1 { tl exp 0 } h l (s) 37 / 42
38 Dependent Random Measures with Hierarchical Structure Posterior characterization n Lielihood: with K l (y) = l i=1 2 { L (µ 1, µ 2 ; X) = exp l=1 Xi,l Augmented lielihood: 2 { L (µ 1, µ 2 ; X, Y ) = exp l=1 Y 0 (t, y) dt K l (y)µ l (dy) Y (X i,l ; y) µ l (dy) } nl i=1 } nl K l (y)µ l (dy) i=1 Y (X i,l ; Y i,l ) µ l (dy i,l ) and the latent variables Y i,l are generated by a discrete random probability Posterior characterization The posterior of ( µ 1, µ 2 ), given the data X and the latents Y, equals the distribution of the random measure vector ( 1 2 ) (µ 1, µ 2 ) + Jj,1 δ Y, J j,1 j,2 δ Y j,2 j=1 (µ 1, µ 2 ) is a hierarchical CRM with updated marginal Lévy intensities j=1 jumps Jj,l are independent and with nown density = The model can be modified so to include censored observations and also other covariates, i.e. a semiparametric Cox proportional hazards type of specification 38 / 42
39 Dependent Random Measures with Hierarchical Structure Illustration Illustration: Estimation of the survival functions S 1 & S 2 Simulation study: 2 samples of size n 1 = n 2 = 100 generated from Weibull distributions with different parameters chosen s.t. the survival functions cross and do not satisfy the assumption of proportional hazards = dependent hierarchical model is able to distinguish the two survival functions Figure: Estimated and true survival functions 39 / 42
40 Concluding remars Inference with non exchangeable data is analytically challenging CRM based dependent priors loo promising: conditionally on a suitable latent structure, they typically display distributional properties reminiscent to those available in the exchangeable case Technique for deriving moment measures (and sometimes even complete posteriors) is general and needs only some adaptation depending on the specific transformations of the CRMs In the exchangeable case (d = 1) a hierarchical random probability yields a new class of nonparametric priors: for all of them, but the two cases where coagulation is achieved, one has ] P [X n+1 = new X (n) = f (n, K n, (N 1,..., N Kn )) Hence, they are not of Gibbs type, though one can still identify the associated prediction rule. 40 / 42
41 Some References Some References Camerlenghi, Lijoi, Orbanz & Prünster (2015). Hierarchical processes and prediction. Submitted. Camerlenghi, Lijoi & Prünster (2015). Hierarchical structures in survival analysis. In preparation. Camerlenghi, Dunson, Lijoi, Prünster & Rodriguez (2015). Generalized nested nonparametric priors for modeling heterogeneous data. In preparation. De Blasi, Favaro, Lijoi, Mena, Prünster & Ruggiero (2015). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE TPAMI 37, Dystra & Laud (1981). A Bayesian nonparametric approach to reliability. Ann. Statist. 9, Ewens (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, Ferguson (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, de Finetti (1938). Sur la condition d équivalence partielle. Act.sci. industr. 739, Griffiths & Milne (1978). A class of bivariate Poisson processes. J. Mult. Anal. 8, James (2005). Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann. Statist. 33, Kingman (1967). Completely random measures. Pacific J. Math. 21, Lijoi, Nipoti & Prünster (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20, / 42
42 Some References Lijoi, Nipoti & Prünster (2014). Dependent mixture models: clustering and borrowing information. Comput. Statist. & Data Anal. 71, Lijoi & Prünster (2010). Models beyond the Dirichlet process. In Bayesian Nonparametrics, Cambridge University Press. MacEachern (1999). Dependent nonparametric processes. In ASA Proceedings. MacEachern (2000). Dependent Dirichlet processes. OSU Tech. Report. Müller, Quintana & Rosner (2004). A method for combining inference across related nonparametric Bayesian models. J. Roy. Statist. Soc. Ser. B 66, Papaspiliopoulos & Roberts (2008). Retrospective Marov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometria 95, Pitman (1995).Exchangeable and partially exchangeable random partitions. Prob. Th. and Rel. Fields 102, Pitman (1999). Coalescents with multiple collisions. Ann. Probab. 27, Pitman & Yor (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, Regazzini, Lijoi & Prünster (2003). Distributional results for means of random measures with independent increments. Ann. Statist. 31, Rodríguez, Dunson & Gelfand (2008). The nested Dirichlet process. J. Amer. Statist. Assoc. 103, Teh & Jordan (2010). Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics, Cambridge University Press. Teh, Jordan, Beal & Blei (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, Waler (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simul. Computat. 36, / 42
Dependent hierarchical processes for multi armed bandits
Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino
More informationOn the posterior structure of NRMI
On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline
More informationDependent mixture models: clustering and borrowing information
ISSN 2279-9362 Dependent mixture models: clustering and borrowing information Antonio Lijoi Bernardo Nipoti Igor Pruenster No. 32 June 213 www.carloalberto.org/research/working-papers 213 by Antonio Lijoi,
More informationA marginal sampler for σ-stable Poisson-Kingman mixture models
A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina
More informationCompound Random Measures
Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January
More informationOn some distributional properties of Gibbs-type priors
On some distributional properties of Gibbs-type priors Igor Prünster University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics Workshop ICERM, 21st September 2012 Joint work with: P. De Blasi,
More informationBayesian Nonparametrics: some contributions to construction and properties of prior distributions
Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,
More informationSlice sampling σ stable Poisson Kingman mixture models
ISSN 2279-9362 Slice sampling σ stable Poisson Kingman mixture models Stefano Favaro S.G. Walker No. 324 December 2013 www.carloalberto.org/research/working-papers 2013 by Stefano Favaro and S.G. Walker.
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi
More informationDiscussion of On simulation and properties of the stable law by L. Devroye and L. James
Stat Methods Appl (2014) 23:371 377 DOI 10.1007/s10260-014-0269-4 Discussion of On simulation and properties of the stable law by L. Devroye and L. James Antonio Lijoi Igor Prünster Accepted: 16 May 2014
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationPriors for Random Count Matrices with Random or Fixed Row Sums
Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences
More informationarxiv: v1 [math.st] 28 Feb 2015
Are Gibbs type priors the most natural generalization of the Dirichlet process? P. De Blasi 1, S. Favaro 1, A. Lijoi 2, R.H. Mena 3, I. Prünster 1 and M. Ruggiero 1 1 Università degli Studi di Torino and
More informationTwo Tales About Bayesian Nonparametric Modeling
Two Tales About Bayesian Nonparametric Modeling Pierpaolo De Blasi Stefano Favaro Antonio Lijoi Ramsés H. Mena Igor Prünster Abstract Gibbs type priors represent a natural generalization of the Dirichlet
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation Julyan Arbel Inria Grenoble, Université Grenoble Alpes julyan.arbel@inria.fr Igor Prünster Bocconi University, Milan
More informationOn the Truncation Error of a Superposed Gamma Process
On the Truncation Error of a Superposed Gamma Process Julyan Arbel and Igor Prünster Abstract Completely random measures (CRMs) form a key ingredient of a wealth of stochastic models, in particular in
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationDefining Predictive Probability Functions for Species Sampling Models
Defining Predictive Probability Functions for Species Sampling Models Jaeyong Lee Department of Statistics, Seoul National University leejyc@gmail.com Fernando A. Quintana Departamento de Estadísica, Pontificia
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationUnit-rate Poisson representations of completely random measures
Unit-rate Poisson representations of completely random measures Peter Orbanz and Sinead Williamson Abstract: Constructive definitions of discrete random measures, which specify a sampling procedure for
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationarxiv: v2 [math.st] 27 May 2014
Full Bayesian inference with hazard mixture models Julyan Arbel a, Antonio Lijoi b,a, Bernardo Nipoti c,a, arxiv:145.6628v2 [math.st] 27 May 214 a Collegio Carlo Alberto, via Real Collegio, 3, 124 Moncalieri,
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationHierarchical Species Sampling Models
Hierarchical Species Sampling Models Federico Bassetti a Roberto Casarin b Luca Rossini c a Polytechnic University of Milan, Italy b Ca Foscari University of Venice, Italy c Free University of Bozen-Bolzano,
More informationOn Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation
On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation Antonio LIJOI, Igor PRÜNSTER, andstepheng.walker The past decade has seen a remarkable development in the area of Bayesian
More informationOn Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process
On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT
More informationarxiv: v1 [stat.ml] 20 Nov 2012
A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College
More informationRandom Partition Distribution Indexed by Pairwise Information
Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore
More informationDistance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics
Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationNonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)
Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationAn ergodic theorem for partially exchangeable random partitions
Electron. Commun. Probab. 22 (2017), no. 64, 1 10. DOI: 10.1214/17-ECP95 ISSN: 1083-589X ELECTRONIC COMMUNICATIONS in PROBABILITY An ergodic theorem for partially exchangeable random partitions Jim Pitman
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationSlice Sampling Mixture Models
Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationFull Bayesian inference with hazard mixture models
ISSN 2279-9362 Full Bayesian inference with hazard mixture models Julyan Arbel Antonio Lijoi Bernardo Nipoti No. 381 December 214 www.carloalberto.org/research/working-papers 214 by Julyan Arbel, Antonio
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationBayesian Nonparametric Models for Ranking Data
Bayesian Nonparametric Models for Ranking Data François Caron 1, Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College
More informationA Bayesian hierarchical model for related densities using Polya trees
A Bayesian hierarchical model for related densities using Polya trees arxiv:1710.01702v2 [stat.me] 6 Oct 2017 Jonathan Christensen Duke University October 9, 2017 Abstract Li Ma Duke University Bayesian
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationREVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS
Applied Probability Trust (29 September 2014) REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS HARRY CRANE, Rutgers University PETER MCCULLAGH, University of Chicago Abstract We study k-divisible
More informationExchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process
Exchangeable random partitions and random discrete probability measures: a brief tour guided by the Dirichlet Process Notes for Oxford Statistics Grad Lecture Benjamin Bloem-Reddy benjamin.bloem-reddy@stats.ox.ac.uk
More informationNew Dirichlet Mean Identities
Hong Kong University of Science and Technology Isaac Newton Institute, August 10, 2007 Origins CIFARELLI, D. M. and REGAZZINI, E. (1979). Considerazioni generali sull impostazione bayesiana di problemi
More informationNonparametric Mixed Membership Models
5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationarxiv: v1 [stat.ml] 8 Jan 2012
A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process
More informationDepartment of Statistics. University of California. Berkeley, CA May 1998
Prediction rules for exchangeable sequences related to species sampling 1 by Ben Hansen and Jim Pitman Technical Report No. 520 Department of Statistics University of California 367 Evans Hall # 3860 Berkeley,
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationPoisson Latent Feature Calculus for Generalized Indian Buffet Processes
Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Lancelot F. James (paper from arxiv [math.st], Dec 14) Discussion by: Piyush Rai January 23, 2015 Lancelot F. James () Poisson Latent
More informationColouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles
Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December
More informationBayesian Nonparametric Autoregressive Models via Latent Variable Representation
Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Lifeng Ye (UCL, London,
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationA Bayesian semiparametric partially PH model for clustered time-to-event data
A Bayesian semiparametric partially PH model for clustered time-to-event data Bernardo Nipoti School of Computer Science and Statistics, Trinity College, Dublin, Ireland Alejandro Jara Department of Statistics,
More informationBayesian Nonparametric Mixture, Admixture, and Language Models
Bayesian Nonparametric Mixture, Admixture, and Language Models Yee Whye Teh University of Oxford Nov 2015 Overview Bayesian nonparametrics and random probability measures Mixture models and clustering
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationNormalized kernel-weighted random measures
Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationOn the Stick-Breaking Representation for Homogeneous NRMIs
Bayesian Analysis 216 11, Number3,pp.697 724 On the Stick-Breaking Representation for Homogeneous NRMIs S. Favaro,A.Lijoi,C.Nava,B.Nipoti,I.Prünster,andY.W.Teh Abstract. In this paper, we consider homogeneous
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationSupplementary material for. Aclassofhazardratemixtures. for combining survival data from di erent experiments
Supplementary material for Aclassofhazardratemixtures for combining survival data from di erent eriments Antonio Lijoi Department of conomics and Management University of Pavia, 271 Pavia, Italy email:
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationHierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors
Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors Antonio LIJOI, RamsésH.MENA, and Igor PRÜNSTER In recent years the Dirichlet process prior has experienced a great success in the context
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationBayesian nonparametric model for sparse dynamic networks
Bayesian nonparametric model for sparse dynamic networks Konstantina Palla, François Caron and Yee Whye Teh 23rd of February, 2018 University of Glasgow Konstantina Palla 1 / 38 BIG PICTURE Interested
More informationFlexible Bayesian Nonparametric Priors and Bayesian Computational Methods
Universidad Carlos III de Madrid Repositorio institucional e-archivo Tesis http://e-archivo.uc3m.es Tesis Doctorales 2016-02-29 Flexible Bayesian Nonparametric Priors and Bayesian Computational Methods
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationBayesian nonparametric models of sparse and exchangeable random graphs
Bayesian nonparametric models of sparse and exchangeable random graphs F. Caron & E. Fox Technical Report Discussion led by Esther Salazar Duke University May 16, 2014 (Reading group) May 16, 2014 1 /
More informationApplied Nonparametric Bayes
Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationOnline Learning of Nonparametric Mixture Models via Sequential Variational Approximation
Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Dahua Lin Toyota Technological Institute at Chicago dhlin@ttic.edu Abstract Reliance on computationally expensive
More informationGibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation
Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationOn the Fisher Bingham Distribution
On the Fisher Bingham Distribution BY A. Kume and S.G Walker Institute of Mathematics, Statistics and Actuarial Science, University of Kent Canterbury, CT2 7NF,UK A.Kume@kent.ac.uk and S.G.Walker@kent.ac.uk
More informationBayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes
Journal of Machine Learning Research 1 (2) 1-48 Submitted 4/; Published 1/ Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes Yee Whye Teh y.w.teh@stats.ox.ac.uk
More informationSome Developments of the Normalized Random Measures with Independent Increments
Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 46-487 c 2006, Indian Statistical Institute Some Developments of the Normalized Random Measures with Independent Increments Laura
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More information