Compound Random Measures

Size: px

Start display at page:

Download "Compound Random Measures"

Lucy Francis
5 years ago
Views:

1 Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent

2 Introduction: Two clinical studies 3 CALGB CALGB β 1 1 β β β

3 Infinite mixture models This data could be analysed using two infinite mixture models CALGB8881: f 1 (y) = j=1 w (1) j k(y θ j ) and CALGB916: f 2 (y) = j=1 w (2) j k(y θ j ) where w (k) j > for j = 1, 2,... and j=1 w (k) j = 1 for k = 1, 2. k(y θ) is a p.d.f. for y with parameter θ. We need to put a prior on the w (1), w (2) and θ (random probability measure).

4 Some dependent random probability measures: stick-breaking θ are i.i.d. and w (k) j = V (k) j ( i<j 1 V (k) i ) Hierarchical Dirichlet ( Process (Teh et al, 26): Be (α β j, α 1 )) j l=1 β l, V (k) j β j Be(1, γ), j 1 β j = β j (1 β l ), ( ) Probit stick-breaking processes, etc.: V (1) j, V (2) j are ( ) correlated and independent of V (1) i, V (2) i for i j. l=1

5 Completely random measures µ is a completely random measure (CRM) on Θ if, for any disjoint subsets A 1,..., A n, µ(a 1 )..., µ(a n ) are mutually independent. We concentrate on completely random measures (CRM s) which can be represented in terms of jump sizes J i and jump locations θ i as µ = J i δ θi where δ is Dirac s delta function and have Lévy-Khintchine representation E [e ] f (θ) µ(dθ) = e [1 e sf (θ) ]α(dθ)ρ(ds) i=1 where α and ρ are measures for which α(dθ) <.

6 J Completely random measures Poisson process with intensity α(dθ)ρ(ds)

7 Examples of CRM s Many processes that we use in Bayesian nonparametrics are CRM s Gamma process - ρ(ds) = s 1 exp{ s} ds. Beta process - ρ(ds) = βs 1 (1 s) β 1 ds. or can be derived from CRM s Normalizing a Gamma process, i.e. taking p = µ/ µ(θ), leads to a Dirichlet process. A beta process prior for p 1, p 2,... can be used to define an Indian buffet process.

8 Vectors of CRMs It is useful to define d related CRM s. Suppose that µ 1,..., µ d are CRM s on Θ with marginal Lévy intensities ν j (ds, dθ) = ν j (ds)α(dθ) Then µ 1,..., µ d are a vector of CRM s if there is a Lévy-Khintchine representation of the form [ ] E e µ(f 1) µ d (f d ) = e ψ ρ,d (f 1,...,f d ) where ψρ,d(f 1,..., f d ) = and (R + ) d [ 1 e s 1f 1 (θ) s d f d (θ) ] α(dθ)ρ d (ds 1,... ds d ) ν j (ds) = ρ d (ds 1,..., ds d ).

9 Compound Random Measures: Definition A compound random measure (CoRM) is a vector of CRM s with intensity ( ρ d (ds 1,..., ds d ) = z d s1 h z,..., s ) d ds 1... ds d ν (dz) z where s 1,..., s d are called scores. H is a score distribution with density h. ν is the Lévy intensity of a directing Lévy process. which satisfies the condition min(1, s )z d h ( s 1 z,..., s ) d z ν (dz) < where s is the Euclidean norm of the vector s = (s 1,..., s d ).

10 A representation of a CoRM Realizations of a CoRM can be expressed as µ j = m j,i J i δ θi i=1 where m 1,i,..., m d,i i.i.d. H η = i=1 J i δ θi is a CRM with Lévy intensity ν (ds) α(dθ).

11 CoRMs with independent gamma scores We will concentrate on the class of CoRMs for which h(s 1 /z,..., s d /z) = d f (s j /z) where f is the p.d.f. of a gamma distribution with shape φ, f (x) = 1 Γ(φ) x φ 1 exp{ x}. j=1

12 Properties of CoRMs with independent score distributions The Lévy copula can be expressed as a univariate integral. Let Mz(t) f = e ts z 1 f (s/z)ds be the moment generating function of z 1 f (s/z) then [1 ] ψ ρ,d (λ 1,..., λ d ) = e s 1 λ 1 s d λ d ρd (ds 1,... ds d ) (R + ) d = ψ ρ,d (λ 1,..., λ d ) = 1 d Mz( λ f j ) ν (z)dz This expression can be used to calculate quantities such as Corr( µ k (A), µ m (A)). j=1

13 CoRMs with gamma distributed scores Consider a CoRM process with independent Ga(φ, 1) distributed scores. If the CoRM process has gamma process marginals then ρ d (s 1,..., s d ) = ( d j=1 s j) φ 1 [Γ(φ)] d 1 s dφ+1 2 e s 2 W (d 2)φ+1 ( s ), dφ 2 2 (1) where s = s s d and W is the Whittaker function. If the CoRM process has σ-stable process marginals then ρ d (s 1,..., s d ) = ( d j=1 s j) φ 1 σγ(σ + dφ) [Γ(φ)] d 1 Γ(σ + φ)γ(1 σ) s σ dφ. (2)

14 CoRMs with exponentially distributed scores Consider a CoRM process with independent exponentially distributed scores. If the CoRM has gamma process marginals we recover the multivariate Lévy intensity of Leisen et al (213), ρ d (s 1,..., s d ) = d 1 j= (d 1)! (d 1 j)! s j 1 e s. Otherwise, if σ-stable marginals are considered then we recover the multivariate vector introduced in Leisen and Lijoi (211) and Zhu and Leisen (214), ρ d (s 1,..., s d ) = (σ) d Γ(1 σ) s σ d.

15 CoRMs with independent gamma scores: specific marginals The Lévy intensity of µ j ν j (ds) = z 1 f (s/z)ds ν (dz) = ν(ds). If we have independent gamma scores, the directing Lévy intensity ν is linked to the marginal Lévy intensity by ( ) ( ) 1 Γ(φ) ν = t 2 φ L 1 ν(s) (t) t sφ 1 where L 1 is the inverse Laplace transform.

16 CoRMs with independent gamma scores: specific marginals The intensity of the directing Lévy process is ν (z) = z 1 (1 z) φ 1, < z < 1 leads to a marginal gamma process for which ν(s) = s 1 exp{ s}, s > Remarks ν is the the Lévy intensity of a beta process. If ν is the Lévy intensity of a Stable-Beta process (Teh and Görür, 29), the marginal process is a generalized gamma process.

17 NCoRM: Gamma marginal, φ = 1 1 DLP Dim. 1 Dim. 2 Dim

18 NCoRM: Gamma marginal, φ = 1.1 DLP Dim. 1 Dim. 2 Dim

19 NCoRM: Gamma marginal, φ = 5 8 x 1 3 DLP Dim. 1 Dim. 2 Dim

20 Marginal beta process A CoRM with beta process marginals (ν(s) = βs 1 (1 s) β 1 ) can be constructed using A beta score distribution with parameters α and 1 A directing Lévy intensity ν (z) = βz 1 (1 z) β 1 + β(β 1) (1 z) β 2 α i.e. a superposition of a beta process and a compound Poisson process with beta jump distribution.

21 Links to other processes Other processes can be expressed as CoRM s: Superpositions/Thinning: e.g. Griffin et al (213), Chen et al (214), Lijoi and Nipoti (214), Lijoi et al (214a, b) using mixture score distributions h(s) = πδ s= + (1 π)h (s). Lévy copulae: e.g. Leisen and Lijoi (211), Leisen et al (213), Zhu and Leisen (214).

22 Normalized Compound Random Measures (NCoRM) A vector of random probability measures can be defined by normalizing each dimension of the CoRM so that p k = µ k µ k (Θ) = w (k) j δ θj. j=1

23 CoRMs on more general spaces For a more general space X, we define µ( ; x) to be a completely random measure for x X. The collection { µ( ; x) x X} can be given a CoRM prior with µ( ; x) = m j (x) J j δ θj j=1 where m k (x) is a realisation of a random process on X. Example X = R p, m k (x) = exp{r k (x)} where r k (x) is given a zero-mean Gaussian process prior (see Ranganath and Blei, 215).

24 Inference with NCoRM s We assume that the data are (x 1, y 1 ),..., (x n, y n ) and are modelled as y i ζ i ind. k(y i ζ i ), ζ i p( ; x i ) = µ( ; x i) µ(θ; x i ), i = 1, 2,..., n where k(y θ) is a probability density function for y with parameter θ and {p( ; x) x X} is given an NCoRM prior.

25 MCMC inference for infinite mixture models Introducing allocation variables c 1,..., c n, the posterior is proportional [ n ] J ci m ci (x i ) p(y, c m, J, θ) = k (y i θ ci ) l=1 J. l m l (x i ) i=1 This form is not tractable due to the infinite sum in the denominator of each term. This can be addressed using the identity { 1 } l=1 J l m l (x i ) = exp v i J l m l (x i ) dv i l=1

26 MCMC inference for infinite mixture models Introducing latent variables v i leads to a suitable form of augmented posterior for MCMC p(y, c, v m, J, θ) [ { n = k (y i θ ci ) J ci m ci (x i ) exp v i = i=1 K j=1 {i c i =j} k ( y i θ j ) J a j j {i c i =j} l=1 J l m l (x i ) m j (x i ) exp { }] l=1 J l } n v i m l (x i ) i=1 where there are K distinct values of c i and a j = n i=1 I(c i = j).

27 MCMC inference for infinite mixture models: Finite X, independent scores In this case, we can define a marginal sampler (e.g. Favaro and Teh, 213) by integrating over J and m. J a ν (J) dj is typical for marginal samplers of normalized random measure mixtures. Integrals of {i c i =j} m j(x i ) will be a product of moments of the scored distribution. E[exp { l=1 J l n i=1 v i m l (x i ) } ] can be evaluated either exactly or as a univariate integral.

28 MCMC inference for infinite mixture models: General X Pseudo-marginal methods (Andrieu and Roberts, 29) are useful for a target density of the form π(θ) f (θ) g(θ) where g(θ) cannot be directly evaluated. Samples from the target density ˆπ(θ) f (θ) ĝ(θ) where E[ĝ(θ)] = g(θ) will have the distribution π. In our target, the problem is evaluating E[exp { l=1 J l n i=1 v i m l (x i ) } ] = exp{ ψ(v)}

29 Unbiased estimation of the Laplace transform The Poisson estimator (see Papaspiliopoulos, 211) of L φ = exp { D φ(x) dx} is ˆL φ = K i=1 ( 1 φ(x ) i) a C κ(x i ) where κ is a p.d.f. on D, C > φ(x) κ(x) for x D, a > 1, i.i.d. K Pn(a C) and x i κ. Then, { } E[ˆL φ ] = exp φ(x) dx D and V[ˆL φ ] = L 2 φ ( { 1 φ(x) 2 } ) exp a C D κ(x) dx 1 <.

30 Unbiased estimation of exp{ ψ(v)} Assuming that x 1, x 2,..., x n are distinct, mi = m(x i ) and m = (m1,..., m n), exp{ ψ ρ,d (v)} can be re-expressed as { ( { }) } n n exp 1 exp z v i mi h(m ) ν (z) dz dm = (R + ) n where { L k = exp (R + ) n i=1 v k m k h(m ) exp { t k=1 L k } } n v i mi T ν (t) dt dm i=1 and T ν (t) = t ν (z) dz (tail mass function).

31 Unbiased estimation of the Laplace transform L k can be estimated using the Poisson estimator with x = (z, m k ), D = (, ) (R+ ) n and φ(z, m k ) = v k m k h(m k) exp { t } n v i mk T ν (t) <. i=1 A suitable approximating density is κ(z, m k ) = κ ν(z) m k h(m k ) E[m k ] where κ ν (z) > T ν (z) for all z R +.

32 A sampler for more general processes A pseudo-marginal sampler is used with exp{ ψ ρ,d (v)} estimated by the Poisson estimator. The jumps are not integrated out and values for empty clusters are proposed from h(m, J) h(m 1 /z,..., m K /z)z exp{ vz}ν (z). An interweaving scheme for m and z (Yu and Meng, 211).

33 Two clinical studies 3 CALGB CALGB β 1 1 β β β

34 Two clinical studies: Posterior mean densities Results using a CoRM with independent gamma scores. CALGB8881 CALGB β 1 1 β β β

35 Example: Nonparametric regression We consider the classic motorcycle data which records head acceleration at different times after impact. where w j = f (y) = exp{r k (x)}j k m=1 exp{rm(x)}jm w j (x)n(y µ j, σj 2 ) j=1 r m (x) are given independent Gaussian process prior with squared exponential covariance function. J 1, J 2,... follow a Gamma process with Lévy intensity M x 1 exp{ x}.

37 12 M 12? 4 L

38 Example: Nonparametric variable selection The classic Boston housing data record the median value of owner-occupied homes in 56 areas of Boston and the values of 14 attributes that are thought to effect house prices. The covariance function k(x, x ) = exp{ p i=1 w j(x j x j )2 } and p(w j ) (1 + w j ) 1.

39 CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Posterior median and 95% credible intervals for w j

40 Summary CoRM processes are a unifying framework for a wide-range of proposed vectors of CRMs. CoRM process are vectors of CRM s which are constructed in terms of a (univariate) CRM and a distribution (which defines the dependence). Several MCMC methods for NCoRM mixture models are developed. These include methods which depend on the availability of analytical forms for some integrals with respect to the score distribution and methods which do not. Modelling dependence through distributions allows a wide-range of dependent nonparametric models to be developed (e.g. regression, time series, etc.).

41 References C. Andrieu and G. O. Roberts (29). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist., 37, S. Favaro and Y. W. Teh (213). MCMC for Normalized Random Measure Mixture Models. Statistical Science, 28, J. E. Griffin, M. Kolossiatis and M. F. J. Steel (213). Comparing Distributions By Using Dependent Normalized Random-Measure Mixtures. Journal of the Royal Statistical Society, Series B, 75, F. Leisen and A. Lijoi (211). Vectors of Poisson-Dirichlet processes. J. Multivariate Anal., 12, F. Leisen, A. Lijoi and D. Spano (213). A Vector of Dirichlet processes. Electronic Journal of Statistics 7, A. Lijoi, and B. Nipoti (214), A class of hazard rate mixtures for combining survival data from different experiments. Journal of the American Statistical Association, 19, A. Lijoi, B. Nipoti and I. Prünster (214a), Bayesian inference with dependent normalized completely random measures, Bernoulli, 2, A. Lijoi, B. Nipoti and I. Prünster (214b), Dependent mixture models: clustering and borrowing information, Computational Statistics and Data Analysis, 71,

42 References O. Papaspiliopoulos (211). A methodological framework for Monte Carlo probabilistic inference for diffusion processes. In Bayesian Time Series Models (D. Barber, A. Taylan Cemgil and S. Chippia, Eds), Cambridge University Press. R. Ranganath and D. M. Blei (215). Correlated Random Measures. arxiv: Y. W. Teh and D. Görür (29). Indian Buffet Proceses with Power-law Behavior. In Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams and A. Culotta, Eds.), Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei (26). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 11, Y. Yu and X.-L. Meng (211). To Center or Not to Center: That is Not the Question An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency. Journal of Computational and Graphical Statistics, 2, W. Zhu and F. Leisen (214). A multivariate extension of a vector of Poisson-Dirichlet processes. To appear in the Journal of Nonparametric Statistics.

Modelling and computation using NCoRM mixtures for. density regression

Modelling and computation using NCoRM mixtures for density regression arxiv:168.874v3 [stat.me] 31 Aug 217 Jim E. Griffin and Fabrizio Leisen University of Kent Abstract Normalized compound random measures