Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and Igor Prünster (University of Turin)
Outline Life-testing model with mixture hazard rate Mixture hazard rate Kernels and completely random measures Asymptotic issues Posterior consistency Weak consistency Sufficient condition for Kullback-Leibler support Prior probability of uniform balls of the true hazard rate CLTs for functionals of the posterior hazard rate Posterior distribution Limit theorems for shifted measures Applications
Mixture hazard rate Let Y be a positive absolutely continuous random variable (lifetime) with random hazard rate of the form h(t) = k(t, x) µ(dx), X k is a kernel, jointly measurable application R + X R +, µ is a random mixing measure on the space X, modeled as a completely random measure. The cumulative hazard is given by H(t) = t h(s)ds and is required to 0 satisfy H(t) for t a.s. Then f (t) = h(t) exp( H(t)) defines a random density function to be used in Bayesian nonparametric inference. See Dykstra and Laud (1981) and Lo and Weng (1989).
Kernels As examples,we consider the class of mixture hazard rates that arise under the following choice of kernels: Dykstra-Laud (DL) kernel (monotone increasing hazard rates) k(t, x) = I (0 x t) rectangular (rect) kernel with bandwidth τ > 0 k(t, x) = I ( t x τ) Ornstein-Uhlenbeck (OU) kernel with κ > 0 k(t, x) = 2κ e κ(t x) I (0 x t) exponential (exp) kernel (monotone decreasing hazard rates) k(t, x) = x 1 e t x.
Completely random measures Let M be the space of boundedly finite measures on X. Definition. A random element µ taking values in M such that (i) µ( ) = 0 (ii) for any disjoint sets (A i ) i 1 : µ(a 1 ), µ(a 2 ),... are mutually independent µ( j 1 A j ) = j 1 µ(a j) is said to be a completely random measure (CRM) on X. Remark. A CRM can always be represented as a linear functional of a Poisson random measure. In particular, µ is uniquely characterized by its Laplace functional [ E e R X g(x) µ(dx)] R = e R + X [1 e v g(x) ]ν(dv,dx) where ν stands for the intensity of the Poisson random measure underlying µ.
CRMs select almost surely discrete measures: hence, µ can always be represented as i 1 J iδ Xi. Note that if ν(dv, dx) = ρ(dv)λ(dx) the law of the J i s and the X i s are independent. In this case µ is termed homogeneous. A non-homogeneous CRM has ν(dv, dx) = ρ(dv x)λ(dx). Non-homogeneous CRMs arise in the posterior. If ν(r +, dx) = for any x, then the CRM jumps infinitely often on any bounded set A. However, recall that µ(a) < a.s. We consider CRMs with Poisson intensity of the following form: ν(dv, dx) = 1 Γ(1 σ) e γ(x)v dv λ(dx), σ [0, 1), γ( ) > 0. v 1+σ γ constant generalized gamma measure (Brix, 1999) σ = 0 weighted gamma measure (Lo & Weng, 1989).
Asymptotic issues Posterior consistency. First generate independent data from a true fixed density f 0, then check whether the sequence of posterior distributions of f accumulates in some suitable neighborhood of f 0. Which conditions on f 0 and µ are needed under the four kernels considered in order to achieve consistency? CLTs for functionals of the posterior hazard. For a fixed T > 0, we consider random objects like (i) H(T ), (ii) T 1 T 0 h(t) 2 dt (path-second moment) and (iii) T 1 T 0 [ h(t) H(T )/T ] 2 dt (path-variance). Let Y = (Y 1,..., Y n ) be a set of observations. We are interested in establishing Central Limit Theorems of the type η(t ) [ H(T ) τ(t )] N(0, σ 2 ) as T + Y law for appropriate positive functions τ(t ) and η(t ) and variance σ 2. How are τ( ), η( ) and σ 2 influenced by the observed data?
Weak consistency Denote by P 0 the probability distribution associated with f 0 and by P 0 the infinite product measure. Also, let Π be the prior distribution of the random density function f and Π n the posterior distribution. Definition. If, for any ɛ > 0 and as n, Π n (A ɛ (f 0 )) 1 a.s. P 0, where A ɛ (f 0 ) is a ɛ neighborhood of f 0 in the weak topology, then Π is said to be weak consistent at f 0. Remark. A sufficient condition for weak consistency requires a prior Π to assign positive probability to Kullback Leibler (K-L) neighborhoods of f 0 ( K-L condition): Π ( f F : log(f 0 /f )f 0 < ɛ ) > 0 for any ɛ > 0, where log(f 0 /f )f 0 denotes the K-L divergence between f 0 and f, whereas F is the space of density functions defined on R +.
General consistency result Assume X = R + and postulate the existence of a boundedly finite measure µ 0 on R + such that the true density f 0 has hazard h 0 (t) = R + k(t, x)µ 0 (dx) where k is the kernel used for constructing f via the mixture hazard h. We translate the K-L condition w.r.t. f 0 into a condition of positive prior probability assigned to uniform neighborhoods of h 0 on [0, T ]. THEOREM Weak consistency Assume (i) h 0 (t) > 0 for any t 0, (ii) R + g(t) f 0 (t)dt <, where g(t) = max{e[ H(t)], t}. Then, a sufficient condition for Π to be weakly consistent at f 0 is that Π {h : sup 0<t T h(t) h 0 (t) } < δ > 0 for any finite T and positive δ. Note that condition (ii) is related to the asymptotic characterization of functionals of the random hazard rate (see later).
Relaxing the condition h 0 (0) > 0 For (DL) and (OU) kernels one may actually have h 0 (0) = 0. Problems then arise in establish the K-L condition since log(h 0 / h) (hence log(f 0 / f )) can become arbitrarily large around 0. In order to cover also the case h 0 (0) = 0, we resort to the short time behaviour of µ, which determines the way h vanishes in 0 for the (DL) and (OU) cases. PROPOSITION 1 Short time behaviour Weak consistency holds for (DL) and (OU) kernels also with h 0 (0) = 0 provided there exist α, r > 0 such that: (a) lim t 0 h 0 (t)/t α = 0; (b) lim inf t 0 µ((0, t])/t r = a.s. In particular (b) holds if µ is a CRM belonging to the generalized gamma family with σ (0, 1) and λ(dx) = dx. Note that the powers α and r do not need to satisfy any relation.
Uniform balls of the true hazard rate In order to derive explicit consistency results for the mixture hazard models based on the four kernels considered, we need to establish: { Π h : sup 0<t T h(t) h 0 (t) } < δ > 0. To this aim, the following result is essential. Denote by G the space of nondecreasing càdlàg functions on R + such that G(0) = 0 for any G G. Note that any CRM induces a distribution on G and that, for G 0 (x) = µ 0 ( [0, x] ), G0 G. LEMMA 1 Support of CRMs Let µ be a CRM on R +, satisfying ν(r +, dx) = for any x X, and denote by Q the distribution induced on G. Then, for any G 0 G, any finite M and η > 0, { } Q G G : sup x M G(x) G 0 (x) < η > 0. Different techniques are required for the (exp) kernel w.r.t. the other three kernels (different support of k(t, x) seen as a function of x).
PROPOSITION 2 Weak consistency for (DL), (rect) and (OU) Let µ be a generalized gamma CRM with σ (0, 1) and λ(dx) = dx. Then weak consistency holds for the (DL), (rect) and (OU) kernels provided that: (i) h 0 (t) > 0 for any t > 0; (ii) t i f 0 (t)dt < with i = 1 for (rect) and (OU), i = 2 for (DL). Note that Condition (i) is automatically satisfied if µ 0 is absolutely continuous. PROPOSITION 3 Weak consistency for (exp) Weak consistency holds for the (exp) kernel provided that: (i) h(0) < a.s. and h 0 (0) < ; (ii) t f 0 (t)dt <. The only relevant assumption consists in asking for both h and h 0 not to explode in 0. Note that h 0 (t) > 0 is automatically satisfied.
CLTs for linear and quadratic functionals CLTs provide a synthetic picture of the random hazard, like information about the trend, oscillation and overall variance for large time horizons. In Peccati and Pruenster (2006) functionals of the prior hazard rate are considered. Such results can serve as a guide for prior specification. Here we investigate the asymptotic behaviour of functionals of the posterior random hazard rate given a fixed number of observations. We found out that, in all the considered special cases, the CLTs associated with the posterior hazard rate are the same as for the prior ones, and this for any number of observations. Although consistency implies that a given model can be asymptotically directed towards any deterministic target, the overall structure of a posterior hazard rate is systematically determined by the prior choice.
Posterior distribution We exploit an explicit posterior representation of the random mixture hazard model of James (2005). Let P be the random probability measure associated with f (t) and let (Y n ) n 1 be a sequence of exchangeable observations from P. Set Y = (Y 1,..., Y n ). Define a random probability measure on X R + as P(dx, dt) = e R t R 0 X k(s,y) µ(dy) ds k(t, x) µ(dx) dt. By introducing the latent variables X = (X 1,..., X n ), we describe µ Y in terms of µ Y, X mixed over X Y. Since the X i s may feature ties, we denote by X = (X1,..., X k ) the k n distinct latent variables. Let n j be the frequency of Xj and C j = {r : x r = xj }. The likelihood function is then given by L (µ; y, x) = e R P n R yi X i=1 0 k(t,x)dtµ(dx) k j=1 µ(dx j ) n j i C j k(y i, xj )
Given X and Y, the conditional distribution of µ Y, X coincides with the distribution of the random measure µ n d = µ n + k i=1 J i δ X j = µ n + n µ n is a CRM with intensity measure ν n (dv, dx) := e v P R n yi i=1 0 k(t,x)dt ρ(dv x) λ(dx) n := k i=1 J i δ X j with, for i = 1,..., k, X i a fixed points of discontinuity with corresponding jump J i which are, conditionally on X and Y, independent of µ n. The conditional distribution X Y, say P X Y, can be expressed as the conditional distributions of X p, Y and p Y, where p is a partitions of the integers {1,..., n}. We derive CLTs for the posterior random hazard given X and Y : h n(t) = X k(t, x) µn (dx) + k i=1 J ik(t, X i ), and, then, average over the distribution of X Y.
Limit theorems for shifted measures Peccati and Pruenster (2006) provide sufficient conditions to have that linear and quadratic functionals associated with a random mixture hazard without fixed atoms verify a CLT. Linear functional. For given X and Y, define H n(t ) = T 0 h n(t)dt = H n (T ) + k i=1 J i T 0 k(t, X i )dt Assume there exists a (deterministic) function T C n (k, T ) and σn(k) 2 > 0 such that, as T +, Cn(k, 2 T ) [ s ] T 2 R + X 0 k(t, x)dt ν n (ds, dx) σn(k), 2 Cn(k, 3 T ) [ s ] T 3 R + X 0 k(t, x)dt ν n (ds, dx) 0, Then C n (k, T ) [ Hn (T ) E[ H ] Y n law (T )] N(0, σn(k)), 2 where σ 2 n(k) may depend on Y.
Assume moreover that lim C n(k, T ) k j=1 J j T + T 0 k(t, X j )dt = m n ( n, k), (1) for some quantity m n ( n, k) 0 that may depend on X. Then C n (k, T ) [ H n(t ) E[ H ] n Y law (T )], X N ( m n ( n, k), σn(k) 2 ). If (1) holds for X = (x 1,..., x n ) X n in a set of P X Y probability 1, then we can state the result as η(t ) [ ] H(T law ) τ(t ) Y Z X,Y d P X Y, X where n η(t ) = C n (k, T ); τ(t ) = E[ H n (T ) Y ]; Z X,Y is a Gaussian random variable with mean m n ( n, k) and variance σ n (k). The mixture in X n Z X,Y d P X Y is over the quantity m n ( n, k) that may change according with the realization of X.
Applications We derive CLTs for functionals of posterior hazards based on the four kernels combined with generalized gamma CRMs. In general m n ( n, k) = 0, not depending on Y, X, that is the part of the posterior with fixed points of discontinuity affects the asymptotic behaviour just in the limit. Then, the convergence is simply expressed in terms of a Gaussian random variable. LINEAR CLT 1 (OU) kernel [ ] T 1 2 H(T ) 2 κ γ (1 σ) law T Y N(0, 2γ (1 σ) ) LINEAR CLT 2 (DL) kernel [ T 3 2 H(T ) (2γ (1 σ) ) 1 T 2] ( ) law Y 1 σ N 0, 3γ 2 σ Most important: C n (k, T ) = C 0 (k, T ), E[ H(T )] E[ H(T ) Y ] and σ n (k) = σ 0 (k), that is we get the same trend, oscillation around the trend and overall variance we get with the prior cumulative hazard.
One might think that the lack of dependence on the data of posterior functionals may be due to the boundedness of the support of the (DL), (OU) and (rect) kernels, as it happens that ν n = ν in R + [Y (n), ), where Y (n) is the largest observed lifetime. Consider then the (exp) kernel. In particular, we set λ(dx)=x 1 2 e 1 x (2 π) 1 in the Poisson intensity ν(dv, dx) of generalized gamma CRM, which gives us h(0) < a.s. and a prior mean centered on a quasi Weibull hazard, E[ h(t)] = γ 1+σ / t + 1. LINEAR CLT 3 (exp) kernel [ T 1 2 H(T ) (γ 1+σ ) 1 T 1/2] ( ) law Y N 0, (2 2)(1 σ) γ 2 σ Again, the oscillation rate, the trend and the overall variance coincides with the prior ones, that is they do not depend on the observation Y.
Concluding remarks We provide a comprehensive investigation of posterior weak consistency of the mixture hazard model. L 1 consistency would be the natural successive target. The investigation of CLTs for functionals represents a completely new line of research and has interest in its own. The coincidence of asymptotic behaviour of the posterior and the prior hazard means that the data do not influence the behaviour of the model for times larger than the largest observed lifetime Y (n) ; and, this, despite the fact that these models are consistent. The fact that the overall variance is not influenced by the data is somehow counterintuitive: since the contribution of the CRM vanishes in the limit, one would expect the variance to become smaller and smaller as more data come in. Since this does not happen, our findings show a clear evidence that the choice of the CRM really matters whatever the size of the dataset and provide a guideline in incorporating prior knowledge appropriately into the model.
Appendix For Further Reading Brix, A. (1999) Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Prob. 31, 929 953. Dykstra, R.L. and Laud, P. (1981) A Bayesian nonparametric approach to reliability. Ann. Statist. 9, 356 367. James, L.F. (2005) Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann. Statist. 33, 1771 1799. Lo, A.Y. and Weng, C.S. (1989) On a class of Bayesian nonparametric estimates. II. Hazard rate estimates. Ann. Inst. Statist. Math. 41, 227 245. Peccati, G. and Prünster, I. (2006) Linear and quadratic functionals of random hazard rates: an asymptotic analysis. Preprint.