Asymptotics for posterior hazards

Similar documents
Asymptotics for posterior hazards

On the posterior structure of NRMI

Truncation error of a superposed gamma process in a decreasing order representation

Bayesian semiparametric analysis of short- and long- term hazard ratios with covariates

Truncation error of a superposed gamma process in a decreasing order representation

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

On the Truncation Error of a Superposed Gamma Process

Dependent hierarchical processes for multi armed bandits

Poisson random measure: motivation

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Bayesian estimation of the discrepancy with misspecified parametric models

Covariance function estimation in Gaussian process regression

arxiv: v2 [math.st] 27 May 2014

Normalized kernel-weighted random measures

STAT Sample Problem: General Asymptotic Results

Bayesian nonparametric models for bipartite graphs

Math 341: Probability Seventeenth Lecture (11/10/09)

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Step-Stress Models and Associated Inference

Full Bayesian inference with hazard mixture models

Multiple Random Variables

Introduction to self-similar growth-fragmentations

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

NEW FUNCTIONAL INEQUALITIES

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On pathwise stochastic integration

Regular Variation and Extreme Events for Stochastic Processes

Lecture Characterization of Infinitely Divisible Distributions

Bayesian Modeling of Conditional Distributions

Convergence of Feller Processes

Densities for the Navier Stokes equations with noise

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Asymptotic properties of maximum likelihood estimator for the growth rate for a jump-type CIR process

Statistics: Learning models from data

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Two viewpoints on measure valued processes

Statistical inference on Lévy processes

Appendix A. Sequences and series. A.1 Sequences. Definition A.1 A sequence is a function N R.

Probability and Measure

Gaussian processes for inference in stochastic differential equations

S1. Proofs of the Key Properties of CIATA-Ph

New Dirichlet Mean Identities

Foundations of Nonparametric Bayesian Methods

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Translation Invariant Experiments with Independent Increments

The information complexity of sequential resource allocation

Consistency of the maximum likelihood estimator for general hidden Markov models

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

GARCH processes continuous counterparts (Part 2)

Order book resilience, price manipulation, and the positive portfolio problem

1. Stochastic Processes and filtrations

MATH 5640: Fourier Series

Math 430/530: Advanced Calculus I Chapter 1: Sequences

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Random Process Lecture 1. Fundamentals of Probability

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

Dependent mixture models: clustering and borrowing information

Frontier estimation based on extreme risk measures

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Nonparametric regression with martingale increment errors

Evgeny Spodarev WIAS, Berlin. Limit theorems for excursion sets of stationary random fields

Discussion of On simulation and properties of the stable law by L. Devroye and L. James

The strictly 1/2-stable example

Harnack Inequalities and Applications for Stochastic Equations

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

Slice sampling σ stable Poisson Kingman mixture models

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

The parabolic Anderson model on Z d with time-dependent potential: Frank s works

ON ADDITIVE TIME-CHANGES OF FELLER PROCESSES. 1. Introduction

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Cumulants of a convolution and applications to monotone probability theory

Temporal point processes: the conditional intensity function

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

University of Toronto Department of Statistics

On some distributional properties of Gibbs-type priors

Statistics (1): Estimation

On oscillating systems of interacting neurons

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Bayesian Nonparametrics

P-adic Functions - Part 1

Information geometry for bivariate distribution control

Nonparametric estimation of log-concave densities

The information complexity of best-arm identification

Understanding Regressions with Observations Collected at High Frequency over Long Span

STOCHASTIC GEOMETRY BIOIMAGING

LAN property for ergodic jump-diffusion processes with discrete observations

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Large Deviations for Small-Noise Stochastic Differential Equations

The small ball property in Banach spaces (quantitative results)

Malliavin calculus and central limit theorems

Large Deviations for Small-Noise Stochastic Differential Equations

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x

Maximum likelihood estimation of a log-concave density based on censored data

Transcription:

Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and Igor Prünster (University of Turin)

Outline Life-testing model with mixture hazard rate Mixture hazard rate Kernels and completely random measures Asymptotic issues Posterior consistency Weak consistency Sufficient condition for Kullback-Leibler support Prior probability of uniform balls of the true hazard rate CLTs for functionals of the posterior hazard rate Posterior distribution Limit theorems for shifted measures Applications

Mixture hazard rate Let Y be a positive absolutely continuous random variable (lifetime) with random hazard rate of the form h(t) = k(t, x) µ(dx), X k is a kernel, jointly measurable application R + X R +, µ is a random mixing measure on the space X, modeled as a completely random measure. The cumulative hazard is given by H(t) = t h(s)ds and is required to 0 satisfy H(t) for t a.s. Then f (t) = h(t) exp( H(t)) defines a random density function to be used in Bayesian nonparametric inference. See Dykstra and Laud (1981) and Lo and Weng (1989).

Kernels As examples,we consider the class of mixture hazard rates that arise under the following choice of kernels: Dykstra-Laud (DL) kernel (monotone increasing hazard rates) k(t, x) = I (0 x t) rectangular (rect) kernel with bandwidth τ > 0 k(t, x) = I ( t x τ) Ornstein-Uhlenbeck (OU) kernel with κ > 0 k(t, x) = 2κ e κ(t x) I (0 x t) exponential (exp) kernel (monotone decreasing hazard rates) k(t, x) = x 1 e t x.

Completely random measures Let M be the space of boundedly finite measures on X. Definition. A random element µ taking values in M such that (i) µ( ) = 0 (ii) for any disjoint sets (A i ) i 1 : µ(a 1 ), µ(a 2 ),... are mutually independent µ( j 1 A j ) = j 1 µ(a j) is said to be a completely random measure (CRM) on X. Remark. A CRM can always be represented as a linear functional of a Poisson random measure. In particular, µ is uniquely characterized by its Laplace functional [ E e R X g(x) µ(dx)] R = e R + X [1 e v g(x) ]ν(dv,dx) where ν stands for the intensity of the Poisson random measure underlying µ.

CRMs select almost surely discrete measures: hence, µ can always be represented as i 1 J iδ Xi. Note that if ν(dv, dx) = ρ(dv)λ(dx) the law of the J i s and the X i s are independent. In this case µ is termed homogeneous. A non-homogeneous CRM has ν(dv, dx) = ρ(dv x)λ(dx). Non-homogeneous CRMs arise in the posterior. If ν(r +, dx) = for any x, then the CRM jumps infinitely often on any bounded set A. However, recall that µ(a) < a.s. We consider CRMs with Poisson intensity of the following form: ν(dv, dx) = 1 Γ(1 σ) e γ(x)v dv λ(dx), σ [0, 1), γ( ) > 0. v 1+σ γ constant generalized gamma measure (Brix, 1999) σ = 0 weighted gamma measure (Lo & Weng, 1989).

Asymptotic issues Posterior consistency. First generate independent data from a true fixed density f 0, then check whether the sequence of posterior distributions of f accumulates in some suitable neighborhood of f 0. Which conditions on f 0 and µ are needed under the four kernels considered in order to achieve consistency? CLTs for functionals of the posterior hazard. For a fixed T > 0, we consider random objects like (i) H(T ), (ii) T 1 T 0 h(t) 2 dt (path-second moment) and (iii) T 1 T 0 [ h(t) H(T )/T ] 2 dt (path-variance). Let Y = (Y 1,..., Y n ) be a set of observations. We are interested in establishing Central Limit Theorems of the type η(t ) [ H(T ) τ(t )] N(0, σ 2 ) as T + Y law for appropriate positive functions τ(t ) and η(t ) and variance σ 2. How are τ( ), η( ) and σ 2 influenced by the observed data?

Weak consistency Denote by P 0 the probability distribution associated with f 0 and by P 0 the infinite product measure. Also, let Π be the prior distribution of the random density function f and Π n the posterior distribution. Definition. If, for any ɛ > 0 and as n, Π n (A ɛ (f 0 )) 1 a.s. P 0, where A ɛ (f 0 ) is a ɛ neighborhood of f 0 in the weak topology, then Π is said to be weak consistent at f 0. Remark. A sufficient condition for weak consistency requires a prior Π to assign positive probability to Kullback Leibler (K-L) neighborhoods of f 0 ( K-L condition): Π ( f F : log(f 0 /f )f 0 < ɛ ) > 0 for any ɛ > 0, where log(f 0 /f )f 0 denotes the K-L divergence between f 0 and f, whereas F is the space of density functions defined on R +.

General consistency result Assume X = R + and postulate the existence of a boundedly finite measure µ 0 on R + such that the true density f 0 has hazard h 0 (t) = R + k(t, x)µ 0 (dx) where k is the kernel used for constructing f via the mixture hazard h. We translate the K-L condition w.r.t. f 0 into a condition of positive prior probability assigned to uniform neighborhoods of h 0 on [0, T ]. THEOREM Weak consistency Assume (i) h 0 (t) > 0 for any t 0, (ii) R + g(t) f 0 (t)dt <, where g(t) = max{e[ H(t)], t}. Then, a sufficient condition for Π to be weakly consistent at f 0 is that Π {h : sup 0<t T h(t) h 0 (t) } < δ > 0 for any finite T and positive δ. Note that condition (ii) is related to the asymptotic characterization of functionals of the random hazard rate (see later).

Relaxing the condition h 0 (0) > 0 For (DL) and (OU) kernels one may actually have h 0 (0) = 0. Problems then arise in establish the K-L condition since log(h 0 / h) (hence log(f 0 / f )) can become arbitrarily large around 0. In order to cover also the case h 0 (0) = 0, we resort to the short time behaviour of µ, which determines the way h vanishes in 0 for the (DL) and (OU) cases. PROPOSITION 1 Short time behaviour Weak consistency holds for (DL) and (OU) kernels also with h 0 (0) = 0 provided there exist α, r > 0 such that: (a) lim t 0 h 0 (t)/t α = 0; (b) lim inf t 0 µ((0, t])/t r = a.s. In particular (b) holds if µ is a CRM belonging to the generalized gamma family with σ (0, 1) and λ(dx) = dx. Note that the powers α and r do not need to satisfy any relation.

Uniform balls of the true hazard rate In order to derive explicit consistency results for the mixture hazard models based on the four kernels considered, we need to establish: { Π h : sup 0<t T h(t) h 0 (t) } < δ > 0. To this aim, the following result is essential. Denote by G the space of nondecreasing càdlàg functions on R + such that G(0) = 0 for any G G. Note that any CRM induces a distribution on G and that, for G 0 (x) = µ 0 ( [0, x] ), G0 G. LEMMA 1 Support of CRMs Let µ be a CRM on R +, satisfying ν(r +, dx) = for any x X, and denote by Q the distribution induced on G. Then, for any G 0 G, any finite M and η > 0, { } Q G G : sup x M G(x) G 0 (x) < η > 0. Different techniques are required for the (exp) kernel w.r.t. the other three kernels (different support of k(t, x) seen as a function of x).

PROPOSITION 2 Weak consistency for (DL), (rect) and (OU) Let µ be a generalized gamma CRM with σ (0, 1) and λ(dx) = dx. Then weak consistency holds for the (DL), (rect) and (OU) kernels provided that: (i) h 0 (t) > 0 for any t > 0; (ii) t i f 0 (t)dt < with i = 1 for (rect) and (OU), i = 2 for (DL). Note that Condition (i) is automatically satisfied if µ 0 is absolutely continuous. PROPOSITION 3 Weak consistency for (exp) Weak consistency holds for the (exp) kernel provided that: (i) h(0) < a.s. and h 0 (0) < ; (ii) t f 0 (t)dt <. The only relevant assumption consists in asking for both h and h 0 not to explode in 0. Note that h 0 (t) > 0 is automatically satisfied.

CLTs for linear and quadratic functionals CLTs provide a synthetic picture of the random hazard, like information about the trend, oscillation and overall variance for large time horizons. In Peccati and Pruenster (2006) functionals of the prior hazard rate are considered. Such results can serve as a guide for prior specification. Here we investigate the asymptotic behaviour of functionals of the posterior random hazard rate given a fixed number of observations. We found out that, in all the considered special cases, the CLTs associated with the posterior hazard rate are the same as for the prior ones, and this for any number of observations. Although consistency implies that a given model can be asymptotically directed towards any deterministic target, the overall structure of a posterior hazard rate is systematically determined by the prior choice.

Posterior distribution We exploit an explicit posterior representation of the random mixture hazard model of James (2005). Let P be the random probability measure associated with f (t) and let (Y n ) n 1 be a sequence of exchangeable observations from P. Set Y = (Y 1,..., Y n ). Define a random probability measure on X R + as P(dx, dt) = e R t R 0 X k(s,y) µ(dy) ds k(t, x) µ(dx) dt. By introducing the latent variables X = (X 1,..., X n ), we describe µ Y in terms of µ Y, X mixed over X Y. Since the X i s may feature ties, we denote by X = (X1,..., X k ) the k n distinct latent variables. Let n j be the frequency of Xj and C j = {r : x r = xj }. The likelihood function is then given by L (µ; y, x) = e R P n R yi X i=1 0 k(t,x)dtµ(dx) k j=1 µ(dx j ) n j i C j k(y i, xj )

Given X and Y, the conditional distribution of µ Y, X coincides with the distribution of the random measure µ n d = µ n + k i=1 J i δ X j = µ n + n µ n is a CRM with intensity measure ν n (dv, dx) := e v P R n yi i=1 0 k(t,x)dt ρ(dv x) λ(dx) n := k i=1 J i δ X j with, for i = 1,..., k, X i a fixed points of discontinuity with corresponding jump J i which are, conditionally on X and Y, independent of µ n. The conditional distribution X Y, say P X Y, can be expressed as the conditional distributions of X p, Y and p Y, where p is a partitions of the integers {1,..., n}. We derive CLTs for the posterior random hazard given X and Y : h n(t) = X k(t, x) µn (dx) + k i=1 J ik(t, X i ), and, then, average over the distribution of X Y.

Limit theorems for shifted measures Peccati and Pruenster (2006) provide sufficient conditions to have that linear and quadratic functionals associated with a random mixture hazard without fixed atoms verify a CLT. Linear functional. For given X and Y, define H n(t ) = T 0 h n(t)dt = H n (T ) + k i=1 J i T 0 k(t, X i )dt Assume there exists a (deterministic) function T C n (k, T ) and σn(k) 2 > 0 such that, as T +, Cn(k, 2 T ) [ s ] T 2 R + X 0 k(t, x)dt ν n (ds, dx) σn(k), 2 Cn(k, 3 T ) [ s ] T 3 R + X 0 k(t, x)dt ν n (ds, dx) 0, Then C n (k, T ) [ Hn (T ) E[ H ] Y n law (T )] N(0, σn(k)), 2 where σ 2 n(k) may depend on Y.

Assume moreover that lim C n(k, T ) k j=1 J j T + T 0 k(t, X j )dt = m n ( n, k), (1) for some quantity m n ( n, k) 0 that may depend on X. Then C n (k, T ) [ H n(t ) E[ H ] n Y law (T )], X N ( m n ( n, k), σn(k) 2 ). If (1) holds for X = (x 1,..., x n ) X n in a set of P X Y probability 1, then we can state the result as η(t ) [ ] H(T law ) τ(t ) Y Z X,Y d P X Y, X where n η(t ) = C n (k, T ); τ(t ) = E[ H n (T ) Y ]; Z X,Y is a Gaussian random variable with mean m n ( n, k) and variance σ n (k). The mixture in X n Z X,Y d P X Y is over the quantity m n ( n, k) that may change according with the realization of X.

Applications We derive CLTs for functionals of posterior hazards based on the four kernels combined with generalized gamma CRMs. In general m n ( n, k) = 0, not depending on Y, X, that is the part of the posterior with fixed points of discontinuity affects the asymptotic behaviour just in the limit. Then, the convergence is simply expressed in terms of a Gaussian random variable. LINEAR CLT 1 (OU) kernel [ ] T 1 2 H(T ) 2 κ γ (1 σ) law T Y N(0, 2γ (1 σ) ) LINEAR CLT 2 (DL) kernel [ T 3 2 H(T ) (2γ (1 σ) ) 1 T 2] ( ) law Y 1 σ N 0, 3γ 2 σ Most important: C n (k, T ) = C 0 (k, T ), E[ H(T )] E[ H(T ) Y ] and σ n (k) = σ 0 (k), that is we get the same trend, oscillation around the trend and overall variance we get with the prior cumulative hazard.

One might think that the lack of dependence on the data of posterior functionals may be due to the boundedness of the support of the (DL), (OU) and (rect) kernels, as it happens that ν n = ν in R + [Y (n), ), where Y (n) is the largest observed lifetime. Consider then the (exp) kernel. In particular, we set λ(dx)=x 1 2 e 1 x (2 π) 1 in the Poisson intensity ν(dv, dx) of generalized gamma CRM, which gives us h(0) < a.s. and a prior mean centered on a quasi Weibull hazard, E[ h(t)] = γ 1+σ / t + 1. LINEAR CLT 3 (exp) kernel [ T 1 2 H(T ) (γ 1+σ ) 1 T 1/2] ( ) law Y N 0, (2 2)(1 σ) γ 2 σ Again, the oscillation rate, the trend and the overall variance coincides with the prior ones, that is they do not depend on the observation Y.

Concluding remarks We provide a comprehensive investigation of posterior weak consistency of the mixture hazard model. L 1 consistency would be the natural successive target. The investigation of CLTs for functionals represents a completely new line of research and has interest in its own. The coincidence of asymptotic behaviour of the posterior and the prior hazard means that the data do not influence the behaviour of the model for times larger than the largest observed lifetime Y (n) ; and, this, despite the fact that these models are consistent. The fact that the overall variance is not influenced by the data is somehow counterintuitive: since the contribution of the CRM vanishes in the limit, one would expect the variance to become smaller and smaller as more data come in. Since this does not happen, our findings show a clear evidence that the choice of the CRM really matters whatever the size of the dataset and provide a guideline in incorporating prior knowledge appropriately into the model.

Appendix For Further Reading Brix, A. (1999) Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Prob. 31, 929 953. Dykstra, R.L. and Laud, P. (1981) A Bayesian nonparametric approach to reliability. Ann. Statist. 9, 356 367. James, L.F. (2005) Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages. Ann. Statist. 33, 1771 1799. Lo, A.Y. and Weng, C.S. (1989) On a class of Bayesian nonparametric estimates. II. Hazard rate estimates. Ann. Inst. Statist. Math. 41, 227 245. Peccati, G. and Prünster, I. (2006) Linear and quadratic functionals of random hazard rates: an asymptotic analysis. Preprint.