STAT331. Example of Martingale CLT with Cox s Model

Similar documents
17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Solutions to HW Assignment 1

2.2. Central limit theorem.

Notes 19 : Martingale CLT

Lecture 19: Convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

4. Partial Sums and the Central Limit Theorem

Generalized Semi- Markov Processes (GSMP)

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Notes 27 : Brownian motion: path properties

Lecture 33: Bootstrap

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Solution to Chapter 2 Analytical Exercises

LECTURE 8: ASYMPTOTICS I

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

This section is optional.

An alternative proof of a theorem of Aldous. concerning convergence in distribution for martingales.

Lecture 8: Convergence of transformations and law of large numbers

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Understanding Samples

Frequentist Inference

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Distribution of Random Samples & Limit theorems

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Random Variables, Sampling and Estimation

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

6a Time change b Quadratic variation c Planar Brownian motion d Conformal local martingales e Hints to exercises...

7.1 Convergence of sequences of random variables

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 20: Multivariate convergence and the Central Limit Theorem

Estimation for Complete Data

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Berry-Esseen bounds for self-normalized martingales

1 The Haar functions and the Brownian motion

18.S096: Homework Problem Set 1 (revised)

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Mathematics 170B Selected HW Solutions.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Ref. Gallager, Stochastic Processes. Notation a vector. All vectors are row vectors. k k. jωx. Φ joint chacteristic function of.

Entropy Rates and Asymptotic Equipartition

Exponential Families and Bayesian Inference

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Advanced Stochastic Processes.

Lecture 10 October Minimaxity and least favorable prior sequences

The Poisson Process *

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Infinite Sequences and Series

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Singular Continuous Measures by Michael Pejic 5/14/10

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

Lecture Chapter 6: Convergence of Random Sequences

Asymptotic Results for the Linear Regression Model

Chapter 6 Infinite Series

Kernel density estimator

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Regression with an Evaporating Logarithmic Trend

Lecture 11 October 27

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Chapter 6 Principles of Data Reduction

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

1 Convergence in Probability and the Weak Law of Large Numbers

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CHAPTER 5. Theory and Solution Using Matrix Techniques

Ma 530 Infinite Series I

Unbiased Estimation. February 7-12, 2008

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Module 1 Fundamentals in statistics

ECE534, Spring 2018: Final Exam

Estimation of the Mean and the ACVF

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Sequences. Notation. Convergence of a Sequence

Matrix Representation of Data in Experiment

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Chapter 8. Euler s Gamma function

EE 4TM4: Digital Communications II Probability Theory

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Solutions: Homework 3

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Binomial Distribution

Stochastic Integration and Ito s Formula

Properties and Hypothesis Testing

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Lecture 11 and 12: Basic estimation theory

ST5215: Advanced Statistical Theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Section 11.8: Power Series

Transcription:

STAT33 Example of Martigale CLT with Cox s Model I this uit we illustrate the Martigale Cetral Limit Theorem by applyig it to the partial likelihood score fuctio from Cox s model. For simplicity of presetatio we assume that we have a scalar covariate Z; however, the same methods apply for vector Z. Suppose that Z deotes a scalar covariate ad assume that the hazard fuctio for someoe with covariate value Z is h(t Z) = λ (t)e βz. The observatios cosist of triplets (U i, δ i, Z i ), for i =, 2,...,, arisig from i.i.d. (T i, C i, Z i ). As usual, we assume that cesorig is oiformative (T i C i Z i ). Let U = 2 ll (β), β where L is Cox s partial likelihood. Recall that if L were a real likelihood fuctio based o i.i.d. observatios, the uder some regularity coditios we would have that 2 ll (β) β = ( ) ll / β L N(, v), sice the bracketed term is a average of i.i.d. zero-mea radom variables (say, with some variace v). This is why we stadardize ll (β)/ β by /2.

Returig to Cox s partial likelihood, we idicated i Uit 2 that we ca express U as where ΣU (t) = U = ΣU (t) t=, ( Z i l= Z le βz l Y ) l(s) dm i (s), l= eβz l Yl (s) ad where N i (t) = [U i t, δ i = ], Y i (t) = (U i t), M i (t) = N i (t) A i (t), ad Lettig we ca thus write A i (t) = ( H i (s) def = 2 Z i λ (s)e βz l Y i (s)ds. Zl e βz l Y ) l(s), e βz lyl (s) ΣU (t) = 2 H i (s)dm i (s). Settig this equal to zero is the same as settig H i (s)dm i (s) to zero. Note that A i ( ) is cotiuous because T i is cotiuous, so λ is cotiuous. To apply the martigale CLT, we eed H i ( ) to be predictable ad locally bouded. It is clear that H i ( ) is predictable (it is left cotiuous ad adapted). Oe way to esure that it is locally bouded is to assume that Z is bouded (Exercise 8 of Uit 2). Also, because the survival times T, T 2,..., T for the subjects are assumed to be idepedet ad cotiuously distributed, they caot jump at the same time, ad so the M i ( ) are orthogoal. Thus, the settig of the martigale cetral limit theorem, (3.3)-(3.5), has bee established. 2

We ow evaluate coditios (a) ad (b) of the Martigale CLT. (a) Does < ΣU, ΣU? > (t) P α(t) (some determiistic fuctio) as Because A i ( ) is cotiuous ad M i ad M j are orthogoal for i j, < ΣU, ΣU > (t) = H i (s)h j (s)d < M i, M j > (s) where = = = j= Q (s) = H 2 i (s)da i (s) λ (s) (Z i Q (s)) 2 λ (s) e βz i Y i (s)ds ( ) (Z i Q (s)) 2 e βz i Y i (s) ds, l= Z le βz l Y l(s) l= eβz l Yl (s). Cosider the term i large brackets for fixed s. This has the same probability limit as { } (Z i µ(s)) 2 e βz i Y i (s), (4.) where µ(s) = E ( Z l e βz l Y l(s) ) E (e βz l Yl (s)). To see this, ote that their differece equals (Z i Q (s)) 2 e βz i Y i (s) (Z i µ(s)) 2 e βz i Y i (s) = ( Y i (s)e βz i Q 2 (s) µ 2 (s) + 2Z i (µ(s) Q (s)) ) 3

= (Q 2 (s) µ 2 (s)) Y i (s)e βz i 2 (Q (s) µ(s)) By the Law of Large Numbers, ad Y i (s)e βz P i E(Yi (s)e βz i ) Y i (s)e βz i Z i. (4.2) Y i (s)e βz i P Z i E(Yi (s)e βz i Z i ). Sice Q (s) coverges i probability to µ(s) (Law of Large Numbers ad Slutzky), Q 2 (s) µ 2 (s) coverges i probability to zero (Cotiuous Mappig Theorem), ad hece (4.2) coverges to zero (repeatedly apply Slutzky s lemma). Now let s cosider (4.) for ay arbitrary s. Sice it is a average of i.i.d. radom variables, it coverges i probability to ( ) m(s) def = E (Z i µ(s)) 2 e βz i Y i (s). It follows from this that < ΣU, ΣU > (t) = λ (s) ( ) (Z i Q (s)) 2 e βz i Y i (s) ds P λ (s)m(s)ds. (4.3) That is, if the limit of the itegral is the itegral of the limit (we will show this more rigorously i Uit 5). Thus, i the otatio of the Martigale CLT, α(t) = λ (s)m(s)ds. 4

Now cosider coditio (b) of the Martigale Cetral Limit Theorem: Does < ΣU,ϵ, ΣU,ϵ > (t) P for all ϵ >? We have < ΣU,ϵ, ΣU,ϵ > (t) = = λ (s){ H 2 i (s) [ H i (s) ϵ]da i (s) = ( j= Z i Z ) [ j(s)e βz j Y j (s) 2 Z i j= eβz j Yj (s) Note that the limit, i probability, of [ Z i j= Z je βz j Y j(s) j= eβz j Yj (s) j= Z je βz j Y j (s) j= eβz j Yj (s) ϵ ϵ is zero because the term i absolute value is bouded (still assume Z is bouded) whereas ϵ (Exercise ). It follows that the etire itegrad i the above itegral coverges to zero i probability, ad hece that < ΣU,ϵ, ΣU,ϵ > (t) P for every t ad ϵ as (implicit i this statemet is that the limit of the itegral equals the itegral of the limit. This requires some assumptios that we will retur to i Uit 5). Thus, the coditios of Martigale CLT hold, so that ] ] e βz i Y i (s)}ds. ΣU ( ) w Q( ), as, where Q( ) is a zero-mea Gaussia process with idepedet icremets ad var (Q(t)) = α(t). By takig t =, it follows that U = ΣU ( ) L N(, σ 2 ) as, where σ 2 = α( ) = λ (s)m(s)ds. Note: ot clear why we ca evaluate these processes at ifiity. Usually, oe assumes that the support of U does ot exted beyod some time τ, so that the process evaluated at time τ is same as at ifiity. 5

The expressio (4.3) for α(t) caot be simplified much more without beig more specific about Z ad the cesorig distributio. For example, cosider the 2-sample problem, where { for group Z i = for group ad suppose that P (Z i = ) = 2, ad that the distributio of C i does ot deped o (T i, Z i ), ad has c.d.f. G( ). The it is ot difficult to show that uder H : β =, ad where E ( (Z l e βz l Y l (s) ) = 2 ( F (s))( G(s)) E ( e βz l Y l (s) ) = ( F (s))( G(s)), F (t) = e λ (s)ds. Thus, µ(s) = /2. It ca also be easily show that, uder H, m(s) = E ( (Z i 2 )2 Y i (s) ) Thus α(t) = 4 = 4 = 4 ( F (s))( G(s)). where f (s) = λ (s)( F (s)). Whe t =, where p = P r(δ i = ). σ 2 = α( ) = 4 λ (s)( F (s))( G(s))ds f (s)( G(s))ds, f (s)( G(s))ds = p 4, 6

Hece we have show that uder H : β =, U = /2 ll (β)/ β coverges i distributio to a ormal distributio with mea zero ad variace σ 2. Therefore, the partial likelihood score test of H, obtaied by evaluatig this at β = ad stadardizig it by σ, is asymptotically N(, ) uder H. We later use very similar techiques to fid the o-ull distributio of Cox s partial likelihood score test whe β. Note: this result shows that lograk test is asymptotically N(,) uder the ull: recall that it ca be viewed as arisig from a Cox proportioal hazards model as a score test! 7

Brief Review of Some Key Martigale Theory Results X( ) is a martigale if it is adapted to F t ad: X( ) is right cotiuous with left-had limits E X(t) < t E[X(t + s) F t ] a.s. = X(t) s, t. X( ) is a sub-martigale if the = is replaced by. If X( ) is a martigale, the E(X(t)) is costat i t; without loss of geerality we ca take E(X(t)) =. N( ) is a coutig process if N() =, N(t) < N( ) is right-cotiuous N( ) is a step fuctio with jumps of size +. H( ) is a predictable process if its value at t determied by F t (e.g., left cotiuous ad adapted). Doob-Meyer decompositio: X( ) = o-egative sub-martigale right-cotiuous,, predictable process A( ) s.t. E(A(t)) < t, ad M( ) = X( ) A( ) is a martigale. A( ) is called the compesator for X( ). If A() a.s. =, A( ) is almost surely uique. Ay coutig process N( ) such that E(N(t)) < is a sub-martigale, A( ) such that N( ) A( ) is a martigale. If M( ) is ay martigale ad EM 2 (t) <, M 2 ( ) is a sub-martigale! predictable process < M, M > ( ) such that M 2 ( ) < M, M > ( ) is a martigale. The process < M, M > ( ) is called the predictable quadratic variatio. Note: If E(M(t)) =, var (M(t)) = E(M 2 (t)) = E(< M, M > (t)). If the zero mea martigale M = N A satisfies E(M 2 (t)) < ad A( ) is cotiuous, the < M, M > ( ) as = A( ). Thus, V ar(m(t)) = E(A(t)). 8

Next suppose that M ( ), M 2 ( ), are zero-mea martigales defied o the same filtratio. The a M ( ) + bm 2 ( ) is a martigale (for ay costats a, b). If E(Mj 2 (t)) < for j =, 2, a right-cotiuous predictable process < M, M 2 > ( ) such that < M, M 2 > () =, E < M, M 2 > (t) <, ad M ( ) M 2 ( ) < M, M 2 > ( ) is a martigale. If < M, M 2 > as =, M ( ) ad M 2 ( ) are called orthogoal. If M j = N j A j, where N j is a coutig process ad A j is its cotiuous compesator, the if N i ad N j caot jump at the same time, < M i, M j > ( ) = as. Next suppose that N( ) is a bouded coutig process, that A( ) is the compesator for N( ), ad E(N A) 2 (t) < for every t. Moreover, H( ) is a bouded, predictable process. Defie Q( ) by The Q(t) = H(s)dM(s) Q( ) is a zero-mea martigale. V ar(q(t)) = E(Q 2 (t)). where M( ) = N( ) A( ). Q 2 ( ) is a sub-martigale, ad thus predictable < Q, Q > ( ) such that Q 2 ( ) < Q, Q > ( ) is a martigale. Hece, var (Q(t)) = E(< Q, Q > (t)). If A( ) is cotiuous, < Q, Q > (t) a.s. = H2 (s)da(s), ad thus V ar(q(t)) = E( H2 (s)da(s)). If M, M 2 are 2 martigales defied similarly to M( ), H ad H 2 are bouded predictable processes, ad Q ( ) ad Q 2 ( ) are defied i the same way as Q( ), the predictable quadratic covariace process < Q, Q 2 > satisfies < Q, Q 2 > (t) = H (s) H 2 ( )d < M, M 2 > (s). 9

Martigale CLT: For,2,...,, let Defie N i ( ) = coutig processes that ca t jump at the same time A i ( ) = cotiuous compesators H i ( ) = locally bouded predictable processes Suppose that M i( ) = N i ( ) A i ( ) U i (t) = H i(s)dm i (s) Hi (s) = H i(s)[ H i (s) ϵ] U i,ϵ (t) = H i (s)dm i(s) ΣU ( ) = i U i( ), ad ΣU i,ϵ ( ) = i U i,ϵ( ). The (a) < ΣU, ΣU > (t) P α(t) for all t, for some fuctio α( ). (b) < ΣU,ϵ, ΣU,ϵ >(t) P for all t ad all ϵ >. ΣU ( ) L U( ), where U( ) is a zero-mea Gaussia process with idepedet icremets ad variace fuctio V ar(u(t)) = α(t). Note: By the defiitios of N i ( ), A i ( ), ad H i ( ): < ΣU, ΣU > (t) = Hi(s)dA 2 i (s) < ΣU,ϵ, ΣU,ϵ > (t) = where H i (s) = H i(s) [ H i (s) ϵ]. H 2 i (s)da i (s), Ituitio: While the proof of the Martigale CLT is rather detailed, the result is ot surprisig. Oe forms a sum, U ( ) = U i ( ), of orthogoal

martigales, where each has zero mea, ad where the variace of the sum coverges i probability to some costat fuctio α( ). I the classical CLT we are used to seeig a sum of i.i.d. radom variables multiplied by /2. Thus, the variace of the stadardized sum is (ad coverges to) σ2 = σ 2, where σ 2 is the variace of each term i the sum. I the way we have set up the Martigale CLT, the multiplier /2 is buried i the itegrads, H i ( ) of the martigales Q i, so the aalogy is really the same as i the classical CLT. If we cosider the asymptotic distributio of U (t) for some fixed t, it is ot surprisig that the result is a ormal distributio with mea. From this oe would expect the fiite-dimesioal distributios of U ( ) to coverge to zero mea multivariate ormal distributios. Thus, if tightess holds, we would expect U ( ) to coverge to a zero mea Gaussia process. The idepedet icremets of this limitig process are assured from the ucorrelated icremet property of the idividual martigales U i ( ).

Exercises. Prove that (see page 5) Q (s) P as, where Q (s) = {( Z i j= Z je βz j Y j(s) j= eβz j Yj (s) ) ϵ }. 2. Prove the expressios for µ, m ad σ 2 o page 6. Do ot take ay of the expressios o page 6 for grated. 2