Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
|
|
- Charlotte O’Neal’
- 6 years ago
- Views:
Transcription
1 Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research University of North Carolina-Chapel Hill 1
2 Score Functions and Estimating Equations A parameter ψ(p ) of particular interest is the parametric component θ of a semiparametric model {P θ,η : θ Θ, η H}, where Θ is an open subset of R k and H is an arbitrary set that may be infinite dimensional. Tangent sets can be used to develop an efficient estimator for ψ(p θ,η ) = θ through the formation of an efficient score function. 2
3 In this setting, we consider submodels of the form {P θ+ta,ηt, t N ɛ } that are differentiable in quadratic mean with score function where a R k, log dp θ+ta,ηt t = a lθ,η + g, t=0 l θ,η : X R k is the ordinary score for θ when η is fixed, and where g : X R is an element of a tangent set Ṗ (η) P θ,η for the submodel (holding θ fixed). P θ = {P θ,η : η H} 3
4 This tangent set is the tangent set for η and should be rich enough to reflect all parametric submodels of P θ. The tangent set for the full model is Ṗ Pθ,η = { } a lθ,η + g : a R k, g Ṗ(η) P θ,η. 4
5 While ψ(p θ+ta,ηt ) = θ + ta is clearly differentiable with respect to t, we also require that there there exists a function ψ θ,η : X R k such that ψ(p θ+ta,ηt ) [ ( t = a = P ψθ,η l θ,η a + g)] t=0 for all a R k and all g Ṗ(η) P θ,η., (1) After setting a = 0, we see that such a function must be uncorrelated with all of the elements of Ṗ (η) P θ,η. 5
6 Define Π θ,η to be the orthogonal projection onto the closed linear span of Ṗ (η) P θ,η in L 0 2 (P θ,η). The efficient score function for θ is l θ,η = l θ,η Π θ,η lθ,η, while the efficient information matrix for θ is Ĩ θ,η = P ] [ lθ,η l θ,η. 6
7 Provided that Ĩ θ,η is nonsingular, the function ψ θ,η = Ĩ 1 θ,η l θ,η satisfies (1) for all a R k and all g Ṗ (η) P θ,η. Thus the functional (parameter) ψ(p θ,η ) = θ is differentiable at P θ,η relative to the tangent set Ṗ Pθ,η, with efficient influence function ψ θ,η. 7
8 Recall that the search for an efficient estimator of θ is over if one can find an estimator T n satisfying n(tn θ) = np n ψθ,η + o P (1). Note that Ĩ θ,η = I θ,η P ) ] [Π θ,η lθ,η (Π θ,η lθ,η, where I θ,η = P [ ] lθ,η l θ,η. 8
9 An intuitive justification for the form of the efficient score is that some information for estimating θ is lost due to a lack of knowledge about η. The amount subtracted off of the efficient score, Π θ,η lθ,η, is the minimum possible amount for regular estimators when η is unknown. 9
10 Consider again the semiparametric regression model: where Y = β Z + e, E[e Z] = 0 and E[e 2 Z] K < almost surely, and where we observe (Y, Z), with the joint density η of (e, Z) satisfying R almost surely. eη(e, Z)de = 0 10
11 Assume η has partial derivative with respect to the first argument, η 1, satisfying and hence η 1 η L 2(P β,η ), η 1 η L0 2(P β,η ), where P β,η is the joint distribution of (Y, Z). The Euclidean parameter of interest in this semiparametric model is θ = β. 11
12 The score for β, assuming η is known, is l β,η = Z( η 1 /η)(y β Z, Z), where we use the shorthand (f/g)(u, v) = f(u, v)/g(u, v) for ratios of functions. One can show that the tangent set Ṗ (η) P β,η for η is the subset of L 0 2 (P β,η) which consists of all functions g(e, Z) L 0 2 (P β,η) which satisfy R eg(e, Z)η(e, Z)de E[eg(e, Z) Z] = R η(e, Z)de = 0, almost surely. 12
13 One can also show that this set is the orthocomplement in L 0 2 (P β,η) of all functions of the form ef(z), where f satisfies P β,η f 2 (Z) <. This means that l β,η = (I Π β,η ) l β,η is the projection in L 0 2 (P β,η) of Z( η 1 /η)(e, Z) onto {ef(z) : P β,η f 2 (Z) < }, where I is the identity. 13
14 Thus where l β,η (Y, Z) = Ze R η 1(e, Z)ede P β,η [e 2 Z] = Ze( 1) P β,η [e 2 Z] = Z(Y β Z) P β,η [e 2, Z] the second-to-last step follows from the identity R η 1 (e, Z)ede = R and the last step follows since e = Y β Z. η(te, Z)de t, t=1 14
15 When the function z P β,η [e 2 Z = z] is non-constant in z, l β,η (Y, Z) is not proportional to Z(Y β Z), and the usual least-squares estimator ˆβ will not be efficient. This is discussed in greater detail in Chapter 4. 15
16 Two very useful tools for computing efficient scores are score and information operators. Returning to the generic semiparametric model {P θ,η : θ Θ, η H}, sometimes it is easier to represent an element g in the tangent set for η, Ṗ (η) P θ,η, as B θ,η b, where b is an element of another set H η and B θ,η is an operator satisfying Ṗ (η) P θ,η = {B θ,η b : b H η }. 16
17 Such an operator is a score operator. The information operator is: B θ,η B θ,η : H η lin H η. If B θ,η B θ,η has an inverse, then it can be shown that the efficient score for θ has the form l θ,η = ( I B θ,η [ B θ,η B θ,η ] 1 B θ,η ) lθ,η. 17
18 To illustrate these methods, consider again the Cox model for right-censored data. Recall in this setting that we observe a sample of n realizations of X = (V, d, Z), where V = T C, d = 1{V = T }, Z R k is a covariate vector, T is a failure time, and C is a censoring time. 18
19 We also assume that T and C are independent given Z, that T given Z has integrated hazard function e β Z Λ(t), for β in an open subset B R k and Λ is continuous and monotone increasing with Λ(0) = 0, and that the censoring distribution does not depend on β or Λ (i.e., censoring is uninformative). 19
20 Recall also the counting and at-risk processes N(t) = 1{V t}d and Y (t) = 1{V t}, and let M(t) = N(t) t 0 Y (s)e β Z dλ(s). For some 0 < τ < with P {C τ} > 0, let H be the set of all Λ s satisfying our criteria with Λ(τ) <. Now the set of models P is indexed by β B and Λ H. 20
21 We let P β,λ be the distribution of (V, d, Z) corresponding to the given parameters. The likelihood for a single observation is thus proportional to [ ] d [ ] p β,λ (X) = e β Z λ(v ) exp e β Z Λ(V ), where λ is the derivative of Λ. 21
22 Now let L 2 (Λ) be the set of measurable functions b : [0, τ] R with τ 0 b 2 (s)dλ(s) <. Note that if b L 2 (Λ) is bounded, then Λ t (s) = s 0 e tb(u) dλ(u) H for all t. 22
23 The score function is thus τ log p β+ta,λt (X) t t=0 [ a Z + b(s) ] dm(s), 0 for any a R k. 23
24 The score function for β is therefore l β,λ (X) = ZM(τ), while the score function for Λ is τ 0 b(s)dm(s). In fact, one can show that there exists one-dimensional submodels Λ t such that log p β+ta,λt is differentiable with score a lβ,λ (X) + for any b L 2 (Λ) and a R k. τ 0 b(s)dm(s), 24
25 The operator B β,λ : L 2 (Λ) L 0 2(P β,λ ), given by B β,λ (b) = τ 0 b(s)dm(s), is the score operator which generates the tangent set for Λ, Ṗ (Λ) P β,λ {B β,λ b : b L 2 (Λ)}. It can be shown that this tangent space spans all square-integrable score functions for Λ generated by parametric submodels. 25
26 The adjoint operator can be shown to be B β,λ : L 2(P β,λ ) L 2 (Λ), where B β,λ (g)(t) = P β,λ[g(x)dm(t)] dλ(t). The information operator B β,λ B β,λ : L 2 (Λ) L 2 (Λ) 26
27 is thus B β,λ B β,λ(b)(t) = P β,λ [ τ 0 b(s)dm(s)dm(u)] dλ(u) = P β,λ [Y (t)e β Z ] b(t), using martingale methods. 27
28 Since B β,λ ( lβ,λ ) (t) = P β,λ [ ZY (t)e β Z ], we have that the efficient score for β is l β,λ = = ( [ ] ) I B β,λ B 1 β,λ B β,λ B β,λ lβ,λ (2) [ ] τ P β,λ ZY (t)e β Z Z P β,λ [Y (t)e β Z ] dm(t). 0 28
29 When Ĩ β,λ P β,λ [ lβ,λ l β,λ ] is positive definite, the resulting efficient influence function is ψ β,λ Ĩ 1 β,λ l β,λ. 29
30 Since the estimator ˆβ n obtained from maximizing the partial likelihood L n (β) = can be shown to satisfy ( ) di n e β Z i n j=1 1{V (3) j V i }e β Z j i=1 n( ˆβn β) = np n ψβ,λ + o P (1), this estimator is efficient. 30
31 Returning to our discussion of score and information operators, these operators are also useful for generating scores for the entire model, not just for the nuisance component. With semiparametric models having score functions of the form a lθ,η + B θ,η b, for a R k and b H η, we can define a new operator A β,η : {(a, b) : a R k, b lin H η } L 0 2(P θ,η ) where A β,η (a, b) = a lθ,η + B θ,η b. 31
32 More generally, we can define the score operator A η : lin H η L 2 (P η ) for the model {P η : η H}, where H indexes the entire model and may include both parametric and nonparametric components, and where lin H η indexes directions in H. Let the parameter of interest be ψ(p η ) = χ(η) R k. 32
33 We assume there exists a linear operator χ : lin H η R k such that, for every b lin H η, there exists a one-dimensional submodel {P ηt : η t H, t N ɛ } satisfying [ (dp ηt ) 1/2 (dp η ) 1/2 t 1 2 A ηb(dp η ) 1/2 ] 2 0, as t 0, and χ(η t ) t = χ(b). t=0 33
34 We require H η to be a Hilbert space with inner product, η. The efficient influence function is the solution ψ Pη R(A η ) L 0 2(P η ) of A ψ η Pη = χ η, (4) where χ η H η satisfies χ η, b η = χ η (b) for all b H η. 34
35 When A ηa η is invertible, then the solution to (4) can be written ψ Pη = A η ( A η A η ) 1 χη. In Chapter 4, we utilized this approach to derive efficient estimators for all parameters of the Cox model. 35
36 Returning to the semiparametric model setting, where P = {P θ,η : θ Θ, η H}, Θ is an open subset of R k, and H is a set, the efficient score can be used to derive estimating equations for computing efficient estimators of θ. 36
37 Recall that an estimating equation is a data dependent function Ψ n : Θ R k for which an approximate zero yields a Z-estimator for θ. When Ψ n ( θ) has the form P nˆl θ,n, where ˆl θ,n (X X 1,..., X n ) is a function for the generic observation X which depends on the value of θ and the sample data X 1,..., X n, we have the following estimating equation result: 37
38 THEOREM 1. Suppose that the model {P θ,η : θ Θ}, where Θ R k, is differentiable in quadratic mean with respect to θ at (θ, η) and let the efficient information matrix Ĩθ,η be nonsingular. Let ˆθ n satisfy np nˆlˆθn,n = o P (1) and be consistent for θ. Also assume that ˆlˆθn,n is contained in a P θ,η-donsker class with probability tending to 1 and that the following conditions hold: Pˆθn,η ˆlˆθn,n = o P (n 1/2 + ˆθ n θ ), (5) P θ,η ˆlˆθn,n l 2 P θ,η 0, Pˆθn,η ˆlˆθn,n 2 = O P (1). (6) 38
39 Then ˆθ n is asymptotically efficient at (θ, η). 39
40 Returning to the Cox model example, the profile likelihood score is the partial likelihood score Ψ n ( β) = P nˆl β,n, where ˆl β,n (X = (V, d, Z) X 1,..., X n ) [ ] τ P n ZY (t)e β Z = 0 Z ] dm P n [Y (t)e β Z β(t), and M β(t) = N(t) t 0 Y (u)e β Z dλ(u). We showed in Chapter 4 that all the conditions of Theorem 1 are satisfied for the root of Ψ n ( β) = 0, ˆβ n, and thus the partial likelihood yields efficient estimation of β. 40
41 Maximum Likelihood Estimation The most common approach to efficient estimation is based on modifications of maximum likelihood estimation that lead to efficient estimates. These modifications, which we will call likelihoods, are generally not really likelihoods (products of densities) because of complications resulting from the presence of an infinite dimensional nuisance parameter. Recall the setting of estimation of an unknown real density f(x) from an i.i.d. sample X 1,..., X n. 41
42 The likelihood is n i=1 f(x i), and the maximizer over all densities has arbitrarily high peaks at the observations, with zero at the other values, and is therefore not a density. This problem can be fixed by using an empirical likelihood n i=1 p i, where p 1,..., p n are the masses assigned to the observations indexed by i = 1,..., n and are constrained to satisfy n i=1 p i = 1. This leads to the empirical distribution function estimator, which is known to be fully efficient. 42
43 Consider again the Cox model for right-censored data explored in the previous section. The density for a single observation X = (V, d, Z) is proportional to [ ] d [ ] e β Z λ(v ) exp e β Z Λ(V ). Maximizing the likelihood based on this density will result in the same problem raised in the previous paragraph. 43
44 A likelihood that works is the following, which assigns mass only at observed failure times: L n (β, Λ) = n i=1 where Λ(t) is the jump size of Λ at t. [ ] di [ ] e β Z i Λ(V i ) exp e β Z i Λ(V i ), (7) For each value of β, one can maximize or profile L n (β, Λ) over the nuisance parameter Λ to obtain the profile likelihood pl n (β), which for the Cox model is exp [ n i=1 d i] times the partial likelihood (3). 44
45 Let ˆβ be the maximizer of pl n (β). Then the maximizer ˆΛ of L n ( ˆβ, Λ) is the Breslow estimator ˆΛ(t) = t 0 P n dn(s) ]. P n [Y (s)e ˆβ Z We showed in Chapter 4 that ˆβ and ˆΛ are both efficient. 45
46 Another useful class of likelihood variants are penalized likelihoods. Penalized likelihoods add a penalty term (or terms) in order to maintain an appropriate level of smoothness for one or more of the nuisance parameters. This method was used in the partly linear logistic regression model described in Chapter 1. Other methods of generating likelihood variants that work are possible. 46
47 The basic idea is that using the likelihood principle to guide estimation of semiparametric models often leads to efficient estimators for the model components which are n consistent. Because of the richness of this approach to estimation, one needs to verify for each new situation that a likelihood-inspired estimator is consistent, efficient and well-behaved for moderate sample sizes. Verifying efficiency usually entails demonstrating that the estimator satisfies the efficient score equation described in the previous section. 47
48 Approximately Least-Favorable Submodels One of the key challenges in this setting is to ensure that the efficient score is a derivative of the chosen log likelihood along some submodel. Something helpful in this setting are approximately least-favorable submodels. 48
49 The basic idea is to find a function η t (θ, η) such that η 0 (θ, η) = η, for all θ Θ and η H, where η t (θ, η) H for all t small enough, and such that κ θ0,η 0 = l θ0,η 0, where κ θ,η (x) = l θ+t,η t (θ,η)(x) t, t=0 l θ,η (x) is the log-likelihood for the observed value x at the parameters (θ, η), and where (θ 0, η 0 ) are the true parameter values. Note that we require κ θ,η = l θ,η only when (θ, η) = (θ 0, η 0 ). 49
50 If (ˆθ n, ˆη n ) is the maximum likelihood estimator, i.e., the maximizer of P n l θ,η, then the function t P n lˆθn +t,η t (ˆθ n,ˆη n ) is maximal at t = 0, and thus (ˆθ n, ˆη n ) is a zero of P n κ θ, η. Now if ˆθ n and ˆl θ,n = κ θ,ˆηn satisfy the conditions of Theorem 1 at (θ, η) = (θ 0, η 0 ), then the maximum likelihood estimator ˆθ n is asymptotically efficient at (θ 0, η 0 ). 50
51 Consider now the maximum likelihood estimator ˆθ n based on maximizing the joint empirical log-likelihood L n (θ, η) np n l(θ, η), where l(, ) is the log-likelihood for a single observation. For now, η will be regarded as a nuisance parameter, and thus we can restrict our attention to the profile log-likelihood θ pl n (θ) sup η L n (θ, η). Note that L n is a sum and not an average, since we multiplied the empirical measure by n. 51
52 While the solution of an efficient score equation need not be a maximum likelihood estimator, it is also possible that the maximum likelihood estimator in a semiparametric model may not be expressible as the zero of an efficient score equation. This possibility occurs because the efficient score is a projection, and, as such, there is no assurance that this projection is the derivative of the log-likelihood along a submodel. This is the main issue that motivates approximately least-favorable submodels. 52
53 An approximately least-favorable submodel approximates the true least-favorable submodel to a useful level of accuracy that facilitates analysis of semiparametric estimators. We will now describe this process in generality: the specifics will depend on the situation. As mentioned previously, we first need a general map from the neighborhood of θ into the parameter set for η, which map we will denote by t η t (θ, η), for t R k. 53
54 We require that η t (θ, η) Ĥ, for all t θ small enough, and (8) η θ (θ, η) = η for any (θ, η) Θ Ĥ, where Ĥ is a suitable enlargement of H that includes all estimators that satisfy the constraints of the estimation process. Now define the map l(t, θ, η) l(t, η t (θ, η)). 54
55 We will require several things of l(,, ), at various point in our discussion, that will result in further restrictions on η t (θ, η). Define and let l(t, θ, η) l(t, θ, η), t) ˆl θ,n l(θ, θ, ˆη n ). Clearly, P nˆlˆθn,n = 0, and thus ˆθ n is efficient for θ 0, provided ˆl θ,n satisfies the conditions of Theorem 1. 55
56 The reason it is necessary to check this even for maximum likelihood estimators is that, as mentioned previously, ˆη n is often on the boundary (or even a little bit outside of) the parameter space. Recall again the Cox model setting for right censored data. 56
57 In this case, η is the baseline integrated hazard function which is usually assumed to be continuous. However, ˆη n is the Breslow estimator, which is a right-continuous step function with jumps at observed failure times and is therefore not in the parameter space. Thus direct differentiation of the log-likelihood at the maximum likelihood estimator will not yield an efficient score equation. 57
58 We will also require that l(θ 0, θ 0, η 0 ) = l θ0,η 0. (9) Note that we are only insisting that this identity holds at the true parameter values. This approximately least-favorable submodel structure is very useful for developing methods of inference for θ. 58
Efficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationEstimation for two-phase designs: semiparametric models and Z theorems
Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for
More informationarxiv:math/ v2 [math.st] 19 Aug 2008
The Annals of Statistics 2008, Vol. 36, No. 4, 1819 1853 DOI: 10.1214/07-AOS536 c Institute of Mathematical Statistics, 2008 arxiv:math/0612191v2 [math.st] 19 Aug 2008 GENERAL FREQUENTIST PROPERTIES OF
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference
Introduction to Empirical Processes and Semiparametric Inference Lecture 22: Preliminaries for Semiparametric Inference Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations
Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationM- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010
M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationEfficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models
NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia
More informationWeighted likelihood estimation under two-phase sampling
Weighted likelihood estimation under two-phase sampling Takumi Saegusa Department of Biostatistics University of Washington Seattle, WA 98195-7232 e-mail: tsaegusa@uw.edu and Jon A. Wellner Department
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationEMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky
EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are
More informationEnsemble estimation and variable selection with semiparametric regression models
Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationSTAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model
STAT331 Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model Because of Theorem 2.5.1 in Fleming and Harrington, see Unit 11: For counting process martingales with
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationSEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam
The Annals of Statistics 1997, Vol. 25, No. 4, 1471 159 SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam Likelihood
More informationMore on nuisance parameters
BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution
More informationSTAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis
STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive
More informationOn the Breslow estimator
Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,
More informationHypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations
Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate
More informationLecture 2: Martingale theory for univariate survival analysis
Lecture 2: Martingale theory for univariate survival analysis In this lecture T is assumed to be a continuous failure time. A core question in this lecture is how to develop asymptotic properties when
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationApplication of Time-to-Event Methods in the Assessment of Safety in Clinical Trials
Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationLecture 5 Models and methods for recurrent event data
Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationA semiparametric generalized proportional hazards model for right-censored data
Ann Inst Stat Math (216) 68:353 384 DOI 1.17/s1463-14-496-3 A semiparametric generalized proportional hazards model for right-censored data M. L. Avendaño M. C. Pardo Received: 28 March 213 / Revised:
More informationEmpirical Processes & Survival Analysis. The Functional Delta Method
STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will
More informationExercises. (a) Prove that m(t) =
Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationStatistics 262: Intermediate Biostatistics Non-parametric Survival Analysis
Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationlog T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,
Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics
More informationLarge sample theory for merged data from multiple sources
Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL
Statistica Sinica 19(29): Supplement S1-S1 ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL Dorota M. Dabrowska University of California, Los Angeles Supplementary Material This supplement
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More informationAnalysis of Time-to-Event Data: Chapter 6 - Regression diagnostics
Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 327 Entering the Era of Data Science: Targeted Learning and the Integration of Statistics
More informationLecture 22 Survival Analysis: An Introduction
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationSTAT 331. Martingale Central Limit Theorem and Related Results
STAT 331 Martingale Central Limit Theorem and Related Results In this unit we discuss a version of the martingale central limit theorem, which states that under certain conditions, a sum of orthogonal
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL
Statistica Sinica 19 (2009), 997-1011 ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL Anthony Gamst, Michael Donohue and Ronghui Xu University
More informationTheoretical Statistics. Lecture 22.
Theoretical Statistics. Lecture 22. Peter Bartlett 1. Recall: Asymptotic testing. 2. Quadratic mean differentiability. 3. Local asymptotic normality. [vdv7] 1 Recall: Asymptotic testing Consider the asymptotics
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationCox s proportional hazards model and Cox s partial likelihood
Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationarxiv:submit/ [math.st] 6 May 2011
A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of
More informationCURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University
CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationAppendix. Proof of Theorem 1. Define. [ ˆΛ 0(D) ˆΛ 0(t) ˆΛ (t) ˆΛ. (0) t. X 0 n(t) = D t. and. 0(t) ˆΛ 0(0) g(t(d t)), 0 < t < D, t.
Appendix Proof of Theorem. Define [ ˆΛ (D) X n (t) = ˆΛ (t) D t ˆΛ (t) ˆΛ () g(t(d t)), t < t < D X n(t) = [ ˆΛ (D) ˆΛ (t) D t ˆΛ (t) ˆΛ () g(t(d t)), < t < D, t where ˆΛ (t) = log[exp( ˆΛ(t)) + ˆp/ˆp,
More informationStatistics 581, Problem Set 8 Solutions Wellner; 11/22/2018
Statistics 581, Problem Set 8 Solutions Wellner; 11//018 1. (a) Show that if θ n = cn 1/ and T n is the Hodges super-efficient estimator discussed in class, then the sequence n(t n θ n )} is uniformly
More informationSTAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where
STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht
More informationST495: Survival Analysis: Maximum likelihood
ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping
More informationTied survival times; estimation of survival probabilities
Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation
More informationTwo Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates
Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Jon A. Wellner 1 and Ying Zhang 2 December 1, 2006 Abstract We consider estimation in a particular semiparametric
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationChapter 2 Inference on Mean Residual Life-Overview
Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationFair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths
Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More information1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models
Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following
More informationLikelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona
Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationA Least-Squares Approach to Consistent Information Estimation in Semiparametric Models
A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH
FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter
More informationTime-to-Event Modeling and Statistical Analysis
PeñaTalkGT) p. An Appetizer The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life,
More information