A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS

Size: px
Start display at page:

Download "A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS"

Transcription

1 J. Japan Statist. Soc. Vol. 34 No A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS Masayuki Henmi* This paper is concerned with parameter estimation in the presence of nuisance parameters. Usually, an estimator with known nuisance parameters is better than that with unknown nuisance parameters in reference to the asymptotic variance. However, it has been noted that the opposite can occur in some situations. In this paper we elucidate when and how this phenomenon occurs using the orthogonal decomposition of estimating functions. Most of the examples of this phenomenon are found in the case of semiparametric models, but this phenomenon can also occur in parametric models. As an example, we consider the estimation of the dispersion parameter in a generalized linear model. Key words and phrases: Asymptotic variance, estimating function, nuisance parameter, optimality, orthogonal decomposition, semiparametric model. 1. Introduction In a statistical model with a number of parameters, only a portion of the parameters are often of interest. The rest are nuisance parameters. Let M = {p(x; β,α)} be a parametric model whose elements are specified by a vector of parameters of interest β and a vector of nuisance parameters α. Then it is well known that under some regularity conditions the following inequality holds, (1.1) Var A ( β) Var A ( ˆβ), where β and ˆβ are the maximum likelihood estimators of β with known and unknown α respectively. Var A denotes the asymptotic covariance matrix of an estimator. For two symmetric matrices A and B, A B indicates that B A is a positive semi-definite matrix. However, inequality (1.1)is not always observed if we do not use the maximum likelihood method. Let M = {p(x; β,α,k)} be a semiparametric model with an infinite-dimensional nuisance parameter k as well as a vector of parameters of interest β and a vector of nuisance parameters α. Then, in a certain special case, we can observe the inequality opposite to (1.1), Var A ( β) Var A ( ˆβ), when β is estimated by an estimating function depending on α. Here, β and ˆβ are estimators of β when α is known and when α is unknown and estimated respectively. In other words, the estimator with unknown nuisance parameters is better Received November 3, Revised February 22, Accepted March 24, *Department of Statistical Science, the Graduate University for Advanced Studies, Minamiazabu, Minato-ku, Tokyo , Japan.

2 76 MASAYUKI HENMI than that with known ones with respect to the asymptotic variance. We call this unusual phenomenon the inverse phenomenon of asymptotic variances. For example, Robins et al. (1992)proposed a semiparametric model for causal inference and pointed out that this phenomenon can occur in their model. Moreover, these kinds of phenomena have also been noted in some other situations (Robins et al. (1994), Lawless et al. (1999)). See also Fourdrinier and Strawderman (1996) for shrinkage estimation. The aim of this paper is to explore the structure of the inverse phenomenon of asymptotic variances systematically by examining estimating functions. Specifically, we focus on the orthogonal decomposition of estimating functions. This decomposition is obtained by decomposing an estimating function to the component in the space of optimal estimating functions and the component in its orthogonal complement. Here, optimal estimating functions mean that the estimators given by them have the minimum asymptotic variance of all estimating functions. The inverse phenomenon of asymptotic variances can occur when an estimating function for parameters of interest with known nuisance parameters is not optimal. Considering the orthogonal decomposition of estimating functions helps us elucidate how estimating nuisance parameters improves the asymptotic variance of estimators for parameters of interest. This paper is organized as follows. In Section 2 we introduce the semiparametric model proposed by Robins et al. (1992)as an illustrative example. In Section 3 we describe the orthogonal decomposition of estimating functions for semiparametric models. Section 4 examines the structure of the inverse phenomenon using orthogonal decomposition. In Section 5 the parametric case is considered. The inverse phenomenon of asymptotic variances can also occur in parametric models if the maximum likelihood estimation method is not used. Its structure is essentially the same as in the semiparametric case. As an example we consider estimation of the dispersion parameter in a generalized linear model. Finally, in Section 6 we give some concluding remarks. 2. Illustrative example In this section, we give an illustrative example of the inverse phenomenon of asymptotic variances. We would like to estimate the causal effect of an exposure or treatment on an outcome of interest. In this case, as is widely known, if we ignore the effect of confounding factors that both covary with the exposure or treatment and are independent predictors of the outcome, the estimate of the causal effect is biased. Let Y,S and X =(X 2,...,X K )be respectively a continuous outcome variable of interest, an indicator of exposure which takes the value of 1 when the subject is exposed and 0 otherwise, and a vector of variables of confounding factors. The following model proposed by Robins et al. (1992)is a semiparametric regression model to estimate the causal effect by adjusting for confounding factors, (2.1) Y = βs + h(x)+ɛ, Eɛ S, X] =0,

3 (2.2) A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 77 ( ) K exp α 1 + α k X k k=2 P(S =1 X) = ( ), K 1 + exp α 1 + α k X k k=2 where h(x)is an unknown real-valued function of X, and α =(α 1,α 2,...,α K ) is an unknown vector of nuisance parameters. The parameter β represents the average causal effect of an exposure or treatment on the outcome when a certain condition is satisfied. However, it has nothing to do with the estimation of β, so we omit it here (see, Robins et al. (1992)). Next, we let {(Y i,s i,x i )} n i=1 be a random sample, that is, a set of independent and identically distributed random vectors under the above model. Robins et al. (1992)also proposed an estimating equation for β as follows, (2.3) n U(Y i,s i,x i,β,ˆα) =0, i=1 where ˆα is the maximum likelihood estimator of α from the logistic regression model (2.2)and U(y, s, x, β, α) ={s r(x; α)}(y βs), r(x; α) = ( ) K exp α 1 + α k x k k=2 ( ). K 1 + exp α 1 + α k x k k=2 When the model is correct, the estimator ˆβ of β, which is the solution of the estimating equation (2.3), is consistent and asymptotically normal under some regularity conditions. In addition its asymptotic variance is calculated as (2.4) Var A ( ˆβ) =Var A ( β) (Q 1 P )J 1 (Q 1 P ) T, where β is the estimator of β with the true value α 0 of α treated as known, which is the solution of (2.3)when one replaces ˆα with α 0 and ] ] U U P =E α (Y,S,X,β,α), Q =E β (Y,S,X,β,α) J =E M(S, X, α)m(s, X, α) T],M(s, x, α) = α log r(x; α) s {1 r(x; α)} 1 s]. For a matrix A, A T denotes the transpose of A. Then, we find that the following inequality holds, (2.5) Var A ( ˆβ) Var A ( β),

4 78 MASAYUKI HENMI since J is an positive definite matrix in equation (2.4). The equality holds if, and only if P = 0, that is, Eh(X) r α (X; α)] = 0. One might feel that this is strange. Inequality (2.5)implies that a more precise estimate of β may be generated by estimating the nuisance parameter α than by using the true value of α even if the latter were known. This phenomenon was pointed out by Robins et al. (1992). They emphasized that this result depends on the fact that ˆα is an efficient estimator of α. In the following sections, we examine the structure of the inverse phenomenon of asymptotic variances using the orthogonal decomposition of estimating functions. It will be also made clearer what role the fact that ˆα is an efficient estimator of α plays in the inverse phenomenon. 3. The orthogonal decomposition of estimating functions In this section we describe the orthogonal decomposition of estimating functions for semiparametric models, which is the key notion to understand the structure of the phenomenon mentioned above from our point of view. Let M = {p(x; θ, k)} be a semiparametric statistical model, that is, a family of probability density functions with respect to a common dominating measure µ(dx), whose element is specified by a finite-dimensional parameter θ =(θ 1,...,θ m ) T and an infinite-dimensional parameter k, typically lying in a space of functions. Here, θ contains a parameter of interest and k is a nuisance parameter. Let u(x, θ) =(u 1 (x, θ),...,u m (x, θ)) T be a vector-valued smooth function of θ, not depending on k, and of the same dimension as θ. This function is called an estimating function for θ when it satisfies the following conditions (Godambe (1991, p. 13)), (3.1) (3.2) E θ,k u(x, θ) ]=0, E θ,k u(x, θ) 2 ] <, ] u det E θ,k (x, θ) 0 θ for all θ and k, where E θ,k denotes the expectation with respect to the distribution p(x; θ, k), det denotes the determinant of a matrix, and is the squared norm of vectors. Moreover, we assume that u(x, θ)p(x; θ, k)µ(dx)is differentiable with respect to θ and that differentiation and integration are interchangeable. When an estimating function u(x, θ)exists, we have an estimator ˆθ of θ as the solution of the following estimating equation: (3.3) n u(x i,θ)=0, i=1 where x 1,...,x n are n independent and identically distributed observations. The estimator ˆθ is often called an M-estimator. Under some regularity conditions, it is consistent and asymptotically normally distributed with the asymptotic covariance matrix, (3.4) Var A (ˆθ) =W 1 VW T,

5 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 79 where V =E θ,k u(x, θ)u(x, θ) T ] and W =E θ,k ( u/ θ)(x, θ)]. Now, under the above setting we describe the orthogonal decomposition of estimating functions. Let us consider the set of random variables defined by (3.5) H θ,k = { a(x) E θ,k a(x) ]=0, E θ,k a(x) 2 ] < }. This is a Hilbert space with the inner product a(x),b(x) θ,k =E θ,k a(x)b(x)] for any two random variables a(x),b(x) H θ,k. Then, condition (3.1)for estimating functions can be represented as (3.6) u i (x, θ) H θ for all i and θ, where H θ denotes the intersection of H θ,k over all k. We assume that all components of the score function s(x, θ, k)for θ belong to H θ,k and let s I (x, θ, k)be the vector comprised by the orthogonal projections of all components of s(x, θ, k) onto H θ, which is the closure of H θ with respect to the topology of H θ,k. Then, the space H θ can be decomposed as (3.7) H θ = F I θ,k FA θ,k, where F I θ,k denotes the linear space spanned by all components of si (x, θ, k)and F A θ,k denotes the orthogonal complement of F I θ,k in H θ. We call the vectorvalued function s I (x, θ, k)the information score function for θ and assume that all components of s I (x, θ, k)are linearly independent. According to (3.6)and (3.7), any estimating function u(x, θ)is represented by the following form for all k: (3.8) u(x, θ) =T (θ, k)s I (x, θ, k)+a(x, θ, k), where T (θ, k)is an m m matrix and a(x, θ, k)is a vector-valued function whose components belong to Fθ,k A. Moreover, by condition (3.2)the orthogonal projections of all components of u(x, θ)onto Fθ,k I are linearly independent, and therefore T (θ, k)is non-singular. Representation (3.8)is what we call the orthogonal decomposition of estimating functions for semiparametric models in this paper. This kind of decomposition has often been treated in the literature on estimating functions. In particular, Amari and Kawanabe (1997)consider the characterization of the orthogonal decomposition (3.7)from an information geometrical point of view. The terminology of an information score function is due to them. In the decomposition (3.8), the parameter k is fixed by an arbitrary possible value, and for an estimating function u(x, θ), its different expressions are obtained by values of k. When in particular we set k = k 0, which is the value of k corresponding to the unknown underlying distribution in M that generates the data, the asymptotic covariance matrix of the estimator ˆθ as the solution of the estimating equation (3.3) is calculated as follows (Amari and Kawanabe (1997)), (3.9) Var A (ˆθ) = ( G I) 1 + ( TG I ) 1 G A ( TG I) T,

6 80 MASAYUKI HENMI where G I =E θ0,k 0 s I (x, θ 0,k 0 )s I (x, θ 0,k 0 ) T ],G A =E θ0,k 0 a(x, θ 0,k 0 )a(x, θ 0,k 0 ) T ], T = T (θ 0,k 0 ), and θ 0 denotes the true value of θ. In equation (3.9), G A is a positive semi-definite matrix. Hence, Var A (ˆθ) (G I ) 1 and the equality holds only when G A = 0. This implies that if s I (x, θ, k 0 )satisfies the conditions to be an estimating function, it is an optimal estimating function in the sense that the asymptotic covariance matrix of the estimator is minimum among all estimating functions. However, it should be noted that generally, s I (x, θ, k 0 )cannot be used since it usually depends on the unknown true value k 0 of k. According to the above discussion, the orthogonal decomposition of estimating functions represents how an estimating function fails to reach the optimal state. Then, we call the first and second terms of the right side in the orthogonal decomposition (3.8)the optimal and non-optimal parts of u(x, θ), respectively. 4. The inverse phenomenon of asymptotic variances In this section we examine the structure of the inverse phenomenon of asymptotic variances. The model in the example given in Section 2 is a semiparametric model with both finite and infinite-dimensional nuisance parameters. In fact, under (2.1)and (2.2), the joint probability density function of the observed variables Y,S and X can be written as follows: (4.1) p YSX (y, s, x; β, α, h, g, f) =g(y βs h(x) s, x)p S X (s x; α)f(x), where g(ɛ s, x)denotes the conditional density function of the error ɛ given S = s and X = x, p S X (s x; α)denotes the conditional probability function of S given X = x and is written as {r(x; α)} s {1 r(x; α)} 1 s from (2.2), and f(x)denotes the marginal density function of X. While the parameter β is of interest, α is a finite-dimensional nuisance parameter. The functions h, g and f play a role of infinite-dimensional nuisance parameters. The inverse phenomenon of asymptotic variances is the phenomenon in which the asymptotic variance of the estimator of β with unknown α is less than that with known α. As is shown in Section 2, this phenomenon can occur under the above model when α is estimated by the maximum likelihood method and β is estimated by the estimating function U(y, s, x, β, α), which depends on α. This implies that generally, the inverse phenomenon of asymptotic variances occurs under some special conditions. Let M = {p(x; θ, k)} be a semiparametric model with a finite-dimensional parameter θ = (β T,α T ) T and an infinite-dimensional nuisance parameter k. Here, β and α are parameters of interest and of nuisance respectively. Let u(x, θ) =(u β (x, θ) T,u α (x, θ) T ) T be an estimating function for θ. The two components u β (x, θ)and u α (x, θ)are marginal estimating functions for β and α, that is, estimating functions for β and α when α and β are fixed, respectively. The following theorem gives one sufficient condition for the inverse phenomenon to occur. Theorem 1. Assume that the semiparametric model M = {p(x; θ, k)} and

7 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 81 the estimating function u(x, θ) =(u β (x, θ) T,u α (x, θ) T ) T satisfy the conditions, (4.2) (4.3) (4.4) E θ,k sβ (x, θ, k)s α (x, θ, k) T] =0 s α (x, θ, k)does not depend on k, u α (x, θ) =s α (x, θ), ( θ, k), where s β (x, θ, k) and s α (x, θ, k) =s α (x, θ) are the score functions for β and α, respectively. Then, the following inequality holds : (4.5) Var A ( ˆβ) Var A ( β), where ˆβ is the estimator of β in the joint estimation of β and α by u(x, θ) while β is that in the single estimation of β by u β (x, θ) with known α. The equality holds if, and only if E θ,k u β (x, θ)s α (x, θ) T ]=0. The example in Section 2 fits this theorem as a special case, in which the score function for α depends neither on the infinite-dimensional parameters h, g and f nor on the parameter of interest β. The above theorem can be proved by direct calculation of the asymptotic covariance matrices of the estimators ˆβ and β. However, the reason of the inverse phenomenon of asymptotic variances is not sufficiently explained by direct calculation. Then, we consider the orthogonal decomposition of estimating functions described in Section 3. It leads us to clear understanding of the structure of the inverse phenomenon. The following is a proof of the above theorem using the orthogonal decomposition. Firstly, we note that under conditions (4.2)and (4.3)the following equations hold: (4.6) (4.7) E θ,k s I β (x, θ, k)s I α(x, θ, k) T] =0 s I α(x, θ, k) =s α (x, θ), ( θ, k), where s I β (x, θ, k)and si α(x, θ, k)are the information score functions for β and α, respectively. This is because the information score functions for β and α are respectively the orthogonal projections of the score functions for β and α onto the space H θ defined in Section 3 and because the score function for α belongs to H θ due to (4.3). By equations (4.6) and (4.7), the orthogonal decomposition of the marginal estimating function u β (x, θ)for β can be represented as follows: (4.8) u β (x, θ) =T β (θ, k 0 )s I β (x, θ, k 0)+T α (θ, k 0 )s α (x, θ)+a(x, θ, k 0 ), where k 0 denotes the true value of k. The first term in the right side of (4.8) is the optimal part of u β (x, θ)while the second and third terms compose the non-optimal part. Here, s α (x, θ)and a(x, θ, k 0 )are orthogonal. Now, we consider the estimation of θ = (β T,α T ) T by the estimating function u(x, θ) = (u β (x, θ) T,s α (x, θ) T ) T. It should be noted that the term of the score function for α is redundant in the decomposition (4.8)due to the existence of s α (x, θ)as

8 82 MASAYUKI HENMI a marginal estimating function for α. In other words, u(x, θ)is equivalent to the estimating function u (x, θ) =(u β (x, θ)t,s α (x, θ) T ) T, where (4.9) u β (x, θ) =T β(θ, k 0 )s I β (x, θ, k 0)+a(x, θ, k 0 ), in the sense that u(x, θ)and u (x, θ)give the same estimator. Here, u β (x, θ)is a marginal estimating function for β that usually depends on unknown k 0, and cannot be used in practice. It is hypothetical, but can be theoretically considered just like an information score function evaluated by unknown k 0. According to the orthogonality of s α (x, θ)and a(x, θ, k 0 ), and equations (4.6) and (4.7), u β (x, θ)is orthogonal to s α(x, θ). Then, by the following theorem, we find that the asymptotic covariance matrix of the estimator of β in the joint estimation of β and α by u (x, θ)coincides with that in the single estimation of β by u β (x, θ) with known α. Theorem 2 (Insensitivity Theorem). Let M = {p(x; θ, k)} be an arbitrary semiparametric model with a finite-dimensional parameter θ =(β T,α T ) T and an infinite-dimensional nuisance parameter k. Letw(x, θ) be an arbitrary estimating function for θ composed by marginal estimating functions w β (x, θ) for β and w α (x, θ) for α. If w β (x, θ) is orthogonal to the score function s α (x, θ, k) for α, that is, E θ,k wβ (x, θ)s α (x, θ, k) T] =0 ( θ, k), then the asymptotic covariance matrix of the estimator of β in the joint estimation of β and α by w(x, θ) coincides with that in the single estimation of β by w β (x, θ) with known α. This theorem was shown by Knudsen (1999)in the case of parametric models, but it also holds in the case of semiparametric models. From the above discussion the asymptotic covariance matrix of ˆβ, which is the estimator of β in the joint estimation of β and α by u(x, θ), coincides with that in the single estimation of β by u β (x, θ)with known α. Hence, by equation (3.9)and the orthogonal decomposition (4.9), the asymptotic covariance matrix of ˆβ is represented as (4.10) Var A ( ˆβ) = ( G I β) 1 + ( Tβ G I β) 1 G A ( T β G I β) T, where G I β = E θ 0,k 0 s I β (x, θ 0,k 0 )s I β (x, θ 0,k 0 ) T ], G A = E θ0,k 0 a(x, θ 0,k 0 )a(x, θ 0, k 0 ) T ],T β = T β (θ 0,k 0 ), and θ 0 is the true value of θ. On the other hand, according to the decomposition (4.8), the asymptotic covariance matrix of the estimator β with known α is represented as (4.11) Var A ( β) = ( G I β) 1 + ( Tβ G I β) 1 ( Tα G α T T α + G A)( T β G I β) T, where G α =E θ0,k 0 s α (x, θ 0 )s α (x, θ 0 ) T ] and T α = T α (θ 0,k 0 ). By comparing (4.10) and (4.11), we find that the following inequality holds: (4.12) Var A ( ˆβ) Var A ( β).

9 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 83 The equality holds only when T α = 0 because of the positive-definiteness of the matrix G α. This is equivalent to the condition E θ,k u β (x, θ)s α (x, θ) T ] = 0. Thus, Theorem 1 has been proved. In the above discussion, the key point is to consider the orthogonal decomposition (4.8)for u β (x, θ). Because of the orthogonality of the information score function s I β (x, θ, k)for β and the score function s α(x, θ)for α, the non-optimal part of u β (x, θ)has a component of s α (x, θ)unless u β (x, θ)and s α (x, θ)are orthogonal. Therefore, in the single estimation of β by u β (x, θ) with known α, there exists loss of asymptotic efficiency which comes from the component of s α (x, θ). However, by estimating β and α simultaneously with u(x, θ) = (u β (x, θ) T,s α (x, θ) T ) T, the component of s α (x, θ)vanishes and the asymptotic efficiency is improved. It should be noted that Insensitivity Theorem plays an important role here, that is, it converts the asymptotic efficiency in the joint estimation into that in the single estimation. It should be also noted that if the marginal estimating function for β is optimal, the inverse phenomenon does not occur. This holds true whether or not the model M and the estimating function u(x, θ)satisfy conditions (4.2), (4.3)and (4.4), because the lower bound of the asymptotic covariance matrix of ˆβ is not less than that of β. The inverse phenomenon of asymptotic variances indicates that the asymptotic efficiency of the estimator can be improved by estimating nuisance parameters under some special conditions in the situation where the optimal estimating function cannot be used. 5. Parametric case In the preceding sections we considered the semiparametric case, but the inverse phenomenon of asymptotic variances can also occur in the parametric case. The structure is essentially the same as in the semiparametric case. The discussion in Section 4 is also applicable to the parametric case if a small modification is made; that is, to remove the infinite-dimensional parameter k and to replace information scores with ordinary scores. For parametric models, the inverse phenomenon of asymptotic variances occurs in the following case: Let M = {p(x; θ)} be a parametric model with a vector of parameters θ, which is composed by two vectors of parameters, β of interest and α of nuisance. We assume that β and α are orthogonal, that is, (5.1) E θ sβ (x, θ)s α (x, θ) T] =0 ( θ), where s β (x, θ)and s α (x, θ)are the score functions for β and α, respectively. In this situation we consider the estimation of θ =(β T,α T ) T by an estimating function u(x, θ) =(u β (x, θ) T,s α (x, θ) T ) T, where u β (x, θ)is an arbitrary marginal estimating function for β. Then, the following inequality for asymptotic covariance matrices holds: (5.2) Var A ( ˆβ) Var A ( β), where ˆβ is the estimator of β in the joint estimation of β and α by u(x, θ)and β is that given by u β (x, θ)when the true value of α is known. The equality holds

10 84 MASAYUKI HENMI if, and only if the marginal estimating function u β (x, θ)and the score function s α (x, θ)are orthogonal. When condition (5.1)holds, it is well known that the equality holds in (5.2)if u β (x, θ)coincides with s β (x, θ). However, if not so, the asymptotic covariance matrix of the estimator of β in the case of estimating α can be less than in the case of using the true value of α. Now, we give one example of the inverse phenomenon of asymptotic variances in the parametric case. Let Y and X be a response variable of interest and a vector of some covariates, respectively. We assume that they are both random variables. A generalized linear model for the conditional distribution of Y given X = x is written as follows: { } yθ b(θ) (5.3) p(y x; β,φ)= exp + c(y, φ), φ (5.4) g(µ) =x T β, where θ, µ and φ denote a natural, a mean and a dispersion parameter, respectively, and g is a link function. The vector of regression parameters β is usually the object of inference. Here, however we assume that the dispersion parameter φ is of our interest and treat β as a vector of nuisance parameters. When we estimate β based on an observed random sample, the maximum likelihood method is usually used. However, estimation of the dispersion parameter φ is not always the same. For example, the moment method is often used based on some reasons (see, for instance, McCullagh and Nelder (1989, p. 295)). When the maximum likelihood method is applied for β and the moment method for φ, the corresponding estimating function is as follows: (5.5) u(y, x, φ, β) = ( u φ (y, x, φ, β), s β (y, x, φ, β) T) T ) T (y µ)2 y µ = (φ, V(µ) φv(µ)g (µ) xt, where V(µ)denotes a variance function. The function s β (y, x, φ, β)is the score function for β and in addition, the two parameters φ and β are orthogonal. Hence, this is a situation in which the inverse phenomenon of asymptotic variances can occur; that is, we observe (5.6) Var A ( ˆφ) Var A ( φ), where ˆφ is the estimator of φ in the case of estimating β and φ is that in the case of using the true value of β. The condition for the equality to hold is as follows: ] V (µ) (5.7) E V(µ)g (µ) XT =0. If the model (5.3)is a normal distribution, this condition is satisfied because V(µ)= 1. However, for instance, in the case of a gamma distribution, this condition is not always satisfied and the inverse phenomenon of asymptotic variances

11 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 85 can occur. Since the dispersion φ is usually a nuisance parameter, the efficiency of the estimator of φ might be of little concern in practice. However, the fact that inequality (5.6)holds with respect to the estimation of φ is of interest. 6. Concluding remarks In this paper we have examined the structure of the inverse phenomenon of asymptotic variances using the orthogonal decomposition of estimating functions. If an optimal estimating function can be used as a marginal estimating function for a parameter of interest β, the asymptotic variance of the estimator with unknown nuisance parameter α cannot be less than that with known α. This is reasonable and compatible with our intuition. However, it is not always true unless the marginal estimating function for β is optimal. In fact, as is discussed in Section 4, when the marginal estimating function for β has a component of the score function for α in the non-optimal part of its orthogonal decomposition, the asymptotic variance of the estimator of β decreases by estimating α with the score function for α rather than by using the true value of α. The inverse phenomenon of asymptotic variances seems to be strange at least intuitively. However, the discussion on the orthogonal decomposition of estimating functions makes it clear how this inverse phenomenon occurs. It should be noted that the inverse phenomenon of asymptotic variances can occur in both parametric and semiparametric models in principle. The inverse phenomenon comes from the structure of estimating functions. However, it seems that this phenomenon has fewer opportunities to occur in the parametric case than in the semiparametric case. This is because, in the parametric case, the optimal estimating function can be used under moderate regularity conditions, and other estimation methods are not used unless some special reason exists. On the other hand, in the semiparametric case, since the optimal estimating function generally depends on the unknown true value of infinite-dimensional nuisance parameters, it usually cannot be used even if it is possible to obtain its functional form explicitly. In the example given in Section 2, if we assume that the error ɛ and (S, X)are independent, the information score function for the parameter of interest β can be calculated explicitly and depends on the unknown regression function h and the unknown marginal density function g of ɛ. The marginal estimating function for β which is used there is obtained from the information score function by substituting the zero function and the normal density function with the mean zero and the variance constant for h and g, respectively. Therefore, if h and g actually coincide with the above functions, the inverse phenomenon of asymptotic variances never occurs. However, if not, and especially if h is not a zero function, the inverse phenomenon does occur. Of course, even in the semiparametric case, the results can change if we estimate the infinite-dimensional nuisance parameters by some nonparametric approach. For the model in Section 2, no estimation of the regression function h is intended because it is difficult for epidemiological reasons. Instead of estimating h, model (2.2)is considered (see, Robins et al. (1992)).

12 86 MASAYUKI HENMI The inverse phenomenon of asymptotic variances does not seem to occur in so many situations since the conditions in Theorem 1 are rather restrictive. However, this phenomenon naturally occurs in some situations as well as the example by Robins et al. (1992), for instance, in the problems of missing-data (Robins et al. (1994), Lawless et al. (1999)), measurement error (Carroll et al. (1995)) and survey sampling (Rosenbaum (1987)). The inverse phenomenon of asymptotic variances gives great impact to the statistical community since it defies the common sense of statistical inference. We believe that our viewpoint helps us comprehend this phenomenon. Acknowledgements The author is grateful to Professor Eguchi of the Institute of Statistical Mathematics for his valuable advice and encouragement. The author also would like to thank the referees for their helpful comments which led to an improved manuscript. References Amari, S. and Kawanabe, M. (1997). Information geometry of estimating functions in semiparametric statistical models, Bernoulli, 3, Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995). Measurement Error in Nonlinear Models, Chapman and Hall, London. Fourdrinier, D. and Strawderman, W. E. (1996). Aparadox concerning shrinkage estimators: should a known scale parameter be replaced by an estimated value in the shrinkage factor? J. Multivar. Anal., 59, Godambe, V. P. (ed.) (1991). Estimating Functions, Oxford University Press, New York. Knudsen, S. J. (1999). Estimating Functions and Separate Inference, Monographs Vol. 1, Dept. of Statistics and Demography, University of Southern Denmark. Lawless, J. F., Kalbfleisch, J. D. and Wild, C. J. (1999). Semiparametric methods for responseselective and missing data problems in regression, J. R. Statist. Soc. B, 61, McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, Chapman and Hall, London. Robins, J. M., Mark, S. D. and Newey, W. K. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders, Biometrics, 48, Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed, J. Am. Statist. Ass., 89, Rosenbaum, P. R. (1987). Model-based direct adjustment, J. Am. Statist. Ass., 82,

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Bootstrap prediction and Bayesian prediction under misspecified models

Bootstrap prediction and Bayesian prediction under misspecified models Bernoulli 11(4), 2005, 747 758 Bootstrap prediction and Bayesian prediction under misspecified models TADAYOSHI FUSHIKI Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569,

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

arxiv: v2 [stat.me] 8 Jun 2016

arxiv: v2 [stat.me] 8 Jun 2016 Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

i=1 h n (ˆθ n ) = 0. (2)

i=1 h n (ˆθ n ) = 0. (2) Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Take-home final examination February 1 st -February 8 th, 019 Instructions You do not need to edit the solutions Just make sure the handwriting is legible The final solutions should

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term Lecture 7 ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

To Estimate or Not to Estimate?

To Estimate or Not to Estimate? To Estimate or Not to Estimate? Benjamin Kedem and Shihua Wen In linear regression there are examples where some of the coefficients are known but are estimated anyway for various reasons not least of

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

2 Metric Spaces Definitions Exotic Examples... 3

2 Metric Spaces Definitions Exotic Examples... 3 Contents 1 Vector Spaces and Norms 1 2 Metric Spaces 2 2.1 Definitions.......................................... 2 2.2 Exotic Examples...................................... 3 3 Topologies 4 3.1 Open Sets..........................................

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Estimation theory and information geometry based on denoising

Estimation theory and information geometry based on denoising Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

March Algebra 2 Question 1. March Algebra 2 Question 1

March Algebra 2 Question 1. March Algebra 2 Question 1 March Algebra 2 Question 1 If the statement is always true for the domain, assign that part a 3. If it is sometimes true, assign it a 2. If it is never true, assign it a 1. Your answer for this question

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

Birkbeck Working Papers in Economics & Finance

Birkbeck Working Papers in Economics & Finance ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Lecture 6: Geometry of OLS Estimation of Linear Regession

Lecture 6: Geometry of OLS Estimation of Linear Regession Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random Paul J. Rathouz University of Chicago Abstract. We consider the problem of attrition under a logistic

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs Questions and Answers on Unit Roots, Cointegration, VARs and VECMs L. Magee Winter, 2012 1. Let ɛ t, t = 1,..., T be a series of independent draws from a N[0,1] distribution. Let w t, t = 1,..., T, be

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information