A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS
|
|
- Merryl Scarlett Sparks
- 6 years ago
- Views:
Transcription
1 J. Japan Statist. Soc. Vol. 34 No A PARADOXICAL EFFECT OF NUISANCE PARAMETERS ON EFFICIENCY OF ESTIMATORS Masayuki Henmi* This paper is concerned with parameter estimation in the presence of nuisance parameters. Usually, an estimator with known nuisance parameters is better than that with unknown nuisance parameters in reference to the asymptotic variance. However, it has been noted that the opposite can occur in some situations. In this paper we elucidate when and how this phenomenon occurs using the orthogonal decomposition of estimating functions. Most of the examples of this phenomenon are found in the case of semiparametric models, but this phenomenon can also occur in parametric models. As an example, we consider the estimation of the dispersion parameter in a generalized linear model. Key words and phrases: Asymptotic variance, estimating function, nuisance parameter, optimality, orthogonal decomposition, semiparametric model. 1. Introduction In a statistical model with a number of parameters, only a portion of the parameters are often of interest. The rest are nuisance parameters. Let M = {p(x; β,α)} be a parametric model whose elements are specified by a vector of parameters of interest β and a vector of nuisance parameters α. Then it is well known that under some regularity conditions the following inequality holds, (1.1) Var A ( β) Var A ( ˆβ), where β and ˆβ are the maximum likelihood estimators of β with known and unknown α respectively. Var A denotes the asymptotic covariance matrix of an estimator. For two symmetric matrices A and B, A B indicates that B A is a positive semi-definite matrix. However, inequality (1.1)is not always observed if we do not use the maximum likelihood method. Let M = {p(x; β,α,k)} be a semiparametric model with an infinite-dimensional nuisance parameter k as well as a vector of parameters of interest β and a vector of nuisance parameters α. Then, in a certain special case, we can observe the inequality opposite to (1.1), Var A ( β) Var A ( ˆβ), when β is estimated by an estimating function depending on α. Here, β and ˆβ are estimators of β when α is known and when α is unknown and estimated respectively. In other words, the estimator with unknown nuisance parameters is better Received November 3, Revised February 22, Accepted March 24, *Department of Statistical Science, the Graduate University for Advanced Studies, Minamiazabu, Minato-ku, Tokyo , Japan.
2 76 MASAYUKI HENMI than that with known ones with respect to the asymptotic variance. We call this unusual phenomenon the inverse phenomenon of asymptotic variances. For example, Robins et al. (1992)proposed a semiparametric model for causal inference and pointed out that this phenomenon can occur in their model. Moreover, these kinds of phenomena have also been noted in some other situations (Robins et al. (1994), Lawless et al. (1999)). See also Fourdrinier and Strawderman (1996) for shrinkage estimation. The aim of this paper is to explore the structure of the inverse phenomenon of asymptotic variances systematically by examining estimating functions. Specifically, we focus on the orthogonal decomposition of estimating functions. This decomposition is obtained by decomposing an estimating function to the component in the space of optimal estimating functions and the component in its orthogonal complement. Here, optimal estimating functions mean that the estimators given by them have the minimum asymptotic variance of all estimating functions. The inverse phenomenon of asymptotic variances can occur when an estimating function for parameters of interest with known nuisance parameters is not optimal. Considering the orthogonal decomposition of estimating functions helps us elucidate how estimating nuisance parameters improves the asymptotic variance of estimators for parameters of interest. This paper is organized as follows. In Section 2 we introduce the semiparametric model proposed by Robins et al. (1992)as an illustrative example. In Section 3 we describe the orthogonal decomposition of estimating functions for semiparametric models. Section 4 examines the structure of the inverse phenomenon using orthogonal decomposition. In Section 5 the parametric case is considered. The inverse phenomenon of asymptotic variances can also occur in parametric models if the maximum likelihood estimation method is not used. Its structure is essentially the same as in the semiparametric case. As an example we consider estimation of the dispersion parameter in a generalized linear model. Finally, in Section 6 we give some concluding remarks. 2. Illustrative example In this section, we give an illustrative example of the inverse phenomenon of asymptotic variances. We would like to estimate the causal effect of an exposure or treatment on an outcome of interest. In this case, as is widely known, if we ignore the effect of confounding factors that both covary with the exposure or treatment and are independent predictors of the outcome, the estimate of the causal effect is biased. Let Y,S and X =(X 2,...,X K )be respectively a continuous outcome variable of interest, an indicator of exposure which takes the value of 1 when the subject is exposed and 0 otherwise, and a vector of variables of confounding factors. The following model proposed by Robins et al. (1992)is a semiparametric regression model to estimate the causal effect by adjusting for confounding factors, (2.1) Y = βs + h(x)+ɛ, Eɛ S, X] =0,
3 (2.2) A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 77 ( ) K exp α 1 + α k X k k=2 P(S =1 X) = ( ), K 1 + exp α 1 + α k X k k=2 where h(x)is an unknown real-valued function of X, and α =(α 1,α 2,...,α K ) is an unknown vector of nuisance parameters. The parameter β represents the average causal effect of an exposure or treatment on the outcome when a certain condition is satisfied. However, it has nothing to do with the estimation of β, so we omit it here (see, Robins et al. (1992)). Next, we let {(Y i,s i,x i )} n i=1 be a random sample, that is, a set of independent and identically distributed random vectors under the above model. Robins et al. (1992)also proposed an estimating equation for β as follows, (2.3) n U(Y i,s i,x i,β,ˆα) =0, i=1 where ˆα is the maximum likelihood estimator of α from the logistic regression model (2.2)and U(y, s, x, β, α) ={s r(x; α)}(y βs), r(x; α) = ( ) K exp α 1 + α k x k k=2 ( ). K 1 + exp α 1 + α k x k k=2 When the model is correct, the estimator ˆβ of β, which is the solution of the estimating equation (2.3), is consistent and asymptotically normal under some regularity conditions. In addition its asymptotic variance is calculated as (2.4) Var A ( ˆβ) =Var A ( β) (Q 1 P )J 1 (Q 1 P ) T, where β is the estimator of β with the true value α 0 of α treated as known, which is the solution of (2.3)when one replaces ˆα with α 0 and ] ] U U P =E α (Y,S,X,β,α), Q =E β (Y,S,X,β,α) J =E M(S, X, α)m(s, X, α) T],M(s, x, α) = α log r(x; α) s {1 r(x; α)} 1 s]. For a matrix A, A T denotes the transpose of A. Then, we find that the following inequality holds, (2.5) Var A ( ˆβ) Var A ( β),
4 78 MASAYUKI HENMI since J is an positive definite matrix in equation (2.4). The equality holds if, and only if P = 0, that is, Eh(X) r α (X; α)] = 0. One might feel that this is strange. Inequality (2.5)implies that a more precise estimate of β may be generated by estimating the nuisance parameter α than by using the true value of α even if the latter were known. This phenomenon was pointed out by Robins et al. (1992). They emphasized that this result depends on the fact that ˆα is an efficient estimator of α. In the following sections, we examine the structure of the inverse phenomenon of asymptotic variances using the orthogonal decomposition of estimating functions. It will be also made clearer what role the fact that ˆα is an efficient estimator of α plays in the inverse phenomenon. 3. The orthogonal decomposition of estimating functions In this section we describe the orthogonal decomposition of estimating functions for semiparametric models, which is the key notion to understand the structure of the phenomenon mentioned above from our point of view. Let M = {p(x; θ, k)} be a semiparametric statistical model, that is, a family of probability density functions with respect to a common dominating measure µ(dx), whose element is specified by a finite-dimensional parameter θ =(θ 1,...,θ m ) T and an infinite-dimensional parameter k, typically lying in a space of functions. Here, θ contains a parameter of interest and k is a nuisance parameter. Let u(x, θ) =(u 1 (x, θ),...,u m (x, θ)) T be a vector-valued smooth function of θ, not depending on k, and of the same dimension as θ. This function is called an estimating function for θ when it satisfies the following conditions (Godambe (1991, p. 13)), (3.1) (3.2) E θ,k u(x, θ) ]=0, E θ,k u(x, θ) 2 ] <, ] u det E θ,k (x, θ) 0 θ for all θ and k, where E θ,k denotes the expectation with respect to the distribution p(x; θ, k), det denotes the determinant of a matrix, and is the squared norm of vectors. Moreover, we assume that u(x, θ)p(x; θ, k)µ(dx)is differentiable with respect to θ and that differentiation and integration are interchangeable. When an estimating function u(x, θ)exists, we have an estimator ˆθ of θ as the solution of the following estimating equation: (3.3) n u(x i,θ)=0, i=1 where x 1,...,x n are n independent and identically distributed observations. The estimator ˆθ is often called an M-estimator. Under some regularity conditions, it is consistent and asymptotically normally distributed with the asymptotic covariance matrix, (3.4) Var A (ˆθ) =W 1 VW T,
5 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 79 where V =E θ,k u(x, θ)u(x, θ) T ] and W =E θ,k ( u/ θ)(x, θ)]. Now, under the above setting we describe the orthogonal decomposition of estimating functions. Let us consider the set of random variables defined by (3.5) H θ,k = { a(x) E θ,k a(x) ]=0, E θ,k a(x) 2 ] < }. This is a Hilbert space with the inner product a(x),b(x) θ,k =E θ,k a(x)b(x)] for any two random variables a(x),b(x) H θ,k. Then, condition (3.1)for estimating functions can be represented as (3.6) u i (x, θ) H θ for all i and θ, where H θ denotes the intersection of H θ,k over all k. We assume that all components of the score function s(x, θ, k)for θ belong to H θ,k and let s I (x, θ, k)be the vector comprised by the orthogonal projections of all components of s(x, θ, k) onto H θ, which is the closure of H θ with respect to the topology of H θ,k. Then, the space H θ can be decomposed as (3.7) H θ = F I θ,k FA θ,k, where F I θ,k denotes the linear space spanned by all components of si (x, θ, k)and F A θ,k denotes the orthogonal complement of F I θ,k in H θ. We call the vectorvalued function s I (x, θ, k)the information score function for θ and assume that all components of s I (x, θ, k)are linearly independent. According to (3.6)and (3.7), any estimating function u(x, θ)is represented by the following form for all k: (3.8) u(x, θ) =T (θ, k)s I (x, θ, k)+a(x, θ, k), where T (θ, k)is an m m matrix and a(x, θ, k)is a vector-valued function whose components belong to Fθ,k A. Moreover, by condition (3.2)the orthogonal projections of all components of u(x, θ)onto Fθ,k I are linearly independent, and therefore T (θ, k)is non-singular. Representation (3.8)is what we call the orthogonal decomposition of estimating functions for semiparametric models in this paper. This kind of decomposition has often been treated in the literature on estimating functions. In particular, Amari and Kawanabe (1997)consider the characterization of the orthogonal decomposition (3.7)from an information geometrical point of view. The terminology of an information score function is due to them. In the decomposition (3.8), the parameter k is fixed by an arbitrary possible value, and for an estimating function u(x, θ), its different expressions are obtained by values of k. When in particular we set k = k 0, which is the value of k corresponding to the unknown underlying distribution in M that generates the data, the asymptotic covariance matrix of the estimator ˆθ as the solution of the estimating equation (3.3) is calculated as follows (Amari and Kawanabe (1997)), (3.9) Var A (ˆθ) = ( G I) 1 + ( TG I ) 1 G A ( TG I) T,
6 80 MASAYUKI HENMI where G I =E θ0,k 0 s I (x, θ 0,k 0 )s I (x, θ 0,k 0 ) T ],G A =E θ0,k 0 a(x, θ 0,k 0 )a(x, θ 0,k 0 ) T ], T = T (θ 0,k 0 ), and θ 0 denotes the true value of θ. In equation (3.9), G A is a positive semi-definite matrix. Hence, Var A (ˆθ) (G I ) 1 and the equality holds only when G A = 0. This implies that if s I (x, θ, k 0 )satisfies the conditions to be an estimating function, it is an optimal estimating function in the sense that the asymptotic covariance matrix of the estimator is minimum among all estimating functions. However, it should be noted that generally, s I (x, θ, k 0 )cannot be used since it usually depends on the unknown true value k 0 of k. According to the above discussion, the orthogonal decomposition of estimating functions represents how an estimating function fails to reach the optimal state. Then, we call the first and second terms of the right side in the orthogonal decomposition (3.8)the optimal and non-optimal parts of u(x, θ), respectively. 4. The inverse phenomenon of asymptotic variances In this section we examine the structure of the inverse phenomenon of asymptotic variances. The model in the example given in Section 2 is a semiparametric model with both finite and infinite-dimensional nuisance parameters. In fact, under (2.1)and (2.2), the joint probability density function of the observed variables Y,S and X can be written as follows: (4.1) p YSX (y, s, x; β, α, h, g, f) =g(y βs h(x) s, x)p S X (s x; α)f(x), where g(ɛ s, x)denotes the conditional density function of the error ɛ given S = s and X = x, p S X (s x; α)denotes the conditional probability function of S given X = x and is written as {r(x; α)} s {1 r(x; α)} 1 s from (2.2), and f(x)denotes the marginal density function of X. While the parameter β is of interest, α is a finite-dimensional nuisance parameter. The functions h, g and f play a role of infinite-dimensional nuisance parameters. The inverse phenomenon of asymptotic variances is the phenomenon in which the asymptotic variance of the estimator of β with unknown α is less than that with known α. As is shown in Section 2, this phenomenon can occur under the above model when α is estimated by the maximum likelihood method and β is estimated by the estimating function U(y, s, x, β, α), which depends on α. This implies that generally, the inverse phenomenon of asymptotic variances occurs under some special conditions. Let M = {p(x; θ, k)} be a semiparametric model with a finite-dimensional parameter θ = (β T,α T ) T and an infinite-dimensional nuisance parameter k. Here, β and α are parameters of interest and of nuisance respectively. Let u(x, θ) =(u β (x, θ) T,u α (x, θ) T ) T be an estimating function for θ. The two components u β (x, θ)and u α (x, θ)are marginal estimating functions for β and α, that is, estimating functions for β and α when α and β are fixed, respectively. The following theorem gives one sufficient condition for the inverse phenomenon to occur. Theorem 1. Assume that the semiparametric model M = {p(x; θ, k)} and
7 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 81 the estimating function u(x, θ) =(u β (x, θ) T,u α (x, θ) T ) T satisfy the conditions, (4.2) (4.3) (4.4) E θ,k sβ (x, θ, k)s α (x, θ, k) T] =0 s α (x, θ, k)does not depend on k, u α (x, θ) =s α (x, θ), ( θ, k), where s β (x, θ, k) and s α (x, θ, k) =s α (x, θ) are the score functions for β and α, respectively. Then, the following inequality holds : (4.5) Var A ( ˆβ) Var A ( β), where ˆβ is the estimator of β in the joint estimation of β and α by u(x, θ) while β is that in the single estimation of β by u β (x, θ) with known α. The equality holds if, and only if E θ,k u β (x, θ)s α (x, θ) T ]=0. The example in Section 2 fits this theorem as a special case, in which the score function for α depends neither on the infinite-dimensional parameters h, g and f nor on the parameter of interest β. The above theorem can be proved by direct calculation of the asymptotic covariance matrices of the estimators ˆβ and β. However, the reason of the inverse phenomenon of asymptotic variances is not sufficiently explained by direct calculation. Then, we consider the orthogonal decomposition of estimating functions described in Section 3. It leads us to clear understanding of the structure of the inverse phenomenon. The following is a proof of the above theorem using the orthogonal decomposition. Firstly, we note that under conditions (4.2)and (4.3)the following equations hold: (4.6) (4.7) E θ,k s I β (x, θ, k)s I α(x, θ, k) T] =0 s I α(x, θ, k) =s α (x, θ), ( θ, k), where s I β (x, θ, k)and si α(x, θ, k)are the information score functions for β and α, respectively. This is because the information score functions for β and α are respectively the orthogonal projections of the score functions for β and α onto the space H θ defined in Section 3 and because the score function for α belongs to H θ due to (4.3). By equations (4.6) and (4.7), the orthogonal decomposition of the marginal estimating function u β (x, θ)for β can be represented as follows: (4.8) u β (x, θ) =T β (θ, k 0 )s I β (x, θ, k 0)+T α (θ, k 0 )s α (x, θ)+a(x, θ, k 0 ), where k 0 denotes the true value of k. The first term in the right side of (4.8) is the optimal part of u β (x, θ)while the second and third terms compose the non-optimal part. Here, s α (x, θ)and a(x, θ, k 0 )are orthogonal. Now, we consider the estimation of θ = (β T,α T ) T by the estimating function u(x, θ) = (u β (x, θ) T,s α (x, θ) T ) T. It should be noted that the term of the score function for α is redundant in the decomposition (4.8)due to the existence of s α (x, θ)as
8 82 MASAYUKI HENMI a marginal estimating function for α. In other words, u(x, θ)is equivalent to the estimating function u (x, θ) =(u β (x, θ)t,s α (x, θ) T ) T, where (4.9) u β (x, θ) =T β(θ, k 0 )s I β (x, θ, k 0)+a(x, θ, k 0 ), in the sense that u(x, θ)and u (x, θ)give the same estimator. Here, u β (x, θ)is a marginal estimating function for β that usually depends on unknown k 0, and cannot be used in practice. It is hypothetical, but can be theoretically considered just like an information score function evaluated by unknown k 0. According to the orthogonality of s α (x, θ)and a(x, θ, k 0 ), and equations (4.6) and (4.7), u β (x, θ)is orthogonal to s α(x, θ). Then, by the following theorem, we find that the asymptotic covariance matrix of the estimator of β in the joint estimation of β and α by u (x, θ)coincides with that in the single estimation of β by u β (x, θ) with known α. Theorem 2 (Insensitivity Theorem). Let M = {p(x; θ, k)} be an arbitrary semiparametric model with a finite-dimensional parameter θ =(β T,α T ) T and an infinite-dimensional nuisance parameter k. Letw(x, θ) be an arbitrary estimating function for θ composed by marginal estimating functions w β (x, θ) for β and w α (x, θ) for α. If w β (x, θ) is orthogonal to the score function s α (x, θ, k) for α, that is, E θ,k wβ (x, θ)s α (x, θ, k) T] =0 ( θ, k), then the asymptotic covariance matrix of the estimator of β in the joint estimation of β and α by w(x, θ) coincides with that in the single estimation of β by w β (x, θ) with known α. This theorem was shown by Knudsen (1999)in the case of parametric models, but it also holds in the case of semiparametric models. From the above discussion the asymptotic covariance matrix of ˆβ, which is the estimator of β in the joint estimation of β and α by u(x, θ), coincides with that in the single estimation of β by u β (x, θ)with known α. Hence, by equation (3.9)and the orthogonal decomposition (4.9), the asymptotic covariance matrix of ˆβ is represented as (4.10) Var A ( ˆβ) = ( G I β) 1 + ( Tβ G I β) 1 G A ( T β G I β) T, where G I β = E θ 0,k 0 s I β (x, θ 0,k 0 )s I β (x, θ 0,k 0 ) T ], G A = E θ0,k 0 a(x, θ 0,k 0 )a(x, θ 0, k 0 ) T ],T β = T β (θ 0,k 0 ), and θ 0 is the true value of θ. On the other hand, according to the decomposition (4.8), the asymptotic covariance matrix of the estimator β with known α is represented as (4.11) Var A ( β) = ( G I β) 1 + ( Tβ G I β) 1 ( Tα G α T T α + G A)( T β G I β) T, where G α =E θ0,k 0 s α (x, θ 0 )s α (x, θ 0 ) T ] and T α = T α (θ 0,k 0 ). By comparing (4.10) and (4.11), we find that the following inequality holds: (4.12) Var A ( ˆβ) Var A ( β).
9 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 83 The equality holds only when T α = 0 because of the positive-definiteness of the matrix G α. This is equivalent to the condition E θ,k u β (x, θ)s α (x, θ) T ] = 0. Thus, Theorem 1 has been proved. In the above discussion, the key point is to consider the orthogonal decomposition (4.8)for u β (x, θ). Because of the orthogonality of the information score function s I β (x, θ, k)for β and the score function s α(x, θ)for α, the non-optimal part of u β (x, θ)has a component of s α (x, θ)unless u β (x, θ)and s α (x, θ)are orthogonal. Therefore, in the single estimation of β by u β (x, θ) with known α, there exists loss of asymptotic efficiency which comes from the component of s α (x, θ). However, by estimating β and α simultaneously with u(x, θ) = (u β (x, θ) T,s α (x, θ) T ) T, the component of s α (x, θ)vanishes and the asymptotic efficiency is improved. It should be noted that Insensitivity Theorem plays an important role here, that is, it converts the asymptotic efficiency in the joint estimation into that in the single estimation. It should be also noted that if the marginal estimating function for β is optimal, the inverse phenomenon does not occur. This holds true whether or not the model M and the estimating function u(x, θ)satisfy conditions (4.2), (4.3)and (4.4), because the lower bound of the asymptotic covariance matrix of ˆβ is not less than that of β. The inverse phenomenon of asymptotic variances indicates that the asymptotic efficiency of the estimator can be improved by estimating nuisance parameters under some special conditions in the situation where the optimal estimating function cannot be used. 5. Parametric case In the preceding sections we considered the semiparametric case, but the inverse phenomenon of asymptotic variances can also occur in the parametric case. The structure is essentially the same as in the semiparametric case. The discussion in Section 4 is also applicable to the parametric case if a small modification is made; that is, to remove the infinite-dimensional parameter k and to replace information scores with ordinary scores. For parametric models, the inverse phenomenon of asymptotic variances occurs in the following case: Let M = {p(x; θ)} be a parametric model with a vector of parameters θ, which is composed by two vectors of parameters, β of interest and α of nuisance. We assume that β and α are orthogonal, that is, (5.1) E θ sβ (x, θ)s α (x, θ) T] =0 ( θ), where s β (x, θ)and s α (x, θ)are the score functions for β and α, respectively. In this situation we consider the estimation of θ =(β T,α T ) T by an estimating function u(x, θ) =(u β (x, θ) T,s α (x, θ) T ) T, where u β (x, θ)is an arbitrary marginal estimating function for β. Then, the following inequality for asymptotic covariance matrices holds: (5.2) Var A ( ˆβ) Var A ( β), where ˆβ is the estimator of β in the joint estimation of β and α by u(x, θ)and β is that given by u β (x, θ)when the true value of α is known. The equality holds
10 84 MASAYUKI HENMI if, and only if the marginal estimating function u β (x, θ)and the score function s α (x, θ)are orthogonal. When condition (5.1)holds, it is well known that the equality holds in (5.2)if u β (x, θ)coincides with s β (x, θ). However, if not so, the asymptotic covariance matrix of the estimator of β in the case of estimating α can be less than in the case of using the true value of α. Now, we give one example of the inverse phenomenon of asymptotic variances in the parametric case. Let Y and X be a response variable of interest and a vector of some covariates, respectively. We assume that they are both random variables. A generalized linear model for the conditional distribution of Y given X = x is written as follows: { } yθ b(θ) (5.3) p(y x; β,φ)= exp + c(y, φ), φ (5.4) g(µ) =x T β, where θ, µ and φ denote a natural, a mean and a dispersion parameter, respectively, and g is a link function. The vector of regression parameters β is usually the object of inference. Here, however we assume that the dispersion parameter φ is of our interest and treat β as a vector of nuisance parameters. When we estimate β based on an observed random sample, the maximum likelihood method is usually used. However, estimation of the dispersion parameter φ is not always the same. For example, the moment method is often used based on some reasons (see, for instance, McCullagh and Nelder (1989, p. 295)). When the maximum likelihood method is applied for β and the moment method for φ, the corresponding estimating function is as follows: (5.5) u(y, x, φ, β) = ( u φ (y, x, φ, β), s β (y, x, φ, β) T) T ) T (y µ)2 y µ = (φ, V(µ) φv(µ)g (µ) xt, where V(µ)denotes a variance function. The function s β (y, x, φ, β)is the score function for β and in addition, the two parameters φ and β are orthogonal. Hence, this is a situation in which the inverse phenomenon of asymptotic variances can occur; that is, we observe (5.6) Var A ( ˆφ) Var A ( φ), where ˆφ is the estimator of φ in the case of estimating β and φ is that in the case of using the true value of β. The condition for the equality to hold is as follows: ] V (µ) (5.7) E V(µ)g (µ) XT =0. If the model (5.3)is a normal distribution, this condition is satisfied because V(µ)= 1. However, for instance, in the case of a gamma distribution, this condition is not always satisfied and the inverse phenomenon of asymptotic variances
11 A PARADOXICAL EFFECT OF NUISANCE PARAMETERS 85 can occur. Since the dispersion φ is usually a nuisance parameter, the efficiency of the estimator of φ might be of little concern in practice. However, the fact that inequality (5.6)holds with respect to the estimation of φ is of interest. 6. Concluding remarks In this paper we have examined the structure of the inverse phenomenon of asymptotic variances using the orthogonal decomposition of estimating functions. If an optimal estimating function can be used as a marginal estimating function for a parameter of interest β, the asymptotic variance of the estimator with unknown nuisance parameter α cannot be less than that with known α. This is reasonable and compatible with our intuition. However, it is not always true unless the marginal estimating function for β is optimal. In fact, as is discussed in Section 4, when the marginal estimating function for β has a component of the score function for α in the non-optimal part of its orthogonal decomposition, the asymptotic variance of the estimator of β decreases by estimating α with the score function for α rather than by using the true value of α. The inverse phenomenon of asymptotic variances seems to be strange at least intuitively. However, the discussion on the orthogonal decomposition of estimating functions makes it clear how this inverse phenomenon occurs. It should be noted that the inverse phenomenon of asymptotic variances can occur in both parametric and semiparametric models in principle. The inverse phenomenon comes from the structure of estimating functions. However, it seems that this phenomenon has fewer opportunities to occur in the parametric case than in the semiparametric case. This is because, in the parametric case, the optimal estimating function can be used under moderate regularity conditions, and other estimation methods are not used unless some special reason exists. On the other hand, in the semiparametric case, since the optimal estimating function generally depends on the unknown true value of infinite-dimensional nuisance parameters, it usually cannot be used even if it is possible to obtain its functional form explicitly. In the example given in Section 2, if we assume that the error ɛ and (S, X)are independent, the information score function for the parameter of interest β can be calculated explicitly and depends on the unknown regression function h and the unknown marginal density function g of ɛ. The marginal estimating function for β which is used there is obtained from the information score function by substituting the zero function and the normal density function with the mean zero and the variance constant for h and g, respectively. Therefore, if h and g actually coincide with the above functions, the inverse phenomenon of asymptotic variances never occurs. However, if not, and especially if h is not a zero function, the inverse phenomenon does occur. Of course, even in the semiparametric case, the results can change if we estimate the infinite-dimensional nuisance parameters by some nonparametric approach. For the model in Section 2, no estimation of the regression function h is intended because it is difficult for epidemiological reasons. Instead of estimating h, model (2.2)is considered (see, Robins et al. (1992)).
12 86 MASAYUKI HENMI The inverse phenomenon of asymptotic variances does not seem to occur in so many situations since the conditions in Theorem 1 are rather restrictive. However, this phenomenon naturally occurs in some situations as well as the example by Robins et al. (1992), for instance, in the problems of missing-data (Robins et al. (1994), Lawless et al. (1999)), measurement error (Carroll et al. (1995)) and survey sampling (Rosenbaum (1987)). The inverse phenomenon of asymptotic variances gives great impact to the statistical community since it defies the common sense of statistical inference. We believe that our viewpoint helps us comprehend this phenomenon. Acknowledgements The author is grateful to Professor Eguchi of the Institute of Statistical Mathematics for his valuable advice and encouragement. The author also would like to thank the referees for their helpful comments which led to an improved manuscript. References Amari, S. and Kawanabe, M. (1997). Information geometry of estimating functions in semiparametric statistical models, Bernoulli, 3, Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995). Measurement Error in Nonlinear Models, Chapman and Hall, London. Fourdrinier, D. and Strawderman, W. E. (1996). Aparadox concerning shrinkage estimators: should a known scale parameter be replaced by an estimated value in the shrinkage factor? J. Multivar. Anal., 59, Godambe, V. P. (ed.) (1991). Estimating Functions, Oxford University Press, New York. Knudsen, S. J. (1999). Estimating Functions and Separate Inference, Monographs Vol. 1, Dept. of Statistics and Demography, University of Southern Denmark. Lawless, J. F., Kalbfleisch, J. D. and Wild, C. J. (1999). Semiparametric methods for responseselective and missing data problems in regression, J. R. Statist. Soc. B, 61, McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, Chapman and Hall, London. Robins, J. M., Mark, S. D. and Newey, W. K. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders, Biometrics, 48, Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed, J. Am. Statist. Ass., 89, Rosenbaum, P. R. (1987). Model-based direct adjustment, J. Am. Statist. Ass., 82,
Generalized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationBootstrap prediction and Bayesian prediction under misspecified models
Bernoulli 11(4), 2005, 747 758 Bootstrap prediction and Bayesian prediction under misspecified models TADAYOSHI FUSHIKI Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569,
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationarxiv: v2 [stat.me] 8 Jun 2016
Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationUniversity of Oxford. Statistical Methods Autocorrelation. Identification and Estimation
University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationi=1 h n (ˆθ n ) = 0. (2)
Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Take-home final examination February 1 st -February 8 th, 019 Instructions You do not need to edit the solutions Just make sure the handwriting is legible The final solutions should
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationCalibration Estimation for Semiparametric Copula Models under Missing Data
Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationECON 3150/4150, Spring term Lecture 7
ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationEstimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004
Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationIntroduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University
Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationEcon 2120: Section 2
Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted
More informationOctober 25, 2013 INNER PRODUCT SPACES
October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal
More informationTo Estimate or Not to Estimate?
To Estimate or Not to Estimate? Benjamin Kedem and Shihua Wen In linear regression there are examples where some of the coefficients are known but are estimated anyway for various reasons not least of
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More information2 Metric Spaces Definitions Exotic Examples... 3
Contents 1 Vector Spaces and Norms 1 2 Metric Spaces 2 2.1 Definitions.......................................... 2 2.2 Exotic Examples...................................... 3 3 Topologies 4 3.1 Open Sets..........................................
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationEstimation theory and information geometry based on denoising
Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationAdvanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1
Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationMarch Algebra 2 Question 1. March Algebra 2 Question 1
March Algebra 2 Question 1 If the statement is always true for the domain, assign that part a 3. If it is sometimes true, assign it a 2. If it is never true, assign it a 1. Your answer for this question
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationBirkbeck Working Papers in Economics & Finance
ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationEstimation for two-phase designs: semiparametric models and Z theorems
Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationSample size calculations for logistic and Poisson regression models
Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationLikelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona
Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,
More information1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold
Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,
More informationLecture 6: Geometry of OLS Estimation of Linear Regession
Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationGraduate Econometrics I: Unbiased Estimation
Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More informationProfessors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th
DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationAccounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationSIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION
Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationMinimax Estimation of a nonlinear functional on a structured high-dimensional model
Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics
More informationDS-GA 1002 Lecture notes 12 Fall Linear regression
DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationTECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random
TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random Paul J. Rathouz University of Chicago Abstract. We consider the problem of attrition under a logistic
More informationIn English, this means that if we travel on a straight line between any two points in C, then we never leave C.
Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from
More informationApproximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise
IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)
More informationQuestions and Answers on Unit Roots, Cointegration, VARs and VECMs
Questions and Answers on Unit Roots, Cointegration, VARs and VECMs L. Magee Winter, 2012 1. Let ɛ t, t = 1,..., T be a series of independent draws from a N[0,1] distribution. Let w t, t = 1,..., T, be
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationQuick Review on Linear Multiple Regression
Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,
More informationarxiv: v1 [stat.me] 15 May 2011
Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information