ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measureme

Size: px
Start display at page:

Download "ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measureme"

Transcription

1 Locally Ancillary Quasiscore Models for Errors-in-Covariates Paul J. RATHOUZ and Kung-Yee LIANG Paul J. Rathouz is Assistant Professor, Department of Health Studies, University of Chicago, Chicago, IL 6637 ( and Kung-Yee Liang is Professor, Department of Biostatistics, Johns Hopkins University, Baltimore, MD 15 ( The authors thank Theodore G. Karrison, an Associate Editor and two referees for very constructive discussions, comments and suggestions; Wanli Min for computing assistance; and Rolf Loeber for permission to use the cortisol data. Corresponding address: Paul Rathouz Dept. of Health Studies 5841 S. Maryland Ave., MC 7 Chicago, IL 6637 Ph: FAX: prathouz@health.bsd.uchicago.edu February 1 Revised for Journal of the American Statistical Association 1

2 ABSTRACT We use the notion of locally ancillary estimating functions to develop a quasiscore method for fitting regression models containing measurement error in the covariates. Suppose interest is on the model E(Y ju; w) for response Y, the observed data are (y; ; w), and X is a mismeasured surrogate for u. We take a functional modelling approach, treating the u as a fied nuisance parameter. Beginning with quasiscores for the regression parameter and the unknown u, a bias-corrected quasiscore for the regression parameter is derived that is second order locally ancillary for the nuisance u. The method used to accomplish this requires only the correct specification of the mean and variance functions for Y and X in terms of u, w and the regression parameter. When an estimator for u is plugged into the corrected quasiscore, local approimations show that the bias is small. Simulations verifying this result and an eample from child psychiatry are presented, both using log-linear regression models. KEY WORDS: ancillarity, measurement error, nuisance parameter, quasilikelihood, semiparametric model. 1

3 1 Introduction Let (u i ;w i ) be a sequence of covariates with arbitrary joint empirical distribution function G( ), and let (y i ; i ;w i ) be a sequence of observations such that the random variables (Y i ju i ;w i ) are independent conditional on the vector of (u i ;w i )s. Assume that y i, u i and i are scalars and that w i is a vector of dimension q. Interest is on the regression model E(Y i j u i ;w i ) = μ y (fi; u i ;w i )=μ yi var(y i j u i ;w i ) = ffi y ~v y (fi; u i ;w i )=v yi ; ) ; (1) where fi is a p-dimensional regression parameter and ffi y is a dispersion parameter. Often, p = q +1. Let i be a mismeasured version of u i such that E(X i j u i ;w i ) = μ (ff; u i ;w i )=μ i var(x i j u i ;w i ) = ffi ~v (ff; u i ;w i )=v i ; ) : () Again, ffi is a dispersion parameter and ff is a measurement error parameter. We make the common surrogacy assumption that X i is independent of the response Y i, conditional on the covariates (u i ;w i ). In this paper, we propose a new method for inference in the regression model (1), subject to stochastic measurement error in the covariates following model (). The method etends in an approimate way the functional modelling approach of Stefanski and Carroll (1987) in which the mismeasured covariate u i is viewed as a fied nuisance parameter. Under their generalized linear models (McCullagh and Nelder 1989) framework, the conditional score function for fi (Lindsay 198) is unbiased even when the covariate u i is not known eactly, but rather is estimated. This elegant method is semiparametric efficient in the sense that the conditional score is optimal for fi

4 in the absence of knowledge of u i or of the distribution of (u i jw i ) (Lindsay 198, 1985). However, the class of applicable models for the distributions of (Y i ju i ;w i ) and (X i ju i ;w i ) is limited to the canonical eponential family. In separate research, Waterman and Lindsay (1996a,b) proposed a projected score method that approimates the conditional score when it eists and emulates it in terms of robustness to nuisance parameters when it does not. Robustness is operationalized in terms of local ancillarity (Small and McLeish 1994), which we define in the net section. While the Waterman-Lindsay method generates estimating functions that are locally ancillary to an arbitrary order, their work and that of Small and McLeish (1989) has also shown that second-order local ancillarity is a particularly important special case. Recently, Rathouz and Liang (1999) have etended the Waterman-Lindsay projected score method to a quasilikelihood setting, thereby obtaining a secondorder locally ancillary quasiscore (SOLAQS). The measurement error method proposed here is motivated by making three observations, which synthesize these prior works: (i) the Stefanski and Carroll (1987) method for measurement error problems corresponds to the conditional score method for general nuisance parameter problems; (ii) recent work shows that second-order locally ancillary estimating functions provide very good approimations to the behavior of the conditional score; and (iii) the general method of obtaining second order locally ancillary estimating functions from quasilikelihood models, of which (1) and () are one eample, can be eploited to develop a new method for inference in functional measurement error models. Such development is the object of this paper. 3

5 Model (1), which is the inferential target, includes linear, logistic, log-linear and polynomial regression models as special cases. Model () encompasses the classical measurement error model E(X i j u i ;w i ) = u i, var(x i j u i ;w i ) = ffi, as well as the error calibration model E(X i j u i ;w i ) = ff + ff 1 u i + ff w i, var(x i j u i ;w i ) = ffi (Carroll, Ruppert and Stefanski 1995) as special cases. Additionally, in model (), the mean and variance of i do not have to be specified on the same scale in which u i appears in model (1). We could for eample, allow amultiplicative error model Eflog(X i ) j u i ;w i g = ff + ff 1 log(u i )+ff w i and varflog(x i ) j u i ;w i g = ffi. Whatever the measurement error model, we assume throughout that the required internal or eternal replication or validation data (Carroll and Stefanski, 199) are available to provide consistent estimators of the measurement error parameters (ff; ffi ). Besides the aforementioned conditional score method, other related methods include those of small measurement error asymptotics which provide firstorder bias-corrections to the naive estimator (e.g. Stefanski 1985), and the approimate quasilikelihood-variance function (QVF) method of Carroll and Stefanski (199), in which parametric functions are assumed only for the first two moments of (Y i ju i ;w i ) and (X i ju i ;w i ). Our approach is in the spirit of the conditional score, while making the weaker assumptions equivalent tothe QVF models. We obtain a bias-corrected quasiscore function for which, using the method of Stefanski (1985), the resulting estimator would have zero firstorder bias-correction. Our approach is therefore semiparametric in two senses. First, it is a functional model in that it does not require a specification of the marginal distribution of (u i jw i ), as the u i 's are treated as fied nuisance 4

6 parameters. Second, it only requires specification of the first two moments of the distributions of (Y i ju i ;w i ) and (X i ju i ;w i ). For a unified presentation of methods for errors-in-covariates, see Carroll, et al. (1995). This paper has the following organization. In Section, we review secondorder locally ancillary estimating functions and show that when the ancillarity applies to the mismeasured covariate u i, the bias-correction of Stefanski (1985) obtains automatically". Section 3 contains the main development of the SO- LAQS for measurement error problems. Theoretical and practical considerations for use of the SOLAQS for inferences on fi are presented in Section 4. These include variance estimation and the use of small measurement error asymptotics to eamine the behavior of the resulting SOLAQS with respect to bias. In Section 5, we study the log-linear regression model in more detail in order to illustrate some advantages of the SOLAQS over other methods. We include simulation results and a small eample data analysis from child psychiatry. We close with a brief discussion in Section 6. Locally ancillary estimating functions in measurement error problems Suppose that the regression parameter fi is a vector of dimension p and that the (p 1) estimating function g(fi) = nx i=1 g i (fi; y i ;u i ;w i )= X i g i ; (3) such that Efg i (fi; Y i ;u i ;w i ); u i ;w i g =,is available for inferences on fi. Now, treating u i as an unknown nuisance parameter and operating only on the ith 5

7 summand in (3), define the (p 1) functional operators " k b i k(g i Efg i(fi; Y Λk i ;u i ;w i ); u Λ ;w i g u Λ =u i ; (4) for k = 1;, and the (p ) concatenated functional operator b i () (g i) = fb i 1(g i );b i (g i )g. The idea in (4) is that the epectation is taken conditionally on (u Λ ;fi), where u Λ 6= u i. If b i 1(g i ) =, then g i is said to be first-order locally ancillary" for u i, while if b i () (g i)=,g i is second-order locally ancillary" for u i (Small and McLeish 1994). One interpretation of kth-order local ancillarity is that, under regularity, it is equivalent to Efg i (u i ); u Λ g = of(u Λ u i ) k g. In addition, under standard regularity conditions for estimating functions (Godambe, 196; Godambe and Thompson 1974), g = b i 1(g i ) and E i = b i (g i 1(g i i ; (5) (Rathouz and Liang 1999). Higher orders of local ancillarity are defined similarly, the order providing a measure of the degree of robustness of g i to u i. The second order is most important in practice, however, because it provides a large degree of the bias correction obtained through second and successively higher orders (Waterman and Lindsay 1996a,b; Small and McLeish 1989). This bias correction phenomenon arises in the measurement error literature, as follows. Let u i be measured with error by i, where E(X i ju i ;w i ) = u i and var(x i ju i ;w i ) = ffi ~v (u i ;w i ). Consider the plug-in estimating function ^g i (fi) =g i (fi; y i ; i ;w i ), and let ^fi be the solution to P i ^g i =. Then for fied ffi >, ^fi converges in probability to fi + O(ffi ) as n 1. The remainder O(ffi ) refers to the measurement error bias in the limiting value of ^fi as ffi. The order of operations for this argument is that n 1first, then ffi. 6

8 A first-order bias-corrected estimator of fi using such small measurement error asymptotics would be ^fi c = ^fi + 1 ffi 8 < : lim n1 X T9 = 1 " X ; lim n1 i g i i )# ~v (u i ;w i ) ; (6) where for fied ffi >, ^fi c converges to fi+o(ffi )asn 1(Stefanski 1985). If g i is second-order locally ancillary for u i, then by (5), E f(@ g i =@u i )~v (u i ;w i )g =, and the bias correction given by (6) obtains automatically. Therefore, estimation with a second-order locally ancillary estimating function produces an estimator that is approimately consistent to order o(ffi ). In the net section, we propose a quasiscore method for obtaining second-order locally ancillary estimating functions for fi in the functional measurement error modelling problem, eliminating the need for any correction term such as that in (6). 3 SOLAQS for Measurement Error Models Rathouz and Liang (1999) recently proposed a method for constructing secondorder locally ancillary estimating functions. The method differs from those previously proposed in that it does not rely on projection, and thereby avoids the need to specify a likelihood function. In this section, their idea is applied to the problem of inference in functional measurement error models. Assume that models (1) and () hold, that μ yi, v yi, μ i, v i are finite and admit continuous first and second derivatives with respect to (fi;ff;u i ) and that v yi > and v i > for all (fi;ff;u i ). Additionally assume that μ (u i ;w i ) is strictly monotone (in u i ) in a neighborhood of the true u i. These assumptions are not restrictive; for many models in practical usage, moments and their derivatives to several 7

9 orders eist. The monotonicity assumption is quite reasonable, given that i is a mismeasured surrogate for the true u i. For the ith observation, we now construct estimating functions for fi, ffi y and u i which will act as building blocks in the development that follows. First, if u i were known without error, one might consider the quasiscore S = X i S i = X T yi yi μ v yi for consistent inferences on fi (Wedderburn 1974, McCullagh 1983). In the traditional generalized linear models case (McCullagh and Nelder 1989), h y ( ) is a link function, yi is the linear predictor and V ( ) is the variance function. Then h y (μ yi )= yi = fi T w i + fi 1 u i, and S i takes the well-known form S i = wi u i fh y(μ yi )g y 1 i μ yi ffi y V (μ yi ) : Additionally, ifffi y is not known, the estimating function R = X i R i = X i (ffi y v yi ) 1 f(y i μ yi ) ffi y v yi g would yield consistent inferences on ffi y. R = can be solved after fi has been estimated via S =,since S is ancillary for ffi y. Note that (S T ;R ) T is the quasilikelihood analogue to the score equations given in Stefanski and Carroll (1987), from which the conditional score function was constructed. Now, similarly to S i, using the data y i and i, define the u i -quasiscore T i yi μ yi v i i μ i v i = T i1y + T i1 : (7) In the aforementioned case where h y (μ yi )= yi, and where μ i = u i, T i1 = fi 1 fh y(μ yi )g y 1 i μ yi ffi y V (μ yi ) + i u i : v i 8

10 The quasiscore T i1 will be used to estimate the mismeasured covariate u i. Further, it will be used as a basis for correcting the bias in S i due to measurement error. To that end, note that T i1 is optimal for u i in the class of linear estimating functions and is thereby information unbiased (Crowder, 1987). In particular, T i i1 =@u i + T i1 is a second unbiased estimating function for u i. Indeed, T i is the quasiscore analogue of the second Bhattacharyya score for the nuisance u i (Rathouz and Liang 1999). Letting prime ( ) denote differentiation with respect to u i, T i can be re-epressed as where T iy = T i1y + T T iy = T i = T i1 + T i1 = T iy + T i +T i1y T v 1 i with an analogous form for T i. i1y and similarly for T i. From (7), T iy takes the form (y i μ yi n o v yi (yi μ yi) v yi i We now obtain a second order locally ancillary quasiscore, S i, as a linear combination of S i, T i1 and T i. Operating on the ith observation (y i ; i ;w i ), define the matri 6 D i = 4E 8 S i T i1 T i 19 C >= A >; ;bi () S i T i1 T i 13 C A 7 5 = D i D i1 D i D i1 D i11 D i1 D i D i1 D i where the second and third columns of D i are the maps of (S T i;t i1 ;T i ) T via b i 1( ) and b i ( ) respectively. Interestingly, D i is symmetric. Then define S i = S i a i Ti1 T i ; where a i is the (p ) matri given by a i = b i ()(S ) ( ) 1 1 b i Ti1 Di11 D () =(D T i1 D i ) i1 : i D i1 D i 9 1 C A ;

11 In order to compute S i, we must evaluate the functionals b i (S () i), b i (T () i1) and b i (T () i). This is easily accomplished via the derivatives of the mean and the variance models (1) and () (see Appendi A). We claim that S i is second order locally ancillary for u i. To see this, first note that b i ( ) is a linear operator in the sense that () bi (a () 1g 1 + a g ) = a 1 b i () (g 1) + a b i () (g ), where a k = a k (fi;u i ;w i ) does not contain the data (y i ; i ), k =1;. Then write b i ()(S i ) = b i ()(S i ) a i b i () Ti1 T i 1 Di11 D = (D i1 ;D i ) (D i1 ;D i ) i1 Di11 D i1 =: D i1 D i D i1 D i We refer to S i as a second-order locally ancillary quasiscore (SOLAQS). It is a robust version of S i, compensating for the bias introduced by the measurement error in i. In contrast to previous methods (Waterman and Lindsay 1996a), S i is obtained without the use of projection, and consequently depends only upon correct specification of models (1) and (). Summing over all observations, inferences on fi can be based on the SOLAQS S = X i S i = X i ( S i a i Ti1 T i ) ; which is an unbiased estimating function. However, while S is second-order locally ancillary for the vector (u 1 ;:::;u n ) T, the u i 's still appear in S i and therefore must be estimated. This is accomplished for each i by solving T i1 (fi;u i )=inu i for ^u ifi, giving rise to the plug-in quasiscore ^S (fi) = X i ^S i (fi) = X i S i (fi; ^u ifi ); 1

12 which is used for inferences on fi. In Section 4, we eamine theoretical implications of and practical considerations for using ^S for inferences on fi. In the case where ffi y is unknown, an analogous procedure to that for deriving S i is implemented. Substituting R i for S i and ffi y for fi, a linear combination R i of (R i ;T i1 ;T i ) that is a second-order locally u i -ancillary estimating function for ffi y -inferences is obtained. Estimation of (fi T ;ffi y ) T is accomplished via solution of ^S = and ^R = P i ^R i =. Simultaneous solution of ( ^S T ; ^R ) T = is required, however, as S is not ancillary for ffi y. 4 Inferences with the plug-in SOLAQS ^S 4.1 Introduction In this section, we further develop the use of ^S for inferences on the regression parameter fi. In the following subsection, we consider the bias in ^S i as the measurement error variance ffi. Having shown that the bias is small, in Section 4.3, we consider the asymptotic distribution of ^fi as n 1for fied ffi, where ^fi is the solution to ^S =. Finally, we address some computational issues in solving ^S =. The Fisher scoring computational technique for solving ^S = is sketched in Appendi C. 4. Small measurement error asymptotic bias in ^S Here, we study the behavior of ^S i under small measurement error asymptotics (Carroll and Stefanski 199). Formally,we hold n fied and consider a series of eperiments in which the measurement error dispersion ffi. That ffi need not reflect a true limiting operation in practice. Rather, since eact bias 11

13 analysis can be quite difficult, it serves as an analytic tool, yielding order-ofmagnitude approimations that provide some insight to the performance of the method with respect to bias-correction. We establish that the asymptotic bias in ^S i is of smaller order than that of the naive plug-in score ^S i wherein i replaces u i. We suppress the subscript i, and operate on one observation at a time. A proof is in Appendi B. Letting prime denote differentiation with respect to u, we have ^S S =(^u fi u)s + 1 (^u fi u) S (^u fi u) 3 S (u Λ fi) (8) where ju Λ fi uj»j^u fi uj. We show in Appendi B that (^u fi u) =D 1 11 T 1 + O p (ffi ), so that the first term of (8) can be written n o (^u fi u)s = ST 1 D S (^ufi u) D 1 11 T 1 : (9) Regarding (8) and (9), we note here two interesting facts on which proof of the following result is founded, and which are direct results of the construction of S. First, E(S ) = E(S ) = due to second-order local ancillarity. Second, due to the joint optimality ofthe quasiscores (S ;T 1 ) for (fi;u), E(S T 1 ) ß, i.e., S and T 1 are approimately orthogonal. Theorem 1. Let ^u fi be the solution of T 1 = for fied fi. Then for the true fi, ^S = S (^u fi )=S + Z + O p (ffi ), where Z = O p (ffi 1= ) and is unbiased. Furthermore, under uniform integrability, E( ^S S )=O(ffi 3= ). We note as a point of comparison that using to estimate u, (^S S ) is also O p (ffi 1= ), but with bias of order O(ffi ). 1

14 4.3 Asymptotic distribution of ^fi We now eamine the asymptotic behavior of ^fi for fied measurement error variance ffi. The following discussion applies equally to the parameter (fi T ;ffi y ) T when ffi y is being estimated simultaneously with fi; simply replace ^S i throughout with the vector ( ^S T i; ^R i ) T. By the standard theory of estimating functions (e.g. Carroll, et al. 1995, Appendi A.3), there eists ^fi, a sequence of solutions to ^S = such that ^fi fi Λ in probability, where fi Λ is the solution to the limiting equation lim n1(1=n) ^S = that is closest to the true fi. By Theorem 1, fi Λ is close to the true fi; this result is similar to those of other methods (e.g. Stefanski 1985; Carroll and Stefanski 199) in that the bias is not completely eliminated, ecept in very specialized cases. Of course, there is no guarantee that there is a unique solution to ^S =, even as n 1, so ^fi must be carefully defined in practice. Our approach to this problem is given in the net section, but a general solution may not eist without further assumptions. By estimating function theory, p n( ^fi fi Λ ) d Nf;A 1 B(A 1 ) T g, where nx A = n 1 n1 lim ^S nx and B = n 1 n1 E ^S i ^S i T ; i=1 and all quantities are evaluated at fi Λ. The variance factor B can be consistently estimated by replacing fi Λ with ^fi and using the empirical epected value of ^S i ^S T i. To estimate A, we employ anumerical derivative matri, as follows. At the estimated ^fi, the kth column of (@ ^S =@fi) is estimated by ^S ( ^fi + D k ) ^S ( ^fi D k ) d k ; where d k is a perturbation, and D k = (;:::;;d k ; ;:::;) T is a p-vector of 13 i=1

15 zeros, with d k in the kth position. Wald-type confidence intervals for fi Λ can then be constructed from ^A 1 ^B( ^A 1 ) T in the standard fashion. In the case where the measurement error parameters = (ff T ;ffi ) T are estimated as well by internal replication and/or validation data (Carroll, et al. 1995), a modified standard error estimator applies. Assume that is estimated via solution to some estimating equation U = P i U i =. Assume further than U i does not depend on fi. This is not unreasonable, since in most settings U i will be a function of ( i ;u i ;w i ), but not y i. Define the quantities nx C = n 1 lim ^S nx and D = n 1 n1 i=1 Then, we show in Appendi D that p n( ^fi fi Λ ) d Nf;A 1 B Λ (A 1 ) T g, where B Λ = n 1 lim nx n1 i=1 i=1 Ef( ^S i CD 1 U i )( ^S i CD 1 U i ) T g; and all quantities are evaluated at (fi Λ T ;ffi y ; T ) T. As with A, C can be estimated using a numerical derivative, D can be estimated in the usual way from U, and B Λ can be estimated using the empirical variance of ^S i CD 1 U i. When is estimated using eternal replication and/or validation data (Carroll, et al. 1995), an augmented data set consisting of the concatenation of the primary data and the eternal data is employed. By setting U i = for the primary data and S i = for the eternal data, the augmented data can be analyzed as for internal replication and/or validation data. 4.4 Computational issues Our eperience thus far with the proposed method has suggested three computational techniques that provide for numerical stabilization of the estimation 14 :

16 of fi. First, before solving ^S, we transform the design matri containing the vectors w i to form an orthonormal basis. Also, using ffi and the sample variance of, we transform such that the empirical distribution function of u has mean zero and variance one. These transformations have the additional advantage of permitting the perturbations d k to be set to a fied constant for all k and sample sizes. We use d k =:1. Second, for smaller sample sizes, there is some instability in the simultaneous solution of S ;T 11 ;:::;T n1. This can be largely alleviated by estimating u i as the solution to T Λ i1;n = for each i, where T Λ i1;n is the same as T i1, replacing ffi by a smaller quantity, ffi Λ n, thereby weighting the estimate ^u ifi towards i. We used ffi Λ n =(1 1p=n)ffi in our simulation work. Third, the uniform integrability assumption made at several points in the proof of Theorem 1 has implications for the estimation of u i. In some settings, it may be necessary to bound the permissible values of ^u ifi by quantities that are scientifically reasonable for the application at hand. As such bounds can be relatively wide, we see no way in which the need to specify them would restrict the applicability of the proposed method. Additionally, two issues arise when ffi is large: lack of convergence and multiple solutions. To choose among possible multiple solutions, we take the naive estimator of (fi T ;ffi y ) T as a starting value for the Fisher scoring procedure (Appendi C). Lack of convergence can occur if the algorithm diverges to a point with one or more singular matrices. Alternatively, a solution may not be reached after the maimum number of iterations (we use 1). Interestingly, in our eperience with simulated data, these problems are more frequent when 15

17 ffi y is known rather than estimated simultaneously with fi. However, when ffi y is being estimated and ffi is large (e.g., greater than var(u i )), multiple solutions to ^S = appear to eist, and the one obtained by starting from the naive estimator may not be consistent for fi Λ. 5 Eample: Log-linear regression We now illustrate the applicability and performance of the SOLAQS, by comparing it to other methods in the contet of a special case of models (1) and (), the log-linear regression model (McCullagh & Nelder 1989, Ch. 6) log(μ yi )=fi T w i + fi 1 u i and v yi = ffi y μ yi (1) with additive measurement error i = u i + ffi i, where E(ffi i ju i )= and var(ffi i ju i )=ffi : (11) We consider assumptions underlying other approaches and present simulation results comparing the SOLAQS to these competitors. 5.1 Other approaches Were (Y i jw i ;u i ) a true Poisson random variable and ffi i ο N(;ffi ), the conditional score method of Stefanski and Carroll (1987) would apply and give rise to the semiparametric efficient estimator for fi in the presence of unknown distribution for (U i jw i ). However, implementation is hindered by two concerns. First, the conditional score does not take a closed form; rather, iterative computations are required to compute it. Second, it is not known to what degree the conditional score is robust to misspecifications of the distributional form 16

18 of (Y i ;X i ju i ;w i ) (Carroll and Wand, 1991). We note that the first problem can be addressed approimately via projection (Waterman and Lindsay 1996a,b). Indeed, due to the equivalence of quasiscores and likelihood scores in eponential family distributions, S with ffi y = 1 is the second-order projected score for the Poisson-Gaussian model. Furthermore, in many problems, (Y i ju i ;w i )isoverdispersed relative to a true Poisson random variable, rendering a likelihood difficult to specify. Alternatives not requiring a likelihood include regression calibration (RC; Carroll, et al. 1995, Ch. 3) and the SIMEX estimator (Cook and Stefanski, 1995). Treating u i as a random variable, the RC method replaces u i with an estimate of E(U i jw i ; i ), thereby eploiting a distributional assumption on (U i jw i ), which mayormay not be valid. However, if the distribution of (U i jw i ; i ) is Gaussian, then in the log-linear model, the RC method is particularly applicable and relatively statistically efficient for ^fi 1. The fully functional SIMEX estimator is easy to implement and quite general. It does however require a distributional form for (X i ju i ;w i ). In contrast, the SOLAQS method only requires the mean and variance of (X i ju i ;w i ). 5. Simulation study We now compare the empirical performance of SOLAQS to that of RC in a simulation study of models (1) and (11). In each simulation, we compare the naive estimator (the solution to ^S, with u i replaced by i ), the RC estimator with the linear approimation calibration estimator as described in Carroll, et al (1995, Section 3.4.), and the SOLAQS estimator using ( ^S T ; ^R ) T to esti- 17

19 mate fi and ffi y. We include the naive estimator to provide an indication of the degree of bias correction needed. The RC method, in cases where its assumptions are satisfied, will permit an assessment of the efficiency loss in SOLAQS by not eploiting the distribution of (U i jw i ). Computational procedures in Section 4.3, 4.4 and Appendi C were used for estimation and confidence interval construction with ( ^S T ; ^R ) T. The error variance ffi, which in practice is easily-estimated with replication data, was assumed known for both the RC and the SOLAQS methods. The RC model parameters of the distribution of (u i ;w i ), which are not needed for SOLAQS, were estimated for each replicate. We concentrate on the coefficient fi 1 of u i. Overdispersed count data Y i were generated as a (3 : 7) Bernoulli miture of two Poisson random variables such that the mean and variance were μ yi and ffi y μ yi, respectively. The u i 's were standardized to have mean zero and variance one and the set of (w i ;u i )'s was fied over all replicates. The errors ffi i were mean-zero Gaussian with variance ffi. We considered relative rate values ep(fi 1 ) = (1:5; 3:), measurement error variance ffi = (:3; :7), and overdispersion ffi y =(1:5; 3:), Each simulation contains 5 replicates. Model 1. Let w i = 1 so that fi is the intercept. Set fi =. Let the u i 's be a random Gaussian sample. Set sample size n =. Results are in Table 1. Model is identical to Model 1, ecept that the u i 's are uniformly distributed, then standardized. Results are also in Table 1. 18

20 Model 3. Let w Λ i1 be Bernoulli with probability :3. To generate the u i s, let ffl i be uniformly distributed on (; 1). Then let u i be the standardized version of (1 + cw Λ i )ffl i, c>. Let w i =(1;w i1 ) T, where w i1 is the standardized version of w Λ i1. Fi fi = (; log(1:5)) T and n = 5. Setting c = (:5; 1:5) allows for different values of ρ wu = corr(w i1 ;u i ). Results are in Table. Results. Of siteen thousand replicates across the three models, in all but one replicate, the SOLAQS converged in less than 1 iterations (result included), while 1 others took more than 5 iterations. These instances all fell under the last case of Model (Table 1). The bias in the naive estimator reflects a substantial degree of measurement error in all cases. For Model 1, the assumptions of RC are met, and therefore, by eploiting the distribution of (U i jw i ), the RC method surpassed the SOLAQS in terms of bias and efficiency. With the eception of one case, however, use of the SOLAQS resulted in less than percent precision loss, as measured by the MSE. For Models and 3, where the RC assumptions are violated, SOLAQS yielded lower bias than the RC method in 19 of 4 cases. While the RC was notably biased in some cases, in no case was the bias in SOLAQS more than ten percent, and it was usually less than five percent. Compared to RC, the SOLAQS never resulted in more than 3 percent loss in precision. By contrast, the gains in precision in SOLAQS over RCwere at times substantial. SOLAQS coverage probabilities of Wald-type confidence intervals using the sandwich estimator were satisfactory for fi 1 = log(1:5), but were anti-conservative at fi 1 = log(3:). This suggests that, in practice, for larger values of fi 1, an- 19

21 other method of variance estimation, such as the BCa bootstrap (Efron and Tibshirani, 1993, Section 14.3) as suggested by Carroll, et al (1995, Sections A.6.5 A.6.6) would be more appropriate. 5.3 Cortisol data We illustrate our method with a data set from a study eamining the relationship between salivary cortisol and symptoms of conduct disorder (CD) (McBurnett, Lahey, Rathouz and Loeber ). One hypothesis about the psychopathology of CD is that symptomatic behaviors occur with increased frequency due to subjects' suppressed fear response to threatening stimuli, such as punishment for disruptive behaviors. Because fear response is reflected in cortisol levels, we epect symptoms to be inversely related to cortisol. This hypothesis was eamined in a clinic-referred sample of n = 38boys with CD. Responses Y i are the cumulative counts over four years of reported symptoms of aggressive CD and of covert CD. We treat the two sets of symptoms separately. Symptom counts each ranged from to 13, with a median of. The 75th percentiles were 4 aggressive symptoms and 5 covert symptoms. The covariate u i of interest is the logarithm of salivary cortisol in Year of the study. Since cortisol was measured in Years and 4, we let X ij = log(measured cortisol), where j denotes Year. Note that corr( i ; i4 ) = :, suggesting substantial within-subject variation, in this case reflecting laboratory error and temporal fluctuations. Age in the first study year was also obtained, presumably without error. Let w i =(1; age i ) T. We model the error in the log-cortisol measurements as E(X ij j u i )=u i +

22 ffi(j = 4) and var(x ij j u i ) = ffi ~. Consequently, define X i = (X i + X i4 ff)=, so that E(X i j u i ) = u i and var(x i j u i ) = ffi ~ = = ffi. Rescaling X i so that sd(u i ) ß 1, we first estimated the error model as ^ff = μ 4 μ = :84 and ^ffi = :5 times the sample variance of ( i i4 + ^ff) = 1:99. We then estimated fi and ffi y simultaneously using the naive and SOLAQS methods (Table 3); the correction ffi Λ n (Section 4.4) was not used. Due to the small sample size and to account for variability in estimation of ff and ffi, bootstrap BCa confidence intervals were generated instead of Wald-type confidence intervals. As epected, correcting for the measurement error makes moderate difference in the parameter estimates for the age coefficients, while use of ^S provides a correction for attenuation of about 7 percentin^fi 1 for each outcome. Nevertheless, the effect is not significant for the Covert CD outcome. Confidence intervals are considerably wider for ^S, reflecting the variability induced in correcting for the bias due to measurement error. Furthermore, the results suggest that, due to the error in u i, the dispersion ffi y is substantially overestimated in the naive analysis. 6 Concluding Remarks We have outlined a quasilikelihood-based method for obtaining second order locally ancillary estimating functions for regression problems subject to errors in covariates. Prior to Rathouz and Liang (1999), local ancillarity had been achieved using the method of L projection (Waterman and Lindsay 1996a), generally requiring a likelihood specification. By replacing the projection operator proposed by Waterman and Lindsay with the solution to a simple linear 1

23 system, our method achieves local ancillarity assuming only that the first two moments of the response and the surrogate covariate are correctly specified. Furthermore, through a functional modeling approach, we avoid any assumptions on the baseline distribution of u. Simulations in the contet of log-linear regression show that the SOLAQS estimator is generally less biased than the RC estimator when the RC assumptions were violated, and often results in a substantial increase in precision. Corresponding results in larger sample sizes are epected to be more dramatic due to the more important role of bias. In the following, we briefly remark on a few additional aspects of the method. Asymptotic approimations focusing on the limiting behavior of ^S i as ffi 1 were discussed in Section 4.. Of more direct interest is the estimator ^fi that solves ^S = ; such approimations are considerably more difficult to study. Nevertheless, the work of Stefanski (1985) described in Section (equation 6) and its relationship to second-order local ancillarity suggests the following: For fied ffi, as n 1, ^fi converges to fi Λ which differs from fi by a quantity of order o(ffi ). Also, as with some eisting estimating functions for measurement error (e.g., Stefanski and Carroll, 1987), ^S may not admit a unique root even in its limit as n 1. This may be especially true for larger values of ffi. Given that ^Si S i as ffi, this is not epected to pose practical problems for small measurement error variance. There are several reasons to believe that ^S is reasonably efficient. First, it is based on quasiscores for fi and u, which are the efficient estimating functions that are linear in the data (Crowder 1987). Second, the conditional score function gives rise to the semiparametric efficient score for fi when the

24 distribution of (ujw) is unspecified. Third, the second-order locally ancillary projected score of Waterman and Lindsay (1996a) emulates the conditional score in terms of bias and efficiency when it eists. Finally, our method is the quasiscore analogue of the projected score. In depth efficiency studies are a subject for further research. Finally, etension of the SOLAQS to multiple mismeasured covariates is straightforward in the case where the variance-covariance of (X i ju i ;w i ) is reliably estimable. If the measurement errors for the components of u i are independent, or if the errors are additive, this will generally not pose a problem. APPENDIX A: Components of matri D Straightforward calculations using derivatives, epected values and the definition of the operators b k ( ), k = 1; lead to the following epressions hold for the components of D i, where i refers to the observation. For details, see Rathouz and Liang (1999) and the technical report referred to T yi D i = v yi = D D i1 = D i = D i11 = D i1 @u v 1 yi v 1 yi v 1 yi v 1 D i = D iy + D i +D i11; = D T μ yi = D + v 1 μ + v i i μ i = D i1 3

25 where D iy v v i + and D i is defined μ i yi v μ yi In the case where ffi y is estimated via ^R simultaneously with fi, the following additional components of D i are required. D i1r =, D ir = and D ir =(ffi y v yi ) T D ir1 =(ffi y v yi i D ir =(ffi y v yi i D ir =(ffi y v yi ) 1 8 < v i + and D i 9 = i ; APPENDIX B: Sketch Proof of Theorem 1 To prove Theorem 1, we study (8) and (9) in four steps. In Step 1 (Lemmas and 3), we eamine the distribution of ^u fi via decomposition into terms of different orders. Step involves deriving epressions for the first term in (9), from which the stochastic order and bias are determined. To accomplish this, we show S = ~S + O p (ffi ) (Lemma 5). Then, via important information equalities (Lemma 6), Corollary 7 establishes that ~S is orthogonal to T 1. In Steps 3 and 4, we derive epressions for the second term in (9) (Lemma 8) and the last two terms of (8) (Lemma 9) respectively, from which stochastic order and bias are determined. Throughout, we hold fi and w fied at the true 4

26 values and take the model assumptions in Section 3 as given. More detailed proofs are in a technical report available from the first author. p Lemma. ^u fi u as ffi, and (^u u) =O p (ffi 1= ). Proof. Let u be the true value of u. For u R, ffi T 1 (u) = ffi T 1y (u) + ffi T 1 (u) =ffi O p (1)+(@μ i =@u i )(u; w)~v (u; w) 1 f μ (u; w)g. Since varfx μ (u ;w)g = ffi ~v (u ;w), X L μ (u ;w), which implies that X p μ (u ;w). Also, ffi O p (1) p, so ffi T 1 (u) p T Λ 1 (u), where T Λ 1 (u) = (@μ i =@u i )(u; w)~v (u; w) 1 fμ (u ;w) μ (u; w)g. By monotonicity of μ (u), T Λ 1 (u) > ifu<u, and T Λ 1 (u) < ifu>u. Following arguments in Serfling (198, Section 7..1), prfj^u fi u j <fflg1asffi, completing the consistency proof. The order O p (ffi 1= )isshown via Taylor series epansion. For the remainder of the proofs, let u be the true value, let prime ( ) denote differentiation with respect to u, and assume all functions S and T and their derivatives S, T, etc. are evaluated at the true u. Let ^u = ^u fi. Lemma 3. Under the conditions of Lemma and mild smoothness conditions on T 1 (u), (^u fi u) =D 1 11 T 1 + Z 1 + Z = D 1 11 T 1 + Z 3 = O p (ffi 1= ); where D 1 11 T 1 does not depend on y, Z 1 = Z 1 () = O p (ffi ), Z = Z (y; ) = O p (ffi 3= ), and Z 3 = Z 3 (y; ) =O p (ffi ). Proof. Recall that D 11 = E( T 1) and define D 11y = E( T 1y) and D 11 = E( T 1). Straightforward inspection provides the following orders of stochastic or fied magnitude as ffi : T 1y = O p (1), T 1 = O p (ffi 1= ), D 11y = 5

27 O(1), D 11 = O(ffi 1 ), T 1y = O p (1), T 1 = O p (ffi 1 ), T 1 = O p (ffi 1 ), T 1 + D 11 = O p (ffi 1= ), T 1 + D 11 = O p (ffi 1= ), T 1 = O p (ffi 1 ), T 1 = O p (ffi 1 ), T 1 E(T 1) = O p (ffi 1= ), T 1y = O p (1), T 1 = O p (ffi 1 ). Also, T 1 (u Λ ) and T 1 (u Λ ) are both O p (ffi 1 ) for u Λ in a neighborhood of u, by smoothness of T 1. Finally, (^u u) = o p (1), by Lemma. The remainder of the proof involves third order Taylor series epansions of T 1 in u, setting T 1 = T 1 + T 1y. Lemma 4. Let the matri a = (a 1 ;a ), where a k is p 1, k = 1;. Then a 1 =~a 1 +O(ffi )=O(ffi ), where ~a 1 = D 1 D And, a =~a +O(ffi 3 )=O(ffi ), where ~a = (D D 11 D 1 D 1 )=(D 3 11). Further, a 1 = ~a 1 + O(ffi ) = O(ffi ), a 1 = O(ffi ), a = O(ffi ), a = O(ffi ), and ~a 1 =(D 1D 11 D 1 D 11)=D 11. Proof. By the epressions in Appendi A, D 1 = O(1), D = O(1), D 11 = O(ffi 1 ), D 1 = O(ffi 1 ), D = D 11 + O p (ffi 1 ) = O p (ffi ). Taylor-series epansions and order-of-magnitude bookkeeping complete the proof. Lemma 5. The u-derivative S of S is S = ~ S + O p (ffi ), where ~S =(S + D 1 ) ~a 1 (T 1 + D 11 ) (~a 1 ~a D 11 )T 1 : (S + D 1 )=O p (1) and the other two terms are O p (ffi 1= ). Proof. Recall that D 1 = E( S ), D 11 = E( T 1) and D 1 = E( T ). In addition to the orders of magnitude in the proof of Lemma 3, S +D 1 = O p (1), T 1 = O p (ffi 1= ), T 1 + D 11 = O p (ffi 1= ), T = O p (ffi 1 ), T + D 1 = O p (ffi 3= ). Write S = S a 1 T 1 a 1T 1 a T a T. By second-order local ancillarity and equation (5), S is unbiased, so we may write S =(S + D 1 ) a 1 (T 1 + D 11 ) a 1(T 1 ) a (T + D 1 ) a (T ); 6

28 for which each term in parentheses is unbiased. Working term-by-term, the first is O p (1). Using Lemma 4, the second is a 1 (T 1 + D 11 ) = ~a 1 (T 1 + D 11 )+ O p (ffi 3= ) = O p (ffi 1= ). Similarly, the third term is a 1T 1 = ~a 1T 1 + O p (ffi 3= ) = O p (ffi 1= ). For the fourth term, use Lemma 6 and equation (5) to write T + D 1 = ft 1 E(T 1 )g ft 1 D 11 g +ft 1 (T 1 + D 11 ) (D 11 D 1 =)g: Thereby show that a (T + D 1 )= ~a T 1 D 11 + O p (ffi )=O p (ffi 1= ). The last term is a T = O p (ffi ), completing the proof. Lemma 6. The unbiased estimating functions (S +D 1 ) and (T 1 +D 11 ) are information unbiased with respect to T 1. That is Ef (S + D 1 ) g = Ef(S + D 1 )T 1 g = D 1 D and Ef (T 1 + D 11 ) g = Ef(T 1 + D 11 )T 1 g = D 11 D 1. Proof. The results are shown through application of equation (5), manipulations of the epressions in Appendi A and the surrogacy assumption. Corollary 7. S T 1 D 1 11 = ~S T 1 D O p (ffi 3= )=O p (ffi 1= ), where E( ~S T 1 )=. So, under uniform integrability, E(S T 1 D 1 11 )=O(ffi 3= ). Proof. T 1 D 1 11 = O p (ffi 1= ), so S T 1 D 1 11 = ~S T 1 D O p (ffi 3= )=O p (ffi 1= ). E( ~S T 1 ) = is shown using the epressions in Lemmas 4, 5 and 6. Lemma 8. The quantity n (^ufi u) D 1 11 T 1 o S is of order O p(ffi ), but its bias under uniform integrability is of order O(ffi 3= ). Proof. By Lemmas 3 and 5, S n (^ufi u) D 1 11 T 1 o = Op (ffi ). By the same lemmas, we may write S n (^ufi u) D 1 11 T 1 o =(S + D 1 )Z 1 ()+O p (ffi 3= )+O p (ffi 3= ): 7

29 The epectation of the first term in this epression is by independence. Lemma 9. The quantity (^u fi u) S is of order O p(ffi ), but its bias under uniform integrability is of order O(ffi 3= ). The quantity (^u fi u) 3 S (u Λ fi) is of order O p (ffi 3= ) and its bias under uniform integrability is of order O(ffi 3= ). Proof. First, S = S a 1T 1 a 1T 1 a 1 T 1 a T a T a T : Then, since S is unbiased, we may replace each term with its centered version, i.e. S E(S ), a 1fT 1 E(T 1)g. Then S E(S )=O p (1), T (k) 1 E(T (k) 1 )= O p (ffi 1= ), and T (k) E(T (k) ) = O p (ffi 3= ), k = 1;. Write S = fs E(S )g + fs S +E(S )g. Order-of-magnitude bookkeeping shows that S S +E(S ) = O p (ffi 1= ). Therefore, (^u fi u) S = O p (ffi ). Applying Lemma 3, write (^u fi u) S = D 11 T1fS E(S )g + O p (ffi 3= ) The epected value of the first term in the foregoing epression is by independence. For the second result, straightforward computations show S = O p (1). Also, (^u fi u) 3 = O p (ffi 3= ); assuming S (u) is sufficiently smooth in u such that S (u Λ )=O p (1), then (^u fi u) 3 S (u Λ )=O p (ffi 3= ), completing the proof. Proof of Theorem 1. Let Z = D 1 11 ~ S T 1. Using epansions (8) and (9), and applying Corollary 7 and Lemmas 8 and 9, the proof is immediate. APPENDIX C: Algorithm for solving ^S = The equation ^S (fi) = can be solved by iterating two steps: (1) For fied ^u i s, take one step in the solution of P i S i (fi; ^u i ) = for fi. () For fied ^fi and for 8

30 each i, take one step in the solution of T Λ i1( ^fi;u i )= for u i. The first step is implemented using a Fisher-scoring algorithm, eploiting the components of the matri D i. The precision matri for fi inferences is the first set of columns D i+ = (D i ;D i1 ;D i ) T, of D i. Defining the matri L i = (I p ; a i ), we have S i = L i (S T i;t i1 ;T i ) T, and hence Ef (@S i =@fi) = L i D i+ = D i. Estimating D i by plugging in ^fi (o) and ^u (o) i ^fi (n) = ^fi (o) + ( X i, ^fi (o) is updated to ^fi (n) with D i ( ^fi (o) )) 1 ^S ( ^fi (o) ): In the second step, we use a Newton-Raphson scheme to update ^u (o) i to ^u (n) i using the observed rather than the epected derivatives of T Λ i1 with respect to u i. We set the maimum number of iterations between (1) and () at 1. APPENDIX D: Proof of Section 4.3 result By standard estimating function theory (^ ) = (@U =@ ) 1 U + o p ( p n). By Taylor-series epansion, ^S (fi; ^ ) = ^S (fi; 1 U + o p ( = ^S (fi; ) CD 1 U + o p ( p n): Since E(U ) = and (@U =@fi) =,it is immediate that Ef (@ ^S (fi; ^ )=@fi)g =Ef (@ ^S (fi; )=@fi)g + o(n), whose limit (scaled by n) is A. Also, Ef ^S (fi; ^ ) ^S (fi; ^ ) T g =E h f ^S (fi; ) CD 1 U gf ^S (fi; ) CD 1 U g T i +o(n); whose limit (scaled by n)isb Λ. The result that p n( ^fi fi Λ ) d Nf;A 1 B Λ (A 1 ) T g then follows from standard estimating function theory. 9

31 References Carroll, R.J., Ruppert, D., and Stefanski, L.A. (1995), Measurement Error in Nonlinear Models, London: Chapman and Hall. Carroll, R.J., and Stefanski, L.A. (199), Approimate Quasi-likelihood Estimation in Models with Surrogate Predictors," Journal of the American Statistical Association, 85, Carroll, R.J., and Wand, M.P. (1991), Semiparametric Estimation in Logistic Measurement Error Models," Journal of the Royal Statistical Society, Ser. B, 53, Cook, J., and Stefanski, L.A. (1995), A simulation etrapolation method for parametric measurement error models," Journal of the American Statistical Association, 89, Crowder, M. (1987), On Linear and Quadratic Estimating Functions," Biometrika, 74, Efron, B., and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman and Hall. Godambe, V.P. (196), An Optimum Property of Regular Maimum Likelihood Estimation," Annals of Mathematical Statistics, 31, Godambe, V.P., and Thompson, M.E. (1974), Estimating Equations in the Presence of a Nuisance Parameter," Annals of Statistics,,

32 Lindsay, B. (198), Conditional Score Functions: Some Optimality Results," Biometrika, 69, Lindsay, B.G. (1985), Using Empirical Partially Bayes Inference for Increased Efficiency," The Annals of Statistics, 13, McBurnett, K., Lahey, B.B., Rathouz, P.J., and Loeber, R. (). Low Salivary Cortisol and Persistent Aggression in Boys referred for Disruptive Behavior," Archives of General Psychiatry, 57, McCullagh, P. (1983), Quasi-likelihood Functions," Annals of Statistics, 11, McCullagh, P., and Nelder, J.A. (1989), Generalized Linear Models (nd ed.). London: Chapman and Hall. Rathouz, P.J., and Liang, K-Y. (1999), Reducing Sensitivity to Nuisance Parameters in Semiparametric Models: A Quasiscore Method," Biometrika, 86, Small, C.G., and McLeish, D.L. (1989), Projection as a Method for Increasing Sensitivity and Eliminating Nuisance Parameters," Biometrika, 76, Small, C.G., and McLeish, D.L. (1994), Hilbert Space Methods in Probability and Statistical Inference, New York: John Wiley and Sons. Stefanski, L.A. (1985), The Effects of Measurement Error on Parameter Estimation," Biometrika, 7,

33 Stefanski, L.A., and Carroll, R.J. (1987), Conditional Scores and Optimal Scores for Generalized Linear Measurement Error Models," Biometrika, 74, Waterman, R.P., and Lindsay, B.G. (1996a), Projected Score Methods for Approimating Conditional Scores," Biometrika, 83, Waterman, R.P., and Lindsay, B.G. (1996b), A Simple and Accurate Method for Approimate Conditional Inference Applied to Eponential Family Models," Journal of the Royal Statistical Society, Ser. B, 58, Wedderburn, R.W.M. (1974), Quasi Likelihood Functions, Generalized Linear Models and the Gauss-Newton Method," Biometrika, 61,

34 Table 1. Simulation Study of Overdispersed Poisson Model with Intercept and Mismeasured Covariate. 5 Replicates. u i ο normal Model % bias in ^fi 1 % CV ( ^fi 1 ) MSE Cov. ep(fi 1 ) ffi y ffi ^S RC ^S RC ^S Rat. % u i ο uniform NOTE: Full model is μ y = fi + fi 1 u, with fi =. Variable u has standard deviation one; distribution is given in the tet. CV ( ^fi 1 ) is the squared coefficient of variation, relative to the true fi 1. MSE Ratio is the mean squared error of the RC estimator relative to the ^S estimator. Coverage percent is for nominal 95% Wald-type confidence intervals for fi 1 using ^S = and variance estimator in Section

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari Orthogonal Locally Ancillary Estimating Functions for Matched-Pair Studies and Errors-in-Covariates Molin Wang Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and John J.

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

arxiv: v2 [stat.me] 8 Jun 2016

arxiv: v2 [stat.me] 8 Jun 2016 Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Longitudinal data analysis using generalized linear models

Longitudinal data analysis using generalized linear models Biomttrika (1986). 73. 1. pp. 13-22 13 I'rinlfH in flreal Britain Longitudinal data analysis using generalized linear models BY KUNG-YEE LIANG AND SCOTT L. ZEGER Department of Biostatistics, Johns Hopkins

More information

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models Sonderforschungsbereich 386, Paper 196 (2000) Online

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Fisher information for generalised linear mixed models

Fisher information for generalised linear mixed models Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

Measurement error, GLMs, and notational conventions

Measurement error, GLMs, and notational conventions The Stata Journal (2003) 3, Number 4, pp. 329 341 Measurement error, GLMs, and notational conventions James W. Hardin Arnold School of Public Health University of South Carolina Columbia, SC 29208 Raymond

More information

Plot of λ(x) against x x. SiZer map based on local likelihood (n=500) x

Plot of λ(x) against x x. SiZer map based on local likelihood (n=500) x Local Likelihood SiZer Map Runze Li Department of Statistics Pennsylvania State University University Park, PA 682-2 Email: rli@stat.psu.edu J. S. Marron Department of Statistics University of North Carolina

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

ATINER's Conference Paper Series STA

ATINER's Conference Paper Series STA ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u Bayesian generalized additive mied models. study A simulation Stefan Lang and Ludwig Fahrmeir University of Munich, Ludwigstr. 33, 8539 Munich email:lang@stat.uni-muenchen.de and fahrmeir@stat.uni-muenchen.de

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University i Applications of Basu's TheorelTI by '. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University January 1997 Institute of Statistics ii-limeo Series

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes?

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Song-Hee Kim and Ward Whitt Industrial Engineering and Operations Research Columbia University

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Conditional Estimation for Generalized Linear Models When Covariates Are Subject-specific Parameters in a Mixed Model for Longitudinal Measurements

Conditional Estimation for Generalized Linear Models When Covariates Are Subject-specific Parameters in a Mixed Model for Longitudinal Measurements Conditional Estimation for Generalized Linear Models When Covariates Are Subject-specific Parameters in a Mixed Model for Longitudinal Measurements Erning Li, Daowen Zhang, and Marie Davidian Department

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS

SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be one -hour paper consisting of 4 questions..

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis Lecture Recalls of probability theory Massimo Piccardi University of Technology, Sydney,

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Some Remarks on Overdispersion Author(s): D. R. Cox Source: Biometrika, Vol. 70, No. 1 (Apr., 1983), pp. 269-274 Published by: Oxford University Press on behalf of Biometrika Trust Stable

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random Paul J. Rathouz University of Chicago Abstract. We consider the problem of attrition under a logistic

More information

Exact and Approximate Numbers:

Exact and Approximate Numbers: Eact and Approimate Numbers: The numbers that arise in technical applications are better described as eact numbers because there is not the sort of uncertainty in their values that was described above.

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Economics 205 Exercises

Economics 205 Exercises Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

Numerical Methods. Root Finding

Numerical Methods. Root Finding Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675. McGill University Department of Epidemiology and Biostatistics Bayesian Analysis for the Health Sciences Course EPIB-675 Lawrence Joseph Bayesian Analysis for the Health Sciences EPIB-675 3 credits Instructor:

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011 .S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR

POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR Stephen J. Iturria and Raymond J. Carroll 1 Texas A&M University, USA David Firth University of Oxford,

More information

arxiv: v1 [stat.co] 26 May 2009

arxiv: v1 [stat.co] 26 May 2009 MAXIMUM LIKELIHOOD ESTIMATION FOR MARKOV CHAINS arxiv:0905.4131v1 [stat.co] 6 May 009 IULIANA TEODORESCU Abstract. A new approach for optimal estimation of Markov chains with sparse transition matrices

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

CSCI-6971 Lecture Notes: Probability theory

CSCI-6971 Lecture Notes: Probability theory CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

SIMEX and TLS: An equivalence result

SIMEX and TLS: An equivalence result SIMEX and TLS: An equivalence result Polzehl, Jörg Weierstrass Institute for Applied Analysis and Stochastics, Mohrenstr. 39, 10117 Berlin, Germany polzehl@wias-berlin.de Zwanzig, Silvelyn Uppsala University,

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Tolerance limits for a ratio of normal random variables

Tolerance limits for a ratio of normal random variables Tolerance limits for a ratio of normal random variables Lanju Zhang 1, Thomas Mathew 2, Harry Yang 1, K. Krishnamoorthy 3 and Iksung Cho 1 1 Department of Biostatistics MedImmune, Inc. One MedImmune Way,

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Introduction to Probability Theory for Graduate Economics Fall 2008

Introduction to Probability Theory for Graduate Economics Fall 2008 Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function

More information

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}

More information

Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University

Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University, 326 Classroom Building, University Park, PA 16802,

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information