GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

Size: px

Start display at page:

Download "GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University"

Madeline Carter
5 years ago
Views:

1 GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist University Advertisement: Measurement Error in Nonlinear Models R. Carroll, D. Ruppert, L. Stefanski Chapman & Hall, 1995 SAS and Splus programs for common GLIM s are at

2 OUTLINE The overheads for this talk are available from the web if you have Adobe Acrobat 3.0 and higher. The main talk is at: newmexico.losalamos.talks.directory/mixedmodels/nantuc01.pdf The plots are called nantucket.framingham.plot01.ps nantucket.simex.plot02.ps nantucket.framingham.plot02.ps nantucket.simex.plot03.ps nantucket.plot01.ps nantucket.simex.plot04.ps nantucket.plot02.ps nantucket.plot03.ps nantucket.simex.plot01.ps

3 OUTLINE Generalized Linear Mixed Models (GLMM) Generalized Linear Mixed Measurement Error Models (GLMMeM) are GLMM s with the wrong mean and variance structure Bias analysis: Ordinary regression model Globally independent covariates Correlated covariates within a cluster Surprising effects of cluster size on biases Functional vs. Structural Models Functional: SIMEX and regression calibration Structural: MLE Functional tests for variance components using SIMEX Example

4 THE DATA In what follows, there will be i =1,..., m clusters. Within each cluster, there are n i observations. The data are structured: Y =(Y 1,..., Y n ) = responses within a cluster X =(X 1,..., X n ) = error prone predictors within a cluster Z =(Z 1,..., Z n ) = exactly measured predictors within a cluster W =(W 1,..., W n ) = measured version of X within a cluster When I talk about asymptotics, it will be as the number of clusters gets large, for a fixed number of observations within a cluster.

5 THE MODELS We consider a generalized linear mixed model (GLMM) with the linear part within a cluster given by g(µ) =β 0 +Xβ x +Zβ z +Cb b = Normal {0, D(θ)} GLMMeM is a GLMM with an unobservable fixed effect W = X + U, cov(u) =Σ uu U = Normal(0, Σ uu ) U (Z, Y, C, b, X) The estimation and inference methods are not restricted to additive errors Bias analysis is more detailed and uses the additive structure.

6 THE MODELS Suppose that [X Z] follows a normal linear model Then by the usual calculations, X = Γ 0 + Γ z Z + Γ w W + e e = Normal(0, Σ x zw ) The original model is g(µ) =β 0 +Xβ x +Zβ z +Cb The observed data also follow a GLMM, but with a more complex mean and variance structure: g(µ) =β 0 +(Γ w W)β x +(Γ z Zβ x +Zβ z )+C b C =(C,I) ( ) b b = eβ x Ignoring measurement errors means you may misspecify the structure of the fixed and random effects.

7 EXAMPLES: Ordinary GLIM s In the case of a single observation per cluster, no variance component, we have the usual GLIM. Loosely speaking, for estimating the slope in X, β x, the effects of ignoring measurement error are the same in linear, logistic and Poisson regression, namely attenuation Effectively, with no Z, one estimates var(x) var(x) + var(measurement error) β x In GLMM, we have four additional factors: variance component, which has to be estimated cluster size (surprisingly important) covariance structure of X: are they correlated within a cluster? covariance structure of errors U: are they correlated within a cluster? We will address the first three points.

8 EXAMPLES Suppose that there is no Z (i.e., covariates measured exactly), and no cluster effects in X and W: X = Normal(0,σ 2 xi) U= Normal(0,σ 2 ui) Note that the X s are independent even within a cluster, hence fully exchangeable, etc. We call this the homogeneous case. Within a cluster, b Normal(0,θ), and the model is g(µ) =β 0 +Xβ x +bj The observed data follow g(µ) =β 0 +W(λβ x )+bj+eβ x e= Normal(0,λσ 2 ui) λ= reliability = σ 2 x/(σ 2 x + σ 2 u) Note the change in the error structure

9 EXAMPLES In the homogeneous case, we obtain the following results if one ignores measurement error. In linear regression: β x : estimates λβ x, λ = reliability. θ: consistently estimated In logistic regression (using the probit approximation): β x : λβ x /τ, τ = 1+λσ 2 uβ 2 x/2.9. θ: θ/τ 2 In Poisson regression: β x : λβ x θ: a detailed and nontrivial analysis is required. Ignoring error with cluster sizes of size n estimates θ + log Note that in Poisson regression: { (n 1) + exp(β 2 x σx) 2 } (n 1) + exp(λβxσ 2 x) 2 Bias depends on the cluster size θ is overestimated

10 EXAMPLES WITH CLUSTER CORRELATIONS Suppose that there is no Z (i.e., covariates measured exactly), and that there are cluster effects in X and W. Thus, the clusters have a random mean with variance σxµ 2 and within each cluster, the X s have variance σx: 2 X = Normal(0,σ 2 xi+σ 2 xµj) Now we have that for clusters of size n, with within cluster mean W E(X W) =λw+(1 λ)(1 λ)w J cov(x W) =λσ 2 ui +(σ 2 u/n)(1 λ)(1 λ)j λ = σ 2 x + σ 2 u σ 2 x + σ2 u + nσ2 xµ Difficult structure. Note dependence on n.

11 EXAMPLES WITH CLUSTER CORRELATIONS As n, one can show that one gets the same results as in the homogeneous model, if one replaces θ there by θ +(1 λ) 2 βxσ 2 xµ. 2 Overestimating of the variance component in linear and Poisson case, typically also in logistic model. For fixed cluster size n, a detailed analysis yields exact formulae to determine bias in the linear case. No exact formulae in the probit or logistic cases. We used numerical and Monte Carlo integration (both) Surprisingly strong effects of cluster size, both for β x and for θ.

12 FUNCTIONAL AND STRUCTURAL INFERENCE Functional and structural approaches differ in what they assume about the X s. A structural approach would typically assume that within a cluster, the X s are independent and normal, with the cluster means themselves being normally distributed. Current attempts in the GLIM literature try to weaken the normality assumption, e.g., hierarchical models, mixtures of normals,... A functional approach tries to make no assumptions about distributions of the X s. Model robustness gained at the potential cost of loss of efficiency. New versions include NPMLE (Nonparametric mle). We have implemented one functional and one structural method.

13 FUNCTIONAL METHODS Most common functional method: regression calibration. The idea is to replace X within a cluster by its best linear prediction based on (W, Z). Works reasonably well for estimating fixed effects Does not work to estimate random effect variances. A computationally intensive alternative is called SIMEX Due to Cook & Stefanski (1994) Theory and standard errors described in the book. The idea is to add increasing but known amounts of error onto W via SIMulation, fit the data using a method which ignores measurement error, trace the fits out, fit a function to the trace, and then EXtrapolate back to the no error case. Here is the method defined via graphs.

14 SIMEX To cut down on simulation variability, instead of adding on error once, add it on many (100) times and use the average or median for the given amount of error variance. There is no single function to fit and then extrapolate. The safe default is a quadratic function. There is a theory of exact extrapolants, but it is wildly difficult to implement in this context In general, the SIMEX estimates are not consistent, but they are approximately so. Essentially exact, to order O(σu), 6 for small error. Because we have a good fast algorithm for it, we used CPQL of Breslow & Lin as the basic estimation method (which ignores measurement error).

15 SIMEX ASYMPTOTIC THEORY The book derives a general asymptotic theory and computable standard errors under two general conditions Very simple estimates if the error variance is known. QVF has a bootstrap: very fast implementation The true parameters are required to be in the interior of the parameter space The estimate being used ignoring measurement error must be an M estimator synonym: solution to an estimating equation this is the case for the mle, CPQL, etc. We showed that the SIMEX estimates are themselves solutions to computable estimating equations (not obvious, but rather nice!) Thus, in the interior of the parameter space, we have computable asymptotic standard errors as the number of clusters gets large.

16 SIMEX INFERENCE On the boundary, special techniques required. testing whether a variance component equals zero. There are a number of score tests for variance components, recently reviewed by Xihong Lin in a technical report. We focus on the global hypothesis: no variance components exist The score test statistic is of the form S = U T I 1 U I = estimated covariance of U under the hypothesis U = average of independent r.v. s with estimated parameters For the random intercept model, U is an overdispersion statistic, based on averages across clusters of squares of weighted within cluster residuals.

17 SIMEX SCORE TESTS U = average of independent r.v. s with estimated parameters The trick is simple. Our general theory is based on statistics which are equivalent to averages of independent r.v. s with estimated parameters. But U is just such a statistic! Thus, we can use SIMEX to estimate U simex, what U would be if there were no measurement error Merely need an estimate of the variance of U simex, which is what our asymptotic theory provides anyway, call this I simex The SIMEX score test is S simex = U T simexi 1 simex U simex

18 SIMEX SCORE TESTS We simulated data closely related to the example discussed below, and computed the actual level of nominal 5% tests The actual level ignoring error was > 10% The actual level of our score test was very nearly 5%

19 MAXIMUM LIKELIHOOD If there is no measurement error, there are a wide variety of possible MLE algorithms. With measurement error, we wanted to use the EM algorithm. E step not available in closed form, and would in general require numerical integration. The missing data are the random effects and the X s The E step requires analysis of E (log likelihood of complete data observed data) This requires expectations of functions of the random effects and the X s given the observed data and current parameter estimates

20 MAXIMUM LIKELIHOOD The missing data are the random effects and the X s The E step requires analysis of E (log likelihood of complete data observed data) We repeatedly generated observations from the appropriate conditional distributions using the Metropolis Hastings algorithm This repeatedly gives observations from the unknown X s and the random effects. This is a generalization of an idea due to C. Mc- Culloch in the no error case He observed in the no error case (and it generalizes to our case) that the random effects and X s generated in the E step automatically lead to simple solutions to the M step. In the no error case, the method reliably reproduces the MLE as judged by EGRET, but is very slow in this first implementation.

21 EXAMPLE We considered data from the Framingham Heart Study There were m = 75 clusters (individuals) with most having n = 4 observations, each taken 2 years apart. The variables were Y = evidence of LVH diagnosed by ECG in patients who developed coronary heart disease before or during the study period W = log(sbp-50) Z = age, exam number, smoking status, body mass index. X = average log(sbp-50) reading over many applications within 6 months (say) of each exam. Since blood pressures are only taken every two years, there is no direct evidence of how W differs from X, and hence no direct way to estimate Σ uu, the measurement error covariance.

22 EXAMPLE It is known that besides simple variation in the measurement process, SBP varies according to time of day, day of week, stress, etc. Data do not allow us to get at this without assumptions. It is possible that the errors of W as measures of X are correlated, although with a 2 year lag in a fairly broad population one would not expect this correlation to be terribly large. This design issue is not restricted merely to SBP. Nutrition experiments with long time lags face exactly the same problem Thus, to illustrate the methods we assumed independent measurement errors, i.e., Σ uu = σui. 2 The GLMM is logistic regression with an individual level random intercept.

23 EXAMPLE The residuals from the regression of W on Z show strong cluster (individual) effects (1/3) is the observed variability in these residuals is within individual variance. Thus, we varied σu 2 from extreme to another among the various possibilities: σu 2 =0 no measurement error, within individual variation entirely due to changes in SBP σ 2 u = (1/3) total variation within individual variation entirely due to measurement error To estimate σu 2 we would need additional measurements of SBP at days relatively close to but not the same as the major exam date. Next we show how SIMEX performed for CPQL

24 EXAMPLE As expected, score test for variance component is highly significant p value increases from to as the measurement error variance increases. Estimates ranges as error variance increases: CPQL θ: decreases from 2.05 to 1.85 CPQL β: increases from 2.80 to 3.90 MLE θ: decreases from 2.65 to 2.20 That θ mle >θ cpql is expected, as are the directions given above.

25 DISCUSSION We have shown some of the effects of measurement error on biases of parameter estimates. Major new observation is the effect of cluster size GLMMeM s are GLMM s with a different fixed effect and random effect structure The SIMEX method is one simple functional method for approximately consistent estimation. We used our previous results to find a score test for global variance components The MLE can be computed via EM:M H. While our implementation is slow, it does allow for non-normal X s. The example illustrated a design problem Without supplemental reliability studies, it will be impossible to estimate the measurement error structure. Identifiability, etc. is still an open question.

26 BAYES ESTIMATION VIA GIBBS SAMPLING It is easy enough to specify various priors and write down expressions for the complete conditionals in a Gibbs sampling implementation There are lots of Metropolis Hastings steps Generating the X s and the random effects actually uses the same code as the EM algorithm We have had various problems though Sensitivity of the answer to the prior Convergence difficulties even with proper priors Work still in progress Natural tests for and shrinkage of the variance components is the aim

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification