The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Size: px

Start display at page:

Download "The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models"

Pamela Cummings
5 years ago
Views:

1 The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San Francisco

2 Outline Motivation/Background 1 Motivation/Background 2 3

3 Antibotic Rx Study in acute care (Gonzales et al, Academic Emergency Medicine 2006) Study objectives: 1) assess intervention to reduce inappropriate antibiotic prescription rates - Rx for antibiotic-nonresponsive conditions (e.g. acute bronchitis); 2) identify poorly performing providers Design: multiple responses from 720 providers (clusters) Large between-provider variability in RX rates & cluster sizes - borrow strength between providers Popular approach: Use estimated regression coefficients & predicted random effects from GLMMs

4 Generalized Linear Mixed Models Y ij b i, X ij GLM E(Y ij X ij, b i ) = h 1 (β 0 + b i + β 1 X ij ) h( ) link function, e.g. identity, logit, probit i=1,..., m : subjects; j=1,..., n i : repeats per cluster When b G T, Y i1,..., Y ni cond ly indep given b i & b i indep of X m Likelihood: L(β, G T ) = n i i=1 j=1 f (Y ij b, X ij ) dg T (b)

5 Misspecified random effects distributions G T typically unknown & we might assume b G F G T. Often G F = N(0, σb 2 ) (e.g. Stata default) Two general forms of misspecified G T : Incorrect distributional shape Incorrectly assuming b indep of X 1 E(b X) = µ M (X), Neuhaus & McCulloch (2006) 2 var(b X) = µ V (X), Heagerty & Kurland (2001)

6 Worry about correctly specifying the shape of G T? Some say yes - Motivation for more flexible G F, e.g. mixture of normals (Chen et al 2002), nonparametric G F (Lesperance & Kalbfleisch 1992) & specification tests (Tchetgen & Coull 2006) Others show little bias in slopes", ˆβ 1 (Neuhaus et al 1992) Little work on random effects prediction under misspecification

7 Bias with misspecified shape of G? Linear mixed effects model: Y ij b i = β 0 + σ b b i + β 1 X ij + e ij b e, b G T, E(b) = 0, var(b) = 1, var(e) = σ 2 e cov(y i1,..., Y ini ) = V = σ 2 ei + σ 2 b J, indep of G T E{ ˆβ GLS } = E{(X T V 1 X) 1 X T V 1 Y } = β 1, indep of G F Unbiased estimators of slopes, β 1, with misspecified shape

8 Assessing consequences of random effects misspecification Follow theory on inference with misspecified models (e.g. White 1994) - let ξ = (β, θ) True : f T (Y i X i ; ξ) = Fitted : f F (Y i X i ; ξ ) = n i f (Y ij b, X ij ; β) dg T (b; θ) j=1 n i f (Y ij b, X ij ; β ) dg F (b; θ ) j=1 MLE" ˆξ ξ minimizes E X E Y X log {f T (y X; ξ)/f F (y X; ξ )} Kullback-Leibler Divergence

9 Misspecified models Minimizing K-L Divergence wrt ξ = (ξ 1,..., ξ q) yields E X y λ(y X, ξ ) ξ { k n i f (Y ij b, X ij ; β ) dg F (b; θ )}dy = 0 j=1 where λ(y X, ξ ) = f T (Y = y X, ξ)/f F (Y = y X, ξ ) Note: ξ that yield λ(y X, ξ ) = 1 X solve system Can solve for ξ analytically in some simple cases, i.e. match fitted & true densities at all points X

10 Matched pairs, solutions under misspecification For some link functions can find analytic solution with β 1 = β 1, but β 0 β 0 & σ σ Special cases: ˆβ 1 consistent, but ˆβ 0 & ˆσ inconsistent when G F G T

11 Logistic link Motivation/Background Binary matched pairs - when G F generates a wide range of pr(y 1, y 2 ) (e.g. G F = Normal) ˆβ 1 is consistent (& ˆβ 1 ˆβ CML ) (Neuhaus et al 1994) G F unspecified, ˆβ 1 ˆβ CML (Lindsay et al 1991) General X - when true β 1 = 0, β 1 = 0 solves Kullback-Leibler minimizing equations ˆβ 1 consistent Typically cannot solve Kullback-Leibler system analytically

12 Numerical solution of Kullback-Leibler equations Numerically find ξ that minimizes E X E Y X log {f T (y ξ, X)/f F (y ξ, X)} using routines in R Numerically evaluate integrals Numerically minimize Kullback-Leibler divergence Allows evaluation of bias over wide range of parameter values, random effect & covariate distributions

13 Numerical assessment of bias True: logit {pr(y ij = 1 X ij, b i )} = β 0 + σb i + β 1 X ij True G T : b exp(1), shifted to E(b) = 0 Fitted G F : b N(0, 1) X varied within clusters X = (0,.25,.5,.75, 1), n i = 5 Range of parameter values: β 0 : (-3,-1) by 0.1 β 1 : (0.5,2) by 0.1 σ: (0.2,2.5) by 0.1 Numerically solved for β 0, β 1, σ that minimize Kullback-Leibler divergence

14 10 Bias in beta1, True = exponential, Fitted = Normal beta0: 3 beta0: 2 beta0: 1 % Bias sigma: 0.2 sigma: 1 sigma: 2.5 beta True beta1

15 50 40 Bias in sigma, true = exponential, fitted = Normal beta0: 3 beta0: 2 beta0: 1 % Bias beta1: 0.5 beta1: 1 beta1: 1.9 beta True sigma

16 % Bias Bias in beta0, true = exponential, fitted = Normal sigma: 0.2 sigma: 1 sigma: True beta0 beta1: 0.5 beta1: 1 beta1: 1.9 sigma

17 Logistic link - further results Find approximate solution of Kullback-Leibler equations using Taylor series methods (Neuhaus et al 1992) Approximate solution & simulations indicate E( ˆβ 1 ) β 1 when G T misspecified, but biased ˆβ 0 & ˆσ b

18 Antibiotic Rx of acute respiratory infections Study objective: assess intervention to reduce inappropriate antibiotic prescription rates - Rx for antibiotic-nonresponsive conditions (e.g. acute bronchitis) Cluster: visits to a provider for antibiotic-nonresponsive conditions - number of visits varied from 1 to 71 Design: Baseline Y intervention Post-Y Binary outcome: prescribed antibiotics for nonresponsive conditions (yes/no) - measured at each visit Covariates: intervention, time, time*intervention, provider type, illness duration prior to visit

19 Antibiotic Rx study - Fitted models logit pr(y ij = 1 b, X ij ) = β 0 + exp(log σ b ) b + β 1 X ij where b G with E(b) = 0, Var(b) = 1 G 1 = N(0,1) G 2 = Exponential (1) standardized to E(b) = 0 G 3 = Nonparametric, 4 support points (GLLAMM)

20 Antibiotic Rx study - Results Estimated parameters from mixed-effects logistic models with different random effects distributions ˆβ, (SE) G β 0 TREAT TIME TRT*TIME log σ b Normal (0.53) (0.32) (0.32) (0.20) (0.08) Exponential (0.52) (0.31) (0.31) (0.19) (0.09) NP (4) (0.51) (0.32) (0.31) (0.19)

21 for fixed effects estimation 1 Misspecifying shape of G T yields accurate slopes" ˆβ 1 over wide range of β 0, σ 2 b 2 Misspecifying shape of G T can yield biased estimates of β 0 & σb 2 %bias in ˆβ 0 with σb 2 & can be substantial % bias in ˆσ can also be large 3 When interest focuses on slope parameters, misspecifying shape has little effect on bias

22 of random effects Useful method: Predict b by b that minimize E[ b b] 2 = ( b b) 2 f (b, y) dy db, where f (b, y) is the joint density of b & y 1,..., y n Can show: b i = E(b i y i1,..., y ini ) Depends on f (b, y), hence on G Misspecifying G may produce inaccurate b

23 Linear Mixed-Effects Model Y ij = β 0 + b i + β 1 X ij + ɛ ij, b i N(0, σ 2 b ), ɛ ij N(0, σ 2 ɛ ) i = 1,..., m; j = 1,..., n i Estimated BLUP: Estimated b = ˆb i = ˆD i Z T i ˆV 1 i (y i X i ˆβ) ˆD i = Cov(b ˆ i ) ˆV i = var(y ˆ i ) Expression depends on joint normality of b & y Misspecification of distribution of b may produce inaccurate b

24 Theory for Linear Mixed-Effects Model (simple case) Y ij = µ + b i + ɛ ij, i = 1,..., m; j = 1,..., n i b i N(0, σ 2 b ), ɛ ij N(0, σ 2 ɛ ), ɛ ij b i, µ, σ 2 b, σ2 ɛ known Best Linear Unbiased Predictor (BLUP) is b i that minimizes E[( b i b i ) 2 ] Simple calculations show that b i = E[b i y] = σ 2 b σ 2 b + σ2 ɛ /n i (Ȳi µ) = σ 2 b σ 2 b + σ2 ɛ /n i (b i + ɛ i )

25 Theory for LME (cont.) Conditional on b i, the Y ij are independent N(µ + b i, σ 2 ɛ ) σ b i b i N µ b b 2 = σb 2 + b i, σ2 ɛ /n i ( ) 2 σb 2 σ 2 ɛ σb 2 + σ2 ɛ /n i n i Thus, b i is conditionally biased However, since calculations are b i, result does not depend on distribution of b i i.e. conditional bias does not depend on distribution of b i

26 Features of b i As n i, b i = σ 2 b σ 2 b + σ2 ɛ /n i (b i + ɛ i ) σ2 b σ 2 b (b i + 0) = b i b i converges to true value as n i Such asymptotics typically not of interest, rather as m

27 Density of b i What does the density of b i look like? Also, what if b i misspecified to be normal? If n i large, then each b i b i density is approximately correct, irrespective of assumed density What if n i not large, usual case of interest? Then density of b i is convolution of true density of b i with the conditional density of b i given b i

28 Density of b i What does the density of b i look like? Also, what if b i misspecified to be normal? If n i large, then each b i b i density is approximately correct, irrespective of assumed density What if n i not large, usual case of interest? Then density of b i is convolution of true density of b i with the conditional density of b i given b i

29 Density of b i What does the density of b i look like? Also, what if b i misspecified to be normal? If n i large, then each b i b i density is approximately correct, irrespective of assumed density What if n i not large, usual case of interest? Then density of b i is convolution of true density of b i with the conditional density of b i given b i

30 Density of b i What does the density of b i look like? Also, what if b i misspecified to be normal? If n i large, then each b i b i density is approximately correct, irrespective of assumed density What if n i not large, usual case of interest? Then density of b i is convolution of true density of b i with the conditional density of b i given b i

31 Density of b i (cont) Suppose b i Exponential(1), shifted so that E(b i ) = 0 Density of b i (under normal assumption) is 0 exp{ ( b µ b) 2 n i /(2σ 2 ɛ )} exp( b 1) d b Straightforward to evaluate numerically

32 Plot of BLUP Densities for Cluster Size 2 Density Assumed Normal Exponential BLUP

33 Plot of BLUP Densities for Cluster Size 4 Density Assumed Normal Exponential BLUP

34 Plot of BLUP Densities for Cluster Size 6 Density Assumed Normal Exponential BLUP

35 Plot of BLUP Densities for Cluster Size 8 Density Assumed Normal Exponential BLUP

36 Plot of BLUP Densities for Cluster Size 10 Density Assumed Normal Exponential BLUP

37 Plot of BLUP Densities for Cluster Size 16 Density Assumed Normal Exponential BLUP

38 Plot of BLUP Densities for Cluster Size 20 Density Assumed Normal Exponential BLUP

39 BLUP density plot findings Density of BLUPs inherits much of its shape from assumed density Doesn t reflect shape of true density of the random effects until cluster sizes get large

40 Best Predicted values Under LME, with e ij N(0, σ 2 e), & true density f bi (b i ), b i = E[b i Y i ] = b i exp{ n i 2σ 2 ɛ exp{ n i 2σ 2 ɛ (ν i b i ) 2 }f bi (b i )db i (ν i b i ) 2 }f bi (b i )db i where ν i = (Ȳi x i β) Given ν i, compute b i for various true f bi (b i ): Normal, T 3, Exponential (1), Mixture of Normals

41 BP plots Motivation/Background Best Within Cluster Deviation Normal Mixture T w/ 3 d.f. Exponential

42 Order preservation for LME Can show b i ν i > 0 for any f bi (b i ) transformation ν i b i monotone Given n i, BP s under any assumed f ordered based on ν i Let f 1 (b i ), f 2 (b i ) denote assumed random effects densities b i1 (ν i ) & b i2 (ν i ), predictions from each value ν i Order the pairs [ b i1 (ν i ), b i2 (ν i )] by ν i Since b i ν i > 0, pairs also ordered by b i1 (ν i ) & b i2 (ν i ). Thus, all pairs concordant & Kendall s τ between b i1 (ν i ) and b i2 (ν i ) is 1

43 Mean square error of prediction Given σ 2 b, σ2 e, we calculated b assuming b N(0,1) b Exponential b Mixture of two N(0,1) Using expressions for b we calculated MSE = E[( b b) 2 ] for various true f b (b) using numerical integration

44 True Gaussian Random Effects True Exponential Random Effects True Mixture Random Effects MSEP Random Effects SD 0.5 MSEP Random Effects SD 0.5 MSEP Random Effects SD Sample Size Per Cluster Sample Size Per Cluster Sample Size Per Cluster MSEP Random Effects SD 1 MSEP Random Effects SD 1 MSEP Random Effects SD Sample Size Per Cluster Sample Size Per Cluster Sample Size Per Cluster MSEP Random Effects SD 2 MSEP Random Effects SD 2 MSEP Random Effects SD Sample Size Per Cluster Sample Size Per Cluster Sample Size Per Cluster Assumed distributions: Solid line/square=gaussian, dotted line/circle=exponential, dashed line/x=mixture

45 Simulations Motivation/Background Evaluate performance of BP s when all parameters estimated Two true & assumed f (b), i.e. 4 true/assumed settings b N(0, 1) b Tukey(g = 0.5, h = 0.1) Linear mixed effects & mixed effects logistic models Linear predictor: η ij = 2 + b i + x between + x within σb 2 = 1, σ2 e = 1 for LME Calculated MSE ˆ = (1/mK ) K m k=1 i=1 ( b ki b ki ) 2 K = 1000, m = 100, cluster sizes=2, 4, 6, 10, 20, 40

46 Percentiles of N(0,1) & standardized Tukey(0.5,0.1) Percentile N(0,1) Standardized Tukey(0.5,0,1).1% % % % % % % % % % %

47 Continuous Outcome Binary Outcome MSEP True Gaussian MSEP True Gaussian Cluster size Cluster size MSEP True Tukey MSEP True Tukey Cluster size Cluster size Assumed distributions: Solid line/square=gaussian, dotted line/circle=tukey

48 Antibiotic Rx study - Fitted models Objective: identify poorly performing providers (i.e. large predicted random effects) logit pr(y ij = 1 b, X ij ) = β 0 + exp(log σ b ) b + β 1 X ij where b G with E(b) = 0, Var(b) = 1 G 1 = N(0,1) G 2 = Exponential (1) standardized to E(b) = 0 Compute b that maximizes log{f (y i1,..., y ini X i1,..., X ini, β, b i ) g(b i θ)} Fit models using PROC NLMIXED in SAS

49 b.norm b.exp Predicted random effects from normal & exponential Spearman corr=0.99

50 of effects on prediction 1 Predicted values of random effects show modest sensitivity to assumed distributional shape. 2 Distribution shape of BLUPs often not reflective of true random effects distribution. 3 Ranking of predicted values is little affected. 4 Misspecified shape can produce modest increases in MSE of prediction.

51 Motivation/Background Misspecifying the shape of the random effects produces little bias in estimated covariate effects slight deterioration in random effects prediction. Stata default of b N(0, σb 2 ) yields accurate estimates of covariate effects & reasonably precise predicted random effects

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject