Estimating prediction error in mixed models

Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12

GLMM - Generalized linear mixed models g(µ i ) = x i β + z i u. - Conditional responses from an exponential family distribution f(y i β, u). - Impose prior distribution on random effects u N ( 0, G(τ 2 ) ). - Structured additive regression models may be represented as (generalized) mixed models. This includes (generalized) additive models, smoothing-spline models and geoadditive models. 2 / 12

Marginal & Conditional perspective - Marginal log-likelihood: log f(y i β, u)p(u) du The random effects model correlation between responses. - Conditional log-likelihood: log f(y i β, u) Random effects act as ordinary fixed parameters with regularized estimation due to a penalty term induced by the covariance structure of the random effects. For example in penalized regression the random effects are used as tool to model penalized parameters. 3 / 12

Deviance prediction error - Deviance error for regression models: err = 2 log f(y i ˆβ(y i )) + 2C C is the log-likelihood of the saturated model. - Omit C if focus is on model selection. - Too optimistic to predict future values y. The quantity of interest is the expected deviance prediction error: ( ) Err = 2E y log f(y ˆβ(y i )) C. 4 / 12

Covariance penalties - For exponential families with corresponding natural parameter θ [ E (Err) = E err + 2 ] Cov(ˆθ i, y i ). i - In GLMs, the approximation i Cov(ˆθ i, y i ) p is used - The resulting criterion is Akaike s information criterion. - For mixed effects models: Prediction may either be based on the conditional distribution y u or on the marginal distribution y. 5 / 12

Marginal prediction error - Appropriate if focus is on the fixed effects β and predictions y have new random effects u. - Tempting to use marginal log-likelihood and Cov(ˆθ i, y i ) q i with q = dim(β) + dim(τ 2 ), i.e. the marginal AIC. - The marginal responses are not necessarily from an exponential family distribution: [ E (Err) = E err + 2 ] Cov(ˆθ i, y i ) i might not hold. - maic does not choose model with lowest expected deviance prediction error. 6 / 12

Conditional prediction error - Appropriate if the predictions share the same random effects as the observed data. - The conditional responses are from an exponential family distribution but is not an observable quantity. ( ) Cov(ˆθ, y) = E (y µ)ˆθ - For Gaussian models ˆθ = ŷ use the Stein formula Cov(ˆθ, y) = σ 2 E ( ) ŷ y 7 / 12

Conditional prediction error - For a linear mixed models ŷ = Hy = X ˆβ + Zû the covariance penalty reduces to tr ( ) [ ( ŷ X = tr(h) = tr t X X t Z y Z t X Z t Z + G(ˆτ 2 ) ) 1 ( X t X X t Z Z t X Z t Z ) ] - ˆτ 2 depends on y. Ignoring this dependence induces a bias. - Corrected criterion can be derived by implicit differentiation tr ( ) ŷ = tr(h) + y j Hy ˆτ 2 j ˆτ 2 j y 8 / 12

Poisson & exponential - If the response is Poisson distributed then use the Chen-Stein formula: ( Cov(ˆθ, y) = E y(ˆθ(y) ˆθ(y ) 1)). - The expected deviance error can be estimated by err + 2 i y i (ˆθi (y i ) ˆθ ) i (y i 1). - For exponentially distributed responses, the covariance penalty is ( y ) Cov(ˆθ, y) = E yˆθ(y) ˆθ(x)dx. 0 9 / 12

Centralized Steinian - In case of Bernoulli responses, i.e. binary data, the covariance penalty may be rewritten as ( Cov(ˆθ, y) = E µ(1 µ)(ˆθ(1) ˆθ(0)) ). - µ is not available it can be replaced by a consistent estimator ˆµ: err + 2 i ˆµ i (1 ˆµ i ) (ˆθi (1) ˆθ ) i (0). - Similarly for continuous exponential family distributions the expected conditional deviance error can be approximated by err + 2 i ˆµ i y i. 10 / 12

Model selection - (random intercept) model 1: 1.00 Selection frequencies of model 1 ( ) µij log = β 0 + β 1 x i + u j 1 µ ij u N (0, τ 2 I) - (linear) model 2: ( ) µi log = β 0 + β 1 x i 1 µ i 0.75 0.50 0.25 0.00 1.00 0.75 0.50 n = 25 n = 100 Variable proposed tr(h) marginal true - Choose model with lowest expected deviance. 0.25 0.00 0.0 0.5 1.0 1.5 τ 2 11 / 12

Summary Two prediction perspectives: marginal & conditional Choose model with lowest expected conditional deviance error Unbiased estimates for Gaussian, Poisson & exponential responses Asymptotic estimates for further exponential family distributions 12 / 12