On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Similar documents
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Estimating prediction error in mixed models

Modfiied Conditional AIC in Linear Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Regression, Ridge Regression, Lasso

Model comparison and selection

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Statistics 203: Introduction to Regression and Analysis of Variance Course review

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Data Mining Stat 588

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Nonparametric Small Area Estimation Using Penalized Spline Regression

High-dimensional regression

UNIVERSITÄT POTSDAM Institut für Mathematik

Akaike criterion: Kullback-Leibler discrepancy

Regression I: Mean Squared Error and Measuring Quality of Fit

Discrepancy-Based Model Selection Criteria Using Cross Validation

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Introduction to Statistical modeling: handout for Math 489/583

Geographically Weighted Regression as a Statistical Model

Weighted Least Squares

MS-C1620 Statistical inference

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Machine Learning for OR & FE

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Likelihood ratio testing for zero variance components in linear mixed models

Efficient Estimation for the Partially Linear Models with Random Effects

A Variant of AIC Using Bayesian Marginal Likelihood

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Empirical Economic Research, Part II

Hierarchical Modelling for Univariate Spatial Data

Model Selection and Geometry

arxiv: v2 [stat.co] 17 Mar 2018

Quick Review on Linear Multiple Regression

mboost - Componentwise Boosting for Generalised Regression Models

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Models

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

MS&E 226: Small Data

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Weighted Least Squares

1 Mixed effect models and longitudinal data analysis

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

13. October p. 1

How the mean changes depends on the other variable. Plots can show what s happening...

Nonparametric Regression. Badr Missaoui

variability of the model, represented by σ 2 and not accounted for by Xβ

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Covariance function estimation in Gaussian process regression

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Lecture 3: Statistical Decision Theory (Part II)

Applied Econometrics (QEM)

Linear Regression Model. Badr Missaoui

Sparse Linear Models (10/7/13)

RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models

Generalized Additive Models

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Parametric Bootstrap Methods for Bias Correction in Linear Mixed Models

Regularized PCA to denoise and visualise data

2.2 Classical Regression in the Time Series Context

Hierarchical Modeling for Univariate Spatial Data

arxiv: v2 [math.st] 9 Feb 2017

Introductory Econometrics

Modelling geoadditive survival data

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Hierarchical Modelling for Univariate Spatial Data

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Model selection using penalty function criteria

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

F9 F10: Autocorrelation

Topic 4: Model Specifications

Modelling using ARMA processes

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Introduction to Within-Person Analysis and RM ANOVA

Regression #3: Properties of OLS Estimator

One-stage dose-response meta-analysis

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Simple Linear Regression

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks

Chapter 4: Factor Analysis

Generalized Elastic Net Regression

Two Applications of Nonparametric Regression in Survey Estimation

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Accounting for Complex Sample Designs via Mixture Models

Transcription:

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of Biostatistics Johns Hopkins University Dortmund, 23.3.2010

Akaike Information Criterion Akaike Information Criterion Most commonly used model choice criterion for comparing parametric models. Definition: AIC = 2l( ˆψ) + 2k. where l( ˆψ) is the log-likelihood evaluated at the maximum likelihood estimate ˆψ for the unknown parameter vector ψ and k = dim(ψ) is the number of parameters. Properties: Compromise between model fit and model complexity. Allows to compare non-nested models. Selects rather too many than too few variables in variable selection problems. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 1

Akaike Information Criterion Data y generated from a true underlying model described in terms of density g( ). Approximate the true model by a parametric class of models f ψ ( ) = f( ; ψ). Measure the discrepancy between a model f ψ ( ) and the truth g( ) by the Kullback- Leibler distance K(f ψ, g) = [log(g(z)) log(f ψ (z))] g(z)dz = E z [log(g(z)) log(f ψ (z))]. where z is an independent replicate following the same distribution as y. Decision rule: Out of a sequence of models, choose the one that minimises K(f ψ, g). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 2

Akaike Information Criterion In practice, the parameter ψ will have to be estimated as ˆψ(y) for the different models. To focus on average properties not depending on a specific data realisation, minimise the expected Kullback-Leibler distance E y [K(f ˆψ(y), g)] = E y[e z [ log(g(z)) log(f ˆψ(y) (z)) ]] Since g( ) does not depend on the data, this is equivalent to minimising 2 E y [E z [log(f ˆψ(y) (z))]] (1) (the expected relative Kullback-Leibler distance). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 3

Akaike Information Criterion The best available estimate for (1) is given by 2 log(f ˆψ(y) (y)). While (1) is a predictive quantity depending on both the data y and an independent replication z, the density and the parameter estimate are evaluated for the same data. Introduce a correction term. Consider the regularity conditions ψ is a k-dimensional parameter with parameter space Ψ = R k (possibly achieved by a change of coordinates). y consists of independent and identically distributed replications y 1,..., y n. In this case, an (asymptotically) unbiased estimate for (1) is given by AIC = 2 log(f ˆψ(y) (y)) + 2k. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 4

Linear Mixed Models Linear Mixed Models Mixed models form a very useful class of regression models with general form y = Xβ + Zb + ε where β are usual regression coefficients while b are random effects with distributional assumption [ ] ([ [ ]) ε 0 σ N, b 0] 2 I 0. 0 D In the following, we will concentrate on mixed models with only one variance component where b N(0, τ 2 I) or b N(0, τ 2 Σ) with Σ known. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 5

Special case I: Random intercept model for longitudinal data Linear Mixed Models y ij = x ijβ + b i + ε ij, j = 1,..., J i, i = 1,..., I, where i indexes individuals while j indexes repeated observations on the same individual. The random intercept b i accounts for shifts in the individual level of response trajectories and therefore also for intra-subject correlations. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 6

Linear Mixed Models Special case II: Penalised spline smoothing for nonparametric function estimation y i = m(x i ) + ε i, i = 1,..., n, where m(x) is a smooth, unspecified function. Approximating m(x) in terms of a spline basis of degree d leads (for example) to the truncated power series representation m(x) = d β j x j + j=0 K b j (x κ j ) d + j=1 where κ 1,..., κ K denotes a sequence of knots. Assume random effects distribution b N(0, τ 2 I) for the basis coefficients of truncated polynomials to enforce smoothness. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 7

Marginal perspective on a mixed model: Linear Mixed Models y N(Xβ, V ) where V = σ 2 I + ZDZ Interpretation: The random effects induce a correlation structure and therefore enable a proper statistical analysis of correlated data. Conditional perspective on a mixed model: y b N(Xβ + Zb, σ 2 I). Interpretation: Random effects are additional regression coefficients (for example subject-specific effects in longitudinal data) that are estimated subject to a regularisation penalty. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 8

Interest in the following is on the selection of random effects: Compare Linear Mixed Models M 1 : y = Xβ + Zb + ε, b N(0, τ 2 Σ) and M 2 : y = Xβ + ε. Equivalent: Compare model with random effects (τ 2 > 0) and without random effects (τ 2 = 0). Random Intercept: τ 2 > 0 versus τ 2 = 0 corresponds to the inclusion and exclusion of the random intercept and therefore to the presence or absence of intra-individual correlations. Penalised splines: τ 2 > 0 versus τ 2 = 0 differentiates between a spline model and a simple polynomial model. In particular, we can compare linear versus nonlinear models. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 9

Akaike Information Criteria in Linear Mixed Models Akaike Information Criteria in Linear Mixed Models In linear mixed models, two variants of AIC are conceivable based on either the marginal or the conditional distribution. The marginal AIC relies on the marginal model and is defined as y N(Xβ, V ) where the marginal likelihood is given by maic = 2l(y ˆβ, ˆτ 2, ˆσ 2 ) + 2(p + 2), l(y ˆβ, ˆτ 2, ˆσ 2 ) = 1 2 log( ˆV ) 1 2 (y X ˆβ) ˆV 1 (y X ˆβ) and p = dim(β). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 10

The conditional AIC relies on the conditional model Akaike Information Criteria in Linear Mixed Models y b N(Xβ + Zb, σ 2 I) and is defined as where caic = 2l(y ˆβ, ˆb, ˆτ 2, σ 2 ) + 2(ρ + 1), l(y ˆβ, ˆb, ˆτ 2, σ 2 ) = n 2 log(ˆσ2 ) 1 2ˆσ 2(y X ˆβ Zˆb) (y X ˆβ Zˆb) is the conditional likelihood and (( X ρ = tr X X ) 1 ( )) Z X X X Z Z X Z Z + σ 2 /τ 2 Σ Z X Z Z are the effective degrees of freedom (trace of the hat matrix). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 11

Marginal AIC Marginal AIC Model M 1 (τ 2 > 0) is preferred over M 2 (τ 2 = 0) when maic 1 < maic 2 2l(y ˆβ 1, ˆτ 2, ˆσ 2 1) + 2(p + 2) < 2l(y ˆβ 2, 0, ˆσ 2 2) + 2(p + 1) 2l(y ˆβ 1, ˆτ 2, ˆσ 2 1) 2l(y ˆβ 2, 0, ˆσ 2 2) > 2. The left hand side is simply the test statistic for a likelihood ratio test on τ 2 = 0 versus τ 2 > 0. Under standard asymptotics, we would have 2l(y ˆβ 1, ˆτ 2, ˆσ 1) 2 2l(y ˆβ 2, 0, ˆσ 2) 2 a,h 0 χ 2 1 and the marginal AIC would have a type 1 error of P (χ 2 1 > 2) 0.1572992 Common interpretation: AIC selects rather too many than too few effects. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 12

Marginal AIC In contrast to the regularity conditions for likelihood ratio tests, τ 2 is on the boundary of the parameter space for model M 2. The classical assumptions underlying the derivation of AIC are also not fulfilled. Consequences: The marginal AIC is positively biased for twice the expected relative Kullback- Leibler-Distance. The bias is dependent on the true unknown parameters in the random effects covariance matrix and this dependence does not vanish asymptotically. Compared to an unbiased criterion, the marginal AIC favors smaller models excluding random effects. This contradicts the usual intuition that the AIC picks rather too many than too few effects. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 13

Conditional AIC Conditional AIC Vaida & Blanchard (2005) have shown that the conditional AIC from above is asymptotically unbiased for the expected relative Kullback Leibler distance for given random effects covariance matrix. Intuition: Result should carry over when using a consistent estimate. Surprisingly, this is not the case: The complex model including the random effect is chosen whenever ˆτ 2 > 0: ˆτ 2 > 0 caic(ˆτ 2 ) < caic(0) ˆτ 2 = 0 caic(ˆτ 2 ) = caic(0). Principal difficulty: The degrees of freedom in the caic are estimated from the same data as the model parameters. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 14

Conditional AIC Liang et al. (2008) propose a corrected conditional AIC, where the degrees of freedom ρ are replaced by the estimate ˆρ = n i=1 ŷ i y i = tr ( ) ŷ. y The resulting corrected conditional AIC shows satisfactory theoretical properties. However, it is computationally cumbersome: Liang et al. suggested to approximate the derivatives numerically (by adding small perturbations to the data). Numerical approximations require n and 2n model fits. In an application with 1,600 Observations and 64 candidate models, computing the corrected conditional AICs would take about 110 days. We have developed a closed form representation of ˆρ that is available almost instantaneously. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 15

Summary Summary The marginal AIC suffers from the same theoretical difficulties as likelihood ratio tests on the boundary of the parameter space. The marginal AIC is biased towards simpler models excluding random effects. The conventional conditional AIC tends to select too many variables. Whenever a random effects variance is estimated to be positive, the corresponding effect will be included. The corrected conditional AIC rectifies this difficulty and is now available in closed form. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 16

References: Summary Greven, S. & Kneib, T. (2009): On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Under Revision for Biometrika. Liang, H., Wu, H. & Zou, G. (2008): A note on conditional AIC for linear mixed-effects models. Biometrika 95, 773 778. Vaida, F. & Blanchard, S. (2005): Conditional Akaike information for mixed-effects models. Biometrika 92, 351 370. A place called home: http://www.staff.uni-oldenburg.de/thomas.kneib On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 17