On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Similar documents
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Estimating prediction error in mixed models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Modfiied Conditional AIC in Linear Mixed Models

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Regression, Ridge Regression, Lasso

Likelihood ratio testing for zero variance components in linear mixed models

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Penalized Splines, Mixed Models, and Recent Large-Sample Results

mboost - Componentwise Boosting for Generalised Regression Models

Modelling geoadditive survival data

Statistics 203: Introduction to Regression and Analysis of Variance Course review

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Analysing geoadditive regression data: a mixed model approach

Model comparison and selection

Sparse Linear Models (10/7/13)

Nonparametric Small Area Estimation Using Penalized Spline Regression

Stat 579: Generalized Linear Models and Extensions

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Regression I: Mean Squared Error and Measuring Quality of Fit

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

1 Mixed effect models and longitudinal data analysis

Optimization Problems

Data Mining Stat 588

Lecture 3: Statistical Decision Theory (Part II)

Model Selection and Geometry

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ECON 4160, Autumn term Lecture 1

Likelihood-Based Methods

Generalized Additive Models

Combining Non-probability and Probability Survey Samples Through Mass Imputation

A general mixed model approach for spatio-temporal regression data

WU Weiterbildung. Linear Mixed Models

Model selection using penalty function criteria

Linear Regression Model. Badr Missaoui

Introductory Econometrics

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

arxiv: v2 [stat.co] 17 Mar 2018

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Discrepancy-Based Model Selection Criteria Using Cross Validation

Conditional Transformation Models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

The Hodrick-Prescott Filter

A Modern Look at Classical Multivariate Techniques

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Random Intercept Models

Data Mining Stat 588

13. October p. 1

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Ultra High Dimensional Variable Selection with Endogenous Variables

Geographically Weighted Regression as a Statistical Model

RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models

Introduction to Within-Person Analysis and RM ANOVA

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

One-stage dose-response meta-analysis

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Quick Review on Linear Multiple Regression

Generalized Linear Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Accounting for Complex Sample Designs via Mixture Models

Machine Learning for OR & FE

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Composite Likelihood Estimation

UNIVERSITÄT POTSDAM Institut für Mathematik

Modeling Real Estate Data using Quantile Regression

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Applied Econometrics (QEM)

Graduate Econometrics I: Maximum Likelihood II

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

Covariance function estimation in Gaussian process regression

Statistics and econometrics

Introduction to Statistical modeling: handout for Math 489/583

Nonparametric Quantile Regression

Beyond Mean Regression

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Analysis Methods for Supersaturated Design: Some Comparisons

Two Applications of Nonparametric Regression in Survey Estimation

Mixed effects models

ECON 5350 Class Notes Functional Form and Structural Change

Empirical Economic Research, Part II

Regression #3: Properties of OLS Estimator

Akaike criterion: Kullback-Leibler discrepancy

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Lecture 6: Discrete Choice: Qualitative Response

Transcription:

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics Ludwig-Maximilians-University Munich joint work with Sonja Greven Department of Biostatistics, Johns Hopkins University 10.2.2009

Overview Overview Linear and additive mixed models. Akaikes information criterion (AIC). Marginal AIC Conditional AIC Application: Childhood malnutrition in Zambia On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 1

Linear and Additive Mixed Models Linear and Additive Mixed Models Mixed models form a very useful class of regression models with general form y = Xβ + Zb + ε where β are usual regression coefficients while b are random effects with distributional assumption [ ] ([ [ ]) ε 0 σ N, b 0] 2 I 0. 0 D Denote the vector of all unknown variance parameters as θ. In the following, we will concentrate on mixed models with only one variance component where b N(0, τ 2 I) or b N(0, τ 2 Σ) with Σ known and therefore θ = (σ 2, τ 2 ). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 2

Special case I: Random intercept model for longitudinal data Linear and Additive Mixed Models y ij = x ijβ + b i + ε ij, j = 1,..., J i, i = 1,..., I, where i indexes individuals while j indexes repeated observations on the same individual. The random intercept b i accounts for shifts in the individual level of response trajectories and therefore also for intra-subject correlations. Extended models include further random (covariate) effects, leading to random slopes. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 3

Linear and Additive Mixed Models Special case II: Penalised spline smoothing for nonparametric function estimation y i = m(x i ) + ε i, i = 1,..., n, where m(x) is a smooth, unspecified function. Approximating m(x) in terms of a spline basis of degree d leads (for example) to the truncated power series representation m(x) = d β j x j + j=0 K b j (x κ j ) d + j=1 where κ 1,..., κ K denotes a sequence of knots. The spline approximation leads to a piecewise polynomial fit of degree d on the intervals defined by the knots under appropriate smoothness restrictions. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 4

Penalised estimation to avoid overly wiggly function estimates: Linear and Additive Mixed Models (y Xβ Zb) (y Xβ Zb) + λb b min β,b where X and Z correspond to design matrices obtained from the truncated power series representation. The smoothness of the curve is determined by the smoothing parameter λ. Equivalent to assuming the random effect distribution b N(0, τ 2 I) and setting the smoothing parameter to λ = σ2 τ 2. Works also for other basis choices (e.g. B-splines) and other types of flexible modelling components (varying coefficients, surfaces, spatial effects, etc.). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 5

Linear and Additive Mixed Models Additive mixed models consist of a combination of random effects and flexible modelling components such as penalised splines. Example: Childhood malnutrition in Zambia. Determine the nutritional status of a child in terms of a Z-score. We consider chronic malnutrition measured in terms of insufficient height for age (stunting), i.e. zscore i = cheight i med, s where med and s are the median and standard deviation of (age-stratified) height in a reference population. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 6

Additive mixed model for stunting: Linear and Additive Mixed Models zscore i = x iβ + m 1 (cage i ) + m 2 (cfeed i ) + m 3 (mage i ) + m 4 (mbmi i ) +m 5 (mheight i ) + b si + ε i, with covariates csex gender of the child (1 = male, 0 = female) cfeed duration of breastfeeding (in months) cage age of the child (in months) mage age of the mother (at birth, in years) mheight height of the mother (in cm) mbmi body mass index of the mother medu education of the mother (1 = no education, 2 = primary school, 3 = elementary school, 4 = higher) mwork employment status of the mother (1 = employed, 0 = unemployed) s residential district (54 districts in total) The random effect b si varying covariates. captures spatial variability induced by unobserved spatially On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 7

Marginal perspective on a mixed model: Linear and Additive Mixed Models y N(Xβ, V ) where V = σ 2 I + ZDZ Interpretation: The random effects induce a correlation structure and therefore enable a proper statistical analysis of correlated data. Conditional perspective on a mixed model: y b N(Xβ + Zb, σ 2 I). Interpretation: Random effects are additional regression coefficients (for example subject-specific effects in longitudinal data) that are estimated subject to a regularisation penalty. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 8

Linear and Additive Mixed Models Best linear unbiased estimates / predictions in the linear mixed model: ˆβ = ( X V 1 X ) 1 X V 1 y, ˆb = DZ V 1 (y X ˆβ). Unknown variance parameters θ are estimated using maximum likelihood (ML) or restricted maximum likelihood (REML). Interest in the following is on model choice in linear mixed models with the special form D = blockdiag(τ 2 1 Σ 1,..., τ 2 q Σ q ) (q independent random effects) for known correlation matrices Σ 1,..., Σ q and in particular in models with only one variance component such as D = τ 2 I. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 9

Without loss of generality, we consider the comparison of Linear and Additive Mixed Models M 1 : D = blockdiag(τ 2 1 Σ 1,..., τ 2 q Σ q ) and M 2 : D = blockdiag(τ 2 1 Σ 1,..., τ q 1 Σ q 1 ). The two models are nested since M 1 reduces to M 2 when τ 2 q = 0. Random Intercept: τ 2 q > 0 versus τ 2 q = 0 corresponds to the inclusion and exclusion of the random intercept and therefore to the presence or absence of intra-individual correlations. Penalised splines: τq 2 > 0 versus τq 2 = 0 differentiates between a spline model and a simple polynomial model. In particular, we can compare linear versus nonlinear models. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 10

Akaikes Information Criterion Akaikes Information Criterion Data y generated from a true underlying model described in terms of density g( ). Approximate the true model by a parametric class of models f ψ ( ) = f( ; ψ). Measure the discrepancy between a model f ψ ( ) and the truth g( ) by the Kullback- Leibler distance K(f ψ, g) = [log(g(z)) log(f ψ (z))] g(z)dz = E z [log(g(z)) log(f ψ (z))]. where z is an independent replicate following the same distribution as y. Note that K(f ψ, g) 0 and K(f ψ, g) = 0 iff f ψ = g almost everywhere. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 11

Akaikes Information Criterion Decision rule: Out of a sequence of models, choose the one that minimises K(f ψ, g). In practice, the parameter ψ will have to be estimated as ˆψ(y) for the different models. To focus on average properties not depending on a specific data realisation, minimise the expected Kullback-Leibler distance E y [K(f ˆψ(y), g)] = E y[e z [ log(g(z)) log(f ˆψ(y) (z)) ]] Since g( ) does not depend on the data, this is equivalent to minimising 2 E y [E z [log(f ˆψ(y) (z))]] (1) (the expected relative Kullback-Leibler distance). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 12

Akaikes Information Criterion The best available estimate for (1) is given by 2 log(f ˆψ(y) (y)). While (1) is a predictive quantity depending on both the data y and an independent replication z, the density and the parameter estimate are evaluated for the same data y. Introduce a correction term. Let ψ denote the parameter vector minimising the Kullback-Leibler distance. Then is unbiased for (1). AIC = 2 log(f ˆψ(y) (y)) + 2 E y[log(f ˆψ(y) (y)) log(f ψ (y))] +2 E y [E z [log(f ψ (z)) log(f ˆψ(y) (z))]] On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 13

Consider the regularity conditions Akaikes Information Criterion ψ is a k-dimensional parameter with parameter space Ψ = R k (possibly achieved by a change of coordinates). y consists of independent and identically distributed replications y 1,..., y n. In this case, the AIC simplifies since ] a 2 E z [log(f ψ (z)) log(f ˆψ(y) (z)) χ 2 k, [ ] a 2 log(f ˆψ(y) (y)) log(f ψ (y)) χ 2 k and therefore an (asymptotically) unbiased estimate for (1) is given by AIC = 2 log(f ˆψ(y) (y)) + 2k. In linear mixed models, two variants of AIC are conceivable based on either the marginal or the conditional distribution. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 14

The marginal AIC relies on the marginal model Akaikes Information Criterion y N(Xβ, V ) and is defined as where the marginal likelihood is given by maic = 2l(y ˆβ, ˆθ) + 2(p + q), l(y ˆβ, ˆθ) = 1 2 log( ˆV ) 1 2 (y X ˆβ) ˆV 1 (y X ˆβ) and p = dim(β), q = dim(θ). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 15

The conditional AIC relies on the conditional model Akaikes Information Criterion y b N(Xβ + Zb, σ 2 I) and is defined as where caic = 2l(y ˆβ, ˆb, ˆθ) + 2(ρ + 1), l(y ˆβ, ˆb, ˆθ) = n 2 log(ˆσ2 ) 1 2ˆσ 2(y X ˆβ Zˆb) (y X ˆβ Zˆb) is the conditional likelihood and (( X ρ = trace X X ) 1 ( )) Z X X X Z Z X Z Z + σ 2 D Z X Z Z are the effective degrees of freedom (trace of the hat matrix). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 16

Akaikes Information Criterion The conditional AIC seems to be recommended when the model shall be used for predictions with the same set of random effects (for example in penalised spline smoothing). The marginal AIC is more plausible when observations with new random effects shall be predicted (e.g. new individuals in longitudinal studies). Still, both variants have been considered in both situations and seem to work reasonably well (see for example Wager, Vaida & Kauermann, 2007). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 17

Marginal AIC Marginal AIC Consider the special case of comparing M 1 : y = Xβ + Zb + ε, b N(0, τ 2 I) versus M 2 : y = Xβ + ε i.e. decide on the inclusion of a random effect. Corresponds to the decision τ 2 > 0 (M 1 ) versus τ 2 = 0 (M 2 ). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 18

Model M 1 is preferred over M 2 when Marginal AIC maic 1 < maic 2 2l(y ˆβ 1, ˆτ 2, ˆσ 2 1) + 2(p + 2) < 2l(y ˆβ 2, 0, ˆσ 2 2) + 2(p + 1) 2l(y ˆβ 1, ˆτ 2, ˆσ 2 1) 2l(y ˆβ 2, 0, ˆσ 2 2) > 2. The left hand side is simply the test statistic for a likelihood ratio test on τ 2 = 0 versus τ 2 > 0. Under standard asymptotics, we would have 2l(y ˆβ 1, ˆτ 2, ˆσ 1) 2 2l(y ˆβ 2, 0, ˆσ 2) 2 a,h 0 χ 2 1 and the marginal AIC would have a type 1 error of P (χ 2 1 > 2) 0.1572992 Common interpretation: AIC selects rather too many than too few effects. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 19

Marginal AIC In contrast to the regularity conditions for likelihood ratio tests, we are testing on the boundary of the parameter space! The likelihood ratio test statistic is no longer χ 2 -distributed but (approximately) follows a mixture of a point mass in zero and a scaled χ 2 1 variable. The point mass in zero corresponds to the probability that is typically larger than 50%. P (ˆτ 2 = 0) Similar difficulties appear in more complex models with several variance components when deciding on zero variances. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 20

The classical assumptions underlying the derivation of AIC are also not fulfilled. Marginal AIC The high probability of estimating a zero variance yields a bias towards simpler models: The marginal AIC is positively biased for twice the expected relative Kullback- Leibler-Distance. The bias is dependent on the true unknown parameters in the random effects covariance matrix D and this dependence does not vanish asymptotically. Compared to an unbiased criterion, the marginal AIC favors smaller models excluding random effects. This contradicts the usual intuition that the AIC picks rather too many than too few effects. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 21

Marginal AIC Simulated example: y i = m(x) + ε where m(x) = 1 + x + 2d(0.3 x) 2. The parameter d determines the amount of nonlinearity. m(x) 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0 x On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 22

Marginal AIC ML REML selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 nonlinearity parameter d 0.0 0.5 1.0 1.5 2.0 2.5 3.0 nonlinearity parameter d On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 23

Conditional AIC Conditional AIC Vaida & Blanchard (2005) have shown that the conditional AIC is asymptotically unbiased for the expected relative Kullback Leibler distance for given random effects covariance matrix D. If D is estimated consistently, one would hope that their result carries over to the case of estimated ˆD. Simulation results seem to indicate that this is not the case. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 24

Conditional AIC ML REML selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 maic caic selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 maic caic 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 nonlinearity parameter d nonlinearity parameter d On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 25

Conditional AIC Surprising result of the simulation study: The complex model including the random effect is chosen whenever ˆτ 2 > 0. If ˆτ 2 = 0, the conditional AICs of the simple and the complex model coincide (despite the additional parameters included in the complex model). The observed phenomenon could be shown to be a general property of the conditional AIC: ˆτ 2 > 0 caic(ˆτ 2 ) < caic(0) ˆτ 2 = 0 caic(ˆτ 2 ) = caic(0). Principal difficulty: The degrees of freedom in the caic are estimated from the same data as the model parameters. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 26

Conditional AIC Liang et al. (2008) propose a corrected conditional AIC, where the degrees of freedom ρ are replaced by n ( ) ŷ i ŷ Φ 0 = = trace y i y if σ 2 is known. i=1 For unknown σ 2, they propose to replace ρ + 1 by Φ 1 = σ2 ˆσ 2 trace ( ŷ y ) + σ 2 (ŷ y) ˆσ 2 y + 1 ( 2ˆσ 2 ) 2 σ4 trace y y, where σ 2 is an estimate for the true error variance. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 27

Conditional AIC ML REML selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 maic caic corrected caic with df Φ 0 corrected caic with df Φ 1 selection frequency of the larger model 0.0 0.2 0.4 0.6 0.8 1.0 maic caic corrected caic with df Φ 0 corrected caic with df Φ 1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 nonlinearity parameter d nonlinearity parameter d On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 28

The corrected conditional AIC shows satisfactory theoretical properties. Conditional AIC However, it is computationally cumbersome: The first and second derivative are not available in closed form and must be approximated numerically (by adding small perturbations to the data). Numerical approximations require n and 2n model fits. In our example, computing the corrected conditional AICs would take about 110 days. In addition, the numerical derivatives were found to be instable in some situations (for example the random intercept model with small cluster sizes). On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 29

Application: Childhood Malnutrition in Zambia Application: Childhood Malnutrition in Zambia Model equation: zscore i = x iβ + m 1 (cage i ) + m 2 (cfeed i ) + m 3 (mage i ) + m 4 (mbmi i ) +m 5 (mheight i ) + b si + ε i. Parametric effects are not subject to model selection. 2 6 = 64 models to consider in the model comparison. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 30

Application: Childhood Malnutrition in Zambia 0.5 0.0 0.5 1.0 REML ML linear 1.0 0.5 0.0 0.5 0 10 20 30 40 50 60 age of the child 0 10 20 30 40 duration of breastfeeding 0.3 0.2 0.1 0.0 0.1 0.2 0.3 1.0 0.5 0.0 0.5 1.0 15 20 25 30 35 40 45 age of the mother 140 150 160 170 180 height of the mother On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 31

Application: Childhood Malnutrition in Zambia -0.21 0 0.21 On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 32

Application: Childhood Malnutrition in Zambia The six best fitting models: ML REML cfeed cage mage mheight mbmi district caic maic caic maic 14 + + + 4125.78 4151.10 4125.78 4173.72 34 + + + + 4125.78 4153.10 4125.78 4175.72 36 + + + + 4125.78 4153.10 4125.78 4175.72 38 + + + + 4125.78 4153.10 4125.78 4175.72 54 + + + + + 4125.78 4155.10 4125.78 4177.72 56 + + + + + 4125.78 4155.10 4125.78 4177.72 58 + + + + + 4125.78 4155.10 4125.78 4177.72 64 + + + + + + 4125.78 4157.10 4125.78 4179.72 On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 33

Summary Summary The marginal AIC suffers from the same theoretical difficulties as likelihood ratio tests on the boundary of the parameter space. The marginal AIC is biased towards simpler models excluding random effects. The conventional conditional AIC tends to select too many variables. Whenever a random effects variance is estimated to be positive, the corresponding effect will be included. The corrected conditional AIC rectifies this difficulty but comes at a high computational price. On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 34

Open questions: Summary Is there a computationally advantageous version / representation of the corrected conditional AIC? Can the marginal AIC be corrected? Is there a working likelihood ratio test based on the corrected conditional AIC? On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 35

References References Greven, S. & Kneib, T. (2009): On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Technical Report. Liang, H., Wu, H. & Zou, G. (2008): A note on conditional AIC for linear mixedeffects models. Biometrika 95, 773 778. Vaida, F. & Blanchard, S. (2005): Conditional Akaike information for mixed-effects models. Biometrika 92, 351 370. Wager, C., Vaida, F. & Kauermann, G. (2007): Model selection for penalized spline smoothing using Akaike information criteria. Australian and New Zealand Journal of Statistics 49, 173 190. A place called home: http://www.stat.uni-muenchen.de/~kneib On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models 36