Marginal Conceptual Predictive Statistic for Mixed Model Selection

Similar documents
Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

A Matrix Representation of Panel Data

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

What is Statistical Learning?

IN a recent article, Geary [1972] discussed the merit of taking first differences

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Simple Linear Regression (single variable)

Distributions, spatial statistics and a Bayesian perspective

Resampling Methods. Chapter 5. Chapter 5 1 / 52

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Performance Bounds for Detect and Avoid Signal Sensing

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Pattern Recognition 2014 Support Vector Machines

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Chapter 3: Cluster Analysis

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Lead/Lag Compensator Frequency Domain Properties and Design Methods

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

ENSC Discrete Time Systems. Project Outline. Semester

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Computational modeling techniques

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Inference in the Multiple-Regression

Math Foundations 20 Work Plan

Margin Distribution and Learning Algorithms

Least Squares Optimal Filtering with Multirate Observations

a(k) received through m channels of length N and coefficients v(k) is an additive independent white Gaussian noise with

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Localized Model Selection for Regression

Determining the Accuracy of Modal Parameter Estimation Methods

Lyapunov Stability Stability of Equilibrium Points

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Pure adaptive search for finite global optimization*

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Comparing Several Means: ANOVA. Group Means and Grand Mean

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Sparse estimation for functional semiparametric additive models

NUMBERS, MATHEMATICS AND EQUATIONS

UNIV1"'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

Kinetic Model Completeness

Differentiation Applications 1: Related Rates

AP Statistics Notes Unit Two: The Normal Distributions

INSTRUMENTAL VARIABLES

More Tutorial at

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

Tree Structured Classifier

A mathematical model for complete stress-strain curve prediction of permeable concrete

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

Comparison of two variable parameter Muskingum methods

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

Pipetting 101 Developed by BSU CityLab

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

Homology groups of disks with holes

Sequential Allocation with Minimal Switching

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

A Regression Solution to the Problem of Criterion Score Comparability

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

On Out-of-Sample Statistics for Financial Time-Series

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

Array Variate Random Variables with Multiway Kronecker Delta Covariance Matrix Structure

, which yields. where z1. and z2

Eric Klein and Ning Sa

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Smoothing, penalized least squares and splines

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Computational Statistics

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

On Boussinesq's problem

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison

SOLUTION OF THREE-CONSTRAINT ENTROPY-BASED VELOCITY DISTRIBUTION

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Lecture 10, Principal Component Analysis

How do scientists measure trees? What is DBH?

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Modeling the Nonlinear Rheological Behavior of Materials with a Hyper-Exponential Type Function

V. Balakrishnan and S. Boyd. (To Appear in Systems and Control Letters, 1992) Abstract

Resampling in State Space Models

Support-Vector Machines

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Evaluating enterprise support: state of the art and future challenges. Dirk Czarnitzki KU Leuven, Belgium, and ZEW Mannheim, Germany

Chapter 15 & 16: Random Forests & Ensemble Learning

Transcription:

Open Jurnal f Statistics, 06, 6, 39-53 Published Online April 06 in Sci http://wwwscirprg/jurnal/js http://dxdirg/0436/js0660 Marginal Cnceptual Predictive Statistic fr Mixed Mdel Selectin Cheng Wenren, Junfeng Shang, Juming Pan Prcess Mdeling Analytics Department, Bristl-Myers Squibb, New Yrk, NY, USA Bwling Green State University, Bwling Green, OH, USA Received 9 March 06; accepted 3 April 06; published 6 April 06 yright 06 by authrs and Scientific earch Publishing Inc his wrk is licensed under the Creative Cmmns Attributin Internatinal License CC BY http://creativecmmnsrg/licenses/by/40/ Abstract We fcus n the develpment f mdel selectin criteria in linear mixed mdels In particular, we prpse the mdel selectin criteria fllwing the Mallws Cnceptual Predictive Statistic C p [] [] in linear mixed mdels When crrelatin exists between the bservatins in data, the nrmal Gauss discrepancy in univariate case is nt apprpriate t measure the distance between the true mdel and a candidate mdel Instead, we define a marginal Gauss discrepancy which takes the crrelatin int accunt in the mixed mdels he mdel selectin criterin, marginal C p, called MC p, serves as an asympttically unbiased estimatr f the expected marginal Gauss discrepancy An imprvement f MC p, called IMC p, is then derived and prved t be a mre accurate estimatr f the expected marginal Gauss discrepancy than MC p he perfrmance f the prpsed criteria is investigated in a simulatin study he simulatin results shw that in small samples, the prpsed criteria utperfrm the Akaike Infrmatin Criteria AIC [3] [4] and Bayesian Infrmatin Criterin BIC [5] in selecting the crrect mdel; in large samples, their perfrmance is cmpetitive Further, the prpsed criteria perfrm significantly better fr highly crrelated respnse data than fr weakly crrelated data Keywrds Mixed Mdel Selectin, Marginal C p, Imprved Marginal C p, Marginal Gauss Discrepancy, Linear Mixed Mdel Intrductin With the develpment in data science ver the past decades, peple becme mre aware f the cmplexity f Crrespnding authr Hw t cite this paper: Wenren, C, Shang, JF and Pan, JM 06 Marginal Cnceptual Predictive Statistic fr Mixed Mdel Selectin Open Jurnal f Statistics, 6, 39-53 http://dxdirg/0436/js0660

C Wenren et al data in real life Univariate linear regressin mdels with independent identically distributed iid Gaussian errrs cannt achieve gd fitness fr sme types f data, especially fr the data with bservatins that are crrelated Fr instance, in lngitudinal data, bservatins are usually recrded frm the same individual ver time It is reasnable t assume that crrelatin exists amng the bservatins frm the same individual and linear mixed mdels are therefre apprpriately utilized fr mdeling such data Since linear mixed mdels are extensively used, mixed mdel selectin plays an imprtant rle in statistical literature he aim f mixed mdel selectin is t chse the mst apprpriate mdel frm a candidate pl in the mixed mdel setting facilitate this task, a variety f mdel selectin criteria are emplyed t implement the selectin prcess In linear mixed mdels, a number f criteria have been develped t characterize mdel selectin he mst widely used criteria are the infrmatin criteria such as the AIC [3] [4] and the BIC [5] Sugiura [6] prpsed a marginal AIC maic which invlved the number f randm effects parameters int the penalty term Shang and Cavanagh [7] emplyed the btstrap methd t estimate the penalty term f maic fr prpsing tw variants f AIC Fr lngitudinal data, a special case f linear mixed mdels, Azari, Li and sai [8] prpsed a crrected Akaike Infrmatin Criterin AICc In the justificatin f AICc, the paper mainly handled the challenge initiated by the crrelatin matrix under certain cnditins fr the mixed mdels Vaida and Blanchard [9] redefined the Akaike infrmatin based n the best linear unbiased predictr BLUP [0]-[] fr the randm effects in the mixed mdels, and prpsed a cnditinal AIC caic Dimva et al [3] derived a series f variants f the Akaike Infrmatin Criterin in small samples fr linear mixed mdels Anther infrmatin criterin, BIC, can be cnsidered as a Bayesian alternative t AIC In linear mixed mdels, BIC is cnverted frm marginal AIC by replacing the cnstant in the penalty by lg N, where N is the sample size mbic [4] Jnes [5] prpsed a measure f the effective sample size t replace the sample size in the penalty term f BIC, leading t a new criterin BIC J We nte that the BIC-type infrmatin criteria are derived using Bayesian appraches Different frm that, the AIC-type infrmatin selectin criteria are justified frm the frequentist perspective and based upn the infrmatin discrepancy Hwever, little research has relied n ther discrepancy t prpse criteria including Mallws C p [] [] in linear mixed mdels In fact, because f dissimilar derivatin, each selectin criterin has its wn advantages, and n unique selectin criterin can cver all the benefits fr mdel selectin further develp the selectin criteria in the mixed mdeling setting, we aim t justify the C p -type nes relying n the Gauss discrepancy Mallws C p [] [] in linear regressin mdels targets t estimate the Gauss discrepancy between the true mdel and a candidate mdel It serves as an asympttically unbiased estimatr f the expected Gauss discrepancy Fujikshi and Sath [6] identified C p in multivariate linear regressin Davies et al [7] presented the estimatin ptimality f C p in linear regressin mdels Cavanaugh et al [8] prvided an alternate versin f C p he Gauss discrepancy is an L nrm measuring the distance between the true mdel and a candidate mdel in linear mdels select the mst apprpriate mdel amng cmpeting fitted mdels, the candidate mdel leading t the smallest value f C p is chsen Hwever, since the cvariance matrix f linear mixed mdels pses the challenge fr the justificatin f selectin criteria, C p statistic in linear mixed mdels has nt been identified his paper extends the justificatin f C p frm linear mdels t linear mixed mdels We first define a marginal Gauss discrepancy reflecting the crrelatin fr measuring the distance between the true mdel and a candidate mdel We utilize the assumptin that under certain cnditins, the estimatr f the crrelatin matrix fr the candidate mdel is cnsistent t that fr the true crrelatin matrix he marginal C p, abbreviated as MC p MC p serves as an asympttically unbiased estimatr f the expected marginal Gauss discrepancy between the true mdel and a candidate mdel An imprvement f MC p, abbreviated as IMC p, is als prpsed and prved We then justify IMC p as an asympttically mre precisely unbiased estimatr f the expected marginal Gauss discrepancy We examine the perfrmance f the prpsed criteria in a simulatin study where we utilize varius crrelatin structures and different sample sizes he paper is rganized as fllws: Sectin presents the ntatin and defines the marginal Gauss discrepancy in the setting f linear mixed mdels In Sectin 3, we prvide the derivatins f the mdel selectin criteria MC p and IMC p Sectin 4 presents a simulatin study t demnstrate the effectiveness f the prpsed criteria Sectin 5 cncludes 40

C Wenren et al Marginal Gauss Discrepancy In this sectin, we will intrduce the true mdel, als called the generating mdel, and the candidate mdel in the setting f linear mixed mdels, then define the marginal Gauss discrepancy Suppse that the generating mdel fr the data is given by y = X + Zb + β, ε where y dentes an N respnse vectr, X is an N p design matrix f full clumn rank, β is a p unknwn vectr fr fixed effects Z is an N mr knwn matrix f full clumn rank and b is an mr unknwn vectr fr randm effects, where m is the number f cases, the sample size, and r is the dimensin f the randm effects fr each case Here, b ~ N 0, G, ε ~ N 0, IN, and b and ε are mutually independent and G is a psitive definite matrix and is a scalar We fit the data with a candidate mdel f the frm y = X β + Zb + ε,, ε, and b and ε are mutually independent he design matrix f the randm effects Z and the randm effects b are the same as thse in the generating mdel he matrix G is a psitive definite matrix with the q unknwn parameters in it Since the randm part f the mdel ie Zb is nt subject t selectin, it is easier t use the marginal frm in where X is an N p design matrix f full clumn rank, β is a p unknwn vectr, b ~ N 0, G ~ N 0, IN [9] f linear mixed mdels Let ζ = Zb + ε, then the generating mdel can be written as y = X β + ζ, Σ ζ ~ N 0,, where the scaled variance Σ = ZGZ + I N Fr the candidate mdel, let ζ = Zb + ε, we have y = Xβ + ζ, Σ ζ ~ N 0,, where the scaled variance Σ= ZGZ + I N herefre, the Σ is a nnsingular psitive definite matrix In mdels 3 and 4, the terms ζ and ζ are the cmbinatins f the randm effects and errrs in the mdel, respectively Since they are bth assumed t have mean zer, the parameters scaled variances Σ and Σ cntain all the infrmatin f the randm effects and errrs, including the crrelatin structures We measure the distance between the true mdel and a candidate mdel by defining the marginal Gauss discrepancy based n the marginal frms f mdels 3 and 4 he true mdel is assumed t be included in the pl f candidate mdels Let θ and θ dente the vectrs f parameters β,, Σ and β,, Σ, respectively he marginal Gauss discrepancy between the true mdel and a candidate mdel is defined as { } θθ = β Σ β G d, E y X y X, where E dentes the expectatin with respect t the true mdel Nte that the marginal Gauss discrepancy cntains a weight f inverse scaled variance Σ int the L nrm herefre, the crrelatin between bservatins is invlved when we use the marginal Gauss discrepancy t measure the distance between the true mdel and a candidate mdel Nw let θ= β,, Σ dente an estimate f θ Fr instance, θ culd be the maximum likelihd estimatr MLE r the restricted maximum likelihd estimatr REML Hwever, in this paper, the MLE is utilized he marginal Gauss discrepancy between the true mdel and the fitted candidate mdel is defined as which can be therefre expressed as d G G θθ = d θθ,,, θ= θ 3 4 4

C Wenren et al d G, θθ { β β } θ= θ { β β β β β β } θ= θ { } { } β β β β β β θ θ tr X X X X = E yx Σ yx = E y X + X X Σ y X + X X = E yx Σ y X + E X X Σ X X = Σ Σ + β β Σ β β = θ= θ We define a transfrmed marginal Gauss discrepancy between the true generating mdel and the fitted candidate mdel as a linear functin f the marginal Gauss discrepancy 5 as G θθ, θθ, 5 d = d N 6 aking the expectatin f the transfrmed marginal Gauss discrepancy 6, we btain the expected transfrmed marginal Gauss discrepancy as θ E { d, θθ p } = C { } β β Σ β β E X X X X { = E tr Σ Σ } + N serve as a mdel selectin criterin based n the expected transfrmed marginal Gauss discrepancy in Equatin 7, an unbiased estimatr r an asympttically unbiased estimatr will be prpsed simplifying the prcedure, we will first abbreviate this discrepancy in Equatin 7 Frm expressin 7, the expectatin part in the numeratr can be written as where { } β β 7 E X Hy Σ X Hy, 8 H = X X Σ X X Σ is a prjectin matrix such that X β = Hy explre a further expressin f 8, we need t knw the prperties f Ĥ herem Fr every Σ, the matrix H = X X Σ X X Σ satisfies the fllwing prperties: Ĥ is idemptent tr H = p and tr I N H = N p he prf is given in the Appendix Crllary Fllwing herem, we have: H H Σ =Σ H H Σ H I = 0 3 H I H I I H I H I H Σ = Σ =Σ he prf f Crllary can be easily cmpleted fllwing herem By Crllary, expressin 8 can be written as { β Σ β } β β β β β β E Hy X Hy X { } = E Hy HX + HX X Σ Hy HX + HX X { } β β β β { } { } = E y X Σ H y X + E X Σ H I X { ζ ζ } β β = E Σ H + E X Σ I H X 9 4

C Wenren et al Nte that the scaled variance Σ is a functin f the q unknwn parameter vectr f variance cmpnents γ, ie, Σ=Σ γ Azari, Li and sai [8] nted that under the assumptin that the set f candidate mdels includes the true mdel, it is reasnable t assume that the MLE γ is a cnsistent estimatr f γ herefre, we can apprximate Σ by Σ, ie, Σ=Σ + In what fllws, we will make use f this apprximatin First, since E{ ζ } = 0 and var{ ζ} = Σ, using the apprximatin Σ=Σ + and herem, we have the first term f 9 as Secnd, using the apprximatin { } ζ Σ ζ = Σ Σ tr H = p E H tr H Σ=Σ + again, the first term f Equatin 7 can be simplified as { } 0 E tr Σ Σ N Using expressins 9, 0, and, C θ p { p + E Xβ Σ I H Xβ } θ Fllwing Mallws interpretatin, θ in 7 can be therefre apprximated as { β Σ β } E X I H X = p + in can be expressed as B P + p θ V, where V P and B p are respectively variance and bias cntributins given by and VP = p { β } β B = E X Σ I H X p We cmment that increasing the number f the parameters f the fixed effects p will decrease the bias B p fr the fitted mdel, yet will increase the variance V P at the same time he marginal Gauss discrepancy can therefre be cnsidered as a bias-variance trade-ff Since a smaller value f the discrepancy indicates a smaller distance between the true mdel and a candidate mdel, the size f the Gauss discrepancy can really reflect hw a fitted mdel is clse t the true mdel 3 Derivatins f Marginal and Imprved Marginal 3 Marginal C p are develped by finding a statistic that has an expectatin which equals t r asympttically equals t the expected transfrmed marginal Gauss discrepancy We start with the expectatin f the sum f squared errrs SS frm a candidate mdel In linear mixed mdels, the sum f squared errrs SS can be written as In this sectin, mdel selectin criteria based n θ β β SS = y X Σ y X SS By herem and Crllary, the expectatin f the scaled sum f squared errr can be expressed by 43

C Wenren et al and then we have yxβ Σ yx β SS E E = y Hy Σ y Hy = E y I H y Σ = E, { β + β Σ β + β } SS E y X X I H y X X E = { } β Σ β β Σ β { } { } ζ Σ ζ E Xβ Σ I H Xβ { } E y X I H y X E X I H X = + E I H = + Similar t the derivatin f Equatin, the numeratr f first term f Equatin 3 is expressed as { } ζ Σ ζ = Σ Σ tr I H = N p E I H tr I H SS hen, by Equatins 3 and 3, it is straightfrward t cnstruct a functin = + p N, which is SS a linear cmbinatin f It can be shwn that the functin has the expectatin SS E { } = E + p N = E { SS} + pn { β Σ β } E X I H X = N p+ + pn { β Σ β } E X I H X B = p+ = + θ p V P Nte that the functin is nt a statistic since the parameter is unknwn Here, we wuld like t use an estimatr t replace in the functin Let X dente the design matrix fr the largest mdel in the candidate pl with rank X = p We assume that C X C X Let SS represent the sum f squared errrs fr the crrespnding fitted mdel and is written as β β SS = y X Σ y X, where β and Σ are the MLEs fr parameters β and Σ in the largest candidate mdel respectively he estimatr Σ cannt be expressed in a clsed frm and is calculated by cmputatinal algrithm where the iteratins are needed Fr the estimatr f, we use the mean squared errr f the largest candidate mdel 3 3 44

C Wenren et al SS, = N p which is an asympttically unbiased estimatr fr, yet it is biased In the justificatin f this estimatr, using the apprximatin Σ =Σ +, we can represent β in terms f Σ, then the expected value f SS can be easily calculated as N p, ie, asympttically we can have E = Serving as an asympttically unbiased estimatr f, the in Equatin 33 fr the largest candidate mdel is preferred t estimate MC p is then btained as Nte that MC p is biased fr θ p SS 33 SS N p SS MC = + p N = + p N 34 Hwever, under the assumptin that the true mdel is included in the pl f candidate mdels, MC p serves as an asympttically unbiased estimatr f the discrepancy in expressin 7 he prf is nntrivial, yet the simulatins nt presented here can shw that as the samples size increases, the curves f the average values fr MC p and the discrepancy C θ p, alng with IMC p, which will be intrduced in the fllwing subsectin, cllectively get merged, indicating that MC p and IMC p are all asymptti- θ cally unbiased estimatrs f the discrepancy 3 Imprved Marginal C p imprve the perfrmance f the MC p statistic in linear mixed mdels, we wish t prpse an imprved marginal C p, called IMC p, which is expected t be a mre accurate r less biased estimatr f the expected transfrmed marginal Gauss discrepancy than M IMC p is prpsed as N p SS IMC = + p N +, 35 p SS where SS and SS are the sum f squared errrs frm the candidate fitted mdel and the largest fitted mdel, respectively Nte that IMC p prvides us an asympttically unbiased estimatr f C θ p, ie, E{ IMC p } C θ p, and it will be shwn in what fllws SS evaluate the expectatin f IMC p, we first need t calculate the rati f the sum f squared errrs SS between the candidate mdel and the largest candidate mdel in the pl By Crllary, we have By using the apprximatin β β y X β Σ y X β y I H Σ I H y Σ y Σ I H y Σ Σ = = SS y X y X y Hy y Hy SS y Hy y Hy Σ Σ = = y I H I H y y I H y Σ=Σ + fr all Σ, we apprximate Ĥ and tively, and H = X X Σ X X Σ and Ĥ by H and H, respec- SS H = X X Σ X X Σ hen, the rati can be writ- SS ten as Σ SS SS y I H y y Σ I H y y Σ I H y = y Σ I H y y Σ I H + H H y y Σ H H y = = + y I H y y I H y Σ Σ cntinue the prf, we will use the fllwing therem and crllary 36 45

C Wenren et al herem If C X C X, then fr any N N he prf f herem is presented in the Appendix Crllary Fllwing herem, we can btain fllwing results: Σ HH =Σ H H =Σ H Σ HX =Σ X he prf f Crllary is included in the Appendix By herem and Crllary, we have matrix K, we have C K X C K X H H I H Σ ΣΣ = 0, such that the quadratic frms y Σ H H y and SS expectatin f SS in 36 can be written as Fr the term Fr the term Nte that where A y Σ I H y are independent It fllws that the y Σ H H y E SS + E SS y Σ I H y = + E{ y Σ H H y} E y Σ I H y { Σ } in 37, since ~, E y H H y y N X β Σ, we have { Σ } tr H H X H H X E y H H y = Σ Σ + β Σ β = tr H H + X β Σ H H X β p p X H H X = + β Σ β p p X I H X = + β Σ β E in 37, we can prve that y Σ I H y y Σ I H y ~ χ rank I H rank I H = N p justify the distributin f Σ I H y Σ I H y = y Ay, y Σ I H y, we have = Fr the distributin f y, we knw that ~, Σ A= I H, and by herem, the matrix I H is idemptent herefre, we have ν = rank I H = N p and by Crllary, we can calculate λ as I H Σ λ = X β AX β = X β X β = 0 37 38 y N X β Σ We calculate that y Ay ~ χ λ ν,, where Nw, its inverse y Σ I H y fllws an inverse Chi-square distributin, ie, 46

C Wenren et al ~ I χ y Σ I H y rank I H, with the expectatin as E = y Σ I H y N p 39 SS Using the results f 38 and 39, we have the expectatin f E SS in 37 as SS E + E y Σ H H y E { } SS y Σ I H y = + E y Σ H H y E { } y Σ I H y p p Xβ I H Xβ = + + Σ N p = + + Xβ Σ I H Xβ p p N p N p N p = + N p Xβ Σ I H Xβ N p 30 We recall that the criterin IMC p in 35 is defined as SS IMC = N p + p N + p SS By the result f 30 and the apprximatin SS E{ IM } = N p E p N + SS Σ=Σ + again, we have the expectatin f IMC p as N p Xβ Σ I H Xβ N p N p E { Xβ Σ I H Xβ } p θ N p + + p N + + Hence, IMC p is an asympttically unbiased estimatr f the expected verall transfrmed Gauss discrepancy C θ p in Equatin 7 he advantage f IMC p is that it avids the bias f using t estimate t derive the criterin cmparing t the derivatin f MC p We cmment that the prpsed MC p and IMC p are justified based upn the assumptin that the true mdel is cntained in the candidate mdels Hence, we can calculate the MC p and IMC p values fr the crrectly and verfitted candidate mdels Hwever, the prpsed criteria are als can be utilized fr the underspecified mdels except that the values will be quite large and nt behave well 4 Simulatin Study In this simulatin study, we investigate the ability f MC p in 34 and IMC p in 35 t determine the crrect set f fixed effects fr the simulated data in different mdels 47

C Wenren et al 4 Presentatin f Simulatins Cnsider a setting in which data are generated by the mdel f the fllwing frm y = X β + b + ε, i =,, m, j =,, n, ij ij i ij where the randm effects b,, bm are uncrrelated with mean 0 and variance τ, the errrs ε ij are independent with each ther with mean 0 and variance It fllws that the crrelatin between any tw bserva- τ tins frm the same case is, whereas the bservatins frm different cases are uncrrelated Let φ dente the prprtin between the variance f the randm effects and the variance f the errrs, ie φ = We τ + τ φ can btain that the crrelatin between the bservatins frm the same case equals, which is an increas- + φ ing functin f φ herefre, a higher φ implies a higher crrelatin between the bservatins in the same case Fr cnvenience, the generating mdel can als be expressed by y = X β + Zb + ε, where β are unknwn cefficients f the fixed effects It is assumed that the randm effects b ~ N 0, G with G = φim, and r = We set Zi jn i j n i =,,, an n i -vectr f nes, and n = = nm = n = N m We als assume that the errr term ε ~ N 0, IN, and is independent f the randm effects b Since the randm part f the mdel ie Zb is nt subject t selectin, we wuld like t express the mdel by its marginal frm Let ζij = bi + ε, we have ij y = X β + ζ, = fr which can als be expressed by the general frm as ij ij ij y = Xβ + ζ, ζ ~ N 0, Σ, 4 τ where ζ = Zb + ε, Σ= ZZ + I N is a scaled cvariance matrix Equivalently, the term ζ has the fllwing exchangeable crrelatin structure: Var ζ = φ+ I + J, where φ =, I is the φ φ τ + φ + φ identity matrix and J is the matrix f s In this simulatin study, we generate the design matrix X with rank X f 5 he first clumn f X is and the ther fur clumns f X are generated randmly frm unifrm distributins but are fixed thrughut the simulatins herefre, the number f fixed effects including the intercept in the largest mdel is p = 5 We assume that the candidate vectrs f cvariates, X,, X5 frm which the clumns f X are t be selected, then p there are = 6 candidate mdels in the candidate pl Here, we will illustrate the behavir f mdel selectin criteria by chsing three generating mdels: Mdel : yij = β + β3xij3 + bi + ε, ij β =, β3 = 3 ; Mdel : yij = β + β3xij3 + β4xij4 + bi + ε, ij β =, β3 = 3, β4 = 4 ; 3 Mdel 3: yij = β + βxij + β3xij3 + β4xij4 + bi + ε, ij β =, β =, β = 3, β4 = 4 hese three mdels crrespnd t the three βs:,0,0, 3,0,,0,0, 3, 4 and,0,, 3,4 in mdel 4 with the number f fixed effects p equals, 3, 4, respectively Again, the MLEs are used fr estimatin in the simulatins Furthermre, we cnsider the case where the crrelated errrs have varying degrees f exchangeable structure he variance cmpnent f errr term is taken t be, and fur values in an increasing rder f τ are cnsidered: 3, 6, 9, crrespnding t three values f φ: 3, 6, 9, respectively We take the number f clusters m t be 5, 0 and 0, the number f repetitins in a cluster t be fixed at n = 5 We emply a ttal f 00 realizatins fr each mdel 48

C Wenren et al 4 ults 4 Mdel : β =,0,0, 3,0 able presents the perfrmance f the tw versins f marginal C p MC p and IMC p, maic and mbic, under mdel with the true fixed effects parameter β =,0,0, 3,0, and crrespnding t p = he crrect mdel selectin rate fr each criterin is listed We bserve that crrespnding t each φ, the IMC p utperfrms the MC p, and bth utperfrm maic and mbic in selecting the crrect mdel fr small samples With the increasing f the rati φ, we can bserve the better perfrmance in selecting the crrect mdel frm ur prpsed criteria 4 Mdel : β =,0,0, 3, 4 We evaluate the prpsed criteria fr mdel in the same manner as fr mdel able presents the perfrmance f MC p and IMC p, maic and mbic under mdel, where the true fixed effects parameter is β =,0,0, 3, 4 and p = 3 he nly change n mdel frm mdel is that we add ne mre fixed effect variable X 5 and set the cefficient f that variable β 5 = 4 In able, the simulatin results f mdel are similar t thse f mdel With the increasing f the rati φ, we can have the better perfrmance frm ur prpsed criteria M and IM, indicating that the prpsed M and IM can effectively fulfill the missin f mdel selectin in the mixed mdels We can als bserve and cnclude that IMC p has imprved the perfrmance f MC p fr mdel selectin in small samples With the increasing f m, the perfrmance f IMC p and MC p becmes clser Cmparing t the crrect selectin rates in mdel, all mdel selectin criteria behave better in mdel 43 Mdel 3: β =,0,, 3, 4 As in the first tw mdels, we evaluate the perfrmance f mdel selectin criteria by the rates in crrectly selecting the true mdel he results are presented in able 3 Mdel 3 is identical t mdel with the exceptin that we add ne mre significant fixed effect variable X with the cefficient β = he simulatin results f mdel 3 are similar t thse f mdels - Cnsidering the rates in chsing the crrect mdel, we can find the trend f dramatic imprvement f all criteria n mdel 3 ver thse n mdels and, implying that the prpsed MC p and IMC p essentially and effectively implement mdel selectin when the fixed-effects are significant In mderately large m = 0 sample sizes, cmpared t that f maic and mbic, MC p and IMC p have cmparative perfrmance in selecting the crrect mdel able Crrect selectin rate in mdel Sample size Criterin crrelatin parameter φ = 3 φ = 6 φ = 9 MC p 078 086 085 m = 5 IMC p 085 09 088 maic 055 048 053 mbic 08 075 069 MC p 076 088 089 m = 0 IMC p 077 089 090 maic 06 05 053 mbic 086 08 080 MC p 08 088 093 m = 0 IMC p 08 089 093 maic 057 06 059 mbic 086 093 09 49

C Wenren et al able Crrect selectin rate in mdel Sample size Criterin Crrelatin parameter φ = 3 φ = 6 φ = 9 MC p 08 090 093 m = 5 IMC p 08 09 093 maic 063 066 06 mbic 076 083 074 MC p 08 088 094 m = 0 IMC p 083 088 094 maic 06 065 069 mbic 085 085 083 MC p 087 094 09 m = 0 IMC p 088 094 09 maic 074 070 063 mbic 09 093 088 able 3 Crrect selectin rate in mdel 3 Sample size Criterin Crrelatin parameter φ = 3 φ = 6 φ = 9 MC p 09 087 09 m = 5 IMC p 093 089 09 maic 08 07 077 mbic 093 085 087 MC p 093 093 096 m = 0 IMC p 093 096 096 maic 083 083 087 mbic 093 094 094 MC p 09 096 097 m = 0 IMC p 093 096 098 maic 084 085 077 mbic 097 096 096 5 Cncluding Remarks he simulatin results illustrate that the prpsed criteria MC p and IMC p utperfrm maic and mbic when the bservatins are highly crrelated in small samples he results als shw that with the increasing f the rati φ between the variance fr the randm effects and that fr errrs, the MC p and IMC p perfrm better Since a larger φ implies a higher crrelatin between the bservatins, we can cnclude that with the crrelatin between bservatins increases, a better perfrmance frm the prpsed criteria MC p and IMC p wuld be bserved Since the mdel with a small φ which clse t 0 is similar t a linear regressin mdel with independent errrs, ur prpsed criteria are nt advantageus t be applied in such case 50

C Wenren et al he simulatin results shw that the prpsed criteria MC p and IMC p significantly utperfrm maic and mbic when the sample size is small As the sample size increases, the perfrmance f the prpsed criteria becmes cmparable t that f maic and mbic herefre, MC p and IMC p are highly recmmended in small samples in the setting f linear mixed mdels Our research nt shwn in this paper als shws that bth prpsed criteria behave best when the maximum likelihd estimatin MLE is emplyed, cmparing t thse when the restricted maximum likelihd estimatin r least squares estimatin are used he research n MC p and IMC p under REML estimatin needs t be further develped in the future In the simulatin study, by the cmparisn amng mdels, and 3, we see that when the true mdel includes mre significant fixed effect cvariates, the prpsed criteria perfrm better in selecting the crrect mdel his fact indicates that the mdels with mre significant variables larger βs are mre identifiable by the prpsed criteria than the mdels with variables which are nt quite significant Cmparing the perfrmance between MC p and IMC p, we find that when the sample size is small, IMC p btains a higher crrect selectin rate than MC p, which demnstrates that IMC p imprves the perfrmance f MC p in selecting the mst apprpriate mdel Hwever, when the sample size becmes larger, the perfrmance f MC p and IMC p is quite identical Regarding the cnsistency f a mdel selectin criterin, it means that as the sample size increases, the mdel selectin will select the true mdel with prbability Nte that MC p, IMC p, and maic are nt cnsistent, whereas mbic is cnsistent as expected since its penalty term lg N prevents the verfitting in large samples As the simulatin study demnstrates, we can address again that the prpsed criteria MC p and IMC p validate their advantages in small samples, althugh they are riginally justified with large sample apprximatins, which is similar t quite a few ther mdel selectin criteria he details fr the cnsistency f mdel selectin criteria in linear mixed mdels can als see Jiang and Ra [0] References [] Mallws, CL 973 Sme Cmments n C p echnmetrics, 5, 66-675 [] Mallws, CL 995 Mre Cmments n C p echnmetrics, 37, 36-37 [3] Akaike, H 973 Infrmatin hery and an Extensin f the Maximum Likelihd Principle In: Petrv, BN and Csaki, F, Eds, Internatinal Sympsium n Infrmatin hery, 67-8 [4] Akaike, H 974 A New Lk at the Mdel Selectin Identificatin IEEE ransactins n Autmatic Cntrl, 9, 76-73 http://dxdirg/009/ac97400705 [5] Schwarz, G 978 Estimating the Dimensin f a Mdel Annals f Statistics, 6, 46-464 http://dxdirg/04/as/7634436 [6] Sugiura, N 978 Further Analysis f the Data by Akaike s Infrmatin Criterin and the Finite Crrectins Cmmunicatins in Statistics hery and Methds A, 7, 3-6 http://dxdirg/0080/03609780887599 [7] Shang, J and Cavanaugh, JE 008 Btstrap Variants f the Akaike Infrmatin Criterin fr Mixed Mdel Selectin Cmputatinal Statistics & Data Analysis, 5, 004-0 http://dxdirg/006/jcsda0070609 [8] Azari, R, Li, L and sai, C 006 Lngitudinal Data Mdel Selectin Applied imes Series Analysis, Academic Press, New Yrk, -3 http://dxdirg/006/jcsda00505009 [9] Vaida, F and Blanchard, S 005 Cnditinal Akaike Infrmatin fr Mixed-Effects Mdels Bimetrika, 9, 35-370 http://dxdirg/0093/bimet/935 [0] Hendersn, CR 950 Estimatin f Genetic Parameters Annals f Mathematical Statistics,, 309-30 [] Harville, DA 990 BLUP Best Linear Unbiased Predictin and beynd In: Gianla, D and Hammnd, K, Eds, Advances in Staitstical Methds fr Genetic Imprvement f Livestck, Springer, New Yrk, 39-76 http://dxdirg/0007/978-3-64-74487-7_ [] Rbinsn, GK 99 hat BLUP Is a Gd hing: he Estimatin f Randm Effects Statistical Science, 6, 5-3 http://dxdirg/04/ss/77096 [3] Dimva, RB, Mariantihi, M and alal, AH 0 Infrmatin Methds fr Mdel Selectin in Linear Mixed Effects Mdels with Applicatin t HCV Data Cmputatinal Statistics & Data Analysis, 55, 677-697 http://dxdirg/006/jcsda00003 [4] Müller, S, Scealy, JL and Welsh, AH 03 Mdel Selectin in Linear Mixed Mdels Statistical Science, 8, 35-67 http://dxdirg/04/-ss40 5

C Wenren et al [5] Jnes, RH 0 Bayesian Infrmatin Criterin fr Lngitudinal and Clustered Data Statistics in Medicine, 30, 3050-3056 http://dxdirg/000/sim433 [6] Fujikshi, Y and Sath, K 997 Mdified AIC and C p in Multivariate Linear Regressin Bimetrika, 84, 707-76 http://dxdirg/0093/bimet/843707 [7] Davies, SL, Neath, AA and Cavanaugh, JE 006 Estimatin Optimality f Crrected AIC and Mdified C p in Linear Regressin Internatinal Statistical Review, 74, 6-68 http://dxdirg/0/j75-583006tb0067x [8] Cavanaugh, J, Neath, AA and Davies, SL 00 An Alternate Versin f the Cnceptual Predictive Statistic Based n a Symmetrized Discrepancy Measure Jurnal f Statistical Planning and Inference, 40, 3389-3398 http://dxdirg/006/jjspi000500 [9] Jiang, J 007 Linear and Generalized Linear Mixed Mdels and heir Applicatins Springer, New Yrk [0] Jiang, J and Ra, JS 003 Cnsistent Prcedures fr Mixed Linear Mdel Selectin Sankhya, 65, 3-4 5

C Wenren et al Appendix Prf f herem prve that Ĥ is idemptent, we calculate HH = X X Σ X X Σ X X Σ X X Σ = X X Σ X X Σ = H hus, we prve that Ĥ is idemptent By the prperties f trace, we have herefre, we have hus, herem is prved p tr H = tr X X Σ X X Σ = tr X Σ X X Σ X = tr I = p N N tr I H = tr I tr H = N p Prf f herem Let y C K X We need t shw that y C K X Since y C K X, there exists a p vectr β such that y = K Xβ By C X C X, there als exists a p vectr β such that Xβ = X β, which makes y = K Xβ = K X β S we have y C K X Prf f Crllary Since rank V = N, such that Σ is psitive definite, there exists an N N matrix V with Σ = VV It fllws that Σ = VV = V V = V V Let K = V, we can have Σ = KK hen, we arrive at Σ HH =Σ X X Σ X X Σ X X Σ X X Σ = Nw, let and Since C X C X which leads t KK X X KK X X KK X X KK X X KK = K K X K X K X K X K X K X K X K X K = H K X K X K X K X H K X K X K X K X =, by herem, we have C K X C K X, s that we can have Σ HH = K K X K X K X K X K X K X K X K X K = KHH K = KHK = KK X X KK X X KK =Σ X X Σ X X Σ =Σ H he first part f Crllary is therefre prved Fllwing the first part prf f Crllary, since C K X C K X, we have hen, we can cnclude that Σ H X = K K X K X K X K X = KH K X = KK X = Σ X herefre, the prf fr the secnd part f Crllary is cmpleted HH = H, H K X K X = 53