Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Similar documents
Some properties of Likelihood Ratio Tests in Linear Mixed Models

Exact Likelihood Ratio Tests for Penalized Splines

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Likelihood ratio testing for zero variance components in linear mixed models

RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS

On testing an unspecified function through a linear mixed effects model with. multiple variance components

Hypothesis Testing in Smoothing Spline Models

Penalized Splines, Mixed Models, and Recent Large-Sample Results

The Hodrick-Prescott Filter

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Stat 710: Mathematical Statistics Lecture 31

Nonparametric Small Area Estimation Using Penalized Spline Regression

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Master s Written Examination

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Nonparametric Inference In Functional Data

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Smoothness Selection. Simon Wood Mathematical Sciences, University of Bath, U.K.

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Stat 5101 Lecture Notes

A NOTE ON A NONPARAMETRIC REGRESSION TEST THROUGH PENALIZED SPLINES

Restricted Likelihood Ratio Lack of Fit Tests using Mixed Spline Models

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

1 Mixed effect models and longitudinal data analysis

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

[y i α βx i ] 2 (2) Q = i=1

Regression, Ridge Regression, Lasso

High-dimensional regression with unknown variance

A Significance Test for the Lasso

Generalized Additive Models

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Can we do statistical inference in a non-asymptotic way? 1

Semiparametric Mixed Model for Evaluating Pathway-Environment Interaction

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Generalized, Linear, and Mixed Models

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Statistical Inference

Nonstationary Regressors using Penalized Splines

A Modern Look at Classical Multivariate Techniques

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Introduction to Estimation Methods for Time Series models Lecture 2

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

A brief introduction to mixed models

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Modelling Ireland s exchange rates: from EMS to EMU

Nonparametric Regression. Badr Missaoui

A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Some General Types of Tests

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

AFT Models and Empirical Likelihood

APTS course: 20th August 24th August 2018

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Lecture 32: Asymptotic confidence sets and likelihoods

Ch 2: Simple Linear Regression

Linear Methods for Prediction

Quick Review on Linear Multiple Regression

simple if it completely specifies the density of x

School of Education, Culture and Communication Division of Applied Mathematics

ECON 5350 Class Notes Functional Form and Structural Change

Binary choice 3.3 Maximum likelihood estimation

Data Mining Stat 588

A significance test for the lasso

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

An Introduction to Functional Data Analysis

General Linear Model: Statistical Inference

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Stat 579: Generalized Linear Models and Extensions

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

Statistical Inference

Econ 583 Final Exam Fall 2008

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

A Resampling Method on Pivotal Estimating Functions

STAT5044: Regression and Anova. Inyoung Kim

STA 2201/442 Assignment 2

1 One-way analysis of variance

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Ch 3: Multiple Linear Regression

Stat 5102 Final Exam May 14, 2015

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

AIR FORCE INSTITUTE OF TECHNOLOGY

Transcription:

1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School of ORIE Cornell University

2 Outline A. Examples of problems (Ruppert, Wand and Carroll, 2003) B. Penalized splines as a particular case of LMMs C. Null hypotheses including zero random effects variance in LMMs D. LRTs and F-type tests for this type of hypotheses E. Null distributions and power properties of LRT and RLRT F. Other applications: additive, nonlinear models, goodness-of-fit G. Conclusions

3 Mother age vs. child birthweight 6000 5000 data ML fit GCV fit birthweight (grams) 4000 3000 2000 1000 0 15 20 25 30 35 40 45 50 maternal age (years)

4 Janka hardness 8.5 8 data linear fit p spline fit log(janka hardness) 7.5 7 6.5 6 20 30 40 50 60 70 density

5 Nonparametric framework Consider the regression equation, where ɛ i are i.i.d. N ( 0,σɛ 2 ) y i = m (x i ) + ɛ i, Null hypothesis: m( ) is a p q degree polynomial m (x,β) = β 0 + β 1 x +... + β p q x p q Alternative hypothesis: m( ) is a regression spline m (x,θ) = β 0 + β 1 x +... + β p x p + Idea: use likelihood ratio tests K k=1 b k (x κ k ) p +

6 P-splines as LMMs Penalized sum of squares estimation criterion (avoid overfitting) n {y i m (x i ;θ)} 2 + 1 λ θt Dθ i=1 The Penalized Spline criterion is equivalent to 1 σ 2 ɛ Y Xβ Zb 2 + 1 λσɛ 2 b T Σ 1 b, If σb 2 = λσ2 ɛ then minimizing this criterion ML for the LMM Y = Xβ + Zb + ɛ, Cov b ɛ = σ 2 b Σ O K n O n K σ 2 ɛi n

7 Smoothing parameter estimation ML(Θ) = n log(σ 2 ɛ) + log {det(v λ )} + (y Xβ)T V 1 λ (y Xβ) REML(Θ) = ML(Θ) (p + 1) log(σ 2 ɛ) + log σ 2 ɛ { det(x T V 1 λ X) }, where Θ = (β, σ 2 ɛ, λ), Cov(Y ) = σ 2 ɛv λ, V λ = I n + λzσz T. Generalized Cross Validation (GCV) can be used to estimate λ (not suited for LRT testing).

8 Hypotheses ) H 0 : β p q+1 =... = β p = 0 and λ = 0 (σ b 2 = 0 ) H A : β p q+1 0... β p 0 λ > 0 (σ b 2 > 0 Why is the problem hard? Non-standard hypothesis under the null (boundary problem) Correlated observations (at least under the alternative) Technicalities (e.g. fixed number of knots, not o(n)!)

9 Likelihood Ratio Tests Likelihood and Restricted Likelihood Ratio Tests LRT n = inf H 0 ML(Θ) inf H A ML(Θ) RLRT n = inf H 0 REML(Θ) inf H A REML(Θ) Note: RLRT makes sense only when the fixed effects are the same under H 0 and H A Generality acknowledgement: Same considerations hold for any LMM with one variance component!

10 Asymptotic Theory Trivialities Asymptotic distribution is useful iff Provides a good approximation of the finite sample distribution(s) Is much simpler than the finite sample distribution(s) Asymptotic reasoning makes sense Key fact: Asymptotics is just an approximation of finite sample

11 Boundary problem asymptotics Test for one parameter on the boundary (q = 0) For independent observations under the alternative: Chernoff (1954), Moran (1971), Chant (1974), Self & Liang (1987, 1995) LRT n 0.5χ 2 0 + 0.5χ2 1 Longitudinal Mixed Effects Model, same is true if K Stram and Lee (1994) Y i = X i β + Z i b i + ɛ i

12 Pinheiro and Bates (2000), simulations RLRT n : 0.5χ 2 0 + 0.5χ2 1, LRT n : 0.65χ 2 0 + 0.35χ2 1 Shephard and Harvey (1990) found p 0 0.95 in a related model

13 Probability mass at zero Test for zero random effects variance in LMM with one variance component (Crainiceanu, Ruppert, Vogelsang, 2002). 1st order conditions for local minimum at λ = 0 β ML(Θ) = 0, σɛ 2 ML(Θ) = 0, λ ML(Θ) 0 Null finite sample LRT n mass at zero ( ) u T P P 0 ZΣZ T P 0 u u T 1 P 0 u n tr(zσzt ) ( 1 u N(0,I n ) and P 0 = I n X X X) T X T

14 One-way ANOVA: best case scenario Model Y ik = β 0 + b k + ɛ ik i = 1,...,I; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (I, K cons.) probability mass at zero ) ) p ML (K) = P (χ 2 K 1 < K p REML (K) = P (χ 2 K 1 < K 1

15 ANOVA: Asymptotic mass at zero 0.9 0.85 LRT RLRT 0.8 Probability 0.75 0.7 0.65 0.6 0.55 0.5 0 10 20 30 40 50 60 70 80 90 100 Number of levels (K)

16 Non-parametric testing example Constant mean (p = 0) vs. piecewise constant spline K Y i = β 0 + b k I{x i > κ k } + ɛ i k=1 i = 1,...,n; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (n, K cons.) Example: x i are equally spaced, κ k is the sample quantile corresponding to k/k + 1

17 Asymptotic mass at zero 1 0.95 0.9 ML REML 0.85 Probability 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0 10 20 30 40 50 60 70 80 90 100 Number of knots

18 Cracking the nut 0.5 : 0.5 chi-square mixture approximation useless in this case Calculation of probability mass at zero (finite sample and asymptotic): simple 0.5 : 0.5 - improved by p n : 1 p n approximation Maybe non-zero part of the distribution is not χ 2 1 What happens when q > 0? Number of knots (levels) fixed not o(n)!

19 LRT n = n log h(λ,w,µ,ξ) = n log Distribution of LRT n (I) ( q ) 1 + s=1 u2 s n p 1 s=1 ws 2 { 1 + N n(λ,w,µ) D n (λ,w,µ) N n (λ,w,µ) = D n (λ,w,µ) = u s, w s, are i.i.d. N(0, 1) K s=1 K s=1 + sup h(λ,w,µ,ξ) λ 0 } K s=1 λµ s,n 1 + λµ s,n w 2 s w 2 s 1 + λµ s,n + n p 1 s=k+1 log(1 + λξ s,n ) µ s,n ; ξ s,n - K eigenvalues of Σ 1/2 Z T P 0 ZΣ 1/2 ; Σ 1/2 Z T ZΣ 1/2 w 2 s

20 Distribution of LRT n (II) Simulation of the finite sample distribution is very simple For K = 20: 5,000 sim/sec (2.66Ghz CPU) Simulations are feasible as long as we can diagonalize matrices Finite sample probability mass at zero for LRT n (q=0) Ks=1 P µ s,n ws 2 n p 1 s=1 ws 2 1 K ξ s,n n s=1 ξ s,n µ s,n for RLRT n (one eigenvalue dominates the others) Wilks phenomenon in finite sample

21 What kind of asymptotics? Number of observations n Number of knots K-fixed Conditions on Σ 1/2 Z T P 0 ZΣ 1/2 and Σ 1/2 Z T ZΣ 1/2 n α µ s,n µ s and n α ξ s,n ξ s Many observations, mean function has at most 1 + p + K degrees of freedom, asymptotic structure of K K design matrices

22 LRT n asymptotic distribution LRT n q s=1 u 2 s + sup d 0 K s=1 dµ s 1 + dµ s w 2 s K log(1 + dξ s ) s=1 u s, w s are i.i.d. N(0, 1); first part corresponds to q fixed effects; second part corresponds to zero variance; independent Asymptotic probability mass at zero (q = 0) K K LRT : P µ s ws 2 RLRT : P s=1 s=1 ξ s K s=1 µ s w 2 s K s=1 µ s

23 Proof details Establish convergence in C[0, ) of the profile LRT (finite dimensional convergence + tightness) Show that a Continuous Mapping Theorem type result holds Catch: sup is not continuous on C[0, ) x n (t) = (n 2 t + n n 3 )I{n 1/n < t n} + ni{t > n} 0 sup t 0 Crainiceanu and Ruppert (2002) x n (t)

24 One-way ANOVA: asymptotics LRT n K { XK 1 log ( XK )} { I X K > 1 } RLRT n (K 1) {X K 1 log (X K )} I {X K > 1} X K χ2 K 1 K, X K χ2 K 1 K 1 When K the two distributions 0.5χ 2 0 + 0.5χ2 1 Probability at zero always > 0.5. Is the non-zero part χ 2 1?

25 QQ plot for LRT LRT>0 ANOVA I: K=3, 5, 20 levels 9 χ 1 2 6.63 4.99 3 5 20 3 0 0 3 Q 9 0.99 =6.63

26 (R)LRT of linearity Linear mean (p = 1) vs. linear spline (q = 0) K Y i = β 0 + β 1 x i + b k (x i κ k ) + + ɛ i k=1 i = 1,...,n; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (n, K cons.) Example: x i are equally spaced, κ k is the sample quantile for k/k + 1. Finite sample and asymptotic LRT n distributions χ 2 0!

27 Testing linear regression vs. penalized linear spline (K=20 knots). REML Quantiles of distributions 5.32 4.20 n=50 n=100 n= 0.5:0.5 mixture 1.74 0 q 0.66 q 0.95 q 0.99 q 0.995 Quantiles of the asymptotic distribution (n= )

28 F and F-type tests Hastie and Tibshirani (1990), Cantoni and Hastie (2002) H 0 : λ = λ 0 vs. H A : λ = λ 1 F γ0,γ 1 = (RSS 0 RSS 1 )/(γ 0 γ 1 ) RSS 1 /γ 1, R λ0,λ 1 = Y T ( Sλ1 S λ0 ) Y Y T ( I n S λ1 ) Y γ λ = tr (I n S λ ) 2 = # d.f. of residuals, S λ = smoother matrix. H 0 : λ = 0 vs. H A : λ > 0 (λ = ˆλ). The null distribution of R 0,λ1 has no mass at zero The null distribution of R 0,ˆλ has >> 0.5 mass at zero

29 Power properties Test constant mean vs. a general alternative piecewise constant spline (RLRT) linear spline (LRT) Types of alternatives considered increasing, concave, periodic Crainiceanu, Ruppert, Claeskens, Wand, 2002

30 Notations R test is from Cantoni and Hastie (2002) F test is from Hastie and Tibshirani (1990) C : alternative is modeled by a piecewise constant spline L : alternative is modeled by a linear spline 1 : estimate under the alternative has DF one greater than p O : the design matrix is orthogonalized ML : smoothing parameter is estimated using (RE)ML GCV : smoothing parameter is estimated using GCV

31 Tests Average Maximum Minimum RLRT-C 0.89 0.97 0.82 R-GCV-L 0.87 0.99 0.72 R-ML-C 0.86 0.99 0.70 F-ML-L 0.85 0.88 0.83 R-ML-L 0.85 0.88 0.83 F-ML-C 0.84 0.99 0.67 F-GCV-L 0.84 0.99 0.66 LRT-L 0.76 0.85 0.68 LRT-L-O 0.69 0.86 0.43 F-1-L 0.68 0.94 0.30 R-1-L 0.62 0.91 0.15 R-GCV-C 0.61 0.92 0.34 RLRT-C-O 0.61 0.93 0.32

32 Power results No most powerful test for the three alternatives considered RLRT-C has good power compared to competing tests LRT-L is worse (probability mass at zero) Other good tests exist but their null distributions have to be simulated (5,000 simulations, n = 100, K = 20) R-GCV-L: 30 min vs. RLRT-C: 1 sec Our approach: R and F tests include variability in ˆλ

33 Mother age/child birthweight - no effect K = 10 K = 20 value p-value value p-value RLRT-C 0 0.35 0.04 0.29 F-ML-C 0.90 0.35 1.30 0.26 F-GCV-C 1.58 0.21 1.58 0.19 LRT-L 2.46 0.12 2.43 0.11 F-ML-L 2.50 0.13 2.50 0.12 F-GCV-L 4.12 0.05 4.59 0.03

34 Janka hardness - linearity K = 5 K = 10 value p-value value p-value RLRT-L 16.90 1 10 5 17.27 1 10 5 F-ML 11.59 5 10 6 12.13 9 10 6

35 Null: more than one covariate y i = m 1 (x 1i ) + m 2 (x 2i ) + ɛ i Both m 1 ( ) and m 2 ( ) can be modeled as splines. The model is Y = X 1 β 1 + X 2 β 2 + Z 1 b 1 + Z 2 b 2 + ɛ b 1 N(0,σ1 2); b 2 N(0,σ2 2); ɛ N(0,σ2 ɛ); independent Tests σ1 2 = 0 and/or σ2 2 = 0 tests vs. nonparametric H A Spectral decomposition of (R)LRT n for LMMs with > 1 variance components (Crainiceanu, Ruppert, Claeskens, Wand, 2002) Warning: theoretical result, practical limitations

36 Null: nonlinear regression Linearity in parameters is the only special property of polynomials A parametric regression function can be embedded in a larger space (e.g. using p-splines) Y = f(x, β 1 ) + Xβ 2 + Zb + ɛ b N(0,σ 2 b Σ), ɛ N(0,σ2 ɛi n ), independent H 0 : β 2 = 0, σ 2 b = 0 vs. H A : β 2 0 or σ 2 b 0

37 LRT for non-linear regression H 0 : Y i = a + b exp(cx i ) + ɛ i H A : Y i = a + b exp(cx i ) + βx i + K k=1 b k (x i κ k ) + n = 100, x i equally spaced in [0, 2], K = 20 equally spaced knots. 5,000 simulations / 20 minutes Distribution LRT n for 2 parameters (β = 0, σ 2 b = 0): χ2 1 WARNING: NO RLRT!

38 Model linearization Let ˆβ 1 be the MLE under the null Y = f(x, β 1 ) + ɛ Approximate f(x,β 1 ) by a first order Taylor expansion f(x,β 1 ) = f(x, ˆβ 1 ) + [ ] T f(x, β ˆβ 1 ) (β 1 ˆβ 1 ) 1 Test for the null of linear regression against a general alternative This problem is solved! RLRT can be used!

39 Goodness-of-fit Testing (constant, linear mean, etc.) against a general alternative is testing goodness-of-fit Classical goodness-of-fit tests use standardized residuals If the null model is correct, standardized residuals under the null should behave like white noise We can use (R)LRT for no-effect of the estimated residuals r i = µ + e i Empirical approach: Ruppert, Wand, Carroll, 2002

40 (R)LRT bootstrap R(LRT) distribution for LMM with more than one variance component (feasible: L = 2, 3) Exact linear, non-linear, GLM null distributions Low order smoothers make simulations tractable Hard to implement in standard software Recommended when unsure about theoretical results Our initial approach to inference (that s how we got p 0 >> 0.5)

41 Conclusions Testing for null including zero random effects variance in LMMs Unified testing theory (penalized likelihood models) (R)LRT null finite sample, asymptotic distributions + power Standard asymptotics for i.i.d. data does not apply Feasible extensions: additive, nonlinear models, GLMs P-splines are a powerful and flexible tool for (R)LRT testing of parametric versus nonparametric models