Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Size: px

Start display at page:

Download "Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science"

Loren King
5 years ago
Views:

1 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science Work done jointly with David Ruppert, School of ORIE Cornell University

2 2 Outline A. Examples of problems (Ruppert, Wand and Carroll, 2003) B. Penalized splines as a particular case of LMMs C. Null hypotheses including zero random effects variance in LMMs D. LRTs and F-type tests for this type of hypotheses E. Null distributions and power properties of LRT and RLRT F. Other applications: additive, nonlinear models, goodness-of-fit G. Conclusions

3 3 Mother age vs. child birthweight data ML fit GCV fit birthweight (grams) maternal age (years)

4 4 Janka hardness data linear fit p spline fit log(janka hardness) density

5 5 Nonparametric framework Consider the regression equation, where ɛ i are i.i.d. N ( 0,σɛ 2 ) y i = m (x i ) + ɛ i, Null hypothesis: m( ) is a p q degree polynomial m (x,β) = β 0 + β 1 x β p q x p q Alternative hypothesis: m( ) is a regression spline m (x,θ) = β 0 + β 1 x β p x p + Idea: use likelihood ratio tests K k=1 b k (x κ k ) p +

6 6 P-splines as LMMs Penalized sum of squares estimation criterion (avoid overfitting) n {y i m (x i ;θ)} λ θt Dθ i=1 The Penalized Spline criterion is equivalent to 1 σ 2 ɛ Y Xβ Zb λσɛ 2 b T Σ 1 b, If σb 2 = λσ2 ɛ then minimizing this criterion ML for the LMM Y = Xβ + Zb + ɛ, Cov b ɛ = σ 2 b Σ O K n O n K σ 2 ɛi n

7 7 Smoothing parameter estimation ML(Θ) = n log(σ 2 ɛ) + log {det(v λ )} + (y Xβ)T V 1 λ (y Xβ) REML(Θ) = ML(Θ) (p + 1) log(σ 2 ɛ) + log σ 2 ɛ { det(x T V 1 λ X) }, where Θ = (β, σ 2 ɛ, λ), Cov(Y ) = σ 2 ɛv λ, V λ = I n + λzσz T. Generalized Cross Validation (GCV) can be used to estimate λ (not suited for LRT testing).

8 8 Hypotheses ) H 0 : β p q+1 =... = β p = 0 and λ = 0 (σ b 2 = 0 ) H A : β p q β p 0 λ > 0 (σ b 2 > 0 Why is the problem hard? Non-standard hypothesis under the null (boundary problem) Correlated observations (at least under the alternative) Technicalities (e.g. fixed number of knots, not o(n)!)

9 9 Likelihood Ratio Tests Likelihood and Restricted Likelihood Ratio Tests LRT n = inf H 0 ML(Θ) inf H A ML(Θ) RLRT n = inf H 0 REML(Θ) inf H A REML(Θ) Note: RLRT makes sense only when the fixed effects are the same under H 0 and H A Generality acknowledgement: Same considerations hold for any LMM with one variance component!

10 10 Asymptotic Theory Trivialities Asymptotic distribution is useful iff Provides a good approximation of the finite sample distribution(s) Is much simpler than the finite sample distribution(s) Asymptotic reasoning makes sense Key fact: Asymptotics is just an approximation of finite sample

11 11 Boundary problem asymptotics Test for one parameter on the boundary (q = 0) For independent observations under the alternative: Chernoff (1954), Moran (1971), Chant (1974), Self & Liang (1987, 1995) LRT n 0.5χ χ2 1 Longitudinal Mixed Effects Model, same is true if K Stram and Lee (1994) Y i = X i β + Z i b i + ɛ i

12 12 Pinheiro and Bates (2000), simulations RLRT n : 0.5χ χ2 1, LRT n : 0.65χ χ2 1 Shephard and Harvey (1990) found p in a related model

13 13 Probability mass at zero Test for zero random effects variance in LMM with one variance component (Crainiceanu, Ruppert, Vogelsang, 2002). 1st order conditions for local minimum at λ = 0 β ML(Θ) = 0, σɛ 2 ML(Θ) = 0, λ ML(Θ) 0 Null finite sample LRT n mass at zero ( ) u T P P 0 ZΣZ T P 0 u u T 1 P 0 u n tr(zσzt ) ( 1 u N(0,I n ) and P 0 = I n X X X) T X T

14 14 One-way ANOVA: best case scenario Model Y ik = β 0 + b k + ɛ ik i = 1,...,I; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (I, K cons.) probability mass at zero ) ) p ML (K) = P (χ 2 K 1 < K p REML (K) = P (χ 2 K 1 < K 1

15 15 ANOVA: Asymptotic mass at zero LRT RLRT 0.8 Probability Number of levels (K)

16 16 Non-parametric testing example Constant mean (p = 0) vs. piecewise constant spline K Y i = β 0 + b k I{x i > κ k } + ɛ i k=1 i = 1,...,n; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (n, K cons.) Example: x i are equally spaced, κ k is the sample quantile corresponding to k/k + 1

17 17 Asymptotic mass at zero ML REML 0.85 Probability Number of knots

18 18 Cracking the nut 0.5 : 0.5 chi-square mixture approximation useless in this case Calculation of probability mass at zero (finite sample and asymptotic): simple 0.5 : improved by p n : 1 p n approximation Maybe non-zero part of the distribution is not χ 2 1 What happens when q > 0? Number of knots (levels) fixed not o(n)!

19 19 LRT n = n log h(λ,w,µ,ξ) = n log Distribution of LRT n (I) ( q ) 1 + s=1 u2 s n p 1 s=1 ws 2 { 1 + N n(λ,w,µ) D n (λ,w,µ) N n (λ,w,µ) = D n (λ,w,µ) = u s, w s, are i.i.d. N(0, 1) K s=1 K s=1 + sup h(λ,w,µ,ξ) λ 0 } K s=1 λµ s,n 1 + λµ s,n w 2 s w 2 s 1 + λµ s,n + n p 1 s=k+1 log(1 + λξ s,n ) µ s,n ; ξ s,n - K eigenvalues of Σ 1/2 Z T P 0 ZΣ 1/2 ; Σ 1/2 Z T ZΣ 1/2 w 2 s

20 20 Distribution of LRT n (II) Simulation of the finite sample distribution is very simple For K = 20: 5,000 sim/sec (2.66Ghz CPU) Simulations are feasible as long as we can diagonalize matrices Finite sample probability mass at zero for LRT n (q=0) Ks=1 P µ s,n ws 2 n p 1 s=1 ws 2 1 K ξ s,n n s=1 ξ s,n µ s,n for RLRT n (one eigenvalue dominates the others) Wilks phenomenon in finite sample

21 21 What kind of asymptotics? Number of observations n Number of knots K-fixed Conditions on Σ 1/2 Z T P 0 ZΣ 1/2 and Σ 1/2 Z T ZΣ 1/2 n α µ s,n µ s and n α ξ s,n ξ s Many observations, mean function has at most 1 + p + K degrees of freedom, asymptotic structure of K K design matrices

22 22 LRT n asymptotic distribution LRT n q s=1 u 2 s + sup d 0 K s=1 dµ s 1 + dµ s w 2 s K log(1 + dξ s ) s=1 u s, w s are i.i.d. N(0, 1); first part corresponds to q fixed effects; second part corresponds to zero variance; independent Asymptotic probability mass at zero (q = 0) K K LRT : P µ s ws 2 RLRT : P s=1 s=1 ξ s K s=1 µ s w 2 s K s=1 µ s

23 23 Proof details Establish convergence in C[0, ) of the profile LRT (finite dimensional convergence + tightness) Show that a Continuous Mapping Theorem type result holds Catch: sup is not continuous on C[0, ) x n (t) = (n 2 t + n n 3 )I{n 1/n < t n} + ni{t > n} 0 sup t 0 Crainiceanu and Ruppert (2002) x n (t)

24 24 One-way ANOVA: asymptotics LRT n K { XK 1 log ( XK )} { I X K > 1 } RLRT n (K 1) {X K 1 log (X K )} I {X K > 1} X K χ2 K 1 K, X K χ2 K 1 K 1 When K the two distributions 0.5χ χ2 1 Probability at zero always > 0.5. Is the non-zero part χ 2 1?

25 25 QQ plot for LRT LRT>0 ANOVA I: K=3, 5, 20 levels 9 χ Q =6.63

26 26 (R)LRT of linearity Linear mean (p = 1) vs. linear spline (q = 0) K Y i = β 0 + β 1 x i + b k (x i κ k ) + + ɛ i k=1 i = 1,...,n; k = 1,...,K; b N(0,σ 2 b I K); ɛ N(0,σ 2 ɛi n ) H 0 : σ 2 b = 0 vs. H A : σ 2 b > 0 Asymptotic (n, K cons.) Example: x i are equally spaced, κ k is the sample quantile for k/k + 1. Finite sample and asymptotic LRT n distributions χ 2 0!

27 27 Testing linear regression vs. penalized linear spline (K=20 knots). REML Quantiles of distributions n=50 n=100 n= 0.5:0.5 mixture q 0.66 q 0.95 q 0.99 q Quantiles of the asymptotic distribution (n= )

28 28 F and F-type tests Hastie and Tibshirani (1990), Cantoni and Hastie (2002) H 0 : λ = λ 0 vs. H A : λ = λ 1 F γ0,γ 1 = (RSS 0 RSS 1 )/(γ 0 γ 1 ) RSS 1 /γ 1, R λ0,λ 1 = Y T ( Sλ1 S λ0 ) Y Y T ( I n S λ1 ) Y γ λ = tr (I n S λ ) 2 = # d.f. of residuals, S λ = smoother matrix. H 0 : λ = 0 vs. H A : λ > 0 (λ = ˆλ). The null distribution of R 0,λ1 has no mass at zero The null distribution of R 0,ˆλ has >> 0.5 mass at zero

29 29 Power properties Test constant mean vs. a general alternative piecewise constant spline (RLRT) linear spline (LRT) Types of alternatives considered increasing, concave, periodic Crainiceanu, Ruppert, Claeskens, Wand, 2002

30 30 Notations R test is from Cantoni and Hastie (2002) F test is from Hastie and Tibshirani (1990) C : alternative is modeled by a piecewise constant spline L : alternative is modeled by a linear spline 1 : estimate under the alternative has DF one greater than p O : the design matrix is orthogonalized ML : smoothing parameter is estimated using (RE)ML GCV : smoothing parameter is estimated using GCV

31 31 Tests Average Maximum Minimum RLRT-C R-GCV-L R-ML-C F-ML-L R-ML-L F-ML-C F-GCV-L LRT-L LRT-L-O F-1-L R-1-L R-GCV-C RLRT-C-O

32 32 Power results No most powerful test for the three alternatives considered RLRT-C has good power compared to competing tests LRT-L is worse (probability mass at zero) Other good tests exist but their null distributions have to be simulated (5,000 simulations, n = 100, K = 20) R-GCV-L: 30 min vs. RLRT-C: 1 sec Our approach: R and F tests include variability in ˆλ

33 33 Mother age/child birthweight - no effect K = 10 K = 20 value p-value value p-value RLRT-C F-ML-C F-GCV-C LRT-L F-ML-L F-GCV-L

34 34 Janka hardness - linearity K = 5 K = 10 value p-value value p-value RLRT-L F-ML

35 35 Null: more than one covariate y i = m 1 (x 1i ) + m 2 (x 2i ) + ɛ i Both m 1 ( ) and m 2 ( ) can be modeled as splines. The model is Y = X 1 β 1 + X 2 β 2 + Z 1 b 1 + Z 2 b 2 + ɛ b 1 N(0,σ1 2); b 2 N(0,σ2 2); ɛ N(0,σ2 ɛ); independent Tests σ1 2 = 0 and/or σ2 2 = 0 tests vs. nonparametric H A Spectral decomposition of (R)LRT n for LMMs with > 1 variance components (Crainiceanu, Ruppert, Claeskens, Wand, 2002) Warning: theoretical result, practical limitations

36 36 Null: nonlinear regression Linearity in parameters is the only special property of polynomials A parametric regression function can be embedded in a larger space (e.g. using p-splines) Y = f(x, β 1 ) + Xβ 2 + Zb + ɛ b N(0,σ 2 b Σ), ɛ N(0,σ2 ɛi n ), independent H 0 : β 2 = 0, σ 2 b = 0 vs. H A : β 2 0 or σ 2 b 0

37 37 LRT for non-linear regression H 0 : Y i = a + b exp(cx i ) + ɛ i H A : Y i = a + b exp(cx i ) + βx i + K k=1 b k (x i κ k ) + n = 100, x i equally spaced in [0, 2], K = 20 equally spaced knots. 5,000 simulations / 20 minutes Distribution LRT n for 2 parameters (β = 0, σ 2 b = 0): χ2 1 WARNING: NO RLRT!

38 38 Model linearization Let ˆβ 1 be the MLE under the null Y = f(x, β 1 ) + ɛ Approximate f(x,β 1 ) by a first order Taylor expansion f(x,β 1 ) = f(x, ˆβ 1 ) + [ ] T f(x, β ˆβ 1 ) (β 1 ˆβ 1 ) 1 Test for the null of linear regression against a general alternative This problem is solved! RLRT can be used!

39 39 Goodness-of-fit Testing (constant, linear mean, etc.) against a general alternative is testing goodness-of-fit Classical goodness-of-fit tests use standardized residuals If the null model is correct, standardized residuals under the null should behave like white noise We can use (R)LRT for no-effect of the estimated residuals r i = µ + e i Empirical approach: Ruppert, Wand, Carroll, 2002

40 40 (R)LRT bootstrap R(LRT) distribution for LMM with more than one variance component (feasible: L = 2, 3) Exact linear, non-linear, GLM null distributions Low order smoothers make simulations tractable Hard to implement in standard software Recommended when unsure about theoretical results Our initial approach to inference (that s how we got p 0 >> 0.5)

41 41 Conclusions Testing for null including zero random effects variance in LMMs Unified testing theory (penalized likelihood models) (R)LRT null finite sample, asymptotic distributions + power Standard asymptotics for i.i.d. data does not apply Feasible extensions: additive, nonlinear models, GLMs P-splines are a powerful and flexible tool for (R)LRT testing of parametric versus nonparametric models

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero