Some properties of Likelihood Ratio Tests in Linear Mixed Models

Size: px
Start display at page:

Download "Some properties of Likelihood Ratio Tests in Linear Mixed Models"

Transcription

1 Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero and the probability of underestimating the true ratio between random effects variance and error variance in a LMM with one variance component. The calculations are expedited by simple matrix diagonalization techniques. One possible application is to compute the probability that the log of the likelihood ratio (LRT), or residual likelihood ratio (RLRT), is zero. The large sample chi-square mixture approximation to the distribution of the log-likelihood ratio, using the usual asymptotic theory for when a parameter is on the boundary, has been shown to be poor in simulations studies. A large part of the problem is that the finite-sample probability that the LRT or RLRT statistic is zero is larger than 0.5, its value under the chi-square mixture approximation. Our calculations explain these empirical results. Another application is to show why standard asymptotic results can fail even when the parameter under the null is in the interior of the parameter space. This paper focuses on LMMs with one variance component because we have developed a very rapid algorithm for simulating finite-sample distributions of the LRT and RLRT statistics for this case. This allows us to compare finite-sample distributions with asymptotic approximations. The main result is the asymptotic approximation are often poor, and this results suggests that asymptotics be used with caution, or avoided altogether, for any LMM regardless of whether it has one variance component or more. For computing the distribution of the test statistics we recommend our algorithm for the case of one variance component and the bootstrap in other cases. Short title: Properties of (R)LRT Keywords: Effects of dependence, Penalized splines, Testing polynomial regression. Department of Statistical Science, Cornell University, Malott Hall, NY 14853, USA. cmc59@cornell.edu School of Operational Research and Industrial Engineering, Cornell University, Rhodes Hall, NY 14853, USA. ruppert@orie.cornell.edu Departments of Economics and Statistics, Cornell University, Uris Hall, NY , USA. E- mail:tjv2@cornell.edu

2 1 INTRODUCTION This work was motivated by our research in testing parametric regression models versus nonparametric alternatives. It is becoming more widely appreciated that penalized splines and other penalized likelihood models can be viewed as LMMs and the fitted curves as BLUPs (e.g., Brumback, Ruppert, and Wand, 1999). In this framework the smoothing parameter is a ratio of variance components and can be estimated by ML or REML. REML is often called generalized maximum likelihood (GML) in the smoothing spline literature. Within the random effects framework, it is natural to consider likelihood ratio tests and residual likelihood ratio tests, (R)LRTs, about the smoothing parameter. In particular, testing whether the smoothing parameter is zero is equivalent to testing for polynomial regression versus a general alternative modeled by penalized splines. These null hypotheses are also equivalent to the hypothesis that a variance component is zero. LRTs for null variance components are non-standard for two reasons. First, the null value of the parameter is on the boundary of the parameter space. Second, the data are dependent, at least under the alternative hypothesis. The focus of our research is on the finite sample distributions of (R)LRT statistics, and the asymptotic distributions are derived as limits of the finite sample distributions in order to compare the accuracy of various types of asymptotics. For example, for balanced one-way ANOVA we compare asymptotics with a fixed number of samples and the number of observations per sample going to infinity with the opposite case of the number of samples tending to infinity with the number of observations per sample fixed. Our major results are a. The usual asymptotic theory for standard or boundary problems provides accurate approximations to finite-sample distributions only when the response vector can be partitioned into a large number of independent sub-vectors, for all values of the parameters. b. The asymptotic approximations can be very poor when the number of independent sub-vectors is small or moderate. c. Penalized spline models do not satisfy the condition described in point a. and standard asymptotic results fail rather dramatically. The usual asymptotics for testing that a single parameter is at the boundary of its range is a 50 : 50 mixture of point mass at zero (called χ 2 0 ) and a χ2 1 distribution. A major reason why 1

3 the asymptotics fail to produce accurate finite-sample approximations is that the finite-sample probability mass at χ 2 0 is substantially greater than 0.5, especially for the LRT but even for the RLRT. This paper studies the amount of probability mass at 0 for these test statistics. However, our methods are sufficiently powerful that the finite-sample distribution of the LRT and RLRT statistics, conditional on being zero, can also be derived. These distributions are studied in a later paper (Crainiceanu and Ruppert, 2003). Our work is applicable to most LMMs with a single random effects variance component, not only to penalized likelihood models. We study LMMs with only a single variance component for tractability. When there is only one variance component, the distributions of the LRT and RLRT statistics can be simplified in such a way that simulation of these distributions is extremely rapid. Although our results do not apply to LMMs with more than one variance component, they strongly suggest that asymptotic approximations be used with great caution for such models. Since asymptotic approximations are poor for LMMs with one variance, it is unlikely that they satisfactory in general when there is more than one variance components. Consider the following LMM Y = Xβ + Zb + ɛ, E [ b ɛ ] = [ 0K 0 n ] [ b, Cov ɛ ] [ σ 2 = b Σ 0 0 σɛ 2 I K ], (1) where Y is an n-dimensional response vector, 0 K is a K-dimensional column of zeros, Σ is a known K K-dimensional, β is a p-dimensional vector of parameters corresponding to fixed effects, b is a K-dimensional vector of exchangeable random effects, and (b, ɛ) is a normal distributed random vector. Under these conditions it follows that E(Y ) = Xβ and Cov(Y ) = σ 2 ɛ V λ, where λ = σ 2 b /σ2 ɛ is the ratio between the variance of random effects b and the variance of the error variables ɛ, V λ = I n + λzσz T, and n is the size of the vector Y of the response variable. Note that σ 2 b = 0 if and only if λ = 0 and the parameter space for λ is [0, ). The LMM described by equation (1) contains standard regression fixed effects Xβ specifying the conditional response mean and random effects Zb that account for correlation. We are interested in testing where λ 0 [0, ). Consider the case λ 0 = 0 (σ 2 b H 0 : λ = λ 0 vs. H A : λ [0, ) \ {λ 0 }, (2) = 0) when the parameter is on the boundary of the parameter space under the null. Using non-standard asymptotic theory developed by Self and Liang (1987) 2

4 for independent data, one may be tempted to conclude that the finite samples distribution of the (R)LRT could be approximated by a 0.5χ χ2 1 mixture. Here χ2 k is the chi-square distribution with k degrees of freedom and χ 2 0 means point probability mass at 0. However, results of Self and Liang (1987) require independence for all values of the parameter. Because the response variable Y in model (1) is not a vector of independent random variables, this theory does not apply. Stram and Lee (1994) showed that the Self and Liang result can still be applied to testing for the zero variance of random effects in LMMs in which the response variable Y can be partitioned into independent vectors and the number of independent subvectors tends to infinity. In a simulation study for a related model, Pinheiro and Bates (2000) found that a 0.5χ χ2 1 mixture distribution approximates well the finite sample distribution of RLRT, but that a 0.65χ χ 2 1 mixture approximates better the finite sample distribution of LRT. A case where it has been shown that the asymptotic mixture probabilities differ from 0.5χ χ2 1 is regression with a stochastic trend analyzed by Shephard and Harvey (1990) and Shephard (1993). They consider the particular case of model (1) where the random effects b are modeled as a random walk and show that the asymptotic mass at zero can be as large as 0.96 for LRT and 0.65 for RLRT. For the case when λ 0 > 0 we show that the distribution of the (RE)ML estimator of λ has mass at zero. Therefore, even when the parameter is in the interior of the parameter space under the null, the asymptotic distributions of the (R)LRT statistics are not χ 2 1. We also calculate the probability of underestimating the true parameter λ 0 and show that in the penalized spline models this probability is larger than 0.5, showing that (RE)ML criteria tend to oversmooth the data. This effect is more severe for ML than for REML. Section 6.2 of Khuri, Mathew, and Sinha (1998) studies mixed models with one variance component. (They consider the error variance as a variance component so they call this case two variance components. ) In their Theorem 6.2.2, they derive the LBI (locally best invariant) test of H 0 : λ = 0 which rejects for large values of F = et ZΣZ T e e T, (3) e where e = {I X(X T X) 1 X}Y is the residual vector when fitting the null model. A test is LBI if among all invariant tests it maximizes power in some neighborhood of the null hypothesis. Notice that the denominator of (3) is, except for a scale factor, the estimator of σ 2 under the null hypothesis and will be inflated by deviations from the null. This suggests that the test might have 3

5 low power at alternatives far from the null. Khuri (1994) studies the probability in a LMM that a linear combination of independent mean squares is negative. For certain balanced models, there is an estimator of σb 2 of this form such that this estimator is negative if and only if the (R)LRT statistic is zero; see, for example, Sections 3.7 and 3.8 of Searle, Casella, and McCulloch (1992). However, in general Khuri s results to do apply to our problem. 2 SPECTRAL DECOMPOSITION OF (R)LRT n Consider maximum likelihood estimation (MLE) for model (1). Twice the log-likelihood of Y given the parameters β, σ 2 ɛ, and λ is, up to a constant that does not depend on the parameters, L(β, σɛ 2, λ) = n log σɛ 2 log V λ (Y Xβ)T V 1 λ (Y Xβ) σ 2 ɛ. (4) Residual or restricted maximum likelihood (REML) was introduced by Patterson and Thompson (1971) to take into account the loss in degrees of freedom due to estimation of β parameters and thereby to obtain unbiased variance components estimators. REML consists of maximizing the likelihood function associated with n p linearly independent error contrasts. It makes no difference which n p contrasts are used because the likelihood function for any such set differs by no more than an additive constant (Harville, 1977). residual log-likelihood was derived by Harville (1974) and is For the LMM described in equation (1), twice the REL(σɛ 2, λ) = (n p) log σɛ 2 log V λ log(x T V λ X) (Y X β λ ) T V 1 λ (Y X β λ ) σɛ 2, (5) where β λ = (X T V 1 λ X) 1 (X T V 1 Y ) maximizes the likelihood as a function of β for a fixed λ value of λ. The (R)LRT statistics for testing hypotheses described in (2) are LRT n = sup L(β, σɛ 2, λ) sup L(β, σɛ 2, λ), RLRT n = sup REL(σɛ 2, λ) sup REL(σɛ 2, λ) (6) H A H 0 H 0 H A H 0 H 0 Denote by µ s,n and ξ s,n the K eigenvalues of the K K matrices Σ 1/2 Z T P 0 ZΣ 1/2 and Σ 1/2 Z T ZΣ 1/2 respectively, where P 0 = I n X(X T X) 1 X T. Crainiceanu and Ruppert (2003) showed that if λ 0 is the true value of the parameter then [ LRT n D = sup λ [0, ) n log { 1 + N } n(λ, λ 0 ) D n (λ, λ 0 ) K ( ) ] 1 + λξs,n log, (7) 1 + λ 0 ξ s,n 4

6 RLRT n D = sup λ [0, ) [ (n p) log where D = denotes equality in distribution, N n (λ, λ 0 ) = K { 1 + N } n(λ, λ 0 ) D n (λ, λ 0 ) (λ λ 0 )µ s,n 1 + λµ s,n w 2 s, D n (λ, λ 0 ) = K K ( ) ] 1 + λµs,n log, (8) 1 + λ 0 µ s,n 1 + λ 0 µ s,n 1 + λµ s,n w 2 s + n p s=k+1 and w s, for s = 1,..., n p, are independent N(0, 1). These null finite sample distributions are easy to simulate (Crainiceanu and Ruppert, 2003). 3 PROBABILITY MASS AT ZERO OF (R)LRT w 2 s, Denote by f( ) and g( ) the functions to be maximized in equations (7) and (8) respectively. Note that the probability mass at zero for LRT n or RLRT n equals the probability that the function f( ) or g( ) has a global maximum at λ = 0. For a given sample size we compute the exact probability of having a local maximum of the f( ) and g( ) at λ = 0. This probability is an upper bound for the probability of having a global maximum at zero but, as we will show using simulations, it provides an excellent approximation. The first order condition for having a local maximum of f( ) at λ = 0 is f (0) 0, where the derivative is taken from the right. The finite sample probability of a local maximum at λ = 0 for ML, when λ 0 is the true value of the parameter, is { K P (1 + λ 0µ s,n )µ s,n ws 2 K (1 + λ 0µ s,n )ws 2 + n p 1 s=k+1 w2 n s } K ξ s,n, (9) where µ s,n and ξ s,n are the eigenvalues of the K K matrices Σ 1/2 Z T P 0 ZΣ 1/2 and Σ 1/2 Z T ZΣ 1/2 respectively, and w s are i.i.d. N(0,1) random variables. If λ 0 = 0 then the probability of a local maximum at λ = 0 is P { K µ s,nw 2 s n p w2 s 1 n } K ξ s,n. (10) Using similar derivations for REML the probability mass at zero when λ = λ 0 is { } K P (1 + λ 0µ s,n )µ s,n ws 2 K (1 + λ 0µ s,n )ws 2 + n p 1 K µ s,n, (11) s=k+1 w2 n p s and, in the particular case when λ 0 = 0, the probability of a local maximum at λ = 0 is { K } P µ s,nws 2 1 K µ s,n. (12) n p n p w2 s 5

7 Once the eigenvalues µ s,n and ξ s,n are computed explicitly or numerically, these probabilities can be simulated. Algorithms for computation of the distribution of a linear combination of χ 2 1 random variables developed by Davies (1980) or Farebrother (1990) could also be used, but we used simulations because they are simple, accurate, and easier to program. For K = 20 we obtained 1 million simulations in 1 minute (2.66GHz, 1Mb RAM). Probabilities in equations (9) and (11) are the probabilities that λ = 0 is a local maximum and provide approximations of the probabilities that λ = 0 is a global maximum. The latter is equal to the finite sample probability mass at zero of the (R)LRT and of (RE)ML estimator of λ when the true value of the parameter is λ 0. For every value λ 0 we can compute the probability of a local maximum at λ = 0 for (RE)ML using the corresponding equation (9) or (11). However, there is no close form for the probability of a global maximum at λ = 0 and we use simulation of the finite sample distributions of (R)LRT n statistics described in equations (7) or (8). In sections 5 and 6 we show that that there is close agreement between the probability of a local and global maximum at λ = 0 for two examples: balanced one-way ANOVA and penalized spline models. 4 PROBABILITY OF UNDERESTIMATING THE SIGNAL-TO-NOISE PARAM- ETER Denote by λ ML and λ 1 ML the global and the first local maximum of f( ), respectively. Define λ REML and λ 1 REML similarly using g( ). When λ 0 is the true value of the signal-to-noise parameter, { ) P ( λ1 ML < λ 0 P } { λ f(λ) K < 0 = P c s,n (λ 0 )ws 2 < λ=λ0 } n p w2 s n, (13) where c s,n (λ 0 ) = µ s,n 1 + λ 0 µ s,n / K ξ s,n 1 + λ 0 ξ s,n. Similarly, for REML we obtain { ) } { P ( λ1 REML < λ 0 P λ g(λ) K < 0 = P d s,n (λ 0 )ws 2 < λ=λ0 } n p w2 s n p, (14) where d s,n (λ 0 ) = µ s,n 1 + λ 0 µ s,n / K µ s,n 1 + λ 0 µ s,n. 6

8 Denote by p ML (λ 0 ) and p REML (λ 0 ) the probabilities appearing in the right hand side of equations ) ) (13) and (14). Our hope is that P ( λ(re)ml < λ 0 is well approximated by P ( λ1 (RE)ML < λ 0 which, in turn, is well approximated by p (RE)ML (λ 0 ). While no general proof is available for these results, in sections 5.2 and 6.3 we show that these approximations are very good at least for balanced one-way ANOVA and penalized spline models. We now develop large-λ asymptotic approximations to p ML (λ 0 ) and p REML (λ 0 ) that will be used in Section 6.3. If µ s,n = 0 then c s,n = d s,n = 0 for all values of λ 0. If µ s,n > 0 then lim c s,n(λ 0 ) = 1/K ξ and lim d s,n(λ 0 ) = 1/K µ, λ 0 λ 0 where K ξ and K µ are the number of non-zero eigenvalues µ s,n and ξ s,n respectively. Therefore ( lim p ML(λ 0 ) = P F Kξ,n p K ξ < n p K ) ξ λ 0 n K ξ and lim p REML(λ 0 ) = P ( F Kµ,n p K µ < 1 ), λ 0 where F r,s denotes an F-distributed random variable with (r, s) degrees of freedom. 5 ONE-WAY ANOVA (15) Consider the balanced one-way ANOVA model with K levels and J observations per level Y ij = µ + b i + ɛ ij, i = 1,..., K and j = 1,..., J. (16) where ɛ ij are i.i.d. random variables N(0, σɛ 2 ), b i are i.i.d. random effects distributed N(0, σb 2) independent of ɛ ij, µ is a fixed unknown intercept, and as before define λ = σb 2/σ2 ɛ. The matrix X for fixed effects is simply a JK 1 column of ones and the matrix Z is a JK K matrix with every column containing only zeros with the exception of a J-dimensional vector of 1 s corresponding to the level parameter. For this model Σ = I K, p = 1 and n = JK is the total number of observations. An important characteristic of this model is that one can explicitly calculate the eigenvalues of the matrices Z T P 0 Z and Z T Z. Using direct calculations we obtain that one eigenvalue of Z T P 0 Z is equal to zero and the remaining K 1 eigenvalues are µ s,n = J. Also, all K eigenvalues of Z T Z are equal and ξ s,n = J. 7

9 5.1 PROBABILITY MASS AT ZERO OF LRT AND RLRT For the balanced one-way design, using equation (9) one obtains the probability of a local maximum at λ = 0 for the ML estimator of λ when λ 0 is the true value { P F K 1,n K K } 1. (17) K λ 0 J Similarly, for REML we obtain the probability of a local maximum at λ = 0 { } 1 P F K 1,n K. (18) 1 + λ 0 J These results is known. See equations (119) and (147) of Searle, Casella, and McCulloch (1992). Table 1 shows the finite sample probability of a global and local maximum at λ = 0 for ML and REML. The probability of a global maximum is reported within parentheses. It represents the frequency of estimating λ = 0 for different true values λ 0 in 1 million simulations of the distributions described in equations (7) or (8) for K = 5 levels and different number of observations J per level. The probability of a local maximum is calculated using equations (17) or (18). There is very close agreement between the probability of a global and local maximum at λ = 0 for both criteria and for all values of the true parameter considered. Suppose that we want to test for no-level effect, that is H 0 : λ = 0 vs. H A : λ > 0. The probability mass at zero of the (R)LRT n is equal to the probability of a global maximum at λ = 0. The probability mass at zero under the alternative (λ 0 > 0) is larger for LRT n than for RLRT n, thus suggesting that RLRT n may have better power properties than LRT n. We focus now on the properties of the null asymptotic distributions of (R)LRT n for testing the zero variance of random effects null hypothesis. Because the response variable Y can be partitioned into K J-dimensional i.i.d. sub-vectors corresponding to each level, when the number of levels K increases to infinity the asymptotic distribution is 0.5χ χ2 1. However, in applications both the number of levels K and the number of observations per level J are fixed. If K is small or moderate (< 100) then the 0.5 approximation of the probability mass at zero is far from the true value. To make the comparison simple we consider the case J in equation (19) and we obtain null asymptotic probability mass at zero ML : P ML (K) = P {X K 1 < K} and REML : P REML (K) = P (X K 1 < K 1), 8

10 Table 1: Probability of having a local (global) maximum at λ = 0 for ML and REML. The number of levels is K = 5. ML REML J\λ (0.678) (0.696) (0.704) (0.709) (0.655) (0.648) (0.609) (0.529) (0.480) (0.353) (0.204) (0.090) (0.069) (0.023) (0.007) (0.002) (0.570) (0.582) (0.588) (0.591) (0.545) (0.532) (0.492) (0.417) (0.377) (0.264) (0.145) (0.062) (0.047) (0.015) (0.004) (0.001) Notes: The finite sample probability of having a global maximum (probability mass at zero of LRT n and RLRT n respectively) is reported within parentheses. It represents the frequency of estimating λ = 0 for different true values λ 0 in 1 million simulations of the distributions described in equations (7) or (8) for K = 5 levels and different number of observations J per level. The standard deviation of each of these estimated probabilities is at most where X r denotes a random variable with a χ 2 distribution with r degrees of freedom. Figure 1 shows P ML (K) and P REML (K) versus K. By central limit theorem both P ML (K) and P REML (K) tend to 0.5, but for K < 100 these probabilities are much larger than 0.5. Indeed, P ML (5) = 0.713, P ML (10) = 0.650, P ML (20) = and P ML (100) = PROBABILITY OF UNDERESTIMATING THE SMOOTHING PARAMETER We now investigate the probability of underestimating λ 0 using ML and REML when λ 0 is true and the design is balanced. It is easy to see that c s,n (λ 0 ) = 1/K in equation (13) and d s,n (λ 0 ) = 1/(K 1), for s = 1,..., K 1, and c K,n (λ 0 ) = d K,n (λ 0 ) = 0. Therefore p ML (λ 0 ) = P {F K 1,n K < K/(K 1)} and REML : p REML (λ 0 ) = P (F K 1,n K < 1), (19) which are the probabilities obtained by the first order conditions, and do not depend on λ 0. Table 2 displays these probabilities for K = 5 levels and several values of J and compares them with the exact probability of underestimating λ 0 calculated using 1 million simulations of the distributions described in equations (7) or (8). The latter is represented between parentheses. We used λ 0 = 1 but similar results were obtained for other values of λ 0. There is close agreement between these probabilities and ML underestimates λ 0 much more frequently than REML. 9

11 Calculations for the balanced one-way ANOVA model can be done analytically because the eigenvalues µ s,n and ξ s,n can be calculated explicitly and have a particularly simple form. Standard asymptotic theory for a parameter on the boundary holds when K and J are large but fails when K is moderate and J is large. Crainiceanu and Ruppert (2003) suggest using the finite sample distributions described in equations (7) and (8), which are very easy to simulate. Table 2: Probability of underestimating the true value of the signal-to-noise ratio parameter λ 0 for ML and REML. The number of levels is K = 5. J ML REML (0.677) (0.568) (0.695) (0.581) (0.704) (0.587) (0.708) (0.590) Notes: The finite sample probability of underestimating λ 0 is reported within parentheses. It represents the frequency of estimating λ < λ 0 for different true values λ 0 in 1 million simulations of the distributions described in equations (7) or (8) for K = 5 levels and different number of observations J per level. λ 0 = 1 but other values give similar results. The standard deviation of each of these estimated probabilities is at most THE UNBALANCED ONE-WAY DESIGN For unbalanced data, Searle, Casella, and McCulloch (1992, p. 88) state that the probability mass at zero cannot be easily specified. Apparently, this quantity cannot be expressed simply using the F-distribution. However, it is a simple case of (9). 5.4 OTHER TESTS Khuri, Mathew, and Sinha (1998) discuss several approaches to testing in LMMs besides (R)LRTs. Wald s variance component test is simply the F-test assuming that all parameters are fixed effects. Under the null hypothesis the test statistic has an exact F-distribution even for unbalanced data. For the balanced one-way design, Wald s test is UMPS (uniformly most powerful similar) and therefore UMPU (uniformly most powerful unbiased) and is also UMPI (uniformly most powerful invariant). In the case of the unbalanced one-way design, there are no UMPS, UMPU, or UMPI tests. However, there is a LBI (locally best invariant) test which was derived by Das and Sinha 10

12 (1987). The test statistic, which is given by equation (3) or (4.2.8) of Khuri, Mathew, and Sinha (1998), is a ratio of quadratic forms in Y so percentiles of its distribution can be found by Davies s (1980) algorithm. 6 TESTING POLYNOMIAL REGRESSION VERSUS A NONPARAMETRIC AL- TERNATIVE In this section we show that nonparametric regression using P-splines is equivalent to a particular LMM. In this context, the smoothing parameter is the ratio between the random effects and error variances and testing assumptions about the shape of the regression function is equivalent to testing hypotheses about the smoothing parameter. We first focus on testing for a polynomial regression versus a general alternative modeled by penalized splines, which is equivalent to testing for a zero smoothing parameter (or zero random effects variance). For this hypothesis, we study the probability mass at zero of the (R)LRT n statistics under the null and alternative hypotheses. In particular, we show that the null probability mass at zero is much larger than 0.5. Because in the penalized spline models the data vector cannot be partitioned into more than one i.i.d. sub-vectors, the Self and Liang assumptions do not hold, but it was an open question as to whether the results themselves held or not. Our results show that the 0.5 : 0.5 mixture of χ 2 distributions cannot be extended to approximate the finite sample distributions of (R)LRT n regardless of the number of knots used. We also investigate the probability of underestimating the true smoothing parameter, which in the context of penalized spline smoothing is the probability of oversmoothing. We show that first order results as described in Section 4 provide excellent approximations of the exact probability of oversmoothing and the probability of oversmoothing with (RE)ML is generally larger than P-SPLINES REGRESSION AND LINEAR MIXED MODELS Consider the following regression equation y i = m (x i ) + ɛ i, (20) where ɛ i are i.i.d. N ( 0, σ 2 ɛ ) and m( ) is the unknown mean function. Suppose that we are interested in testing if m( ) is a p-th degree polynomial: H 0 : m (x) = β 0 + β 1 x β p x p. 11

13 To define an alternative that is flexible enough to describe a large class of functions, we consider the class of regression splines H A : m(x) = m (x, Θ) = β 0 + β 1 x β p x p + K b k (x κ k ) p +, (21) where Θ = (β 0,..., β p, b 1,..., b K ) T is the vector of regression coefficients, β = (β 0,..., β p ) T is the vector of polynomial parameters, b = (b 1,..., b K ) T k=1 is the vector of spline coefficients, and κ 1 < κ 2 <... < κ K are fixed knots. Following Gray (1994) and Ruppert (2002), we consider a number of knots that is large enough (e.g. 20) to ensure the desired flexibility. The knots are taken to be sample quantiles of the x s such that κ k corresponds to probability k/(k + 1). To avoid overfitting, the criterion to be minimized is a penalized sum of squares n {y i m (x i ; Θ)} λ ΘT W Θ, (22) i=1 where λ 0 is the smoothing parameter and W is a positive semi-definite matrix. Denote Y = (y 1, y 2,..., y n ) T, X the matrix having the i-th row X i = (1, x i,..., x p i ), Z the matrix having the i-th row Z i = { (x i κ 1 ) p +, (x i κ 2 ) p +,..., (x i κ K ) p +}, and X = [X Z]. In this paper we focus on matrices W of the form W = [ 0p+1 p+1 0 p+1 K 0 K p+1 Σ 1 where Σ is a positive definite matrix and 0 ml is an m l matrix of zeros. This type of matrix W penalizes the coefficients of the spline basis functions (x κ k ) p + ], only and will be used in the remainder of the paper. A standard choice is Σ = I K but other matrices can be used according to the specific application. If criterion (22) is divided by σ 2 ɛ one obtains 1 σ 2 ɛ Y Xβ Zb λσɛ 2 b T Σ 1 b. (23) Define σb 2 = λσ2 ɛ, consider the vectors γ and β as unknown fixed parameters and the vector b as a set of random parameters with E(b) = 0 and cov(b) = σb 2Σ. If (bt, ɛ T ) T is a normal random vector and b and ɛ are independent then one obtains an equivalent model representation of the penalized spline in the form of a LMM (Brumback, Ruppert, and Wand 1999; Ruppert, Wand, and Carroll, 2003): ( b Y = Xβ + Zb + ɛ, cov ɛ ) [ σ 2 = b Σ 0 0 σɛ 2 I n ]. (24) 12

14 More specifically, the P-spline model is equivalent to the LMM in the following sense. Given a fixed value of λ = σ 2 b /σ2 ɛ, the P-spline is equal to the BLUP of the regression function in the LMM. The P-spline model and LMM may differ in how λ is estimated. In the LMM it would be estimated by ML or REML. In the P-spline model, λ could be determined by cross-validation, generalized cross-validation or some other method for selecting a smoothing parameter. However, using ML or REML to select the smoothing parameter is an effective method and we will use it. There is naturally some concern about modeling a regression function by assuming that (b T, ɛ T ) T is a normal random vector and cov(b) = σb 2 Σ. However, under the null hypotheses of interest, b = 0 so this assumption does hold with σ 2 b = 0. Therefore, this concern is not relevant to the problem of testing the null hypothesis of a parametric model. One can also view the LMM interpretation of a P-spline model as a hierarchical Bayesian model and the assumption about (b T, ɛ T ) T as part of the prior. This is analogous to the Bayesian interpretation of smoothing splines pioneered by Wahba (1978, 1990). In this context, testing for a polynomial fit against a general alternative described by a P-spline is equivalent to testing H 0 : λ = 0 ( σ 2 b = 0) vs. H A : λ > 0 ( σ 2 b > 0). Given the LMM representation of a P-spline model we can define LRT n and RLRT n for testing these hypotheses as described in Section 2. Because the b i s have mean zero, σb 2 = 0 in H 0 is equivalent to the condition that all coefficients b i of the truncated power functions are identically zero. These coefficients account for departures from a polynomial. For the P-spline model, Wald s variance component test mentioned in Section 5.4 would be the F-test for testing polynomial regression versus a regression spline viewed as a fixed effects model. The fit under the alternative would be ordinary least squares. Because there would be no smoothing, it seems unlikely that this test would be satisfactory and, perhaps for this reason, it has not been studied, at least as far as we are aware. 6.2 PROBABILITY MASS AT ZERO OF (R)LRT In this section we compute the probability that the (R)LRT n is 0 when testing for a polynomial regression versus a general alternative modeled by penalized splines. We consider testing for a constant mean, p = 0, versus the alternative of a piecewise constant spline and linear polynomial, 13

15 p = 1, versus the alternative of a linear spline. For illustration we analyze the case when the x s are equally spaced on [0, 1] and K = 20 knots are used, but the same procedure can be applied to the more general case. Once the eigenvalues µ s,n and ξ s,n of the matrices Z T P 0 Z and Z T Z are calculated numerically, the probability of having a local maximum at zero for (R)LRT n is computed using equations (9) or (11). Results are reported in Tables 3 and 4. We also report, between parentheses, the estimated probabilities of having a global maximum at zero. As for one-way ANOVA, we used 1 million simulations from the spectral form of distributions described in equations (7) or (8). For RLRT n there is close agreement between the probability of a local and global maximum at zero for every value of the parameter λ 0. For LRT n the two probabilities are very close when λ 0 = 0, but when λ 0 > 0 the probability of a local maximum at zero is much larger than the probability of a global maximum. This happens because the likelihood function can be decreasing in a neighborhood of zero but have a global maximum in the interior of the parameter space. The restricted likelihood function can exhibit the same behavior but does so less often. An important observation is that the LRT n has almost all its mass at zero, that is 0.92 for p = 0, and 0.99 for p = 1. This makes the construction of a LRT very difficult, if not impossible, especially when we test for the linearity against a general alternative. Estimating λ to be zero with high probability when the true value is λ 0 = 0 is a desirable property of the likelihood function. However, continuing to estimate λ to be zero with high probability when the true value is λ 0 > 0 (e.g for n = 100, λ 0 = 1 and p = 1) suggests that the power of the LRT n can be poor. The RLRT n has less mass at zero 0.65 for p = 0, and 0.67 for p = 1, thus allowing the construction of tests. Also, the probability of estimating zero smoothing parameter when the true parameter λ 0 > 0 is much smaller (note the different scales in Tables 3 and 4) indicating that the RLRT is probably more powerful than the LRT. In a simulation study Crainiceanu, Ruppert, Claeskens and Wand (2003) showed that this is indeed the case. Columns corresponding to λ 0 = 0 in Tables 3 and 4 show that the 0.5 approximation of the probability mass at zero for (R)LRT n is very poor for K = 20 knots, regardless of the number of observations. Using an analogy with the balanced one-way ANOVA case one may be tempted to believe that by increasing the number of knots K the 0.5 approximation will improve. To address this problem we calculate the asymptotic probability mass at zero when the number of observations n tends to infinity and the number of knots K is fixed. 14

16 Table 3: Probability of having a local and global minimum at λ = 0 for LRT n p = 0 p = 1 n λ 0 = 0 λ 0 = 0.1 λ 0 = 1 λ 0 = 0 λ 0 = 1 λ 0 = (0.917) (0.919) (0.921) (0.923) (0.312) (0.158) (0.060) (0.015) (0.021) (0.002) (0) (0) > (0.993) > (0.994) > (0.994) > (0.995) > (0.891) (0.778) (0.623) (0.456) Notes: The standard deviation of each of these estimated probabilities is at most (0.422) (0.262) (0.143) (0.067) Consider the example of testing for a constant mean versus a general alternative modeled by a piecewise constant spline with equally spaced observations and K knots. In Appendix A1 we show that µ s,n /n µ s and ξ s,n /n ξ s where µ s and ξ s are the eigenvalues of two K K matrices. Using these results in equations (10) and (12) it follows that the asymptotic probability mass at zero for LRT n and RLRT n are ( K ) K P µ s ws 2 ξ s and ( K ) K P µ s ws 2 µ s, (25) respectively, where w s are independent N(0, 1) random variables. Figure 2 shows these probabilities calculated using 1 million simulations in equation (25) for a number of knots 1 K 100. Over a wide range of number of knots the probabilities are practically constant, 0.95 for ML and 0.65 for REML. Approximating the null probability of estimating λ as 0 by 0.5 is very inaccurate, since 0.5 is not even correct asymptotically. 6.3 PROBABILITY OF UNDERESTIMATING THE SMOOTHING PARAMETER We now investigate the probability of underestimating the true value of the smoothing parameter λ 0 using ML or REML criteria. In the penalized spline context this probability is the probability of undersmoothing. The REML bias towards oversmoothing for the regression function estimates has been discussed before in the smoothing spline literature (e.g., Efron, 2001; and Kauerman 15

17 Table 4: Probability of having a local and global minimum at λ = 0 for RLRT n p = 0 p = 1 n λ 0 = 0 λ 0 = 0.01 λ 0 = 0.1 λ 0 = 0 λ 0 = 0.01 λ 0 = (0.642) (0.646) (0.647) (0.648) (0.463) (0.366) (0.258) (0.155) (0.139) (0.063) (0.021) (0.005) (0.660) (0.663) (0.666) (0.667) (0.656) (0.655) (0.646) (0.629) (0.616) (0.579) (0.520) (0.438) Notes: The finite sample probability of having a global maximum (probability mass at zero of log-lr and RLRT respectively) is reported within parentheses. It represents the frequency of estimating λ = 0 for different true values λ in 1 million simulations of the spectral decomposition of RLRT n in equation (8). The standard deviation of each of these estimated probabilities is at most ). Based on first order conditions we provide a simple and accurate approximation of the finite sample probability of undersmoothing for penalized splines. The exact finite sample probability of undersmoothing can be obtained exactly by recording the ARGMAX at each simulation step of the distributions described in equations (7) and (8). As illustration, we use the same examples as in Section 6.2: piecewise constant spline, p = 0, and linear spline, p = 1, with equally spaced observations and K = 20 knots, even though results hold more generally for unequally spaced observations and any number of knots. Table 5 shows the approximate values p ML (λ 0 ) and p REML (λ 0 ) of underestimating λ 0 obtained using first order conditions, as described in Section 4. These values are obtained using 1 million simulations from the expressions that appear in the right hand side of equations (13) and (14), where µ s,n and ξ s,n are the eigenvalues of the matrices Z T P 0 Z and Z T Z respectively. The exact probability of underestimating λ 0 is reported between parentheses and is obtained using 1 million simulations from the distribution described in (7) and (8) respectively. Results are reported for n = 100 and n = 400 observations. Table 5 shows a very close agreement between the approximate and exact probability of underestimating λ 0 for all values of λ 0 both for ML and REML criteria. For both criteria, probabilities decrease with the increase of the true value of the smoothing parameter but their values remain larger than 0.5 for all cases considered. Differences are more severe for small to moderate values 16

18 Table 5: Approximate and exact probability of underestimating the smoothing parameter λ 0 p = 0 p = 1 n(ml) (1) (0.754) 400 (1) (0.669) 100 (0) (0.617) 400 (0) (0.588) (0.609) (0.567) (0.563) (0.548) (0.530) (0.539) (0.530) (0.539) (0.515) (0.536) (0.528) (0.539) (0.513) (0.536) (0.529) (0.539) >.999 (0.993) (0.990) (0.663) (0.665) (0.980) (0.913) (0.661) (0.655) (0.845) (0.783) (0.638) (0.615) (0.734) (0.691) (0.597) (0.582) (0.501) (0.533) (0.528) (0.539) Notes: Probabilities are reported for penalized splines with n equally spaced observations in [0, 1] and K = 20 equally spaced knots. Here ML = 1 corresponds to using the ML criterion and ML = 0 corresponds to using the REML criterion for estimating the smoothing parameter. The exact probability of underestimating λ 0 is reported between parentheses and is obtained using 1 million simulations from the distribution described in (7) and (8) respectively. The approximate probability of underestimating λ 0 is obtained using equations (13) and (14). The standard deviation of each of these estimated probabilities is at most of λ 0. The approximate probabilities depend essentially on the weights c s,n (λ 0 ) and d s,n (λ 0 ), and converge to the probabilities described in equation (15). For example, for n = 100, K = 20 we have K µ = K ξ = 20 and we obtain lim p ML(λ 0 ) = F 20,79 (79/80) = and λ 0 lim p REML(λ 0 ) = F 20,79 (1) = λ 0 These values agree with the simulation results presented in Table 5 for λ 0 = LONGITUDINAL MODELS Modeling longitudinal data is one of the most common application area of LMMs. LRTs for null hypotheses that include zero variance components are routinely used in this context. We focus our discussion on (R)LRTs for a zero random effects variance. Asymptotic theory for longitudinal data models was developed (Stram and Lee, 1994) under the assumption that data can be partitioned into a large number of independent subvectors. One might argue, as one referee did, that for longitudinal data it is generally possible to partition the response vectors into independent clusters corresponding to subjects and that the appropriate 17

19 asymptotics is obtained by allowing the number of subjects to increase rather than the number of observations per subject. However, in our view, this argument has several drawbacks. The object of interest is the finite sample distribution of the test statistic. If this is available, then it should be used. If an asymptotic distribution is used, then the type of asymptotics used should be that which gives the best approximation to the finite-sample distribution. In Section 7.1, for the random intercept longitudinal model, we show that if the number of subjects is less than 100 then the 50 : 50 mixture of chi squared distributions is a poor approximation of the (R)LRT finite sample distribution, regardless of the number of observations per subject. For this example we would use the finite sample distribution of the (R)LRT statistic, as derived by Crainiceanu and Ruppert (2003). Moreover, for many commonly used semiparametric models there are random effects in the submodel for the population mean, so that the data cannot be divided into independent subvectors and the results of Stram and Lee (1994) do not apply to this type of models.. For example, suppose that n subjects in M groups are observed and their response curves are recorded over time. The mean response of a subject can be decomposed as the corresponding group mean and the deviation of the subject mean from the group mean. The group mean can farther be decomposed as the overall mean and the deviation of the group mean from the overall mean. In Section 7.2 we show that if the overall mean is modelled nonparametrically in this way, then the response vector cannot be partitioned into more than one independent subvector. Moreover, if the overall mean is modelled parametrically and the group deviations are modelled nonparametrically, then the response vector cannot be partitioned into more than M independent subvectors, where M rarely exceeds 5 in applications. 7.1 RANDOM INTERCEPT LONGITUDINAL MODEL Consider the random intercept longitudinal model with K subjects and J(k) observations per subject k Y kj = β 0 + S β s x (s) kj + b k + ɛ kj, k = 1,..., K and j = 1,..., J(k). (26) where ɛ kj are i.i.d. random variables N(0, σ 2 ɛ ), b k are i.i.d. intercepts distributed N(0, σ 2 b ) independent of ɛ kj and denote by λ = σ 2 b /σ2 ɛ. In model (26) β 0 and β s, s = 1,..., S are fixed unknown parameters and x (s) kj is the j-th observation for the k-th subject on the s-th covariate. 18

20 The matrix X for fixed effects is a JK (S + 1) matrix with the first column containing only 1 s and the last S containing the corresponding values of x s. The matrix Z is a JK K matrix with every column containing only zeros with the exception of a J-dimensional vector of 1 s corresponding to the random intercept for each subject. For this model Σ = I K, p = S + 1 and n = K k=1 J(k). The eigenvalues of the matrices Z T P 0 Z and Z T Z can be computed numerically and the finite sample distributions of (R)LRT n statistics can be simulated using results in section 2. In particular, properties such as probability mass at zero and probability of underestimating the true value of λ 0 can be obtained in seconds. In some cases these eigenvalues can be calculated explicitly. Consider the case of equal number of observations per subject, J(k) = J for k = 1,..., K, one covariate, S = 1, and x (1) kj = x kj = j. This example is routinely found in practice when K subjects are observed at equally spaced intervals in time and time itself is used as a covariate. It is easy to show that for this example µ s,n = J for s = 1,..., K 1, µ K,n = 0 and ξ s,n = J for s = 1,..., K. These eigenvalues are identical to the corresponding eigenvalues for the balanced one-way ANOVA model and all distributional results discussed in section 5 apply to this example. If the covariates are not equally spaced and/or there are unequal numbers of observations per subject then the eigenvalues of Z T P 0 Z cannot be computed explicitly. In particular, the F distribution cannot be used to compute the probability mass at zero or the probability of underestimating the true signal to noise ratio, but our methodology could be. 7.2 NONPARAMETRIC LONGITUDINAL MODELS Nested families of curves in longitudinal data analysis can be modeled nonparametrically using penalized splines. An important advantage of low order smoothers, such as penalized splines, is the reduction in the dimensionality of the problem. Moreover, smoothing can be done using standard mixed model software because splines can be viewed as BLUPs in mixed models. In this context, likelihood and restricted likelihood ratio statistics could be used to test for effects of interest. For example, the hypothesis that a curve is linear is equivalent to a zero variance component being zero. While computing the test statistics is straightforward using standard software, the null distribution theory is complex. Modified asymptotic theory for testing whether a parameter is on the boundary of the parameter space does not apply because data are correlated 19

21 under the alternative (usually the full model). Brumback and Rice (1998) study an important class of models for longitudinal data. In this paper, we consider a subclass of those models. In particular, we assume that repeated observations are taken on each of n subjects divided into M groups. Suppose that y ij is the j-th observation on the i-th subject recorded at time t ij, where 1 i I, 1 j J(i), and n = I i=1 J(i) is the total number of observations. Consider the nonparametric model y ij = f(t ij ) + f g(i) (t ij ) + f i (t ij ) + ɛ ij, (27) where ɛ ij are independent N(0, σɛ 2 ) errors and g(i) denotes the group corresponding to the i-th subject. The population curve f( ), the deviations of the g(i) group from the population curve, f g(i) (t ij ), and the deviation of the ith subject s curve from the population average, f i ( ), are modeled nonparametrically. Models similar to (27) have been studied by many other authors, e.g., Wang (1998). We model the population, group and subject curves as penalized splines with a relatively small number of knots f(t) = p s=0 β st s + K 1 k=1 b k(t κ 1,k ) p + f g (t) = p s=0 β gst s + K 2 k=1 u gk(t κ 2,k ) p + f i (t) = p s=0 a ist s + K 3 k=1 v ik(t κ 3,k ) p +. For the population curve β 0,..., β p will be treated as fixed effects and b 1,..., b K will be treated as independent random coefficients distributed N(0, σb 2 ). In LMM s, usually parameters are treated as random effects because they are subject-specific and the subjects have been sampled randomly. Here, b 1,..., b K are treated as random effects for an entirely different reason. Modeling them as random specifies a Bayesian prior and allows for shrinkage that assures smoothness of the fit. For the group curves β gs, g = 1,..., M, s = 1,..., p, will be treated as fixed effects and u g k, k = 1,..., K, will be treated as random coefficients distributed N(0, σ 2 g). identifiable we impose restrictions on β gs M β gs = 0 or β Ms = 0 for s = 1,..., p. g=1 To make the model This model assumes that the groups are fixed, e.g., are determined by fixed treatments. If the groups were chosen randomly, then the model would be changed somewhat. The parameters β gs, 20

22 g = 1,..., M, s = 1,..., p, would be treated as random effects and the σg 2 might be assumed to be equal or to be i.i.d. from some distribution such as an inverse gamma. For the subject curves, a is, 1 i n, will be treated as independent random coefficients distributed N(0, σa,s), 2 which is a typical random effects assumption since the subjects are sampled randomly. Moreover v ik will be treated as independent random coefficients distributed N(0, σv). 2 With the additional assumption that b k, a ik, v ik and ɛ ij are independent, the nonparametric model (27) can be rewritten as a LMM with p + M + 3 variance components Y = X β + X G β G + Z b b + Z G u G + Z a a + Z v v + ɛ, (28) where β = (β 0,..., β p ) T, β g = (β g0,..., β gp ) T, β G = (β T 1,..., β T M ), u g = (u g1,..., u gk2 ) T, u G = (u T 1,..., ut M )T, a s = (a 1s,..., a ns ) T, a = (a T 0,..., at p ) T and v = (v 11,..., v nk3 ). The X and Z matrices are defined accordingly. By inspecting the form (28) of model (27), some interesting features of the model are revealed. If σb 2 > 0 then the vector Y cannot be partitioned into more than one independent subvectors due to the term Zb. If σb 2 = 0 but at least one of σ2 g > 0 then Y cannot be partitioned into more than M independent subvectors due to the term Z G u G. In other words, if the overall or one of the group means are modeled nonparametrically then the assumptions for obtaining the Self and Liang type of results fail to hold. When longitudinal data are modeled semiparametrically in this way and there is more than one variance component, deriving the finite sample distribution of (R)LRT for testing that certain variance components are zero is not straightforward. The distribution of the LRT and RLRT statistics will depend on those variance components, if any, that are not completely specified by the null hypothesis. Therefore, exact tests are generally not possible. We recommend that the parametric bootstrap be used to approximate the finite sample distributions of the test statistics. However, the bootstrap will be much more computationally intensive that the simulations we have been able to develop. 8 DISCUSSION We considered estimation of the ratio between the random effects and error variances in a LMM with one variance component and we examined the probability mass at zero of the (RE)ML estimator, as well as the probability of underestimating the true parameter λ 0. We provided simple formulae for 21

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated

More information

Exact Likelihood Ratio Tests for Penalized Splines

Exact Likelihood Ratio Tests for Penalized Splines Exact Likelihood Ratio Tests for Penalized Splines By CIPRIAN CRAINICEANU, DAVID RUPPERT, GERDA CLAESKENS, M.P. WAND Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore,

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information

RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models

RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models Fabian Scheipl 1 joint work with Sonja Greven 1,2 and Helmut Küchenhoff 1 1 Department of Statistics, LMU

More information

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Ø Ñ Å Ø Ñ Ø Ð ÈÙ Ð Ø ÓÒ DOI: 10.2478/v10127-012-0017-9 Tatra Mt. Math. Publ. 51 (2012), 173 181 ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Júlia Volaufová Viktor Witkovský

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Hypothesis Testing in Smoothing Spline Models

Hypothesis Testing in Smoothing Spline Models Hypothesis Testing in Smoothing Spline Models Anna Liu and Yuedong Wang October 10, 2002 Abstract This article provides a unified and comparative review of some existing test methods for the hypothesis

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Modeling the Mean: Response Profiles v. Parametric Curves

Modeling the Mean: Response Profiles v. Parametric Curves Modeling the Mean: Response Profiles v. Parametric Curves Jamie Monogan University of Georgia Escuela de Invierno en Métodos y Análisis de Datos Universidad Católica del Uruguay Jamie Monogan (UGA) Modeling

More information

On testing an unspecified function through a linear mixed effects model with. multiple variance components

On testing an unspecified function through a linear mixed effects model with. multiple variance components Biometrics 63, 1?? December 2011 DOI: 10.1111/j.1541-0420.2005.00454.x On testing an unspecified function through a linear mixed effects model with multiple variance components Yuanjia Wang and Huaihou

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Restricted Likelihood Ratio Lack of Fit Tests using Mixed Spline Models

Restricted Likelihood Ratio Lack of Fit Tests using Mixed Spline Models Restricted Likelihood Ratio Lack of Fit Tests using Mixed Spline Models Gerda Claeskens Texas A&M University, College Station, USA and Université catholique de Louvain, Louvain-la-Neuve, Belgium Summary

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis Ana-Maria Staicu Yingxing Li Ciprian M. Crainiceanu David Ruppert October 13, 2013 Abstract The

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

The Hodrick-Prescott Filter

The Hodrick-Prescott Filter The Hodrick-Prescott Filter A Special Case of Penalized Spline Smoothing Alex Trindade Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob Paige, Missouri University of Science

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Measure for measure: exact F tests and the mixed models controversy

Measure for measure: exact F tests and the mixed models controversy Measure for measure: exact F tests and the mixed models controversy Viviana B. Lencina Departamento de Investigación, FM Universidad Nacional de Tucumán, Argentina Julio M. Singer Departamento de Estatística,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Two Applications of Nonparametric Regression in Survey Estimation

Two Applications of Nonparametric Regression in Survey Estimation Two Applications of Nonparametric Regression in Survey Estimation 1/56 Jean Opsomer Iowa State University Joint work with Jay Breidt, Colorado State University Gerda Claeskens, Université Catholique de

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Averaging Estimators for Regressions with a Possible Structural Break

Averaging Estimators for Regressions with a Possible Structural Break Averaging Estimators for Regressions with a Possible Structural Break Bruce E. Hansen University of Wisconsin y www.ssc.wisc.edu/~bhansen September 2007 Preliminary Abstract This paper investigates selection

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Penalized Balanced Sampling. Jay Breidt

Penalized Balanced Sampling. Jay Breidt Penalized Balanced Sampling Jay Breidt Colorado State University Joint work with Guillaume Chauvet (ENSAI) February 4, 2010 1 / 44 Linear Mixed Models Let U = {1, 2,...,N}. Consider linear mixed models

More information

Exercises Chapter 4 Statistical Hypothesis Testing

Exercises Chapter 4 Statistical Hypothesis Testing Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression J. D. Opsomer Iowa State University G. Claeskens Katholieke Universiteit Leuven M. G. Ranalli Universita degli Studi di Perugia G.

More information

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky Summary The empirical likelihood ratio method is a general nonparametric

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Efficient Estimation for the Partially Linear Models with Random Effects

Efficient Estimation for the Partially Linear Models with Random Effects A^VÇÚO 1 33 ò 1 5 Ï 2017 c 10 Chinese Journal of Applied Probability and Statistics Oct., 2017, Vol. 33, No. 5, pp. 529-537 doi: 10.3969/j.issn.1001-4268.2017.05.009 Efficient Estimation for the Partially

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks Munich Discussion Paper No. 2014-2 Department of Economics University of Munich Volkswirtschaftliche

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

2.2 Classical Regression in the Time Series Context

2.2 Classical Regression in the Time Series Context 48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

A Dimension Reduction Technique for Estimation in Linear Mixed Models

A Dimension Reduction Technique for Estimation in Linear Mixed Models A Dimension Reduction Technique for Estimation in Linear Mixed Models M. de Carvalho, M. Fonseca, M. Oliveira, J.T. Mexia Abstract This paper proposes a dimension reduction technique for estimation in

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine September 1, 2008 Abstract This paper

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements

A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements Pankaj K. Choudhary 1 Department of Mathematical Sciences, University of Texas at Dallas

More information