A New Approach to Modeling Covariate Effects and Individualization in Population Pharmacokinetics-Pharmacodynamics

Size: px
Start display at page:

Download "A New Approach to Modeling Covariate Effects and Individualization in Population Pharmacokinetics-Pharmacodynamics"

Transcription

1 Journal of Pharmacokinetics and Pharmacodynamics, Vol. 33, No. 1, February 2006 ( 2006) DOI: /s A New Approach to Modeling Covariate Effects and Individualization in Population Pharmacokinetics-Pharmacodynamics Tze Leung Lai, 1, Mei-Chiung Shih, 2 and Samuel P. Wong 3 Received September 7, 2004 Final October 7, 2005 Published Online January 10, 2006 By combining Laplace s approximation and Monte Carlo methods to evaluate multiple integrals, this paper develops a new approach to estimation in nonlinear mixed effects models that are widely used in population pharmacokinetics and pharmacodynamics. Estimation here involves not only estimating the model parameters from Phase I and II studies but also using the fitted model to estimate the concentration versus time curve or the drug effects of a subject who has covariate information but sparse measurements. Because of its computational tractability, the proposed approach can model the covariate effects nonparametrically by using (i) regression splines or neural networks as basis functions and (ii) AIC or BIC for model selection. Its computational and statistical advantages are illustrated in simulation studies and in Phase I trials. KEY WORDS: mixed effects model; hybrid estimator; neural network; regression splines. INTRODUCTION A widely used model in population pharmacokinetics (PK) and pharmacodynamics (PD) is the nonlinear mixed effects model of the form y ij = f i (t ij,θ i ) + ε ij, θ i = g(x i,β)+ b i (1 j n i, 1 i I ), (1) where θ i is a 1 r vector of the ith subject s parameters whose regression function on the subject s observed covariate x i is g(x i,β) with a 1 s 1 2 Department of Statistics, Stanford University, Stanford, CA 94305, USA. 3 Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. Department of Statistics, the Chinese University of Hong Kong, Hong Kong, China. To whom correspondence should be addressed. lait@stat.stanford.edu X/06/ / Springer Science+Business Media, Inc.

2 50 Lai, Shih, and Wong parameter vector β, which is the fixed effect to be estimated. The random effects b i in Eq. (1) are assumed to be independent and identically distributed (i.i.d.) and their nonzero components have a common distribution G with mean 0. The ith subject s response y ij at t ij has mean f i (t ij,θ i ), in which f i is a known function. Given θ i, the random errors ε ij are assumed to be independent normal random variables with mean 0 and standard deviation σω ij (θ i ), in which ω ij is a given function and σ is an unknown parameter. In PK, the concentrations at times t ij after the administration of a single oral dose D i are often modeled by the one-compartment model D i k ai y ij = V i (k ai k ei ) (e k ei t ij e k ait ij ) + ε ij, 1 j n i. (2) Here V i, k ai, k ei are the ith subject s volume of distribution, absorption rate, and elimination rate, respectively, and their logarithms constitute the vector θ i in Eq. (1). The regression function g relates θ i to the ith subject s physiologic characteristics that constitute the covariate vector x i.the population distribution G is usually assumed to be normal with unknown parameters which, together with β and σ, can be estimated by maximum likelihood. Unlike linear mixed effects models in which the normality assumption on G yields closed-form expressions of the likelihood, the normality of G in nonlinear mixed effects models leads to computationally intensive likelihoods that involve I multiple integrals. A commonly used approach, as adopted in the software package NONMEM (1), and the nlme function in S-Plus due to Lindstrom and Bates (2) and Pinheiro and Bates (3) is to develop iterative schemes based on first-order approximations of f i (t ij, g(x i,β)+b i ) in Eq. (1), so that the normality assumption on G can be used to reduce the problem to that of a linear Gaussian mixed effects model at each iterative step. A basic issue with this approximation is that when some of the subjects have sparse data there are considerable errors in approximating the likelihood function via these first-order approximations, as noted by Yafune et al. (4) who propose to use Monte Carlo integration to evaluate the I multiple integrals in the likelihood function for Phase 1 studies but point out that the computational time (already taking 22 hours in their particular Phase 1 trial) may be too long for Phase 2 (or later) trials to be of practical interest. Another issue is that the actual population distribution may be highly nonnormal. Since there is no computational advantage in using normal G when first-order approximations to reduce to linear Gaussian mixed effects models are not used, it may be more appropriate to try more flexible parametric families for G. Refs. (5) (7) propose certain parametric families that

3 A New Approach to Modeling Covariate Effects 51 incorporate skewness and multimodality, but they are too computationally intensive for routine use. Refs. (8) (10) point out difficulties in likelihood inference due to inaccuracies of the first-order approximations in nonlinear mixed effects models. In this paper we make use of a hybrid approach, developed recently by Lai and Shih (11), that uses first-order approximations based on Laplace s method to evaluate the likelihood when the subject has sufficient data, in combination with Monte Carlo approximations of the likelihood involving relatively few simulation runs when the subject has sparse data. Details of this approach and its underlying rationale are given in the next section, which also discusses why this approach can lead to consistent estimators of the fixed effects even when the population distribution G is misspecified (e.g., as normal). In this connection, a review of recent work by Lai and Shih (12) and by previous authors on nonparametric modeling of G is also given in (11), where it is shown that such nonparametric modeling typically does not yield better estimates of the fixed effects than the hybrid approach that assumes G to be normal. Moreover, an improved hybrid method that uses importance sampling instead of direct Monte Carlo is developed. Another issue with NONMEM that we address herein is the choice of the functional form of g. Whereas the choice of f i in Eq. (1) is usually based on scientific theory, like Eq. (2) for PK, regression of the random effects θ i on the covariates x i is statistical modeling of the black box type, and one often chooses linear regression functions for convenience. In recent years there have been attempts to apply modern nonparametric regression techniques, such as generalized additive models (13), to the residuals after fitting a linear regression model or even the simpler population model without covariate effects. Because the θ i in Eq. (1) are unobservable, this raises the question of how residuals should be defined. We review previous definitions in the literature and propose a new definition by making use of the concept of generalized residuals introduced by Cox and Snell (14). These generalized residuals, computed by the hybrid method, enable us to perform diagnostic checks of the assumed regression model (see Regression Diagnostics ). Instead of linear regression, we assume more flexible regression models that use regression splines or neural networks as basis functions, and fit them via the likelihood function which has relatively low computational complexity when we use the improved hybrid method. This computational tractability also enables us to address likelihood-based inference and develop model selection schemes (see Flexible Regression Modeling and Likelihood Inference ). The individualization problem is of fundamental interest in population PK. Since the efficacy and toxicity of a drug are directly related

4 52 Lai, Shih, and Wong to drug concentrations at a target site, which are generally not available but for which blood concentrations are often good surrogates, criteria for designing the dosing regimen for a specific subject often involve functions of the subject s concentrations, or equivalently in Eq. (1), functions of the subject s parameter vector θ = g(x,β) + b. The subject s blood samples are often too sparse to provide an adequate estimate of θ. The empirical Bayes approach borrows information from healthy volunteers in Phase I studies who have undergone intensive blood sampling and also from clinical patients for whom intensive blood sampling is not feasible. Combining an individual patient s characteristics (as measured by x) and sparse concentration data with a large database from other subjects is one of the main motivations for building population PK models. Making use of the improved hybrid method, we show how empirical Bayes estimates of h(θ) can be computed from (a) the patient s data and (b) the population PK model fitted from other subjects data (see Individualization ). A HYBRID METHOD FOR MAXIMUM LIKELIHOOD ESTIMATION Suppose the distribution G is normal with mean 0 and covariance matrix. For given values of β,σ and, the integral for the ith subject in the likelihood function can be written as an expectation Eψ i (b), which can be computed by Monte Carlo simulations of the random vector b with the normal density function φ having mean 0 and covariance matrix. Alternatively, letting e l i (b) = ψ i (b)φ (b), we can use Laplace s method to approximate the integral e l i (b) db (1) db (r) =(2π) r/2 l i ( b i ) 1/2 e l i ( ˆb i ), (3) where l i is the Hessian matrix of second partial derivatives of l i with respect to the components b ( j) of b, and b i is the maximizer of l i (b). Laplace s approximation basically approximates l i by a quadratic function in a neighborhood of the maximizer b i as λ min ( l i ( b i )), where λ min ( ) denotes the minimum eigenvalue of a symmetric matrix. If the observations (y ij, t ij ), 1 j n i, are sufficiently informative about the ith subject s parameter vector θ i = g(x i,β 0 ) + b i,thenfor(β, σ ) near the true value (β 0,σ 0 ), l i (b) becomes peaked around b i and can be well approximated by the quadratic function l i ( b i )+(b b i ) T l i ( b i )(b b i )/2. Laplace s approximation is also applicable when λ min ( 1 ) is large, which occurs when the distribution of b is concentrated around 0. When λ min ( 1 ) is not sufficiently large and the ith subject has sparse data, Laplace s method may give a poor approximation to the left

5 A New Approach to Modeling Covariate Effects 53 hand side of Eq. (3), which will be denoted by L i (β,σ, ). These considerations led Lai and Shih (11) to introduce the following hybrid method for evaluating L i (β,σ, ), which combines Laplace s with the Monte Carlo approximation. Choose a threshold c and let V i = l i ( b i ). (i) If λ min (V i )<c, evaluate L i (β i,σ, )by the Monte Carlo approximation B 1 B j=1 ψ i ( 1/2 z j ),wherez j, j = 1,...,B, are independent random vectors from the standard normal distribution. Note that 1/2 z j is normal with mean 0 and covariance matrix. (ii) If λ min (V i ) c, evaluate L i (β,σ, ) by its Laplace approximation (2π) r/2 V i 1/2 e l i ( ˆb i ). Following Lindstrom and Bates (2), the iterative procedure used to maximize the logarithm of I i=1 L i (β,σ, ) first maximizes over β for fixed η = (σ, ) and then maximizes over η for fixed β, repeating until convergence or until a prespecified maximum number of iterative steps is reached. To avoid numerical instability in differentiating log L i with respect to β, care should be taken when L i computed by Monte Carlo approximation is small, in which case we can circumvent the difficulty by simply replacing L i by its Laplace approximation whose logarithm is convenient for differentiation. Details on the choice of the threshold c and starting values for β,σ, can be found in Section 3.2 of (11). In particular, for typical population PK studies that involve both healthy volunteers from whom intensive blood sampling is conducted and clinical patients who only have sparse blood samples, one can first single out potentially good studies and check their λ min (V i ) values. It is usually adequate to choose a threshold c as low as 10 for λ min (V i ) to determine if these potentially good studies indeed qualify for using Laplace s approximation to L i (β,σ, ); see Simulation Study for further discussion of the choice of c. Moreover, for such experimental designs, good starting values can be obtained by using only those studies that have sufficient data so that their θ i can be well estimated by the nonlinear least squares estimate based on (y ij, t ij ),1 j n i. By performing simple diagnostics on the appropriateness of using Laplace s approximation to evaluate the integral in Eq. (3) for the ith subject, the hybrid approach preserves the computational simplicity of Laplace s method when it can be used and switches to the Monte Carlo method when Laplace s method fails. If the ith subject has enough data so that l i (b) is peaked around b i for (β, σ ) near (β 0,σ 0 ), the Monte Carlo approach becomes unreliable unless B is very large or importance sampling is used to generate the B samples from a distribution that is peaked around b i, so Laplace s method gives a better approximation to

6 54 Lai, Shih, and Wong L i (β,σ, ) in this case. On the other hand, if the ith subject has sparse data and l i (b) is relatively flat in b, then applying the Monte Carlo approach is tantamount to choosing a random distribution G i,whichis the empirical distribution of a sample of B random vectors 1/2 z j with standard normal z j, to approximate G. As there is no need for high resolution in the random distribution used to approximate the actual G (which may not even be normal), using 50 B 200 samples in the Monte Carlo method should be able to provide enough statistical detail while maintaining a low computational cost comparable to that of the first-order method that can be derived from Laplace s approximation (2). We can improve the Monte Carlo method in (i) above by using importance sampling instead of sampling directly from φ. Specifically, we evaluate L i (β,σ, ) by the importance sampling estimate B / B ψ i (ζ j )w j w j, (4) j=1 where P{ζ j = 1/2 z j }=p = 1 P{ζ j = b i + (V i + ɛ I ) 1/2 z j } with standard normal z j, which corresponds to sampling ζ j from a mixture of the prior normal distribution with density φ and the posterior normal distribution with mean b i and covariance matrix (V i + ɛ I ) 1, choosing some small ɛ>0 to ensure that the covariance matrix is invertible. Denoting the density function of this mixture distribution by λ, note that λ(x) = pφ (x) + (1 p)φ (Vi +ɛ I ) 1(x b i ).Thew j in Eq. (4) are the importance weights given by w j = φ (ζ j )/λ(ζ j ). Note that the special case p = 1 reduces to direct Monte Carlo in (i) above, whereas the case p = 0 corresponds to a Monte Carlo implementation of Laplace s method. We recommend choosing p in the range 0.2 p 0.5. This choice of the importance distribution has the advantage of further incorporating the essence of Laplace s approximation in the simulation step, making the method less dependent than direct Monte Carlo used in (11) on the choice of the threshold c. For further discussion of this importance sampling approach, see Individualization. In the case where the I studies contain many good ones (in the preceding sense), Lai and Shih (12) developed nonparametric maximum likelihood estimates of G,β and σ. Previous work in this direction by Mallet (15,16) and Mentré and Mallet (17) assumes that the x i are i.i.d. so that β can be estimated via the joint distribution of (x i, b i ), which can be estimated by using the algorithms developed in (15). By using the good studies to initialize the nonparametric maximum likelihood estimate of (G,β,σ), Lai and Shih (12) were able to dispense with the restrictive assumption that x i be i.i.d. and to estimate the finite-dimensional j=1

7 A New Approach to Modeling Covariate Effects 55 parameter β directly without going through the much more difficult infinite-dimensional problem of estimating the joint distribution of (x i, b i ). They found, however, that even when G is highly non-normal (e.g., has a bimodal distribution), the parametric estimates of β and σ that assume normal G compare favorably with the nonparametric estimates. An asymptotic theory explaining this is given in (11). Since the nonparametric maximum likelihood estimate Ĝ has relatively low resolution (with very slow rate of convergence to G as the total sample size n 1 + +n I becomes infinite), approximating the population distribution G by a normal distribution (with covariance matrix to be estimated from the data), or by the random distribution G i when l i (b) is relatively flat in the hybrid method, is usually an innocuous assumption in population PK/PD models. FLEXIBLE REGRESSION MODELING AND LIKELIHOOD INFERENCE The function g in Eq. (1) is often assumed to be linear in β and x i because of simplicity and ease of interpretation, but may be overly restrictive in practice. Mandema et al. (18) introduced a three-step procedure to relax this linearity assumption. The first step consists of fitting a basic PK/PD model without covariates, from which empirical Bayes estimates θ i = ( θ i1,..., θ ir ) of θ i are derived. In the second step, θ im (m = 1,...,r) is regressed on the covariates x i = (x i1,...,x ip ) by using a generalized additive model of the form θ im = a m + p g ml (x il ), m = 1,...,r, (5) l=1 in which the constants a m and g ml are estimated from ( θ i, x i ), i = 1,...,I, using splines of degree k to approximate the functional form of g ml. A stepwise addition/deletion method is used to decide which covariates should be included in the model by using Akaike s information criterion (AIC), which is also used to choose the degree k of the splines. In the third step, NONMEM is used to estimate the parameters of the model chosen in the second step. The additivity assumption in generalized additive models precludes interactions among the covariates. Moreover, the empirical Bayes estimate θ i in Eq. (5), derived from the PK/PD model without covariates, may differ considerably from the actual θ i which is not observable and which in fact depends on covariates. Since the hybrid method enables us to carry out full likelihood computations, we propose to apply a likelihood-based model selection procedure

8 56 Lai, Shih, and Wong that consists of stepwise forward addition followed by stepwise backward elimination, similar to that introduced by Kooperberg et al. (19) for generalized linear models (without random effects). In addition, we propose to use instead of additive regression models the following regression functions which do not require the additivity assumption and which are widely used in nonparametric multiple regression because of their attractive statistical and computational properties. (i) Regression splines: For univariate x i, a regression spline of degree k is a piecewise polynomial, for which the regions that define the pieces are separated by knots and the polynomials join smoothly at the knots. It can be expressed as a linear combination of the basis functions 1, x i,...,x k i and (x i ξ j ) k +, where the ξ j are the knots and t + = max(0, t). An alternative piecewise polynomial basis that has computational advantages consists of the B-splines (20). For multivariate x i, one can define regression splines by adding tensor products of the univariate regression splines, i.e., choosing g in Eq. (1) to be g(x i,β)= β 0 + M β m B m (x i ), (6) in which B m (x i ) is a product of terms of the form x l ij or (x ij ξ mj ) k +, for some 1 l k. Note that estimating the coefficients β m of regression splines once the knots are determined involves the same procedure as that of the traditional mixed effects model with linear regression of the random effects on the covariates. Moreover, linear regression corresponds to the special case in which there are no knots. It is convenient to choose the knots at certain quantiles of the predictor variables. A much more computationally intensive adaptive knot selection scheme has been developed by Friedman (21) for the case k =1 (i.e., linear splines) in his MARS (multivariate adaptive regression splines) procedure. In population PK/PD studies, because the number I of subjects is typically not larger than a few hundred and because many of these subjects have sparse data, using quadratic splines with a few knots at some quantiles of the predictor variables typically suffices. Stone (22) has established certain asymptotic optimality properties of using these spline approximations to regression functions. (ii) Neural networks: The term neural network refers to a multi-layer regression function that represents the output in each unit of a layer as a nonlinear function of linear combinations of the inputs, which are the outputs from the units in the previous layer. A popular choice of the nonlinear function is the sigmoid φ(z) = 1/(1 + exp( z)), and a simple class of neural networks that suffices for PK/PD applications is the feedforward neural network model (NN k ) with a single layer of k sigmoidal m=1

9 A New Approach to Modeling Covariate Effects 57 hidden units: g(x) = γ 0 + k j=1 γ jφ(a j + α T j x). Barron (23) has proved that a large class of smooth functions can be approximated by sums of single-layer sigmoidal functions, with an integrated squared error of order O(k 1 ) that does not depend on the dimension p of x. This is often called the universal approximation property of feedforward neural networks for multivariate function approximation that circumvents the curse of dimensionality p. To include the commonly used linear regression function as a special case, we propose to augment NN k to a more general model of the form g(x i,β)= γ 0 + α T 0 x i + k γ j φ(a j + α T j x i), (7) with parameter vector β = (γ 0,γ 1,...,γ k, a 1,...,a k,α T 0,αT 1,...,αT k ). j=1 Likelihood Inference Let R i = log L i, and use Ṙ i to denote the gradient vector of partial derivatives and R i to denote the Hessian matrix of second derivatives of R i with respect to σ and the components of β and. A consistent estimator of the asymptotic covariance matrix of ( β, σ, ) is V 1,where V is the observed information matrix i=1 I R i ( β, σ,î), which can be computed by taking numerical derivatives and using the hybrid method to evaluate L i ; see (11). By Theorem 2 of (11), V 1/2( β β, σ σ, ij ij has a limiting standard multivariate normal distribution as )1 i j r I under certain regularity conditions. To test a null hypothesis H that some components of ω = (β,σ, ) satisfy certain equality constraints (e.g., β s+1 = = β s = 0), we can use the hybrid method to compute the generalized likelihood ratio statistic { I GLR = 2 i=1 R i ( β, σ, ) I i=1 } R i ( ω H ), (8) where ω H is the maximum likelihood estimate of ω under H. Because we have a smooth parametric family here, Eq. (8) has a limiting χ 2 distribution (as I ), with degrees of freedom equal to the difference between the dimensionalities of H and the unconstrained parameter space. Simulation studies in (8) (10) have shown the significance levels of GLR tests using the χ 2 approximation to be anti-conservative. A possible explanation for this is that the first-order approximation to the likelihood function used in these simulation studies may yield nominal GLR values

10 58 Lai, Shih, and Wong that differ substantially from the actual values. Therefore, using a more accurate method such as the hybrid method to compute the left hand side of Eq. (3) can improve the approximation. On the other hand, the sample size (or more precisely the information content of the experimental design) may not be large enough for the applicability of the limiting χ 2 distribution of (exact) GLR statistics. A more reliable approach is to compute the sampling distribution of the approximate GLR statistic by bootstrap simulations. The bootstrap test involves computing the sampling distribution of the GLR statistic by Monte Carlo simulations assuming the unknown parameters to take the value ω H, and its theoretical justification is that the GLR statistic is an approximate pivot under the null hypothesis; see (24). For similar reasons, it is more reliable to use the bootstrap method (24, Sections 12.5 and 21.5) to construct likelihood confidence regions for ω than to apply the asymptotic normality of V 1/2( β β, σ σ, ij ij that results in ellipsoidal confidence regions; see Sections 4.2 )1 i j r and 4.4 of (25) for further details in a related application. Model Selection Model selection consists of choosing the number of knots in regression splines, or the number of hidden units in single-layer neural networks, and which of a set of covariates should be included in the model. Two commonly used likelihood-based criteria are Akaike s information criterion (AIC) and the Bayesian information criterion (BIC), where BIC = 2 I log L i ( β, σ, ) + q log N, (9) i=1 with q log N in Eq. (9) replaced by 2q foraic,inwhichq is the number of unknown parameters in the model and N is the total number of observations. To circumvent the computational complexity of minimizing the information criterion over a potentially large set of models, we use a model selection procedure that consists of stepwise forward addition followed by stepwise backward elimination, similar to that introduced in (19) for polychotomous regression. To fix the ideas, we consider the case of regression splines, for which the spline basis in Eq. (6) also includes the choice of covariates. For the forward addition of basis functions, we follow the following guidelines of Kooperberg et al. (19): (a) The constant function 1 is included in Eq. (6). (b) If (x ik ξ mk ) r + (x ih ξ mh ) r + is included in Eq. (6), then so are (x ik ξ mk ) r + and (x ih ξ mh ) r + ; thus, main effects are included before

11 A New Approach to Modeling Covariate Effects 59 including interactions. (c) If (x ik ξ mk ) r + is included in Eq. (6), then so are xik l, l = 1,...,r; thus, polynomials of degree r are included before incorporating knots. One reason for these guidelines is that adding main effects before incorporating knots and then adding interactions yield models that are easier to interpret. Another reason is that Stone s (22) theory on optimal rate of convergence in nonparametric function estimation assumes such hierarchical structure. As in (19), a forward addition step aims at adding to the current model the most significant among all possible candidates. Specifically, suppose M 1 spline basis functions, with coefficients β 1,..., β M 1, together with σ and have been fitted to the model and an additional spline basis function is to be chosen from k candidates. For the jth candidate, we consider the Mth component s ( j) M of the score vector I i=1 Ṙi(( β 1,..., β M 1, 0), σ, ) that includes the M 1 basis functions and the jth candidate. Let ṽ ( j) M be the Mth diagonal element of I R i=1 i (( β 1,..., β M 1, 0), σ, ). We choose the candidate that maximizes s ( j) M / ṽ ( j) M for forward addition. The rationale of using score statistics instead of the GLR statistics in Eq. (8) for this pseudo-test of H: β ( j) M =0 is their computational simplicity in ranking the significance of the k candidates. Stepwise forward addition terminates when an information criterion for model selection does not decrease with the addition of a basis function, or when there is no more candidate basis function to be included. Stepwise backward elimination begins with the termination of stepwise forward addition, and proceeds until the information criterion for model selection does not improve with the deletion of a basis function. As in (19), a backward elimination step aims at excluding from the current model the least significant basis function, in which significance is ranked by the Wald statistics. Specifically, suppose M spline basis functions, with coefficients β 1,..., β M, together with σ and have been fitted to the model. The Wald statistic for testing β j =0is β j /ŝe j,whereŝe j is the square root of the jth diagonal element of { I i=1 R i ( β, σ, )} 1. If elimination of the basis function with the smallest Wald statistic leads to a new model with a smaller BIC (or AIC), then the new model is preferred. Note that in the preceding model selection procedure, the score statistics and Wald statistics are only used to rank the significance of the candidate basis functions for inclusion or deletion, and selection is based on an information criterion rather than on significance testing. Therefore, the issue of anti-conservative significance tests in model selection raised in (8) (10) is not relevant to our model selection procedure. Following (19),

12 60 Lai, Shih, and Wong we use BIC (instead of AIC) in the subsequent simulation and experimental studies. For the neural network basis, we can again proceed by stepwise forward addition and backward elimination, with the number k of hidden units varying between 0 and some small upper bound K. The forward procedure chooses sequentially for each k the variables to be included in NN k. It enters one variable at a time, choosing the set of variables not already entered the one with the largest score statistic. Also, it starts with k =0 (the linear regression case), then proceeds to k =1, etc. The forward selection terminates with a k and a set of variables associated with NN k. Backward elimination then proceeds to delete variables sequentially from NN k. INDIVIDUALIZATION AND REGRESSION DIAGNOSTICS Individualization An important application of a nonlinear mixed effects model is the individualization problem of estimating a function h(θ) of a subject s unobservable parameter θ, given the subject s covariate x and some (or even no) measurements taken from the subject. To fix the ideas, assume that all the f i in Eq. (1) are equal to f, and that the standard deviation ω ij (θ i ) of ɛ ij /σ are of the form ω(t ij,θ i ).Ifβ,σ and are known, then a natural estimate of h(θ) is the posterior mean of h(θ) given the subject s data. Without assuming β,σ and to be known, the empirical Bayes approach replaces them in the posterior mean by their estimates β, σ, so that h(θ) is estimated by ĥ = E ˆβ,ˆσ,ˆ {h(θ) subject s data}. (10) The expectation in Eq. (10) can be evaluated by the hybrid method which we have used for likelihood calculation. First note that Laplace s approximation in Eq. (3) is based on the Taylor expansion l i (b). = l i ( ˆb i ) + (b ˆb i ) l i ( ˆb i )(b ˆb i ) T /2. (11) Since the density function of b i given the subject s data is proportional to e l i (b), it follows from Eq. (11) that the conditional distribution of b i given the ith subject s data is approximately normal with mean ˆb i and covariance matrix Vi 1, where V i = l i ( ˆb i ). Hence for a new subject with informative data (i.e., satisfying λ min (V ) c), for whom we drop the

13 A New Approach to Modeling Covariate Effects 61 subscript i in l i, V i and ˆb i, we can use the above normal density to evaluate E ˆβ,ˆσ,ˆ {h(θ) subject s data} approximately via h(θ) = h(g(x, ˆβ) + b). = h(g(x, ˆβ) + ˆb + V 1/2 Z), (12) where Z is standard normal. The expectation with respect to Z can be evaluated either by the Taylor expansion whose expectation yields h(θ) =. h(g(x, ˆβ) + ˆb) + h (g(x, ˆβ) + ˆb)V 1/2 Z Z T V 1/2 h (g(x, ˆβ) + ˆb)V 1/2 Z ĥ. = h(g(x, ˆβ) + ˆb) V 1/2 h (g(x, ˆβ) + ˆb)V 1/2, (13a) or by Gaussian quadrature applied to h(z) = h(g(x, ˆβ)+ ˆb+V 1/2 z), yielding ĥ =. h(z) exp{ z T z/2} dz =. ( ) J J ( ) r 2 h 2s j a ji, (13b) j 1 =1 with a small J, wheres j = (s j1,...,s jr ) T, {s j } J j=1 and {a j} J j=1 are predetermined sequences such that the approximation is exact if r = 1and h( ) is a polynomial of degrees less than 2J. When the eigenvalue criterion in the hybrid method fails (i.e., when λ min (V )<c), Eq. (10) can be evaluated by using importance sampling for the Monte Carlo approximation j r =1 i=1 ĥ. = J j=1 ( h g(x, ˆβ) + b ( j)) / J w j w j, (14) j=1 where {b (1),...,b (J) } are independent samples from some density λ( ) and the importance weights are w j = e l(b( j)) /λ(b ( j) ), noting that the posterior density function of the subject s random effect b given the subject s data is proportional to e l(b). The density λ is typically chosen so that it is easy to sample from and also has a simple formula for the weights w j andsothat the coefficient of variation of the w j is not too large. One good choice of λ is a mixture of the prior normal distribution N(0, ˆ ) and the posterior normal distribution N( ˆb,( l( ˆb) + εi ) 1 ),whereε is a positive constant to ensure that the covariance matrix is positive definite. We recommend choosing 0.2 p 0.5 in the mixing proportion p : (1 p) for the prior N(0, ) versus the posterior normal distributions in the mixture.

14 62 Lai, Shih, and Wong Regression Diagnostics The empirical Bayes idea can also be used to provide diagnostics for the regression function g. If the individual parameters θ i were observed, the residuals r i = θ i g(x i, ˆβ) would provide approximations for the i.i.d. random variables b i that are not observable. Therefore substantial deviation of these residuals from an i.i.d. pattern would suggest inadequacies and possible improvements of the assumed regression model. Since the θ i are not observed, we propose to replace them by the empirical Bayes estimate E ˆβ, ˆσ 2, ˆ (θ i y i, t i, x i ), leading to the following generalized residuals in the sense of Cox and Snell (14): ˆr i = E ( ˆβ,ˆσ 2, ˆ ) (θ i y i, t i, x i ) g(x i, ˆβ) = E ( ˆβ,ˆσ 2, ˆ ) (b i y i, t i, x i ), i =1,...,I. (15) Noting that the first expectation in Eq. (15) is a special case of Eq. (10) with h(θ) = θ, the calculation of ˆr i can be carried out by the hybrid method. These Cox Snell-type generalized residuals ˆr i provide better approximations to the unobservable residuals r i = θ i g(x i, ˆβ), particularly when the ith subject has sparse data, than the computationaly more convenient ˆb i which Maitre et al. (26) proposed to use as residuals. SIMULATION STUDY Consider a one-compartment open model with first-order absorption given by Eq.(1) in which θ i = (log V i, log k ai, log k ei ),wherev i, k ai and k ei denote the ith subject s volume of distribution, absorption rate and elimination rate, and f i = f with f (t ij,θ i ) = 500k ai V i (k ai k ei ) (e k ei t ij e k ait ij ), θ i = g(x i,β)+ b i = (g 1 (x i ) + b i1,β 2 + b i2,β 3 + b i3 ), (16) in which x i is the subject s age. Assume that ε ij is normal with standard deviation σ f (t ij,θ i ),andthatb i1, b i2 and b i3 are independent normal with mean 0 and var (b ik )=τ 2 k, k =1, 2, 3. We take g 1(x i ) = 3 + exp{ (x i 20)(x i 25)(x i 40)}, β 2 = 1, β 3 = 1, τ 1 = 0.1, τ 2 =0.5, τ 3 =0.2, σ =0.1. One hundred datasets are generated from this model, each consisting of I 1 =30 subjects with 15 measurements each, taken at times t ij = 0.17, 0.33, 0.5, 0.75, 1, 1.5, 2, 2.5, 3, 4, 5, 6, 8, 10, 12, and I 2 = 100 subjects with sparse (2 to 4) measurements taken at timepoints that are randomly selected from the above 15 timepoints. For the 30 subjects with 15

15 A New Approach to Modeling Covariate Effects 63 measurements, their ages x i are randomly sampled from the uniform distribution on {19,...,65}. Thex i for the 100 subjects with sparse data are randomly chosen from seven strata: 2 6, 7 12, 13 18, 19 24, 25 40, 41 65, 66 75, with stratum sizes proportional to 1:1:1:2:2:2:1. We applied nonparametric modeling of g 1 using regression splines or neural networks g 1 (x i ) = α 0 + α 1 x i + +α r x m i + g 1 (x i ) = γ 0 + α 0 x i + K γ k (x i ξ k ) m + k=1 K γ k ψ(a k + α k x i ), in conjunction with the hybrid method with c =10 and B =50, to estimate g 1, β 2, β 3, σ, τ 1, τ 2 and τ 3. k=1 A Typical Dataset In this simulated data set, there are 31 subjects with 2 observations (each), 32 subjects with 3 observations, and 37 subjects with 4 observations. The five-number summaries (minimum, 1st quartile, median, 3rd quartile, maximum) of the age variable x i are (19, 22.25, 32.5, 43.5, 60) for the 30 subjects with 15 measurements each, and (2, 17, 24.5, 44.5, 74) for the 100 subjects who have sparse measurements. Figure 1 shows that the fitted ĝ 1 by neural networks with 1 hidden unit (NN1, dot-and-dash curve) and suitably chosen regression splines (long dashed curve) approximate the true regression function (solid curve) quite well, while the linear model (dotted line) does not catch the nonlinear pattern around x =10. Table I gives the estimates of β 2, β 3, σ, τ 1, τ 2, and τ 3 together with the BIC values for different models. Here the selected neural network model has one hidden unit (NN1); the selected spline model has order 2. Table I shows that for this particular dataset the spline model has the smallest BIC. The estimates of β 3 and σ are close to the true values but the estimates of β 2 and of the variances of the random effects are larger than the true values. Note that there are much fewer observations in the absorption phase than in the elimination phase, which may account for the less accurate estimate of the absorption rate. To examine the sensitivity of the parameter estimates to the choice of c in the hybrid estimation method, we refitted the linear model using c = 5, 10, 15, 30, 60. Table II shows that the parameter estimates are quite

16 64 Lai, Shih, and Wong g^1 (x) True Linear(nlme) Linear NN1 Splines x Fig. 1. Fitted regression function ĝ 1 for a typical simulated PK study. The solid curve represents the true function. The dotted, dot-and-dash, and long dashed curves are, respectively, the linear, NN1, and splines models fitted by the hybrid method. The short dashed curve is thhe linear model fitted by nlme. Table I. Estimates of Regression Parameters β 2, β 3, Measurement Error Standard Deviation σ, the Standard Deviation τ of the Random Effects, and BIC for a Typical Data Set Model β 2 β 3 σ τ 1 τ 2 τ 3 BIC True Linear (nlme) Linear NN Spline similar for c ranging from 5 to 60. Also given in Table II is the CPU time (in seconds) needed per iteration. For comparison, the CPU time required by nlme to fit the same linear model is second per iteration, which is shorter than the hybrid method because it uses Laplace s approximation for all individuals and does not need to calculate the eigenvalues of V i.

17 A New Approach to Modeling Covariate Effects 65 Table II. Estimates of Regression Parameters β 2, β 3, Measurement Error Standard Deviation σ, the Standard Deviation τ of the Random Effects, and BIC for a Typical Data Set, under the Linear Model and a Range of Values for Threshold c CPUtime(s) c β 2 β 3 σ τ 1 τ 2 τ 3 per iteration As expected, the CPU time needed per iteration for the hybrid method increases with c in Table II because for larger c, more individuals are classified inadequate for the application of Laplace s approximation in the likelihood evaluation. In particular, c = 0 corresponds to using Laplace s approximation for all individuals and c = corresponds to using importance sampling, which requires longer computing time, for all individuals. Based on the tradeoff between the computational effort and the approximation error, we chose c = 10 for the numerical examples in this paper. We applied the fitted model to estimate the concentration versus time curve of a new subject at age x = 30 with no measurements or with only a few concentration measurements. We used the Monte Carlo estimate given in Eq. (14) with J = 50 importance samples and p = 0.2,ɛ = to form the mixture importance density. The estimated concentration versus time curves are plotted in Figure 2 which shows that, by making use of the data from the 130 subjects, the fitted regression models provide good empirical Bayes estimates of the concentration versus time curve when the new subject has only three observations. Bias and Standard Error Based on 100 Simulated Datasets Table III shows the mean and standard error for the estimates of β 2,β 3,σ,τ 1,τ 2,τ 3, under various models fitted by the hybrid method. Also given in comparison are the corresponding values for the linear model fitted by the S-function nlme. It shows that the estimates are generally close to the true values except for the upward bias in τ 1, which reflects the approximation error ĝ 1 g 1 for volume of distribution. As expected, the bias in τ 1 for splines or NN1 model is smaller than that of the linear model because the former approximates the nonlinear function g 1 better. In addition, the standard errors of all hybrid estimates are in general

18 66 Lai, Shih, and Wong (a) 10 (b) 10 Concentration Concentration Hour Hour (c) 10 8 Concentration Hour Fig. 2. Estimates of concentration curve for a new subject with covariate x = 30, based on a fitted population PK model: (a) when no observation is available, (b) when two observations are available (marked by ), (c) when three observations are available. The solid, dotted, dotand-dash and dashed curves denote, respectively, the subject s true concentration curve and the Bayes estimate of f (t,θ) based on the fitted linear, NN1 and spline models. smaller than those of nlme, indicating the hybrid method gives more precise estimates than nlme. Since the unknown regression function g 1 (x) is nonlinear and the random effects b i1, b i2, b i3 are unobservable, we evaluated the combined mean absolute error of (ĝ 1, β 2, β 3, τ 1, τ 2, τ 3 ) by the relative mean absolute error I n i rmae = f / I ij 1 n i (17) f ij i=1 j=1 for each simulated dataset, where i=1 f ij = M 1 f ij = M 1 M m=1 M m=1 f (t ij ;ĝ 1 (x i ) + τ 1 z (m) i1, β 2 + τ 2 z (m) i2, β 3 + τ 3 z (m) i3 ), f (t ij ; g 1 (x i ) + τ 1 z (m) i1,β 2 + τ 2 z (m) i2,β 3 + τ 3 z (m) i3 ), (18)

19 A New Approach to Modeling Covariate Effects 67 Table III. Mean and Standard Error (in parentheses) for Estimates of β 2,β 3, Measurement Error Standard Deviation σ, and the Standard Deviations τ 1,τ 2,τ 3 of the Random Effects for the Simulation Study, based on 100 Simulated Datasets Model β 2 β 3 σ τ 1 τ 2 τ 3 True Linear (nlme) (0.285) (0.277) (0.066) (0.091) (0.174) (0.064) Linear (0.042) (0.009) (0.004) (0.052) (0.076) (0.023) NN (0.042) (0.009) (0.004) (0.051) (0.056) (0.023) Spline (0.102) (0.022) (0.004) (0.066) (0.058) (0.027)

20 68 Lai, Shih, and Wong and {z (1) i1, z(1) i2, z(1) i3,...,z(m) i1, z(m) i2, z(m) i3 } is a set of independent standard normal random variables. The average over the M random vectors (z (m) z (m) i2, z(m) i3 ) 1 m M in Eq. (18) is basically a Monte Carlo estimate of the expected concentration f ij. = f (t ij ; g 1 (x i ) + b i1,β 2 + b i2,β 3 + b i3 )φ (b i ) db i, i1, with (g 1,β 2,β 3,τ 1,τ 2,τ 3 ) replaced by (ĝ 1, β 2, β 3, τ 1, τ 2, τ 3 ) for f ij.herewe choose to work with relative error in Eq. (17) because the standard deviation of ε ij is proportional to the mean so normalization to the mean is needed. Also, we choose absolute rather than squared errors because the former is more stable and robust. Figure 3 shows the boxplots of the 100 replicates of rmae, using M = 500 to compute f ij and f ij in Eq. (18). The NN1 and the splines model perform much better than the linear models. EXPERIMENTAL STUDY An orally administered cancer drug, temozolomide, was given to 65 adult patients with advanced cancer in four Phase I trials sponsored by the Schering Plough Research Institute (Jen et al. (27)); see also a smaller pilot study by Newlands et al. (28). A total of 756 concentration measurements were collected. Each subject had concentration measurements from 10 min to 16 h after a single dose. These concentrations are modeled by the one-compartment open model in Eq. (16) that is used in the above simulation study, and the objective is to identify the influence of patient characteristics on the pharmacokinetics. Patient covariates included in the analysis are body surface area, gender, age and creatinine clearance, forming the vector x i in Eq. (16). Test for Gender Difference An important question in the study was whether there was gender difference in volume of distribution, which we addressed by using the GLR test described in Eq. (8). The GLR test statistic computed by the hybrid method was , which gives a p-value of using the χ 2 approximation and which is considerably smaller than the value given by nlme (with p-value of using χ 2 approximation). To check the validity of the χ 2 approximation that shows the volume of distribution to be significantly different between males and females, Figure 4 plots the quantiles of the GLR test statistic, computed by the

21 A New Approach to Modeling Covariate Effects 69 rmae nlme Linear NN1 Splines Fig. 3. Goodness of fit for NN1, splines and linear models fitted by the hybrid method and for the linear model fitted by nlme. Quantiles of LRT statistic from bootstrap replicates Quantiles of LRT statistic from bootstrap replicates Quantiles of Chi-square(1) distribution Quantiles of Chi-square(1) distribution Fig. 4. QQ (quantile versus quantile) plots of the generalized likelihood ratio statistic for gender difference versus χ 2 distribution with 1 degree of freedom, based on the hybrid method (left panel) or nlme (right panel).

22 70 Lai, Shih, and Wong hybrid method (left panel) and by nlme (right panel), from 2000 bootstrap replicates against the quantiles from a χ 2 distribution with 1 degree of freedom. These QQ plots show the χ 2 approximation to be adequate for the present study. Model Selection and Regression Diagnostics We considered four patient covariates for the population model g 1 : body surface area, gender, age, and creatinine clearance. We first examined the generalized residuals in the null model, which assumes no covariates in g 1, against each of the four covariates, and found that the residuals for volume of distribution tend to increase with body surface area. We applied the automatic model selection procedure with respect to these four covariates using three models: linear, neural networks, and splines. All three models selected body surface area for modeling volume of distribution and no covariates for absorption or elimination rates. Table IV presents the fitted parameters of the linear model, neural network with 1 hidden unit, and the spline model with 3 quartiles as the knots for a continuous covariate, treating gender as a dichotomous covariate and using the same fitting procedure as that in the simulation study. The smallest BIC value in Table IV is a spline model that has no knots and is a function of the form 2.917x x 2 without an intercept term, where x denotes body surface area. The linear model in Table IV is x and its BIC value is only slightly larger than that of the quadratic (spline) model. Because of ease of interpretation, we chose the linear model instead of the quadratic model without an intercept term. To check the goodness of fit, the generalized residual plots for the linear model are given in Figure 5. No specific trends are observed with respect to any covariate, indicating that after adjusting for body surface area, volume of distribution is no longer significantly different between males and females. Table IV. Estimates of Regression Coefficients β s, Measurement Error Standard Deviation σ, and the Standard Deviation τ of the Random Effects for the Experimental Study Model β 0 β 1 β 2 β 3 σ τ 1 τ 2 τ 3 BIC Linear (nlme) Linear NN Spline

23 A New Approach to Modeling Covariate Effects 71 Residual of log(v) Residual of log(v) Residual of log(v) Residual of log(v) Body Surface Area Female Male Gender Age Creatinine Clearance Residual of log(ka) Residual of log(ka) Residual of log(ka) Residual of log(ka) Body Surface Area Female Male Gender Age Creatinine Clearance Residual of log(ke) Residual of log(ke) Residual of log(ke) Residual of log(ke) Body Surface Area Female Male Gender Age Creatinine Clearance Fig. 5. Generalized residuals from the linear model fitted by hybrid method. The residuals are marked by except in the panel for gender where a box plot is used. Analogous to rmae introduced in Eq. (17), we measure the goodness of fit by I n i / rmae = ŷ ij I 1 y n i, (19) ij i=1 j=1 where ŷ ij is the model-based estimate f i (t ij, g(x i, ˆβ)+ b)φ ˆ (b) db of y ij. For the linear model fitted by the hybrid method, rmae = 0.306, which is comparable to ˆσ = Since y ij /f (t ij,θ i ) has mean 1 and standard deviation σ, this suggests that the fitted linear model estimates f (t ij,θ i ) reasonably well. The rmae of and for the NN1 and splines model are comparable to that of the linear model, all of which are better than the rmae of for nlme s fitted linear model. i=1 Empirical Bayes Estimates for an Individual s Concentrations To illustrate the usefulness of the fitted linear model for estimating the concentration versus time curve of an adult patient after a single

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

A Novel Screening Method Using Score Test for Efficient Covariate Selection in Population Pharmacokinetic Analysis

A Novel Screening Method Using Score Test for Efficient Covariate Selection in Population Pharmacokinetic Analysis A Novel Screening Method Using Score Test for Efficient Covariate Selection in Population Pharmacokinetic Analysis Yixuan Zou 1, Chee M. Ng 1 1 College of Pharmacy, University of Kentucky, Lexington, KY

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

3 Results. Part I. 3.1 Base/primary model

3 Results. Part I. 3.1 Base/primary model 3 Results Part I 3.1 Base/primary model For the development of the base/primary population model the development dataset (for data details see Table 5 and sections 2.1 and 2.2), which included 1256 serum

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Threshold Autoregressions and NonLinear Autoregressions

Threshold Autoregressions and NonLinear Autoregressions Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Accurate Maximum Likelihood Estimation for Parametric Population Analysis. Bob Leary UCSD/SDSC and LAPK, USC School of Medicine

Accurate Maximum Likelihood Estimation for Parametric Population Analysis. Bob Leary UCSD/SDSC and LAPK, USC School of Medicine Accurate Maximum Likelihood Estimation for Parametric Population Analysis Bob Leary UCSD/SDSC and LAPK, USC School of Medicine Why use parametric maximum likelihood estimators? Consistency: θˆ θ as N ML

More information

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17 Model selection I February 17 Remedial measures Suppose one of your diagnostic plots indicates a problem with the model s fit or assumptions; what options are available to you? Generally speaking, you

More information

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Data Uncertainty, MCML and Sampling Density

Data Uncertainty, MCML and Sampling Density Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models

Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models Curtis B. Storlie a, Laura P. Swiler b, Jon C. Helton b and Cedric J. Sallaberry

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Approaches to Modeling Menstrual Cycle Function

Approaches to Modeling Menstrual Cycle Function Approaches to Modeling Menstrual Cycle Function Paul S. Albert (albertp@mail.nih.gov) Biostatistics & Bioinformatics Branch Division of Epidemiology, Statistics, and Prevention Research NICHD SPER Student

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Obnoxious lateness humor

Obnoxious lateness humor Obnoxious lateness humor 1 Using Bayesian Model Averaging For Addressing Model Uncertainty in Environmental Risk Assessment Louise Ryan and Melissa Whitney Department of Biostatistics Harvard School of

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Reduced-rank hazard regression

Reduced-rank hazard regression Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information