Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Size: px

Start display at page:

Download "Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1"

Aubrey Dickerson
5 years ago
Views:

1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school Lipari, Italy These slides are based on Marc Lavielle s slides

2 Outline 1 Introduction 2 Some pharmacokinetics-pharmacodynamics examples 3 Estimation in classical regression models linear regression model non linear regression model examples 4 The mixed effects model linear mixed effects model non linear mixed effects model examples

3 Some examples of data Growth of children Height (cm) age (year)

4 Some examples of data Pharmacokinetics of Indomethacin Indomethicin concentration (mcg/ml) Time since drug administration (hr)

5 Some examples of data Pharmacokinetics of Theophylline Theophylline concentration in serum (mg/l) Time since drug administration (hr)

6 The classical regression model For one subject y j = f (x j, β) + ε j, 1 j n y j R is the jth observation of the subject, n is the number of observations The regression variables, or design variables, (x j ) are known, The vector of parameters (β) is unknown. The measurement error variables (ε j ) are unknown.

7 The classical regression model For one subject y j = f (x j, β) + ε j, 1 j n y j R is the jth observation of the subject, n is the number of observations The regression variables, or design variables, (x j ) are known, The vector of parameters (β) is unknown. The measurement error variables (ε j ) are unknown.

8 The classical regression model For one subject y j = f (x j, β) + ε j, 1 j n y j R is the jth observation of the subject, n is the number of observations The regression variables, or design variables, (x j ) are known, The vector of parameters (β) is unknown. The measurement error variables (ε j ) are unknown.

9 The classical regression model For one subject y j = f (x j, β) + ε j, 1 j n y j R is the jth observation of the subject, n is the number of observations The regression variables, or design variables, (x j ) are known, The vector of parameters (β) is unknown. The measurement error variables (ε j ) are unknown.

10 The classical regression model y j = f (x j, β) + ε j, 1 j n The (ε j ) are modelized as sequences of random variables. The goal of the modeler is to develop simultaneously two kinds of models: (1) The structural model f (2) The statistical model

11 The classical regression model y j = f (x j, β) + ε j, 1 j n (1) The structural model f : We are not interested with a purely descriptive model which nicely fits the data, but rather with a mechanistic model which has some biological meaning and which is a function of some physiological parameters. Examples: compartimental PK models, viral dynamic models,...

12 The classical regression model y j = f (x j, β) + ε j, 1 j n (2) The statistical model aims to explain the variability observed in the data: the residual error model: distribution of (ε j ) the model of covariates

13 The classical regression model y j = f (x j, β) + ε j, 1 j n Some statistical issues: Estimation: estimate the parameters of the model Model selection and model assessment: Select and assess the best structural model f, Select and assess the best statistical model Optimization of the design : Find the optimal design (x j )

14 Outline 1 Introduction 2 Some pharmacokinetics-pharmacodynamics examples 3 Estimation in classical regression models linear regression model non linear regression model examples 4 The mixed effects model linear mixed effects model non linear mixed effects model examples

15 Pharmacokinetics and Pharmacodynamics (PK/PD) dose PK concentration PD response Pharmacokinetics (PK): What the body does to the drug Pharmacodynamics (PD): What the drug does to the body

16 One compartment PK model intravenous administration

17 intravenous administration and first-order elimination dose D (t=0) DRUG AMOUNT Q(t) elimination (rate k e ) dq (t) = kq(t) ; Q(0) = D dt Q(t) = De kt C(t) = Q(t) V = D V e ket C(t) : concentration of the drug, V : volume of the compartment

18 intravenous administration and nonlinear elimination dose D (t=0) DRUG AMOUNT Q(t) nonlinear elimination dq(t) dt = V m Q(t) V K m + Q(t) C(t) = Q(t) V (V m, K m ) : Michaelis-Menten elimination parameters, V : volume of the compartment.

19 oral administration, first-order absorbtion and elimination dose D at time t=0 absorption (rate k a ) DRUG AMOUNT Q(t) elimination (rate k e ) dq a dt (t) = k aq a (t) ; Q a (0) = D dq dt (t) = k aq a (t) k e Q(t) ; Q(0) = 0 Q a (t): amount at absorption site. C(t) = Q(t) V = D k a (e ket e kat) V (k a k e )

20 Two compartments PK model intravenous administration

21 Two compartments PK model dq a dt (t) = k aq a (t), dq c dt (t) = k aq a (t) k e Q c (t) k 12 Q c (t) + k 21 Q p (t), dq p dt (t) = k 12Q c (t) k 21 Q p (t). Q a (t): amount at absorption site, Q a (0) = D. Q c (t): amount in the central compartment, Q c (0) = 0. Q p (t): amount in the peripheral compartment, Q p (0) = 0.

22 Outline 1 Introduction 2 Some pharmacokinetics-pharmacodynamics examples 3 Estimation in classical regression models linear regression model non linear regression model examples 4 The mixed effects model linear mixed effects model non linear mixed effects model examples

23 The regression model y j = f (x j, β) + ε j, 1 j n n is the number of observations. The regression variables, or design variables, (x j ) are known, The vector of parameters β is unknown. - linear model: f is a linear function of the parameters β - non linear model: f is a non linear function of the parameters β

24 The regression model y j = f (x j, β) + ε j, 1 j n n is the number of observations. The regression variables, or design variables, (x j ) are known, The vector of parameters β is unknown. - linear model: f is a linear function of the parameters β - non linear model: f is a non linear function of the parameters β

25 The regression model The statistical model y j = f (x j, β) + ε j The simplest statistical model assumes that the (ε j ) are independent and identically distributed (i.i.d) Gaussian random variables: ε j i.i.d. N (0, σ 2 ) Problem: estimate the parameters of the model θ = (β, σ 2 ).

26 Maximum Likelihood Estimation Maximum likelihood estimation (MLE) is a popular statistical method used for fitting a statistical model to data, and providing estimates for the model s parameters. For a fixed set of data and underlying probability model, maximum likelihood picks the values of the model parameters that make the data "more likely" than any other values of the parameters would make them

27 Maximum Likelihood Estimation Consider a family of continuous probability distributions parameterized by an unknown parameter θ, associated with a known probability density function p θ. Draw a vector y = (y 1, y 2,..., y n ) from this distribution, and then using p θ compute the probability density associated with the observed data, p θ (y) = p θ (y 1, y 2,..., y n ) As a function of θ with y 1, y 2,..., y n fixed, this is the likelihood function L(θ; y) = p θ (y)

28 Maximum Likelihood Estimation Let θ be the true value of θ. The method of maximum likelihood estimates θ by finding the value of θ that maximizes L(θ; y). This is the maximum likelihood estimator (MLE) of θ: ˆθ = Arg max L(θ; y) θ

29 Maximum Likelihood Estimation Some properties of the MLE Under certain (fairly weak) regularity conditions, the MLE is "asymptotically optimal": The MLE is asymptotically unbiased: E(ˆθ) θ n The MLE is a consistant estimate of θ (LLN): ˆθ θ n The MLE is asymptotically normal (CLT) n(ˆθ θ ) n N (0, I(θ ) 1 ) I(θ ) = E 2 θ log L(θ ; y)/n is the Fisher Information Matrix The MLE is asymptotically efficient, (Cramér-Rao) This means that no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE.

30 Maximum Likelihood Estimation Some properties of the MLE Under certain (fairly weak) regularity conditions, the MLE is "asymptotically optimal": The MLE is asymptotically unbiased: E(ˆθ) θ n The MLE is a consistant estimate of θ (LLN): ˆθ θ n The MLE is asymptotically normal (CLT) n(ˆθ θ ) n N (0, I(θ ) 1 ) I(θ ) = E 2 θ log L(θ ; y)/n is the Fisher Information Matrix The MLE is asymptotically efficient, (Cramér-Rao) This means that no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE.

31 Maximum Likelihood Estimation Some properties of the MLE Under certain (fairly weak) regularity conditions, the MLE is "asymptotically optimal": The MLE is asymptotically unbiased: E(ˆθ) θ n The MLE is a consistant estimate of θ (LLN): ˆθ θ n The MLE is asymptotically normal (CLT) n(ˆθ θ ) n N (0, I(θ ) 1 ) I(θ ) = E 2 θ log L(θ ; y)/n is the Fisher Information Matrix The MLE is asymptotically efficient, (Cramér-Rao) This means that no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE.

32 Maximum Likelihood Estimation Some properties of the MLE Under certain (fairly weak) regularity conditions, the MLE is "asymptotically optimal": The MLE is asymptotically unbiased: E(ˆθ) θ n The MLE is a consistant estimate of θ (LLN): ˆθ θ n The MLE is asymptotically normal (CLT) n(ˆθ θ ) n N (0, I(θ ) 1 ) I(θ ) = E 2 θ log L(θ ; y)/n is the Fisher Information Matrix The MLE is asymptotically efficient, (Cramér-Rao) This means that no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE.

33 The regression model Maximum likelihood estimation y j = f (x j, β) + ε j, 1 j n ε j i.i.d. N (0, σ 2 ) y j N (f (x j, β), σ 2 ) L(θ; y) = = n L(θ; y j ) = j=1 n j=1 n p θ (y j ) j=1 ( 2πσ 2 ) 1 2 e 1 2σ 2 (y j f (x j,β)) 2 = ( 2πσ 2) n 2 e 1 2σ 2 P n j=1 (y j f (x j,β)) 2

34 The regression model Maximum likelihood estimation ˆβ = Arg max β = Arg max β = Arg min β L(β; y) { (2πσ 2 ) n 2 e 1 P n 2σ 2 j=1 (y j f (x j,β)) 2} n (y j f (x j, β)) 2 j=1 (Maximum Likelihood estimate of β = Least-Square estimate of β)

35 The linear regression model y 1 = x 11 β 1 + x 12 β x 1p β p + ε 1 y 2 = x 21 β 1 + x 22 β x 2p β p + ε 2. y n = x n1 β 1 + x n2 β x np β p + ε n Y = X β + ε

36 The linear regression model Maximum Likelihood Estimation y = X β + ε ε j N (0, σ 2 ) θ = (β, σ 2 ) y N (X β, σ 2 I n ) L(θ; y) = ( 2πσ 2) n 2 e 1 2σ2 y X β 2 ˆβ = Arg max β L(β; y) = Arg min y X β 2 β

37 The linear regression model Maximum Likelihood Estimation y = X β + ε ˆβ = Arg min β y X β 2 E( ˆβ) = β = (X X ) 1 X y = (X X ) 1 X (X β + ε) = β + (X X ) 1 X ε Var( ˆβ) = σ 2 (X X ) 1 log L(θ ; y) = n 2 log(2πσ2 ) + 1 2σ 2 y X β 2 I(β) = 1 n E 2 β log L(β, σ2 ; y) = 1 nσ 2 (X X )

38 The linear regression model Maximum Likelihood Estimation y = X β + ε ˆβ = Arg min β y X β 2 E( ˆβ) = β = (X X ) 1 X y = (X X ) 1 X (X β + ε) = β + (X X ) 1 X ε Var( ˆβ) = σ 2 (X X ) 1 log L(θ ; y) = n 2 log(2πσ2 ) + 1 2σ 2 y X β 2 I(β) = 1 n E 2 β log L(β, σ2 ; y) = 1 nσ 2 (X X )

39 The linear regression model Maximum Likelihood Estimation y = X β + ε Let V = Var( ˆβ) = σ 2 (X X ) 1 be the variance-covariance matrix of ˆβ. The diagonal elements of V are the variances of the components of ˆβ: V k,k is the variance of ˆβ k Vk,k is the standard error (s.e.) of ˆβ k 90% confidence interval for β k : [ ˆβ k V k,k ; ˆβ k V k,k ]

40 The non linear regression model y 1 = f (x 11 β 1, x 12 β 2,..., x 1p β p ) + ε 1 y 2 = f (x 21 β 1, x 22 β 2,..., x 2p β p ) + ε 2. y n = f (x n1 β 1, x n2 β 2,..., x np β p ) + ε n Y = f (X, β) + ε

41 The non linear regression model Maximum Likelihood Estimation y = f (X, β) + ε ε j N (0, σ 2 ) y N (f (X, β), σ 2 I n ) L(θ; y) = ( 2πσ 2) n 2 e 1 2σ2 y f (X, β) 2 ˆβ = Arg max β L(β; y) = Arg min y f (X, β) 2 β No explicit expression for ˆβ, use of optimization algorithm (Newton-Raphson)

42 Model selection 1 Compute the likelihood of the different models Let ˆθ M be the maximum likelihood estimate of θ for model M: ˆθ M = Arg max L M (θ; y) θ Let L M = L M (ˆθ M ; y) be the likelihood of model M. Selecting the most likely models by comparing the likelihoods favor models of high dimension (with many parameters)! 2 Penalize the models of high dimension Select the model ˆM that minimizes the penalized criteria 2L M + pen(m) Bayesian Information Criteria (BIC) : pen(m) = log(n) dim(m). Akaike Information Criteria (AIC) : pen(m) = 2dim (M).

43 Linear regression model Growth of children height age y j = β 1 + t j β 2 + ε j ˆβ 1 = 97.97, ˆβ 2 = 3.59

44 Non linear regression model Pharmacokinetic of Indomethacin Subject 1 Subject 2 concentration concentration time time y j = β 1 e β 2t j + β 3 e β 4t j + ε j Subject ˆβ1 ˆβ2 ˆβ3 ˆβ

45 Outline 1 Introduction 2 Some pharmacokinetics-pharmacodynamics examples 3 Estimation in classical regression models linear regression model non linear regression model examples 4 The mixed effects model linear mixed effects model non linear mixed effects model examples

46 Some examples of data Pharmacokinetics of Theophyllinee concentration Each individual curve is described by the same parametric model, with its own individual parameters time

47 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

48 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

49 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

50 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

51 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

52 Analysis of data from several subjects Classical regression model for one subject y j = f (x j, ψ) + ε j, 1 j n Classical regression model for several subjects y ij = f (x ij, ψ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. x ij are the regression variables, or design variables

53 The mixed effects model [Pinheiro and Bates, 2002] y ij = f (x ij, ψ i ) + ε ij, 1 i N, 1 j n i y ij R is the jth observation of subject i, N is the number of subjects n i is the number of observations of subject i. The regression variables, or design variables, (x ij ) are known, The individual parameters (ψ i ) are unknown.

54 The mixed effects model y ij = f (x ij, ψ i ) + ε ij, 1 i N, 1 j n i The (ψ i ) and the (ε ij ) are modelized as sequences of random variables. The goal of the modeler is to develop simultaneously two kinds of models: (1) The structural model f (2) The statistical model

55 The mixed effects model y ij = f (x ij, ψ i ) + ε ij, 1 i N, 1 j n i (2) The statistical model aims to explain the variability observed in the data: the residual error model: distribution of (ε ij ) the model of the individual parameters: distribution of (ψ i ) C i is a vector of covariates β is a vector of fixed effects η i is a vector of random effects ψ i = h(c i, β, η i )

56 The individual parameters examples: ψ i = h(c i, β, η i ) additive random effects (C i = 1) ψ i = β + η i effect of a covariate (β = [β 1, β 2 ], C i = [1, sex i ]) ψ i = β 1 + β 2 sex i + η i multiplicative random effects ψ i = β e η i "partial" vector of random effects ψ i = (ψ i1, ψ i2 ) = (β 1 + η i1, β 2 )

57 The individual parameters examples: ψ i = h(c i, β, η i ) additive random effects (C i = 1) ψ i = β + η i effect of a covariate (β = [β 1, β 2 ], C i = [1, sex i ]) ψ i = β 1 + β 2 sex i + η i multiplicative random effects ψ i = β e η i "partial" vector of random effects ψ i = (ψ i1, ψ i2 ) = (β 1 + η i1, β 2 )

58 The individual parameters examples: ψ i = h(c i, β, η i ) additive random effects (C i = 1) ψ i = β + η i effect of a covariate (β = [β 1, β 2 ], C i = [1, sex i ]) ψ i = β 1 + β 2 sex i + η i multiplicative random effects ψ i = β e η i "partial" vector of random effects ψ i = (ψ i1, ψ i2 ) = (β 1 + η i1, β 2 )

59 The individual parameters examples: ψ i = h(c i, β, η i ) additive random effects (C i = 1) ψ i = β + η i effect of a covariate (β = [β 1, β 2 ], C i = [1, sex i ]) ψ i = β 1 + β 2 sex i + η i multiplicative random effects ψ i = β e η i "partial" vector of random effects ψ i = (ψ i1, ψ i2 ) = (β 1 + η i1, β 2 )

60 The individual parameters examples: ψ i = h(c i, β, η i ) additive random effects (C i = 1) ψ i = β + η i effect of a covariate (β = [β 1, β 2 ], C i = [1, sex i ]) ψ i = β 1 + β 2 sex i + η i multiplicative random effects ψ i = β e η i "partial" vector of random effects ψ i = (ψ i1, ψ i2 ) = (β 1 + η i1, β 2 )

61 The mixed effects model y ij = f (x ij, ψ i ) + ε ij, 1 i N, 1 j n i ψ i = h(c i, β, η i ) C i is a known vector of covariates β is a unknown p-vector of fixed effects η i is a unknown q-vector of random effects ε ij N (0, σ 2 ) ψ i N (0, Ω) Ω is the q q variance-covariance matrix of the random effects (Hyper or population) parameters: θ = (β, Ω, σ 2 )

62 The mixed effects model Example Classical regression model y j = β 1 + β 2 t j + ε j Mixed effects model y ij = ψ i1 + ψ i2 t ij + ε ij = (β 1 + η i1 ) + (β 2 + η i2 )t ij + ε ij where η i = (η i1, η i2 ) N (0, Ω) means ( ηi1 η i2 ) N (( 0 0 ) ( ω 2, 1 ω 12 ω 12 ω2 2 ))

63 The mixed effects model Some statistical issues: Estimation: estimate the population parameters of the model θ estimate the individual parameters compute confidence intervals Model selection and model assessment: Determine if a parameter varies in the population Select the best combination of covariates Compare several treatments Optimization of the design : Determine the design (the measurement times) that yields the most accurate estimation of the model

64 The mixed effects model Estimation of the population parameters The maximum likelihood estimator of θ = (β, Ω, σ 2 ) maximizes L(θ; y) = L i (θ; y i ) = = N L i (θ; y i ) i=1 p(y i, η i ; θ)dη i p(y i η i ; θ)p(η i ; θ)dη i We know that y i η i N ( f (x i, h(c i, β, η i )), σ 2 I ni ) η i N (0, Ω)

65 The mixed effects model Estimation of the population parameters The maximum likelihood estimator of θ = (β, Ω, σ 2 ) maximizes L(θ; y) = L i (θ; y i ) = = N L i (θ; y i ) i=1 p(y i, η i ; θ)dη i p(y i η i ; θ)p(η i ; θ)dη i We know that y i η i N ( f (x i, h(c i, β, η i )), σ 2 I ni ) η i N (0, Ω)

66 The mixed effects model Estimation of the population parameters Thus L i (θ; y i ) = (2πσ 2 ) n i 2 e 1 2σ 2 y i f (x i,h(c i,β,η i )) 2 (2π Ω ) 1 2 e 1 2 η i Ω 1 η i dη i Example: ψ i = β + η i L i (θ; y i ) = C σ n i Ω 1 2 e 1 2σ 2 y i f (x i,ψ i ) (ψ i β) Ω 1 (ψ i β) dψ i

67 The mixed effects model Estimation of the population parameters Thus L i (θ; y i ) = (2πσ 2 ) n i 2 e 1 2σ 2 y i f (x i,h(c i,β,η i )) 2 (2π Ω ) 1 2 e 1 2 η i Ω 1 η i dη i Example: ψ i = β + η i L i (θ; y i ) = C σ n i Ω 1 2 e 1 2σ 2 y i f (x i,ψ i ) (ψ i β) Ω 1 (ψ i β) dψ i

68 The mixed effects model Estimation of the individual parameters Assume that θ = (β, Ω, σ 2 ) is known (or previously estimated) ˆψ i maximizes the conditional distribution p(ψ i y i ; θ) p(ψ i y i ; θ) = p(ψ i, y i ; θ) p(y i : θ) = p(y i ψ i ; θ)p(ψ i ; θ) p(y i : θ) p(y i ψ i ; θ)p(ψ i ; θ) Example: ψ i = β + η i p(ψ i y i ; θ) = Ce 1 2 y i f (x i,ψ i ) (ψ i β) Ω 1 (ψ i β) ˆψ i minimizes a penalized least-square criteria: ˆψ i = arg min ψ ( y i f (x i, ψ i ) (ψ i β) Ω 1 (ψ i β) )

69 The mixed effects model Estimation of the individual parameters Assume that θ = (β, Ω, σ 2 ) is known (or previously estimated) ˆψ i maximizes the conditional distribution p(ψ i y i ; θ) p(ψ i y i ; θ) = p(ψ i, y i ; θ) p(y i : θ) = p(y i ψ i ; θ)p(ψ i ; θ) p(y i : θ) p(y i ψ i ; θ)p(ψ i ; θ) Example: ψ i = β + η i p(ψ i y i ; θ) = Ce 1 2 y i f (x i,ψ i ) (ψ i β) Ω 1 (ψ i β) ˆψ i minimizes a penalized least-square criteria: ˆψ i = arg min ψ ( y i f (x i, ψ i ) (ψ i β) Ω 1 (ψ i β) )

70 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

71 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

72 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

73 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

74 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

75 The linear mixed effects model Maximum likelihood estimate y i = X i ψ i + ε i ε i N (0, σ 2 I ni ) ψ i N (β, Ω) We have y i = X i β + X i η i + ε i thus by linearity y i N (X i β, X i ΩX i + σ 2 I ni ) Set V i = X i ΩX i /σ2 + I ni, thus the likelihood is explicit N ( L(θ; y) = (2π σ 2 V i ) 1 2 exp 1 ) 2σ 2 (y i X i β) V 1 i (y i X i β) i=1 Computation of the MLE ˆθ via an optimization routine (Newton-Raphson iterations or EM algorithm)

76 The linear mixed effects model Profiled likelihood Optimization much simpler using concentrated or profiled likelihood, ie likelihood as a function of Ω From one can deduce y i N (X i β, σ 2 V i ) ˆβ(Ω) = ˆσ 2 (Ω) = ( N i=1 X i V i X i ) 1 N X i V 1 i y i i=1 ( ) N i=1 y i X i ˆβ(Ω) V 1 i N i=1 n i ( ) y i X i ˆβ(Ω)

77 The linear mixed effects model Profiled likelihood Optimization much simpler using concentrated or profiled likelihood, ie likelihood as a function of Ω From one can deduce y i N (X i β, σ 2 V i ) ˆβ(Ω) = ˆσ 2 (Ω) = ( N i=1 X i V i X i ) 1 N X i V 1 i y i i=1 ( ) N i=1 y i X i ˆβ(Ω) V 1 i N i=1 n i ( ) y i X i ˆβ(Ω)

78 The linear mixed effects model Profiled likelihood Using these expressions, derive the profiled log-likelihood L(Ω; y) as a function of Ω: L(Ω; y) = L( ˆβ(Ω), Ω, ˆσ 2 (Ω); y) Estimator of Ω is obtained by maximizing L(Ω; y) Plug in estimators of β and σ 2 ˆΩ = arg max L(Ω; y) Ω ˆβ = ˆβ( ˆΩ) ˆσ 2 = ˆσ 2 (ˆΩ)

79 The linear mixed effects model Profiled likelihood Using these expressions, derive the profiled log-likelihood L(Ω; y) as a function of Ω: L(Ω; y) = L( ˆβ(Ω), Ω, ˆσ 2 (Ω); y) Estimator of Ω is obtained by maximizing L(Ω; y) Plug in estimators of β and σ 2 ˆΩ = arg max L(Ω; y) Ω ˆβ = ˆβ( ˆΩ) ˆσ 2 = ˆσ 2 (ˆΩ)

80 The linear mixed effects model Restricted likelihood estimation MLE ˆΩ and ˆσ 2 underestimate the parametersω and σ 2 [Patterson and Thompson, 1971] proposes the Restricted maximum likelihood (REML) estimates by maximizing L R (Ω, σ 2 ; y) = L(β, Ω, σ 2 ; y)dβ Equivalent in a Bayesian framework to assume a uniform prior distribution for the fixed effects β

81 The linear mixed effects model Estimation of the individual parameters L(θ; y) = = = N i=1 N i=1 N i=1 p(y i, η i ; θ)dη i p(y i η i ; θ)p(η i ; θ)dη i 1 (2π σ 2 ) n i 2 1 e 1 (2π Ω ) 1 2 η i Ω 1 η i dη i 2 e 1 2σ 2 (y i X i β X i η i ) (y i X i β X i η i )

82 The linear mixed effects model Estimation of the individual parameters Introduce = Ω 1 σ 2, ỹ i = [ yi 0 ] [ Xi, X i = 0 ] [ Xi, Z i = ] L(θ; y) = = N i=1 1 (2π σ 2 ) n i 2 e 1 2σ 2 (y i X i β X i η i ) (y i X i β X i η i ) 1 e 1 (2π Ω ) 1 2 η i Ω 1 η i dη i 2 N C e 1 2σ 2 (ỹ i X i β Z i η i ) (ỹ i X i β Z i η i ) dη i i=1 then by linearity of the model ˆη i = ( Z i Z i ) 1 Z i (ỹ i X i ˆβ)

83 The linear mixed effects model Estimation of the individual parameters Introduce = Ω 1 σ 2, ỹ i = [ yi 0 ] [ Xi, X i = 0 ] [ Xi, Z i = ] L(θ; y) = = N i=1 1 (2π σ 2 ) n i 2 e 1 2σ 2 (y i X i β X i η i ) (y i X i β X i η i ) 1 e 1 (2π Ω ) 1 2 η i Ω 1 η i dη i 2 N C e 1 2σ 2 (ỹ i X i β Z i η i ) (ỹ i X i β Z i η i ) dη i i=1 then by linearity of the model ˆη i = ( Z i Z i ) 1 Z i (ỹ i X i ˆβ)

84 The non-linear mixed effects model y i = f (X i, ψ i ) + ε i = f (X i, β + η i ) + ε i L(θ; y i ) = (2π σ 2 ) n i 2 (2π Ω ) 1 2 e 1 2σ 2 (y i f (X i,β+η i )) (y i f (X i,β+η i )) 1 2 η i Ω 1 η i dη i The likelihood has no explicit form because of the non linearity of the regression function f with respect to η i Existing methods are based on approximations or numerical computations of the likelihood

85 The non-linear mixed effects model Linearization methods Principle: linearization of f to come down to a linear mixed effects model First order methods (FO) [Beal and Sheiner, 1982] linearization of f around β NONMEM software First order conditional methods (FOCE) [Lindstrom and Bates, 1990] linearization of f around ψ i NONMEM, SAS, proc NLMIXED, Splus/R function nlme

86 The non-linear mixed effects model Linearization methods Principle: linearization of f to come down to a linear mixed effects model First order methods (FO) [Beal and Sheiner, 1982] linearization of f around β NONMEM software First order conditional methods (FOCE) [Lindstrom and Bates, 1990] linearization of f around ψ i NONMEM, SAS, proc NLMIXED, Splus/R function nlme

87 The non-linear mixed effects model Linearization methods Principle: linearization of f to come down to a linear mixed effects model First order methods (FO) [Beal and Sheiner, 1982] linearization of f around β NONMEM software First order conditional methods (FOCE) [Lindstrom and Bates, 1990] linearization of f around ψ i NONMEM, SAS, proc NLMIXED, Splus/R function nlme

88 The non-linear mixed effects model First order method Linearization of f around β f (x ij, ψ i ) = f (x ij, β + η i ) = f (x ij, β) + f ψ (x ij, β) η i + o(ηi 2 ) An (approximated) model is deduced y ij = f (x ij, β) + f ψ (x ij, β) η i + ε ij A linear mixed effects model is defined by plugging in a previously estimated value of β y ij = f (x ij, ˆβ) + f ψ (x ij, ˆβ) η i + ε ij

89 The non-linear mixed effects model First order method Linearization of f around β f (x ij, ψ i ) = f (x ij, β + η i ) = f (x ij, β) + f ψ (x ij, β) η i + o(ηi 2 ) An (approximated) model is deduced y ij = f (x ij, β) + f ψ (x ij, β) η i + ε ij A linear mixed effects model is defined by plugging in a previously estimated value of β y ij = f (x ij, ˆβ) + f ψ (x ij, ˆβ) η i + ε ij

90 The non-linear mixed effects model First order method Linearization of f around β f (x ij, ψ i ) = f (x ij, β + η i ) = f (x ij, β) + f ψ (x ij, β) η i + o(ηi 2 ) An (approximated) model is deduced y ij = f (x ij, β) + f ψ (x ij, β) η i + ε ij A linear mixed effects model is defined by plugging in a previously estimated value of β y ij = f (x ij, ˆβ) + f ψ (x ij, ˆβ) η i + ε ij

91 The non-linear mixed effects model First order method Iterative algorithm 1 Penalized nonlinear least squares (PNLS) step: with current estimate ˆΩ and ˆσ 2, conditional modes of β and η i obtained by minimizing N (y i f (X i, β + η i ) (y i f (X i, β + η i )) + ˆσ 2 η i ˆΩ 1 η i i=1 2 Linear mixed effects (LME) step: first order Taylor expansion of f around ˆβ y i f (X i, ˆβ + ˆη i ) + f ψ i (X i, ˆβ) η i + ε i MLE estimates of Ω and σ 2

92 The non-linear mixed effects model First order conditional estimate method Iterative algorithm 1 Penalized nonlinear least squares (PNLS) step: with current estimate ˆΩ and ˆσ 2, conditional modes of β and η i obtained by minimizing N (y i f (X i, β + η i ) (y i f (X i, β + η i )) + ˆσ 2 η i ˆΩ 1 η i i=1 2 Linear mixed effects (LME) step: first order Taylor expansion of f around ˆψ i = ˆβ + ˆη i y i f (X i, ˆψ i ) + f ψ i (X i, ˆψ i ) (ψ i ˆψ i ) + ε i MLE estimates of Ω and σ 2

93 The non-linear mixed effects model First order conditional estimate method Drawbacks Theoretical drawbacks: no well-known statistical properties of the algorithm Practical drawbacks: very sensitive to the initial guess, does not always converge, poor estimation of some parameters

94 The non-linear mixed effects model Other classical methods Methods based on numerical approximations of the likelihood Laplace method [Wolfinger, 1993] Gaussian quadrature method [Davidian and Gallant, 1993] (SAS proc NLMIXED) Properties Theoretical: maximum likelihood estimate is performed Practical: limited to few random effects

95 Model selection Several steps: Choice of the regression function f Choice of the covariate model C i Choice of the random effects model Ω: diagonal matrix, block-diagonal matrix, plain matrix, etc

96 Model selection Several steps: Choice of the regression function f Choice of the covariate model C i Choice of the random effects model Ω: diagonal matrix, block-diagonal matrix, plain matrix, etc

97 Model selection Several steps: Choice of the regression function f Choice of the covariate model C i Choice of the random effects model Ω: diagonal matrix, block-diagonal matrix, plain matrix, etc

98 Model selection Several steps: Choice of the regression function f Choice of the covariate model C i Choice of the random effects model Ω: diagonal matrix, block-diagonal matrix, plain matrix, etc

99 Model selection 1 Compute the likelihood of the different models Let ˆθ M be the maximum likelihood estimate of θ for model M: ˆθ M = Arg max L M (θ; y) θ Let L M = L M (ˆθ M ; y) be the likelihood of model M. Selecting the most likely models by comparing the likelihoods favor models of high dimension (with many parameters)! 2 Penalize the models of high dimension Select the model ˆM that minimizes the penalized criteria 2L M + pen(m) Bayesian Information Criteria (BIC) : pen(m) = log(n) dim(m). Akaike Information Criteria (AIC) : pen(m) = 2dim (M).

100 Model checking Computation of Plots Population predictions: f (x ij, ˆβ) Individual predictions: f (x ij, ˆψ i ) Population residuals: y ij f (x ij, ˆβ) Individual residuals: y ij f (x ij, ˆψ i ) Population/Individual predictions vs observations Population/Individual residuals vs population/individual predictions Normality of the residuals

101 Pharmacokinetic of Indomethacin Indomethicin concentration (mcg/ml) Time since drug administration (hr) y ij = ψ i1 e ψ i2t ij + ψ i3 e ψ i4t ij + ε ij = (β 1 + η i1 )e (β 2+η i2 )t ij + (β 3 + η i3 )e (β 4+η i4 )t ij + ε ij = Choice of the covariance matrix Ω

102 Pharmacokinetic of Indomethacin Diagonal covariance matrix Ω AIC = fixed Subject Indomethicin concentration (mcg/ml) Time since drug administration (hr) ˆβ 1 = 2.83, ˆβ 2 = 2.10, ˆβ 3 = 0.41, ˆβ 4 = 0.24 ˆω β1 = 0.56, ˆω β2 = 0.34, ˆω β3 = 0.10, ˆω β4 = 10 6

103 Pharmacokinetic of Indomethacin Diagonal covariance matrix Ω Population predictions Individual predictions observations observations Population residuals Individual residuals Population predictions Individual predictions

104 Pharmacokinetic of Indomethacin Plain covariance matrix Ω AIC = fixed Subject Indomethicin concentration (mcg/ml) Time since drug administration (hr) ˆβ 1 = 2.84, ˆβ 2 = 2.46, ˆβ 3 = 0.63, ˆβ 4 = 0.28 ˆω β1 = 0.74, ˆω β2 = 0.61, ˆω β3 = 0.40, ˆω β4 = 0.17

105 Pharmacokinetic of Indomethacin Plain covariance matrix Ω Population predictions Individual predictions observations observations Population residuals Individual residuals Population predictions Individual predictions

106 Pharmacokinetic of Theophylline Theophylline concentration in serum (mg/l) Time since drug administration (hr) y ij = Dose k a i V i (k ai k ei ) (e ke i t ij e ka i t ij ) + ε ij

107 Non linear mixed model Pharmacokinetic of Theophylline fixed Subject Theophylline concentration in serum (mg/l) Time since drug administration (hr) ˆk e = 0.08, ˆka = 1.53, ˆV = 0.48 ˆω ke = 0.02, ˆω V = 0.08

108 Non linear mixed model Pharmacokinetic of Theophylline Population predictions Individual predictions observations observations Population residuals Individual residuals Population predictions Individual predictions

109 Pharmacokinetic of Theophylline y ij = Log parametrisation Dose k a i V i (k ai k ei ) (e ke i t ij e ka i t ij ) + ε ij y ij = Dose e lka i lke e lv i (e lka i e lke i ) (e e i tij e elka i t ij ) + ε ij = Final model Random effects on lk e, lk a and V Covariate effect (weight) on lka

110 Non linear mixed model Pharmacokinetic of Theophylline fixed Subject Theophylline concentration in serum (mg/l) Time since drug administration (hr)

111 Non linear mixed model Pharmacokinetic of Theophylline Population predictions Individual predictions observations observations Population residuals Individual residuals Population predictions Individual predictions

Estimation, Model Selection and Optimal Design in Mixed Eects Models Applications to pharmacometrics. Marc Lavielle 1. Cemracs CIRM logo

Estimation, Model Selection and Optimal Design in Mixed Eects Models Applications to pharmacometrics Marc Lavielle 1 1 INRIA Saclay Cemracs 2009 - CIRM Outline 1 Introduction 2 Some pharmacokinetics-pharmacodynamics