Approximation of the Fisher information and design in nonlinear mixed effects models

Size: px

Start display at page:

Download "Approximation of the Fisher information and design in nonlinear mixed effects models"

Jared Roberts
5 years ago
Views:

1 of the Fisher information and design in nonlinear mixed effects models Tobias Mielke Optimum for Mixed Effects Non-Linear and Generalized Linear PODE

2 Outline 1 2 3

3 Mixed Effects Similar functions for different individuals Every individual has its own individual parameters Vectors of individual parameters are realizations of random vectors Mixed Effects

4 Mixed Effects Two-stage-model: 1. stage (intra-individual variation): Y ij = η(β i, x ij ) + ɛ ij, j = 1,..., m i, ɛ ij N(0, σ 2 ) 2. stage (inter-individual variation): β i = β + b i, i = 1,..., N, b i N p (0, σ 2 D) b i and ɛ ij are assumed to be independent. Variance parameters σ 2 and D assumed to be known. Aim: Estimation of β.

5 Individual observation vector: Y i = η(β i, ξ i ) + ɛ i, - m i - number of observations, - ξ i = (x i1,..., x imi ) - experimental settings, - η(β i, ξ i ) := (η(β i, x i1 ),..., η(β i, x imi )) T. matrix: F β := ( η(β i, ξ i ) βi T βi =β).

6 Optimal designs for estimating β cov( ˆβ)? For Y i with density f Yi (y i ) and l(β; y i ) := log(f Yi (y i )) : Maximization of the Fisher information M β = E( l(β; Y i) l(β; Y i ) β β T ). Often applied: Linearization.

7 For η nonlinear in β i : Linearization of the model (1): Y i = η(β i, ξ i ) + ɛ i η(β 0, ξ i ) + F β0 (β β 0 ) + F β0 (β i β) + ɛ i, with a guess β 0 of β (or β i ). Distribution assumptions yield: Y i N(η(β 0, ξ i ) + F β0 (β β 0 ), σ 2 V β0 ) with V β0 := I mi + F β0 DF T β 0 Linear mixed effects model.

8 For η nonlinear in β i : Linearization of the model (2): Y i = η(β i, ξ i ) + ɛ i η(β, ξ i ) + F β (β i β) + ɛ i with the true β. Distribution assumptions yield: Y i N(η(β, ξ i ), σ 2 V β ) with V β := I mi + F β DF T β Heteroscedastic nonlinear normal model.

9 Calculation of the Fisher information: Assumption: new model is true. Linear Mixed [Retout et. al (2001)]: M 1,β (ξ) = 1 σ 2 F β T 0 V 1 β 0 F β0 Nonlinear heteroscedastic [Retout et. al (2003)]: where S 0, with M 2,β (ξ) = 1 σ 2 F T β V 1 β F β S, S j,k = Tr [V 1 V β β V 1 V β β β ], j, k = 1,..., p. j β k

10 Example: Observations as Y i = exp(β i ) + ɛ i, β i N(β, d), ɛ i N(0, 1) yield M 1,β = exp(2β) 1 + d exp(2β) and M 2,β = exp(2β) 1 + d exp(2β) + 2d 2 exp(4β) (1 + d exp(2β)) 2.

11 Experiment settings ξ i for individuals: x ij X ξ i = ( x i1... x imi ) X m i. Alternative: Approximate individual designs - not here! Population design ζ: ( ) ξ1... ξ ζ = k, ω 1... ω k k ω j = 1, j=1 ω j : weight of indivdual design ξ j in the population. Population design ˆ= approximate design on X m

12 Individual information: M β (ξ i ) Population information: M β;pop (ζ) = k ω i M β (ξ i ) i=1 Population design ζ D-optimal if and only if: Efficiency: Tr M β;pop (ζ ) 1 M β (ξ) p, ξ X m δ(ζ) := ( ) 1 det(mβ;pop (ζ)) p. det(m β;pop (ζ ))

13 Example [Schmelter 2007]: Y i = η(β i, x i ) exp(ɛ i ), with ɛ i N(0, σ 2 ) η(β i, x i ) := β 1,i β 3,i β 1,i β 2,i [exp( β 2,i β 3,i x i ) exp( β 1,i x i )], β i,k = β k exp(b ik ) with b ik N(0, σ 2 d k ), k = 1, 2, 3,

14 Example: matters M 1,β -: ζ 1 = M 2,β -: ζ 2 = ( ) (0.10) (4.18) (24.00), δ(ζ ) = 0.66 ( ) (3.10) (5.18) (24.00), δ(ζ ) = 0.55

15 Remember: Y i = η(β i, ξ i ) + ɛ i, β i N(β, σ 2 D), ɛ i N(0, σ 2 I mi ) log-likelihood function: l(β; y i ) := log(f Yi (y i )), with f Yi (y i ) := 1 exp[ 1 R p c 1 2σ 2 l(β i, β; y i )]dβ i and l(β i, β; y i ) := (y i η(β i, ξ i )) T (y i η(β i, ξ i )) +(β i β) T D 1 (β i β). We search: M β = E( l(β; Y i) l(β; Y i ) β β T ).

16 Alternative representation: With E yi (β i ) := E(β i Y i = y i ): l(β; y i ) β = 1 σ 2 D 1 (E yi (β i ) β). For Var yi (β i ) = Var(β i Y i = y i ) follows: M β = 1 σ 2 D 1 Var(E Yi (β i ))D 1 1 σ 2 = 1 σ 2 D 1 1 σ 2 D 1 E(Var Yi (β i ))D 1 1 σ 2 of conditional moments.

17 Remark More general inter-individual model: Differentiable function g : R p 1 R p : β i = g(β) + b i, with β R p 1, b i N p (0, σ 2 D) With the (p p 1 )-Jacobi-matrix G(β) follows: M β = G(β) T 1 σ 2 D 1 Var(E Yi (β i ))D 1 1 σ 2 G(β) = G(β) T ( 1 σ 2 D 1 1 σ 2 D 1 E(Var Yi (β i ))D 1 1 σ 2 )G(β)T

18 of conditional moments: Quadrature rules Simulations dependence on experimental settings? computational intensive. Laplace approximation accuracy?

19 Laplace approximation: Problem in nonlinear mixed models: E(E Yi (β i )E Yi (β i ) T ) of E yi (β i ) given y i : E yi (β i ) = βi exp( l(β i,β;y i ) )dβ 2σ 2 i, with exp( l(β i,β;y i ) )dβ 2σ 2 i l(β i, β; y i ) := (y i η(β i, ξ i )) T (y i η(β i, ξ i )) +(β i β) T D 1 (β i β). Laplace approximation of both integrals. [Tierney and Kadane (1986)]

20 Laplace approximation: Idea: of f Yi (y i ) = where l(β i, β; y i ) is minimial. 1 i, β; y i ) exp( l(β c 1 2σ 2 )dβ i Second order Taylor expansion around β i with β i := argmin β i R p { l(β i, β; y i )} Linear part vanishes normal density. f Yi (y i ) 1 exp( l(β i, β; y i) c 2 (y i, β) 2σ 2 ). Dependence of β i on y i.

21 For approximating f βi Y i =y i = φ Y i β i (y i )φ βi (β i ) f Yi (y i ) Apply similar approximation to the numerator β i Yi =y i app. N(βi, σ2 M 1 βi ), with M β i := l(β i, β; y i ) β i βi T βi =βi. Problems: Dependence of β i Dependence of M 1 β i on y i on y i

22 Alternative: Just approximation of η(β i, ξ i ): η(β i, ξ i ) η( ˆβ i, ξ i ) + F ˆβi (β i ˆβ i ). in l(β i, β; y i ) with some ˆβ i R p. Same approximation to numerator yields β i Yi =y i app. N(µ(y i, ˆβ i, β), σ 2 M 1 ), with ˆβ i M ˆβ i := F Ṱ β i F ˆβ i + D 1 µ(y i, ˆβ i, β) := M 1 (F Ṱ (y ˆβ i β i η( ˆβ i, ξ i )) + F i ˆβi ˆβ i ) + D 1 β). Appropriate choice of ˆβ i?

23 For ˆβ i = β i minimizing l(β i, β; y i ): β i Yi =y i app. N(βi, σ2 M 1 ). Problem: Dependence on y i. For ˆβ i = β: β i β i Yi =y i app. N(β + M 1 β F β T (y i η(β, ξ i )), σ 2 M 1 β ). For ˆβ i = β i : β i Yi =y i app. N(µ(y i, β i, β), σ 2 M 1 β i ).

24 Fisher information approximations in ˆβ i = β: With V β := I mi + F β DFβ T : Conditional variance: M β = 1 σ 2 D 1 1 σ 2 D 1 E(Var Yi (β i ))D 1 1 σ 2 1 σ 2 F T β V 1 β F β =: M 1,β Conditional expectation: M β = 1 σ 2 D 1 Var(E Yi (β i ))D 1 1 σ 2 1 σ 4 F β T V 1 β Var(Y i)v 1 β F β =: M 3,β.

25 Fisher information approximation in ˆβ i = β i : With V βi := I mi + F βi DF T β i : Conditional variance: M 4,βi := 1 σ 2 F T β i V 1 β i F βi. consider here M 4,β := 1 σ 2 E(F T β i V 1 β i F βi ). Some other possible approximations: Nonlinear heteroscedastic normal model: Quasi-: M 2,β := 1 σ 2 F T β V 1 β F β S. M 5,β := F T β,ql Var(Y i) 1 F β,ql

26 Summary: With V βi := I mi + F βi DF T β i : M 1,β := 1 σ 2 F β T V 1 β F β M 2,β := 1 σ 2 F β T V 1 β F β S M 3,β := 1 σ 4 F T β V 1 β Var(Y i)v 1 M 4,β := 1 σ 2 E(F T β i V 1 β i F βi ) β F β M 5,β := F T β,ql Var(Y i) 1 F β,ql

27 : Log-Concentration: Y i = β 2i x i exp(β 1i β 2i ) + ɛ i, i = 1,..., N - ɛ i N(0, σ 2 ), ξ i = (x i ) [t min, t max ], - β i = (β 1i, β 2i ) T N(β, σ 2 D), - β = (β 1, β 2 ) T, D = diag(d 1, d 2 ).

28 - M 1,β : Red - M 2,β : Blue - M 3,β : Yellow - M 5,β : Green

29 matrices M 1,β (β) or M 5,β (β): For every pair of experimental settings x 1, x 2 [t min, t max ] with α(x 1, x 2 ) := σ 2 + cov(y (x 1 ), Y (x 2 )) = 0, the population design ζ 1 = ( (x1 ) ) (x 2 ) is D-optimal in the case of one allowed observation per individual.

30 How to see this: Equivalence theorem for D-optimality: ζ optimal g ζ (ξ) := Tr M β;pop (ζ ) 1 M β (ξ) p, ξ X m ( ) a b With general nonsingular M(ζ) = and ξ = x: b c ( ) c b g ζ (x) := F β (x) F b a β (ξ) T 2(ac b 2 )V β (x) 0 2 support points, weight: ω = 0.5. For ζ = ( (x1 ) ) (x 2 ) g ζ (x) = α(x 1,x 2 ) V (x 1 )V (x 2 ) (x x 2)(x x 1 ).

31 M 1,β -: ζ 1 = M 2,β -: ζ 2 = M 3,β -: ζ 3 = M 5,β -: ζ 5 = ( ) (1.13) (11.98), δ (ζ 1 ) = ( ) (0.96) ( 6.00), δ (ζ 2 ) = ( ) (1.70) (24.00), δ (ζ 3 ) = ( ) (0.93) (11.99), δ (ζ 5 ) =

32 Example: Observations as Y i = exp(β i ) + ɛ i, β i N(β, d), ɛ i N(0, 1) yield M 1,β = exp(2β) 1 + d exp(2β), M 2,β = M 3,β = M 5,β = exp(2β) 1 + d exp(2β) + 2d 2 exp(4β) (1 + d exp(2β)) 2, exp(2β)(exp(2β + d)(exp(d) 1) + 1) (1 + d exp(2β)) 2 and exp(2β + d) exp(2β + d)(exp(d) 1) + 1.

33 Y i = exp(β i ) + ɛ i ; β i N( 2, d = ρ 1 ρ ); ɛ i N(0, 1) Red: Simulated Fisher information; Blue: M 4,β M 1,β : solid, M 2,β : dashed, M 3,β : dot-dash, M 5,β : dotted

34 Y i = exp(β i ) + ɛ i ; β i N( 1, d = ρ 1 ρ ); ɛ i N(0, 1) Red: Simulated Fisher information; Blue: M 4,β M 1,β : solid, M 2,β : dashed, M 3,β : dot-dash, M 5,β : dotted

35 Y i = exp(β i ) + ɛ i ; β i N(0, d = ρ 1 ρ ); ɛ i N(0, 1) Red: Simulated Fisher information; Blue: M 4,β M 1,β : solid, M 2,β : dashed, M 3,β : dot-dash, M 5,β : dotted

36 Y i = exp(β i ) + ɛ i ; β i N(1, d = ρ 1 ρ ); ɛ i N(0, 1) Red: Simulated Fisher information; Blue: M 4,β M 1,β : solid, M 2,β : dashed, M 3,β : dot-dash, M 5,β : dotted

37 Y i = exp(β i ) + ɛ i ; β i N(2, d = ρ 1 ρ ); ɛ i N(0, 1) Red: Simulated Fisher information; Blue: M 4,β M 1,β : solid, M 2,β : dashed, M 3,β : dot-dash, M 5,β : dotted

38 Remember: Var(β i ) = E(Var Yi (β i )) + Var(E Yi (β i )) M β = 1 σ 2 D 1 Var(E Yi (β i ))D 1 1 σ 2 = 1 σ 2 D 1 1 σ 2 D 1 E(Var Yi (β i ))D 1 1 σ 2 Some explanation: d M β 0. d 0 Nonlinear regression M β 1 σ 2 F T β F β

39 Conclusions/Outlook Conclusions: - Motivation of model approximation - Different information different design - Big influence of inter-individual variance on the accuracy of approximations Outlook: - More insight needed on: - appropriateness of the approximations - approximation in β i or β i - Inclusion of variance parameters.

40 Thank you for your attention! Further informations needed (M 6,β,...)?

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural