Accurate Maximum Likelihood Estimation for Parametric Population Analysis Bob Leary UCSD/SDSC and LAPK, USC School of Medicine
Why use parametric maximum likelihood estimators? Consistency: θˆ θ as N ML TRUE Asymptotic Efficiency: var var ˆ θ ˆ θ ML OTHER 1 as N Well-developed asymptotic theory allows hypothesis testing
But most current parametric methods only maximize approximate likelihoods (MAL) F.O. F.O.C.E Laplace
MAL estimation can cause serious degradation of statistical peformance Not Consistent: ˆ θ θ + bias as N MAL TRUE Not Asymptotically Efficient: var var ˆ θ MAL ˆ θ ML > 1 as N No asymptotic theory using ML asymptotic theory for MAL case can be VERY misleading!
Statistical efficiency definition Relative statistical efficiency: statistical efficiency( θˆ ) MAL = var var ˆ θ ˆ θ ML MAL Statistical efficiencies can be evaluated by Monte Carlo simulations
Simulation conditions Simple 1-compartment IV bolus model, two random effects V and K C(t) = (1/V)e -Kt (V, K) ~ N(µ, Σ), 25% inter-individual relative standard deviation in each of V, K. Y i =C(t)(1+e i ), e i ~N(0,σ 2 ), intra-individual relative σ=.10, 2 observations/subject (sparse data)
Approximate likelihoods can destroy statistical efficiency 16 14 12 10 histogram (white) of PEM estimators histogram (blue) of NONMEM FO estimators 8 6 4 2 0 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18
FOCE does better, but still has <40% efficiency relative to ML 10 9 red: NP A G blue: NONMEM-FOCE 8 7 6 5 4 3 2 1 0 0.05 0.055 0.06 0.065 0.07 0.075 0.08
Comparative statistical efficiencies, 2 obs/subject, 10% observational error
Approximations add random error ˆ = ˆ + ( ˆ ˆ ) θ θ θ θ MAL ML MAL ML var( θˆ θˆ ) = 80 var( θˆ ) FO ML ML var( ˆ ˆ ) = 1.2 var( ˆ ) θ θ θ FOCE ML ML
PEM Estimation Simplest case just random effects (no fixed effects related to covariates) Population distribution of random effects vector η at the lowest level is N(µ, Σ), so parameters to be estimated are θ = (µ, Σ) We are given likelihood functions l y η σ i 2 (, ) i
PEM algorithm For current population distribution iterate N(µ j, Σ j ), compute posterior distributions f post i = l ( y η) f( η µ, Σ ) i i j j Q i Compute (µ j+1, Σ j+1) as the mean and covariance of the posterior mixture distribution f post 1 N = N i = 1 f post i
Likelihoods Likelihood of j+1 iterate is N j 1 j 1 log L( µ +, Σ + ) = log i= 1 Q i (Schumitzky, 1993) showed log L( µ j +, Σ j + 1 ) log L( µ 1 j j, Σ )
Likelihood convergence FOCE vs PEM
Numerical integration methods Q = l ( y η) f( η µ, Σ ) dη i i i j j = 1 M M k = 1 l ( y η ) for M samples η i i k k η k can be Monte Carlo (pseudorandom) samples, Gauss-Hermite grid points, or "quasi random" samples (low discrepancy sequences)
1000 pseudo- vs quasi random samples from a bivariate N(0,I) 4 4 3 3 2 1 0-1 -2 2 1 0-1 -2-3 -3-4 -3-2 -1 0 1 2 3 4-4 -4-3 -2-1 0 1 2 3 4 Σ pseudo = 1.065-0.064-0.064 0.983 Integration error~1/m 1/2 Σ = 1.003 0.004 quasi 0.004 0.998 Integration error ~1/M
Recent blind trial Pop PK method comparison (Girard et al, 2004) Model response = E 0 e 1 e log E E 0 0 1 E ED50* f ( male) = 1, f ( female) = θ η = e η + max f e η 2 dose ( sex) e η 3 + dose Estimate all 8 parameters with standard errors and p-value for null hypothesis θ=1
Profile likelihood method
Conclusions Likelihood approximations such as FO and FOCE significantly degrade statistical results, particularly in the sparse data case Accurate likelihood methods such as PEM perform much better in the sparse data case than approximate methods Accurate likelihood methods such as PEM are feasible with current computational technology