Accurate Maximum Likelihood Estimation for Parametric Population Analysis. Bob Leary UCSD/SDSC and LAPK, USC School of Medicine

Similar documents
Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

A comparison of estimation methods in nonlinear mixed effects models using a blind analysis

2.1 Linear regression with matrices

3 Results. Part I. 3.1 Base/primary model

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Density Estimation. Seungjin Choi

Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging. Céline Helbert, Delphine Dupuy and Laurent Carraro

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Lecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström

Wald s theorem and the Asimov data set

STA414/2104 Statistical Methods for Machine Learning II

Primer on statistics:

A Novel Screening Method Using Score Test for Efficient Covariate Selection in Population Pharmacokinetic Analysis

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Likelihood-Based Methods

Classical Inference for Gaussian Linear Models

F. Combes (1,2,3) S. Retout (2), N. Frey (2) and F. Mentré (1) PODE 2012

Bootstrapping, Permutations, and Monte Carlo Testing

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Answers to Selected Exercises in Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Correction of the likelihood function as an alternative for imputing missing covariates. Wojciech Krzyzanski and An Vermeulen PAGE 2017 Budapest

Nonlinear random effects mixture models: Maximum likelihood estimation via the EM algorithm

Estimation of Quantiles

Hypothesis testing: theory and methods

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

Asymptotic Statistics-VI. Changliang Zou

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Maximum Likelihood Estimation. only training data is available to design a classifier

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Generalized, Linear, and Mixed Models

BTRY 4090: Spring 2009 Theory of Statistics

Nonlinear State Estimation! Particle, Sigma-Points Filters!

Basic Sampling Methods

Comparing two independent samples

Models for Multivariate Panel Count Data

Default Priors and Effcient Posterior Computation in Bayesian

A New Approach to Modeling Covariate Effects and Individualization in Population Pharmacokinetics-Pharmacodynamics

Obnoxious lateness humor

CSC 2541: Bayesian Methods for Machine Learning

Assumptions of classical multiple regression model

Evaluation of the Fisher information matrix in nonlinear mixed eect models without linearization

Nonresponse weighting adjustment using estimated response probability

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

TUTORIAL 8 SOLUTIONS #

Lecture 6: Bayesian Inference in SDE Models

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Methods in Multilevel Regression

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Modern Methods of Data Analysis - WS 07/08

Previously Monte Carlo Integration

Using Estimating Equations for Spatially Correlated A

stcrmix and Timing of Events with Stata

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

Robust negative binomial regression

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Part 1: Expectation Propagation

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Reconstruction of individual patient data for meta analysis via Bayesian approach

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

Designs for Clinical Trials

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Bayesian Regression Linear and Logistic Regression

On the Power of Tests for Regime Switching

Session 3 The proportional odds model and the Mann-Whitney test

Nonlinear Mixed Effects Models

Basic Concepts of Inference

BIOS 2083 Linear Models c Abdus S. Wahed

Recent Advances in the analysis of missing data with non-ignorable missingness

Session 5B: A worked example EGARCH model

Likelihood-based inference with missing data under missing-at-random

1. Describing patient variability, 2. Minimizing patient variability, 3. Describing and Minimizing Intraindividual

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization

Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Current approaches to covariate modeling

PRINCIPLES OF STATISTICAL INFERENCE

Robustness to Parametric Assumptions in Missing Data Models

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Bias-Variance Tradeoff

Notes on Machine Learning for and

Transcription:

Accurate Maximum Likelihood Estimation for Parametric Population Analysis Bob Leary UCSD/SDSC and LAPK, USC School of Medicine

Why use parametric maximum likelihood estimators? Consistency: θˆ θ as N ML TRUE Asymptotic Efficiency: var var ˆ θ ˆ θ ML OTHER 1 as N Well-developed asymptotic theory allows hypothesis testing

But most current parametric methods only maximize approximate likelihoods (MAL) F.O. F.O.C.E Laplace

MAL estimation can cause serious degradation of statistical peformance Not Consistent: ˆ θ θ + bias as N MAL TRUE Not Asymptotically Efficient: var var ˆ θ MAL ˆ θ ML > 1 as N No asymptotic theory using ML asymptotic theory for MAL case can be VERY misleading!

Statistical efficiency definition Relative statistical efficiency: statistical efficiency( θˆ ) MAL = var var ˆ θ ˆ θ ML MAL Statistical efficiencies can be evaluated by Monte Carlo simulations

Simulation conditions Simple 1-compartment IV bolus model, two random effects V and K C(t) = (1/V)e -Kt (V, K) ~ N(µ, Σ), 25% inter-individual relative standard deviation in each of V, K. Y i =C(t)(1+e i ), e i ~N(0,σ 2 ), intra-individual relative σ=.10, 2 observations/subject (sparse data)

Approximate likelihoods can destroy statistical efficiency 16 14 12 10 histogram (white) of PEM estimators histogram (blue) of NONMEM FO estimators 8 6 4 2 0 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

FOCE does better, but still has <40% efficiency relative to ML 10 9 red: NP A G blue: NONMEM-FOCE 8 7 6 5 4 3 2 1 0 0.05 0.055 0.06 0.065 0.07 0.075 0.08

Comparative statistical efficiencies, 2 obs/subject, 10% observational error

Approximations add random error ˆ = ˆ + ( ˆ ˆ ) θ θ θ θ MAL ML MAL ML var( θˆ θˆ ) = 80 var( θˆ ) FO ML ML var( ˆ ˆ ) = 1.2 var( ˆ ) θ θ θ FOCE ML ML

PEM Estimation Simplest case just random effects (no fixed effects related to covariates) Population distribution of random effects vector η at the lowest level is N(µ, Σ), so parameters to be estimated are θ = (µ, Σ) We are given likelihood functions l y η σ i 2 (, ) i

PEM algorithm For current population distribution iterate N(µ j, Σ j ), compute posterior distributions f post i = l ( y η) f( η µ, Σ ) i i j j Q i Compute (µ j+1, Σ j+1) as the mean and covariance of the posterior mixture distribution f post 1 N = N i = 1 f post i

Likelihoods Likelihood of j+1 iterate is N j 1 j 1 log L( µ +, Σ + ) = log i= 1 Q i (Schumitzky, 1993) showed log L( µ j +, Σ j + 1 ) log L( µ 1 j j, Σ )

Likelihood convergence FOCE vs PEM

Numerical integration methods Q = l ( y η) f( η µ, Σ ) dη i i i j j = 1 M M k = 1 l ( y η ) for M samples η i i k k η k can be Monte Carlo (pseudorandom) samples, Gauss-Hermite grid points, or "quasi random" samples (low discrepancy sequences)

1000 pseudo- vs quasi random samples from a bivariate N(0,I) 4 4 3 3 2 1 0-1 -2 2 1 0-1 -2-3 -3-4 -3-2 -1 0 1 2 3 4-4 -4-3 -2-1 0 1 2 3 4 Σ pseudo = 1.065-0.064-0.064 0.983 Integration error~1/m 1/2 Σ = 1.003 0.004 quasi 0.004 0.998 Integration error ~1/M

Recent blind trial Pop PK method comparison (Girard et al, 2004) Model response = E 0 e 1 e log E E 0 0 1 E ED50* f ( male) = 1, f ( female) = θ η = e η + max f e η 2 dose ( sex) e η 3 + dose Estimate all 8 parameters with standard errors and p-value for null hypothesis θ=1

Profile likelihood method

Conclusions Likelihood approximations such as FO and FOCE significantly degrade statistical results, particularly in the sparse data case Accurate likelihood methods such as PEM perform much better in the sparse data case than approximate methods Accurate likelihood methods such as PEM are feasible with current computational technology