University of California, Berkeley

Similar documents
University of California, Berkeley

Using Estimating Equations for Spatially Correlated A

Modification and Improvement of Empirical Likelihood for Missing Response Problem

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

University of Michigan School of Public Health

The International Journal of Biostatistics

Statistical Inference for Data Adaptive Target Parameters

Integrated approaches for analysis of cluster randomised trials

University of California, Berkeley

Empirical Bayes Moderation of Asymptotically Linear Parameters

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

University of California, Berkeley

University of California, Berkeley

Targeted Maximum Likelihood Estimation in Safety Analysis

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

AFT Models and Empirical Likelihood

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Empirical Bayes Moderation of Asymptotically Linear Parameters

,..., θ(2),..., θ(n)

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Hypothesis Testing For Multilayer Network Data

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

University of California, Berkeley

5 Methods Based on Inverse Probability Weighting Under MAR

Stat 579: Generalized Linear Models and Extensions

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Harvard University. Harvard University Biostatistics Working Paper Series

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Longitudinal Modeling with Logistic Regression

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Package drgee. November 8, 2016

University of California, Berkeley

University of California, Berkeley

Propensity Score Weighting with Multilevel Data

Stat 5101 Lecture Notes

Repeated ordinal measurements: a generalised estimating equation approach

University of California, Berkeley

Independent Increments in Group Sequential Tests: A Review

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Covariate Balancing Propensity Score for General Treatment Regimes

Figure 36: Respiratory infection versus time for the first 49 children.

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

PQL Estimation Biases in Generalized Linear Mixed Models

A note on L convergence of Neumann series approximation in missing data problems

The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Propensity Score Analysis with Hierarchical Data

IMPROVING POWER IN GROUP SEQUENTIAL, RANDOMIZED TRIALS BY ADJUSTING FOR PROGNOSTIC BASELINE VARIABLES AND SHORT-TERM OUTCOMES

Robust Estimation of Inverse Probability Weights for Marginal Structural Models

Harvard University. Harvard University Biostatistics Working Paper Series

Causal Inference Basics

A Sampling of IMPACT Research:

Efficient and Robust Scale Estimation

Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records

GEE for Longitudinal Data - Chapter 8

University of California, Berkeley

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Generalized Estimating Equations (gee) for glm type data

Estimating the Marginal Odds Ratio in Observational Studies

Longitudinal analysis of ordinal data

Targeted Learning for High-Dimensional Variable Importance

Chapter 14 Simple Linear Regression (A)

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Design of experiments for generalized linear models with random block e ects

Even Simpler Standard Errors for Two-Stage Optimization Estimators: Mata Implementation via the DERIV Command

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

STAT 526 Advanced Statistical Methodology

University of Texas, MD Anderson Cancer Center

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Robustness to Parametric Assumptions in Missing Data Models

Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa

Power for linear models of longitudinal data with applications to Alzheimer s Disease Phase II study design

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

University of California, Berkeley

diluted treatment effect estimation for trigger analysis in online controlled experiments

ECON 3150/4150, Spring term Lecture 6

Trends in Human Development Index of European Union

Covariance function estimation in Gaussian process regression

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules

University of California, Berkeley

Additive Isotonic Regression

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

ICES REPORT Model Misspecification and Plausibility

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION

University of California, Berkeley

Transcription:

University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population average models estimated using generalized estimating equations Alan E. Hubbard Mark J. van der Laan Division of Biostatistics, UC Berkeley, hubbard@berkeley.edu University of California - Berkeley, laan@berkeley.edu This working paper is hosted by The Berkeley Electronic Press (bepress) and may not be commercially reproduced without the permission of the copyright holder. http://biostats.bepress.com/ucbbiostat/paper251 Copyright c 2009 by the authors.

Nonparametric population average models: deriving the form of approximate population average models estimated using generalized estimating equations Alan E. Hubbard and Mark J. van der Laan Abstract For estimating regressions for repeated measures outcome data, a popular choice is the population average models estimated by generalized estimating equations (GEE). We review in this report the derivation of the robust inference (sandwichtype estimator of the standard error). In addition, we present formally how the approximation of a misspecified working population average model relates to the true model and in turn how to interpret the results of such a misspecified model.

1 Introduction First, introduce a simple repeated measures data structure. Let Y ij be the outcome for sub-unit i within unit j, and associated covariates X ij, For this paper, the parameter of interest I the so-called population average model or E(Y ij X ij ), which is modeled as: E(Y ij X ij ) = m(x ij β) = g[µ(x ij β)] (1) g is the link function depends on the regression (e.g., linear: g -1 (u)=u, log:g -1 (u)=log(u), logistic: g -1 (u)=log[u/(1-u)]), β are the regression coefficients u is a linear function of basis functions constructed from the vector of covariates, X ij. Typically, a generalized estimation equation approach is used to estimated the β (Liang and Zeger, 1986). A reasonably parameter one could estimate is an informative approximation of the true population average model, g[µ(x ij β)]. With this approach, one could avoid any model specification (i.e., the model is nonparametric) and the parameter of interest would be some projection of E[Y ij X ij ] onto the proposed class of estimating models (e.g., linear with only main effects); one can define explicitly the parameter of interest as a function of the distribution of the observed data (X,Y) Below, we first review the generalized estimating equation approach and somewhat informally discuss the derivation of the robust inference. Then, we derive parameter of interest as some projection of the working model onto the true population average model, where the projection is a function of the so-called working correlation model. Hosted by The Berkeley Electronic Press

2 Generalized Estimating Equations The general form of this estimating function for regression problems is (see van der Laan and Robins, 2003): D V (X. j,y. j β) = m(x. j β) V 1 { ε. j (β)}ε. j (β) T β (2) where X. j represents the design matrix for all observations in neighborhood j, Y. j represents the vector of outcomes observations corresponding to the rows of X. j, m(x. j β)={m(x ij β), i=1,...,n j } a vector of regression functions corresponding again to each row of X. j, ε. j (β)=y. j - m(x. j β) = {Y ij -m(x ij β), i=1,...,n j }, β a px1 vector of coefficients and V{ε. j (β)} is the working n j by n j variance-covariance of the residuals, consisting of the entries, cov(ε ij, ε i j ), based on the working correlation model. One can show that this is a consistent estimating function, that is, it has mean 0 if the true β s are entered. The estimator of β is defined by setting the average of these estimating functions for each unit (e.g., neighborhood) to 0 and solving for β: m 0 = D (X,Y V ˆ. j. j β). (3) i=1 Instead, robust or sandwich inference is typically provided (Liang and Zeger, 1986). Without going into the technical details, one can show that these estimators are asymptotically linear, which means that: β m β = 1 m m IC(X,Y 1. j. j β) + o p m (4) j=1 http://biostats.bepress.com/ucbbiostat/paper251

where β n is the estimate based on m observations (neighborhoods); IC is the so-called influence curve, is the same dimension as β and is derived from the estimating function (2); and the last term, o p (1/ m), is a second order term that becomes negligible as m gets large. Thus, one can derive the variance estimate of β n and the resulting standard errors by simply looking at the empirical variance-covariance of the components of the IC. In this case, the IC, a standardized version of the estimating function, is: where h(x. j ) = m β (X. T j β) pxn j IC(X. j,y. j β) = E h(x. j ) m(x. j β) β T V 1 { ε. j (β)} n. j x n j 1 h(x. j )ε. j (β) T 3 Definition of Projections Related to GEE regression models The motivation for defining the parameter of interest as some an approximation of a population average model borrows extensively from Neugebauer and van der Laan, 2007). The true population average conditional mean is µ*=e(y ij X ij ). Define µ : R p Μ, where µ(x β) Μ is the working model (informative approximation) and denote Μ as the set of possible working models (µ is the proposed approximation of µ*). Note that µ* is not necessarily an element of Μ (the class of working models does not necessarily contain the true model); we will refer to µ(x β) as the model approximation. The parameter of interest in the context of misspecification of µ* is: β(p X,Y A), where P X,Y represents the distribution of the data and is a function of some weight matrix, A =V - 1 in which the variance is derived from the mean according to some guessed distribution (e.g., var(y ij X ij )=E(Y ij X ij ) if model assumed is log-linear, Poisson model; see Liang and Hosted by The Berkeley Electronic Press

Zeger, 1986 for details). The estimation function (2) can be derived from solving the objective function: β A = β(p X,Y A) argmin E X [ ε T (β)aε(β)], ε. j (β) = Y. j µ(x. j β). β R k This can be shown to be equivalent to: β A = argmin E X [ ε * T (β)aε * (β)], ε *. j (β) = u * (X. j ) µ(x. j β), (5) β R k or the β A that minimizes the weighted Euclidean distance between the model and the true mean. This can be seen by differentiation product rule, so estimation function derived from (5) is (dropping the vector j subscript): d dβ E X ε * T (β)aε * (β) µ(x β) [ ] T = 0 E X β A[µ * (X) µ(x β)] = 0. But, by the law of iterated expectation, one can exchange Y for µ*(x) this is the same as solving: E X E X µ(x β) T β µ(x β) T β A[Y µ(x β)] = E µ(x β)t X E β A[µ * (X) µ(x β)]. A[Y µ(x β)] X = Thus, using the GEE estimating equations to estimate the coefficients, β A, as part of a misspecified model, µ(x β), is asymptotically equivalent to the β A that minimize the weighted Euclidean distance of the model to the true mean, where different weight matrices A (working model for the inverse of the variance-covariance matrix of the http://biostats.bepress.com/ucbbiostat/paper251

vector of repeated measures, Y) can result in different coefficients. Thus, the approximation model varies as a function of the working correlation structure chosen by the user (e.g., independence, auto-correlation, unstructured, etc), so that the estimate will represent estimates of different parameters for different working correlation structures Under misspecification of the population average model, the typical GEE algorithm will provide estimation and proper robust inference for β A. Thus, in this case, we have defined the coefficient parameters that GEE estimates as explicit nonparametric functions of the distribution of the data, P X,Y. References Liang K-Y, Zeger SL. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73(1):13-22. Neugebauer R, van der Laan MJ. 2007. Nonparametric Causal Effects based on marginal structural models, Journal of Statistical Planning and Inference Journal of Statistical Planning and Inference 137:419 434. van der Laan, MJ, Robins, JM. 2003. Unified Methods for Censored Longitudinal Data and Causality. Springer-Verlag, New York. Hosted by The Berkeley Electronic Press