Using Estimating Equations for Spatially Correlated A

Similar documents
Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

University of California, Berkeley

Quasi-likelihood Scan Statistics for Detection of

Integrated approaches for analysis of cluster randomised trials

5 Methods Based on Inverse Probability Weighting Under MAR

,..., θ(2),..., θ(n)

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Modification and Improvement of Empirical Likelihood for Missing Response Problem

PQL Estimation Biases in Generalized Linear Mixed Models

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Covariance function estimation in Gaussian process regression

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

Marginal Specifications and a Gaussian Copula Estimation

A Sampling of IMPACT Research:

Repeated ordinal measurements: a generalised estimating equation approach

Stat 579: Generalized Linear Models and Extensions

Review of Econometrics

SAS/STAT 15.1 User s Guide Introduction to Mixed Modeling Procedures

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Bayesian Hierarchical Models

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

GEE for Longitudinal Data - Chapter 8

STA 2201/442 Assignment 2

Generating Spatial Correlated Binary Data Through a Copulas Method

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Maximum Likelihood Estimation of Regression Parameters With Spatially Dependent Discrete Data

Various types of likelihood

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

ST 790, Homework 1 Spring 2017

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios

Semiparametric Generalized Linear Models

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Statistics 910, #5 1. Regression Methods

Likelihood-Based Methods

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Point-Referenced Data Models

Figure 36: Respiratory infection versus time for the first 49 children.

POLI 8501 Introduction to Maximum Likelihood Estimation

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Matrix Approach to Simple Linear Regression: An Overview

Longitudinal Modeling with Logistic Regression

Robust Inference for Regression with Spatially Correlated Errors

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Longitudinal analysis of ordinal data

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

5601 Notes: The Sandwich Estimator

STAT 518 Intro Student Presentation

Stat 642, Lecture notes for 04/12/05 96

Survival Analysis for Case-Cohort Studies

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Approaches to Modeling Menstrual Cycle Function

Poisson regression: Further topics

1 Motivation for Instrumental Variable (IV) Regression

High Dimensional Propensity Score Estimation via Covariate Balancing

Discussion of Maximization by Parts in Likelihood Inference

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Linear Models and Estimation by Least Squares

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

SAS/STAT 13.1 User s Guide. Introduction to Mixed Modeling Procedures

Models for binary data

Bayesian linear regression

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Introduction to Estimation Methods for Time Series models. Lecture 1

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Approaches for Multiple Disease Mapping: MCAR and SANOVA

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

A stationarity test on Markov chain models based on marginal distribution

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies

Fisher information for generalised linear mixed models

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

For more information about how to cite these materials visit

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Introduction to Geostatistics

Efficiency of generalized estimating equations for binary responses

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Bayesian Methods for Machine Learning

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Function of Longitudinal Data

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Linear models and their mathematical foundations: Simple linear regression

Regression #3: Properties of OLS Estimator

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Transcription:

Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009

Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion

Typical Problem Assess the relationship between a spatially varying covariate and areal epidemiologic data; prediction not the focus Frequently the response is non-normal Should we put in a spatial random effect to smooth? Reich et al. (2006) showed using the CAR model can lead to large changes in the posterior mean and variance of fixed effects when spatially varying covariates collinear with spatial random effects

The Real Issues If the model for the covariance of the response (conditioned on covariates) is wrong Often estimators for fixed effects still consistent Estimators will be inefficient But standard error estimates will be incorrect Leads to erroneous inference Unlikely that one would know the form of the covariance of the conditional response a priori. Want a method that is robust to misspecification of the covariance Only want to assume mean model is correct GOAL posit a model for the covariance and hopefully gain efficiency

Liang and Zeger (1986) seminal paper: Generalized Estimating Equations Some Notation Used Observe pairs of independent data (Y i x i ) for i = 1,..., m where Y i is (n i 1) Let E(Y i x i ) = f i (x i, β) var(y i x i ) = V i (β, ξ, x i ) = T 1/2 i (β, θ, x i )Γ i (α, x i )T 1/2 i (β, θ, x i ) T i diagonal matrix with elements var(y ij ) = σ 2 g(β, θ, x ij ), Γ i correlation matrix

Liang and Zeger (1986) seminal paper: Generalized Estimating Equations Main Idea Posit a working correlation matrix for the multivariate response within subject Solve m {Y i f i (x i, β)} = 0 to estimate β i=1 XT i (β)v 1 i Solve quadratic estimating equations to obtain estimates for ξ = (θ, α) Obtain standard error estimates using Sandwich variance estimators

Properties of Generalized Estimating Equations Consistent, asymptotically normal estimator of β even if V i was misspecified Sandwich variance estimators are consistent correct inference If V i were correctly specified, most efficient RAL semi-parametric estimator (Tsiatis, 2006) Presumably, posit something close to the truth, gain efficiency Asymptotic here refers to m Albert and McShane (1995) (brain imaging) considered repeated measures where there was spatial variability Most spatial epidemiology problems have m = 1

Extending GEEs to Spatial Epidemiology Note on notation: Since we will only consider m = 1 for the rest of the talk we will drop the subscript i Gotway and Stroup (1997) discuss a Quasi-Likelihood approach which would lead to same form of the estimating equations above with m = 1 But V must satisfy various assumptions not easily verified Perhaps can borrow ideas from time series: Zeger (1988)

Development of McShane et al. (1997) Developed model for spatially correlated count data Main idea: Conceive of some non-negative weakly stationary latent (spatial) process ɛ = {ɛ(s 1 ),..., ɛ(s n )} where s 1,..., s n are the spatial locations, and then assume E(y(s j ) ɛ(s j )) = exp(x T j β)ɛ(s j) var(y(s j ) ɛ(s j )) = E(y(s j ) ɛ(s j )) - like a Poisson process cov(y(s j ), y(s k )) = σ 2 ρ ɛ (h, ξ) - h is the distance E(ɛ(s)) = 1 This induces a model for the mean and covariance matrix of the unconditional moments of y(s) which could be solved for using the law of iterated expectations and covariances

Asymptotic Properties of McShane et al. (1997) Zeger showed that if ɛ(s) is a mixing process then so will y(s) and the solution to the estimating equations X T (β)v 1 (β, ξ, x){y f(x, β)} = 0 will be consistent and asymptotically (n ) normal for β ξ can be estimated using quadratic estimating equations Did not need to consider this model through a latent spatial process Could have just directly posited the model for the mean and covariance of y(s) as long as y(s) was a mixing process

Variance Estimation in McShane et al. (1997) If the covariance matrix V is correctly specified then asymptotically var(ˆβ) = {X T (β)v 1 (β, ξ, x)x T (β)} 1 If in truth the covariance is U then asymptotically var(ˆβ) = (X T U 1 X T ) 1 (X T U 1 V U 1 X T )(X T U 1 X T ) 1 Zeger (1988) only considers the possibility of misspecification for computational purposes Note the asymptotic standard errors do not depend on how ξ was estimated Suggest that the estimated standard errors are likely to be optimistic Unfortunately SAS cannot easily obtain these sandwich-like variance estimators

Implementation in SAS PROC GENMOD typically used to GEE models Does not have option for spatial working correlation matrix and only uses method of moments estimators for parameters in the working correlation matrix PROC GLIMMIX usually thought of as software for Generalized Linear Mixed Models but can also do GEE Able to posit spatial working correlation matrices and uses quadratic estimating equations to estimate parameters in the working correlation matrix

Warning on the Interpretation of Parameters Repeated measures analysis often think of population-averaged vs. subject-specific models Estimating equation approach lends itself to the population average approach With a log-linear mean model: exp(β 1 ) is the factor by which the response changes averaged across all units for a one unit change in x 1 Likely the perspective preferred by epidemiologists

Simulation Design - Overview Interested in assessing the small-sample properties of these spatial estimating equations Assess the effect of adding a spatial random effect on the fixed effects Consider a spatially varying covariate x i over 30 western North Carolina counties 500 total diseases with expected number of diseases, E j, per county proportional to the July, 2008 population estimate Observed number of diseases, O j drawn from a (correlated) Poisson distribution with mean= E j exp(β 0 + β 1 x j ), where β 0 = 0 and β 1 = 0.4 1,000 Monte Carlo Runs Each data set analyzed using independence, spherical, and exponential working correlation matrices

Simulation Design - Spatially Varying Covariate Project county centroid to x, y grid Spatially varying covariate that was function of x-coordinate plus random error [ 1.09, 0.484) [ 0.484, 0.076) [ 0.076,0.294) [0.294,0.45) [0.45,0.88]

Simulation Design - Correlation Structure Consider three different spatial correlation structures for the response to generate the data: Exponential correlation with range parameter 15 km, 30 km, and 45 km (approximately 45 km, 90 km, and 135 km for effective spatial range) c(1, 2, 3) 1.0 1.5 2.0 2.5 3.0 [0,0.05) [0.05,0.1) [0.1,0.15) [0.15,0.2) [0.2,0.25) [0.25,0.3) [0.3,0.35) [0.35,0.4) [0.4,0.45) [0.45,1] 1.0 1.5 2.0 2.5 3.0

Simulation Results - Range Parameter 15 km Working % Relative var( ˆβ 1 ) Mean Ratio Coverage Cor. Bias {se( ˆβ 1 )} 2 Exponential 0.0118 0.0141 0.0124 0.883 0.936 Independent 0.0099 0.0110 0.0102 0.924 0.943 Spherical 0.0067 0.0132 0.0109 0.822 0.928 Table: Maximum standard errors % Relative Bias: 0.0094; var( ˆβ 1 ): 0.0006; Mean {se( ˆβ 1 )} 2 : 0.0003; Ratio: 0.045; Coverage 0.008

Simulation Results - Range Parameter 45 km Working % Relative var( ˆβ 1 ) Mean Ratio Coverage Cor. Bias {se( ˆβ 1 )} 2 Exponential 0.014 0.0219 0.0170 0.774 0.886 Independent 0.022 0.0254 0.0103 0.404 0.809 Spherical 0.017 0.0244 0.0221 0.902 0.913 Table: Maximum standard errors % Relative Bias: 0.013; var( ˆβ 1 ): 0.0011; Mean {se( ˆβ 1 )} 2 : 0.0003; Ratio: 0.045; Coverage 0.012

Conclusion We have shown that linear estimating equations can be used for analyzing spatially correlated areal data and those estimators are consistent and asymptotically normal regardless of whether the assumed covariance between the response is correct. We have argued using a simulation study that assuming a spatial covariance will often lead to improved efficiency in the parameter estimates even with a small sample size.

References Albert P. S. and McShane L. M. (1995). A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics 51, 627-638. Gotway, C. A. and Stroup, W. W. (1997). A generalized linear model approach to spatial data analysis and prediction. Journal of Agricultural Biological and Environmental Statistics 2, 157-178. Liang K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. Lin P.-S. and Clayton, M. K. (2005). Analysis of binary spatial data by quasi-likelihood estimating equations. Annals of Statistics 33, 542-555. McShane, L. M., Albert P. S., and Palmatier, M. A. (1997). A latent process regression model for spatially correlated count data. Biometrics 56, 698-706. Reich B. J.. Hodges, J. S., and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease mapping model. Biometrics 62, 1197-1206. Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. New York: Springer. Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 74, 621-629.