Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Similar documents
MISCLASSIFICATION AND MEASUREMENT ERROR IN REGRESSION MODELS. Prof. Helmut Küchenhoff Ludwig-Maximilians-Universität München

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

Measurement Error in Covariates

Imprecise Measurement Error Models Towards a More Reliable Analysis of Messy Data

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Non-Gaussian Berkson Errors in Bioassay

MEASUREMENT ERROR IN HEALTH STUDIES

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology

Specification Errors, Measurement Errors, Confounding

SIMEX and TLS: An equivalence result

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Using Estimating Equations for Spatially Correlated A

Econometric Analysis of Cross Section and Panel Data

Combining multiple observational data sources to estimate causal eects

POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Linear Methods for Prediction

Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit

Measurement error, GLMs, and notational conventions

Measurement Error in Spatial Modeling of Environmental Exposures

Analysing geoadditive regression data: a mixed model approach

Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates

Non-linear panel data modeling

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Appendix A1: Proof for estimation bias in linear regression with a single misclassified regressor (for

Covariate Balancing Propensity Score for General Treatment Regimes

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Lecture 6: Discrete Choice: Qualitative Response

AN ABSTRACT OF THE DISSERTATION OF

Statistical Estimation

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Correction for classical covariate measurement error and extensions to life-course studies

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

General Regression Model

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Linear Regression. Junhui Qian. October 27, 2014

Machine Learning Linear Classification. Prof. Matteo Matteucci

Multilevel Statistical Models: 3 rd edition, 2003 Contents

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Modelling geoadditive survival data

ECO Class 6 Nonparametric Econometrics

Flexible Estimation of Treatment Effect Parameters

Analysis of Correlated Data with Measurement Error in Responses or Covariates

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

A general mixed model approach for spatio-temporal regression data

Robustness to Parametric Assumptions in Missing Data Models

Multi-level Models: Idea

Classification. Chapter Introduction. 6.2 The Bayes classifier

SPATIAL LINEAR MIXED MODELS WITH COVARIATE MEASUREMENT ERRORS

Gibbs Sampling in Endogenous Variables Models

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

Generalized Linear Models. Kurt Hornik

Local Polynomial Regression and SIMEX

Accounting for Baseline Observations in Randomized Clinical Trials

ECON 4160, Autumn term Lecture 1

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Digital Southern. Georgia Southern University. Varadan Sevilimedu Georgia Southern University. Fall 2017

Linear Methods for Prediction

Learning Gaussian Process Models from Uncertain Data

Selection on Observables: Propensity Score Matching.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Multi-state Models: An Overview

Statistics 910, #5 1. Regression Methods

Data Uncertainty, MCML and Sampling Density

A Regularization Corrected Score Method for Nonlinear Regression Models. with Covariate Error

Distribution-free ROC Analysis Using Binary Regression Techniques

SOME ASPECTS OF MEASUREMENT ERROR IN EXPLANATORY VARIABLES FOR CONTINUOUS AND BINARY REGRESSION MODELS

Machine Learning 2017

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Bootstrap, Jackknife and other resampling methods

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Probability and Estimation. Alan Moses

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

A Significance Test for the Lasso

Causal Mechanisms Short Course Part II:

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Consequences of measurement error. Psychology 588: Covariance structure and factor models

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Likelihood ratio testing for zero variance components in linear mixed models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Lecture 7 Introduction to Statistical Decision Theory

Semiparametric Generalized Linear Models

Fractional Imputation in Survey Sampling: A Comparative Review

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Econometrics Master in Business and Quantitative Methods

Bayesian Methods for Machine Learning

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Poisson regression 1/15

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

Transcription:

Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010

Overview 1 and Misclassification 2 3 4 2

and Misclassfication

References Carroll R. J., D. Ruppert, L. Stefanski and CRainiceanu, C : in Nonlinear Models. A Modern perspective. Chapman & Hall London 2006. Buonaccorsi, J.-P.: Measurement error. Models, methods and applications. Cahpman & Hall, Buca Raton 2010. Kuha, J., C. Skinner and J. Palmgren. Misclassification error. In: Encyclopedia of Biostatistics, ed. by Armitage, P.and Colton, T., 2615-2621. Wiley, Chichester. 4

Statistical Modeling Applied Statistics: Learning from data by sophisticated Several kinds of complexity Complex relationships between variables Often, the relationship between variables and data is complex, too: * Variables of interest are not ascertainable. * Only proxy variables (surrogates) are available instead. 5

General setting Outcome Variable Y i measurement model measurement Y i main model Aassociation effects Predictor X i measurement model measurement X i data inference data 6

Examples Framingham heart study, Munich blood pressure study: Blood pressure, long term average Single measurement Munich bronchitis study: Average occupational dust exposure Single measurement, expert ratings Studies on effects of air pollution Munich radon study Study on caries (misclassification) 7

Consequences of and misclassification Potential consequences of neglecting the difference between latent variables and their surrogates: severely biased estimation of all parameters Note: Measurement error and misclassification are not just a matter of sloppiness. Latent variables are eo ipso not exactly measurable. 8

Notation We have to distinguish between true (correctly measured) variable and its (possible incorrect) measurement, i.e. between the latent variable and the corresponding surrogate. * - Notation (here) X, Z : correctly (unobservable) Variable X, Z : corresponding possibly incorrect measurements analogously: Y, Y and T, T X,W, Z - Notation (Carroll et al.; 2006) ξ - X- Notation (Schneeweiß, Fuller) 9

Models for Systematic vs random Classical vs Berkson Additive vs multiplicative Homoscedastic vs heteroscedastic Differential vs non differential 10

Classical additive random X i : True value Xi : Measurement of X Xi = X i + U i (U i, X i ) indep. E(U i ) = 0 V (U i ) = σu 2 U i N(0,σU) 2 This model is suitable for Instrument m.e. One measurement is used for a mean 11

Additive Berkson-error X i = Xi + U i (U i, Xi ) indep. E(U i ) = 0 V (U i ) = σu 2 U i N(0,σU) 2 The model is suitable for Mean exposure of a region X instead of individual exposure X. Working place measurement Dose in a controlled experiment 12

Classical and Berkson Note that in the Berkson case E(X X ) = X Var(X) = Var(X )+Var(U) Var(X) > Var(X ) Note that in the Classical additive case E(X X) = X Var(X ) = Var(X)+Var(U) Var(X ) > Var(X) 13

Multiplicative Xi = X i U i (U i, X i ) indep. Classical X i = Xi U i (U i, Xi ) indep. Berkson E(U i ) = 1 U i Lognormal Additive on the logarithmic scale Used for exposure by chemicals or radiation 14

Modell für misclassification (MC) Binary case: sensitivity and specificity Y Y observed true P(Y = 1 Y = 1) = π 11 (sensitivity) P(Y = 0 Y = 0) = π 00 (specificity) P(Y = 0 Y = 1) = 1 π 11 = π 01 P(Y = 1 Y = 0) = 1 π 00 = π 10 classification matrix ( ) π00 π 01 π 10 π 11 15

Complex measurement The relationship between measurements and true value can be modeled, e.g. Scores Regression (estimation by validation data) Mixture of Berkson and classical Dependence of on further covariates 16

Additive Measurement error in response Simple linear regression Y = β 0 + β 1 X + ε Y = Y + U additive Y = β 0 + β 1 X + ε + U New equation error: ε + U Assumption : U and X independent, U and ɛ independent Higher variance of ε Inference still correct Error in equation and are not distinguishable. 17

Measurement error in covariates We focus on covariate in regression Main model: E(Y )=f(β, X, Z ) We are interested in Inference on β 1 Z is a further covariate measured without error Error model: X X Observed model: E(Y )=f (X, Z,β ) Naive estimation: Observed model = main model but in most cases : f f,β β 18

Differential and non differential Assumption of differential relates to the response: [Y X, X ]=[Y X] For Y there is no further information in U or W when X is known. Then the error and the main model can be split. [Y, X, X] =[Y X][X X][X] From the substantive point of view: Measurement process and Y are independent Blood pressure on a special day is irrelevant for CHD if long term average is known Mean exposure irrelevant if individual exposure is known But people with CHD can have a different view on their nutrition behavior 19

Simple linear regression We assume a classical non differential additive normal Y = β 0 + β 1 X + ɛ X = X + U, (U, X,ɛ) indep. U N(0,σu) 2 ɛ N(0,σɛ 2 ) 20

classical in linear regression 21

The observed model in linear regression E(Y X )=β 0 + β 1 E(X X ) Assuming X N(µ x,σ 2 x ), the observed model is: E(Y X ) = β 0 + β 1X β 1 = σ 2 x σ 2 x + σ 2 u β 0 = β 0 + β 1 ( 1 σ2 x σ 2 x + σ 2 u ( Y β0 β1x N 0,σɛ 2 + β2 1 σ2 uσx 2 σx 2 + σu 2 ) β 1 µ x ) The observed model is still a linear regression! Attenuation of β 1 by the factor Reliability ratio" σ 2 x σ 2 x +σ 2 u Loss of precision (higher error term) 22

Identification (β 0,β 1,µ x,σ 2 x,σ 2 u,σ 2 ɛ ) [Y, X ] µ y,µ x,σ 2 y,σ 2 x,σ x y (β 0,β 1,µ x,σ 2 x,σ 2 u,σ 2 ɛ ) and (β 0,β 1,µ x,σ 2 x + σ 2 u, 0,σ ɛ ) yield the identical distributions of (Y, X ). = The model parameters are not identifiable We need extra information, e.g σ u is known or can be estimated σ u /σ ɛ is known (orthogonal regression) The model with another distribution for X is identifiable by higher moments. 23

Naive LS- estimation For the slope : ˆβ 1n = S yx S 2 x plim( ˆβ 1n ) = σ yx = σx 2 σ yx σ 2 x + σ 2 u σ 2 x = β 1 σx 2 + σu 2 24

Correction for attenuation We have a first method: Solve the bias equation: ˆβ 1 = ˆβ 1n σ 2 x + σ 2 u σ 2 x ˆβ 1 = ˆβ 1n S 2 x S 2 x σ2 u Correction by reliability ratio. V ( ˆβ 1 ) > V (β 1n ) Bias Variance trade off 25

Multiple regression Formulae can be extended to the multiple case Note : also parameter estimates of error free variables may be affected. 26

Berkson-Error in simple linear regression Y = β 0 + β 1 X + ɛ X = X + U, U, (X, Y ) indep., E(U) =0 27

Observed Model E(Y X ) = β 0 + β 1 X V (Y X ) = σɛ 2 + β 1 σu 2 Regression model with identical β Measurement error ignorable Loss of precision 28

Misclassification Model for misclassification X : true binary variable, gold standard X : observed value of X, surrogate P(X = 1 X = 1) = π 11 (Sensitivity) P(X = 0 X = 0) = π 00 (Specificity) P(X = 0 X = 1) = 1 π 11 = π 01 P(X = 1 X = 0) = 1 π 00 = π 10 classification matrix ( ) π00 π Π = 01 π 10 π 11 29

misclassification Naive analysis: Simply ignore misclassification We want to estimate P(X = 1) We use 1 n n i=1 X i P(X = 1) = π 11 P(X = 1)+π 10 P(X = 0) P(X = 1) P(X = 1) = π 10 P(X = 0) π 01 P(X = 1) 30

Correction Idea: Solve the bias equation Note that X is still binomial and P(X = 1) can be consistently estimated from the observed data. P(X = 1) = π 11 P(X = 1)+π 10 (1 P(X = 1)) P(X = 1) = (P(X = 1) π 10 )/(π 11 + π 00 1) Assumptions π 11 and π 00 known π 11 + π 00 > 1 Variance factor (π 11 + π 00 1) 2 31

Multinomial case X is multinomial with categories 1,..., k. X is observed The error model is given by the misclassification Matrix Π= {π ij } with π ij = P(X = i X = j). Then we get for the probability vectors P X = (p x1,..., p xk ) P X = (px1,..., pxk) P X = Π P X 32

The matrix method The correction method is given by ˆP X := Π 1 ˆP X (1) Properties Misclassification matrix has to be known or estimated Gives sometimes probabilities > 1 or < 0 Variance calculation straight forward Use the delta method in the case of estimated Π, Greenland (1988) 33

example: Apolipoprotein E and Morbus Alzheimer case control Established realtionship between Apolipoprotein Eε4 and Morbus Alzheimer Data from metaanalysis ApoE genotype ε2ε2 ε2ε3 ε2ε4 ε3ε3 ε3ε4 ε4ε4 Clinical controls 27 425 81 2258 803 71 Clinical Alzheimer 7 74 41 593 620 207 PM controls 3 75 18 358 120 8 PM Alzheimer 1 20 17 249 373 97 PM: Post Mortem 34

MC - correction 1 Annahme: P(Y = 0 Y = 1) = 10% (literature) P(Y = 1 Y = 0) = 1% (literature) In our sample Sensitivity = P(Y = 1 Y = 1) =0.97 Specificity = P(Y = 0 Y = 0) =0.96 correction (Matrix) OR: ε3ε3 (Reference) ε2ε4 ε2ε4 ε2ε4 naiv 1.93 2.94 11.10 korr. 2.12 3.36 14.04 35

Overview Correction and method of moments Orthogonal regression Regression calibration Simulation and extrapolation (SIMEX) Likelihood Quasi likelihood Corrected score Bayes 36

Direct correction Find p lim ˆβ n = β = f (β, γ) Then ˆβ := ˆf 1 ( ˆβ n ) Correction for attenuation in linear model General case Kü (1997) 37

Method of moments Moments of observed data can be estimated. Solve moments equations Simple linear regression: µ X = µ X µ Y = β 0 + β 1 µ x σx 2 = σ2 X + σu 2 σy 2 = β1σ 2 X 2 + σɛ 2 σ YX = β 1 σ 2 X 38

Orthogonal regression, total least squares In linear Regression with classical additive : Assume σ2 ɛ = η is known, σu 2 e. g. no equation error and σ ɛ is in Y. Minimize n i=1 in (β 0,β 1, X 1, X 2,..., X n ). { (Yi β 0 β 1 X i ) 2 + η(x i X i ) 2} Total least squares, Van Huffel (1997) Technical symmetric applications In epidemiology a problem, assumption of no equation error not realistic 39

Regression calibration This simple method has been widely applied. It was suggested by different authors: Rosner et al. (1989) Carroll and Stefanski(1990) 1 Find a model for E(X X, Z ) by validation data or replication 2 Replace the unobserved X by estimate E(X X, Z ) in the main model 3 Adjust variance estimates by bootstrap or asymptotic methods Good method in many practical situations Calibration data can be incorporated Problems in highly nonlinear 40

Regression calibration Berkson case: E(X X )=W Naive estimation = Regression calibration Classical : Linear regression X on X E(X X ) = σ2 X σx 2 X + µ X (1 σ2 X σ 2 X ) Correction for attenuation in linear model Classical with the assumption of mixed normal: E(X X )=P(I = 1 X ) E(X X, I = 1) +P(I = 2 X ) E(X X, I = 2) 41

Likelihood methods Standard inference can be done with standard errors and likelihood ratio tests Efficiency Combination of different data types are possible Sometimes more accurate than approximations Difficult to calculate Software not available Parametric model for the unobserved predictor necessary Robustness to strong parametric assumptions 42

The classical error likelihood The model has three components Main model [Y X, Z,ζ] Error model [X X, η] Exposure model [X Z,λ] n [Y, X Z,θ] = [y i x, z i,ζ][w i x,η][x z i,λ]dµ(x), i=1 where θ =(ζ, η, λ) Evaluation by numerical integration Misclassification is a special case 43

Berkson likelihood We have the three components, but the third component contains no information Main model Error model Exposure model [Y, X Z,θ] = [Y, X Z,θ] = [Y X, Z,ζ] [X X,η] [X Z,λ], n [y i x, z i,ζ][x w i,η][w i z i,λ]dµ(x) i=1 n [y i x, z i,ζ][x w i,η]dµ(x) const i=1 The Berkson likelihood does not depend on the exposure model 44

Quasi likelihood If calculation of the likelihood is too complicated use E(Y X, Z ) = g(x, Z )f x w dx V (Y X, Z ) = v(x, Z )f x w dx + Var[g(X, Z ) X ] This can be done e.g. for exponential g: Poisson regression model Parametric survival 45

Bayes Richardson and Green (2002) Evaluation by MCMC techniques Conditional independence assumptions on the three parts as seen in the likelihood approach The latent variable X is treated an unknown parameter Different data types can be combined Priori distributions for the error model Flexible handling of the exposure model WinBugs Software 46

SIMEX: Basic idea Cook and Stefanski(JASA, 1994). The effect of measurement error on the naive estimation is analyzed by a simulation experiment. Linear regression with additive : Y = β 0 + β X X + ɛ X = X + U, (U, X,ɛ) indep. U N(0,σ 2 u) ɛ N(0,σ 2 ɛ ) For the naive estimator (Regression from Y on X ): σ2 X β := p lim β naive = σx 2 + β X σ2 u 47

The SIMEX algorithm Additive with known or estimated variance σ 2 u 1 Calculate the naive estimator ˆβ naive := β(0) 2 Simulation: For a fixed grid of λ,e. g. λ = 0.5, 1, 1.5, 2 For b = 1,... B Simulate new error prone regressors Xib = X i + λ U i,b, U i,b N(0,σu) 2 Calculate the (naive) estimates ˆβ b,λ based on [Y i, X Calculate some average over b β(λ) 3 Extrapolation: Fit a regression ˆβ(λ) =g(λ, γ) ˆβ simex := g( 1, ˆγ) i,b ] 48

Extrapolation functions Linear : g(λ) =γ 0 + γ 1 λ Quadratic : g(λ) =γ 0 + γ 1 λ + γ 2 λ 2 Nonlinear : g(λ) =γ 1 + γ 2 γ 3 λ Nonlinear is motivated by linear regression Quadratic works fine in many examples 49

Misclassification SIMEX General Regression model with discrete variable X and classification matrix Π describing the relationship between X and observed X β (Π) := p lim ˆβ naive β (I k k ) = β Problem: β (Π) is a function of a matrix. We define: λ β (Π λ ) Π λ is given Π n+1 =Π n Π for natural numbers and Π 0 = I kxk. In general: Π λ := EΛ λ E 1 E := Matrix of eigenvectors Λ := Diagonal matrix of eigenvalues 50

Summary SIMEX Simple model Suitable for complex main model R package simex available Extrapolation problematic especially for high measurement error 51

Example : Occupational Dust and chronic bronchitis Kü/Carroll (1997) and Gössl /Kü(2001) Research question: Relationship between occupational dust and chronic bronchitis Data form N=1246 workers: X: log(1+average occupational dust exposure) Y : Chronic bronchitis (CBR) X : Measurements and expert ratings Z 1 : Smoking Z 2 : Duration of exposure with threshold: Model P(Y = 1) =G(β 0 + β 1 (X TLV ) + + β 2 Z 1 + β 3 Z 2 ) 52

Results for the TLV Method TLV-τ 0 Nom s. e. boot s.e. Naive 1.27.41.24 Pseudo-MLE 1.76.17.21 Regression Calibration 1.75.12.19 simex: linear 1.37.23.23 simex: quadratic 1.40.23.34 simex: nonlinear 1.40.23.86 53

Findings MLE can be much more efficient (see also Guolo and Brazzale(2008)) SIMEX flexible model critical 54