Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010
Overview 1 and Misclassification 2 3 4 2
and Misclassfication
References Carroll R. J., D. Ruppert, L. Stefanski and CRainiceanu, C : in Nonlinear Models. A Modern perspective. Chapman & Hall London 2006. Buonaccorsi, J.-P.: Measurement error. Models, methods and applications. Cahpman & Hall, Buca Raton 2010. Kuha, J., C. Skinner and J. Palmgren. Misclassification error. In: Encyclopedia of Biostatistics, ed. by Armitage, P.and Colton, T., 2615-2621. Wiley, Chichester. 4
Statistical Modeling Applied Statistics: Learning from data by sophisticated Several kinds of complexity Complex relationships between variables Often, the relationship between variables and data is complex, too: * Variables of interest are not ascertainable. * Only proxy variables (surrogates) are available instead. 5
General setting Outcome Variable Y i measurement model measurement Y i main model Aassociation effects Predictor X i measurement model measurement X i data inference data 6
Examples Framingham heart study, Munich blood pressure study: Blood pressure, long term average Single measurement Munich bronchitis study: Average occupational dust exposure Single measurement, expert ratings Studies on effects of air pollution Munich radon study Study on caries (misclassification) 7
Consequences of and misclassification Potential consequences of neglecting the difference between latent variables and their surrogates: severely biased estimation of all parameters Note: Measurement error and misclassification are not just a matter of sloppiness. Latent variables are eo ipso not exactly measurable. 8
Notation We have to distinguish between true (correctly measured) variable and its (possible incorrect) measurement, i.e. between the latent variable and the corresponding surrogate. * - Notation (here) X, Z : correctly (unobservable) Variable X, Z : corresponding possibly incorrect measurements analogously: Y, Y and T, T X,W, Z - Notation (Carroll et al.; 2006) ξ - X- Notation (Schneeweiß, Fuller) 9
Models for Systematic vs random Classical vs Berkson Additive vs multiplicative Homoscedastic vs heteroscedastic Differential vs non differential 10
Classical additive random X i : True value Xi : Measurement of X Xi = X i + U i (U i, X i ) indep. E(U i ) = 0 V (U i ) = σu 2 U i N(0,σU) 2 This model is suitable for Instrument m.e. One measurement is used for a mean 11
Additive Berkson-error X i = Xi + U i (U i, Xi ) indep. E(U i ) = 0 V (U i ) = σu 2 U i N(0,σU) 2 The model is suitable for Mean exposure of a region X instead of individual exposure X. Working place measurement Dose in a controlled experiment 12
Classical and Berkson Note that in the Berkson case E(X X ) = X Var(X) = Var(X )+Var(U) Var(X) > Var(X ) Note that in the Classical additive case E(X X) = X Var(X ) = Var(X)+Var(U) Var(X ) > Var(X) 13
Multiplicative Xi = X i U i (U i, X i ) indep. Classical X i = Xi U i (U i, Xi ) indep. Berkson E(U i ) = 1 U i Lognormal Additive on the logarithmic scale Used for exposure by chemicals or radiation 14
Modell für misclassification (MC) Binary case: sensitivity and specificity Y Y observed true P(Y = 1 Y = 1) = π 11 (sensitivity) P(Y = 0 Y = 0) = π 00 (specificity) P(Y = 0 Y = 1) = 1 π 11 = π 01 P(Y = 1 Y = 0) = 1 π 00 = π 10 classification matrix ( ) π00 π 01 π 10 π 11 15
Complex measurement The relationship between measurements and true value can be modeled, e.g. Scores Regression (estimation by validation data) Mixture of Berkson and classical Dependence of on further covariates 16
Additive Measurement error in response Simple linear regression Y = β 0 + β 1 X + ε Y = Y + U additive Y = β 0 + β 1 X + ε + U New equation error: ε + U Assumption : U and X independent, U and ɛ independent Higher variance of ε Inference still correct Error in equation and are not distinguishable. 17
Measurement error in covariates We focus on covariate in regression Main model: E(Y )=f(β, X, Z ) We are interested in Inference on β 1 Z is a further covariate measured without error Error model: X X Observed model: E(Y )=f (X, Z,β ) Naive estimation: Observed model = main model but in most cases : f f,β β 18
Differential and non differential Assumption of differential relates to the response: [Y X, X ]=[Y X] For Y there is no further information in U or W when X is known. Then the error and the main model can be split. [Y, X, X] =[Y X][X X][X] From the substantive point of view: Measurement process and Y are independent Blood pressure on a special day is irrelevant for CHD if long term average is known Mean exposure irrelevant if individual exposure is known But people with CHD can have a different view on their nutrition behavior 19
Simple linear regression We assume a classical non differential additive normal Y = β 0 + β 1 X + ɛ X = X + U, (U, X,ɛ) indep. U N(0,σu) 2 ɛ N(0,σɛ 2 ) 20
classical in linear regression 21
The observed model in linear regression E(Y X )=β 0 + β 1 E(X X ) Assuming X N(µ x,σ 2 x ), the observed model is: E(Y X ) = β 0 + β 1X β 1 = σ 2 x σ 2 x + σ 2 u β 0 = β 0 + β 1 ( 1 σ2 x σ 2 x + σ 2 u ( Y β0 β1x N 0,σɛ 2 + β2 1 σ2 uσx 2 σx 2 + σu 2 ) β 1 µ x ) The observed model is still a linear regression! Attenuation of β 1 by the factor Reliability ratio" σ 2 x σ 2 x +σ 2 u Loss of precision (higher error term) 22
Identification (β 0,β 1,µ x,σ 2 x,σ 2 u,σ 2 ɛ ) [Y, X ] µ y,µ x,σ 2 y,σ 2 x,σ x y (β 0,β 1,µ x,σ 2 x,σ 2 u,σ 2 ɛ ) and (β 0,β 1,µ x,σ 2 x + σ 2 u, 0,σ ɛ ) yield the identical distributions of (Y, X ). = The model parameters are not identifiable We need extra information, e.g σ u is known or can be estimated σ u /σ ɛ is known (orthogonal regression) The model with another distribution for X is identifiable by higher moments. 23
Naive LS- estimation For the slope : ˆβ 1n = S yx S 2 x plim( ˆβ 1n ) = σ yx = σx 2 σ yx σ 2 x + σ 2 u σ 2 x = β 1 σx 2 + σu 2 24
Correction for attenuation We have a first method: Solve the bias equation: ˆβ 1 = ˆβ 1n σ 2 x + σ 2 u σ 2 x ˆβ 1 = ˆβ 1n S 2 x S 2 x σ2 u Correction by reliability ratio. V ( ˆβ 1 ) > V (β 1n ) Bias Variance trade off 25
Multiple regression Formulae can be extended to the multiple case Note : also parameter estimates of error free variables may be affected. 26
Berkson-Error in simple linear regression Y = β 0 + β 1 X + ɛ X = X + U, U, (X, Y ) indep., E(U) =0 27
Observed Model E(Y X ) = β 0 + β 1 X V (Y X ) = σɛ 2 + β 1 σu 2 Regression model with identical β Measurement error ignorable Loss of precision 28
Misclassification Model for misclassification X : true binary variable, gold standard X : observed value of X, surrogate P(X = 1 X = 1) = π 11 (Sensitivity) P(X = 0 X = 0) = π 00 (Specificity) P(X = 0 X = 1) = 1 π 11 = π 01 P(X = 1 X = 0) = 1 π 00 = π 10 classification matrix ( ) π00 π Π = 01 π 10 π 11 29
misclassification Naive analysis: Simply ignore misclassification We want to estimate P(X = 1) We use 1 n n i=1 X i P(X = 1) = π 11 P(X = 1)+π 10 P(X = 0) P(X = 1) P(X = 1) = π 10 P(X = 0) π 01 P(X = 1) 30
Correction Idea: Solve the bias equation Note that X is still binomial and P(X = 1) can be consistently estimated from the observed data. P(X = 1) = π 11 P(X = 1)+π 10 (1 P(X = 1)) P(X = 1) = (P(X = 1) π 10 )/(π 11 + π 00 1) Assumptions π 11 and π 00 known π 11 + π 00 > 1 Variance factor (π 11 + π 00 1) 2 31
Multinomial case X is multinomial with categories 1,..., k. X is observed The error model is given by the misclassification Matrix Π= {π ij } with π ij = P(X = i X = j). Then we get for the probability vectors P X = (p x1,..., p xk ) P X = (px1,..., pxk) P X = Π P X 32
The matrix method The correction method is given by ˆP X := Π 1 ˆP X (1) Properties Misclassification matrix has to be known or estimated Gives sometimes probabilities > 1 or < 0 Variance calculation straight forward Use the delta method in the case of estimated Π, Greenland (1988) 33
example: Apolipoprotein E and Morbus Alzheimer case control Established realtionship between Apolipoprotein Eε4 and Morbus Alzheimer Data from metaanalysis ApoE genotype ε2ε2 ε2ε3 ε2ε4 ε3ε3 ε3ε4 ε4ε4 Clinical controls 27 425 81 2258 803 71 Clinical Alzheimer 7 74 41 593 620 207 PM controls 3 75 18 358 120 8 PM Alzheimer 1 20 17 249 373 97 PM: Post Mortem 34
MC - correction 1 Annahme: P(Y = 0 Y = 1) = 10% (literature) P(Y = 1 Y = 0) = 1% (literature) In our sample Sensitivity = P(Y = 1 Y = 1) =0.97 Specificity = P(Y = 0 Y = 0) =0.96 correction (Matrix) OR: ε3ε3 (Reference) ε2ε4 ε2ε4 ε2ε4 naiv 1.93 2.94 11.10 korr. 2.12 3.36 14.04 35
Overview Correction and method of moments Orthogonal regression Regression calibration Simulation and extrapolation (SIMEX) Likelihood Quasi likelihood Corrected score Bayes 36
Direct correction Find p lim ˆβ n = β = f (β, γ) Then ˆβ := ˆf 1 ( ˆβ n ) Correction for attenuation in linear model General case Kü (1997) 37
Method of moments Moments of observed data can be estimated. Solve moments equations Simple linear regression: µ X = µ X µ Y = β 0 + β 1 µ x σx 2 = σ2 X + σu 2 σy 2 = β1σ 2 X 2 + σɛ 2 σ YX = β 1 σ 2 X 38
Orthogonal regression, total least squares In linear Regression with classical additive : Assume σ2 ɛ = η is known, σu 2 e. g. no equation error and σ ɛ is in Y. Minimize n i=1 in (β 0,β 1, X 1, X 2,..., X n ). { (Yi β 0 β 1 X i ) 2 + η(x i X i ) 2} Total least squares, Van Huffel (1997) Technical symmetric applications In epidemiology a problem, assumption of no equation error not realistic 39
Regression calibration This simple method has been widely applied. It was suggested by different authors: Rosner et al. (1989) Carroll and Stefanski(1990) 1 Find a model for E(X X, Z ) by validation data or replication 2 Replace the unobserved X by estimate E(X X, Z ) in the main model 3 Adjust variance estimates by bootstrap or asymptotic methods Good method in many practical situations Calibration data can be incorporated Problems in highly nonlinear 40
Regression calibration Berkson case: E(X X )=W Naive estimation = Regression calibration Classical : Linear regression X on X E(X X ) = σ2 X σx 2 X + µ X (1 σ2 X σ 2 X ) Correction for attenuation in linear model Classical with the assumption of mixed normal: E(X X )=P(I = 1 X ) E(X X, I = 1) +P(I = 2 X ) E(X X, I = 2) 41
Likelihood methods Standard inference can be done with standard errors and likelihood ratio tests Efficiency Combination of different data types are possible Sometimes more accurate than approximations Difficult to calculate Software not available Parametric model for the unobserved predictor necessary Robustness to strong parametric assumptions 42
The classical error likelihood The model has three components Main model [Y X, Z,ζ] Error model [X X, η] Exposure model [X Z,λ] n [Y, X Z,θ] = [y i x, z i,ζ][w i x,η][x z i,λ]dµ(x), i=1 where θ =(ζ, η, λ) Evaluation by numerical integration Misclassification is a special case 43
Berkson likelihood We have the three components, but the third component contains no information Main model Error model Exposure model [Y, X Z,θ] = [Y, X Z,θ] = [Y X, Z,ζ] [X X,η] [X Z,λ], n [y i x, z i,ζ][x w i,η][w i z i,λ]dµ(x) i=1 n [y i x, z i,ζ][x w i,η]dµ(x) const i=1 The Berkson likelihood does not depend on the exposure model 44
Quasi likelihood If calculation of the likelihood is too complicated use E(Y X, Z ) = g(x, Z )f x w dx V (Y X, Z ) = v(x, Z )f x w dx + Var[g(X, Z ) X ] This can be done e.g. for exponential g: Poisson regression model Parametric survival 45
Bayes Richardson and Green (2002) Evaluation by MCMC techniques Conditional independence assumptions on the three parts as seen in the likelihood approach The latent variable X is treated an unknown parameter Different data types can be combined Priori distributions for the error model Flexible handling of the exposure model WinBugs Software 46
SIMEX: Basic idea Cook and Stefanski(JASA, 1994). The effect of measurement error on the naive estimation is analyzed by a simulation experiment. Linear regression with additive : Y = β 0 + β X X + ɛ X = X + U, (U, X,ɛ) indep. U N(0,σ 2 u) ɛ N(0,σ 2 ɛ ) For the naive estimator (Regression from Y on X ): σ2 X β := p lim β naive = σx 2 + β X σ2 u 47
The SIMEX algorithm Additive with known or estimated variance σ 2 u 1 Calculate the naive estimator ˆβ naive := β(0) 2 Simulation: For a fixed grid of λ,e. g. λ = 0.5, 1, 1.5, 2 For b = 1,... B Simulate new error prone regressors Xib = X i + λ U i,b, U i,b N(0,σu) 2 Calculate the (naive) estimates ˆβ b,λ based on [Y i, X Calculate some average over b β(λ) 3 Extrapolation: Fit a regression ˆβ(λ) =g(λ, γ) ˆβ simex := g( 1, ˆγ) i,b ] 48
Extrapolation functions Linear : g(λ) =γ 0 + γ 1 λ Quadratic : g(λ) =γ 0 + γ 1 λ + γ 2 λ 2 Nonlinear : g(λ) =γ 1 + γ 2 γ 3 λ Nonlinear is motivated by linear regression Quadratic works fine in many examples 49
Misclassification SIMEX General Regression model with discrete variable X and classification matrix Π describing the relationship between X and observed X β (Π) := p lim ˆβ naive β (I k k ) = β Problem: β (Π) is a function of a matrix. We define: λ β (Π λ ) Π λ is given Π n+1 =Π n Π for natural numbers and Π 0 = I kxk. In general: Π λ := EΛ λ E 1 E := Matrix of eigenvectors Λ := Diagonal matrix of eigenvalues 50
Summary SIMEX Simple model Suitable for complex main model R package simex available Extrapolation problematic especially for high measurement error 51
Example : Occupational Dust and chronic bronchitis Kü/Carroll (1997) and Gössl /Kü(2001) Research question: Relationship between occupational dust and chronic bronchitis Data form N=1246 workers: X: log(1+average occupational dust exposure) Y : Chronic bronchitis (CBR) X : Measurements and expert ratings Z 1 : Smoking Z 2 : Duration of exposure with threshold: Model P(Y = 1) =G(β 0 + β 1 (X TLV ) + + β 2 Z 1 + β 3 Z 2 ) 52
Results for the TLV Method TLV-τ 0 Nom s. e. boot s.e. Naive 1.27.41.24 Pseudo-MLE 1.76.17.21 Regression Calibration 1.75.12.19 simex: linear 1.37.23.23 simex: quadratic 1.40.23.34 simex: nonlinear 1.40.23.86 53
Findings MLE can be much more efficient (see also Guolo and Brazzale(2008)) SIMEX flexible model critical 54