Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit

Similar documents
Data Uncertainty, MCML and Sampling Density

Non-Gaussian Berkson Errors in Bioassay

Measurement Error in Covariates

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

AN ABSTRACT OF THE DISSERTATION OF

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Bias in the estimation of exposure effects with individual- or group-based exposure assessment

Estimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Statistics Applications Epidemiology. Does adjustment for measurement error induce positive bias if there is no true association? Igor Burstyn, Ph.D.

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

SOME ASPECTS OF MEASUREMENT ERROR IN EXPLANATORY VARIABLES FOR CONTINUOUS AND BINARY REGRESSION MODELS

Public Health and Statistics In India IISA-Harvard-SAMSI May Supported by NIH R01 ES Donna Spiegelman, Sc.D.

University of California, Berkeley

Specification Errors, Measurement Errors, Confounding

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology

Misclassification in Logistic Regression with Discrete Covariates

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Statistics in medicine

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error in Spatial Modeling of Environmental Exposures

Harvard University. Harvard University Biostatistics Working Paper Series

Approximate Median Regression via the Box-Cox Transformation

A Hypothesis Test for the End of a Common Source Outbreak

ASA Section on Survey Research Methods

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Flexible modelling of the cumulative effects of time-varying exposures

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

y response variable x 1, x 2,, x k -- a set of explanatory variables

ON THE USE OF HIERARCHICAL MODELS

Effect Modification and Interaction

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Obtaining Uncertainty Measures on Slope and Intercept

Unit 4. Statistics, Detection Limits and Uncertainty. Experts Teaching from Practical Experience

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

General Regression Model

Lognormal Measurement Error in Air Pollution Health Effect Studies

Important note: Transcripts are not substitutes for textbook assignments. 1

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Epidemiologists often attempt to estimate the total (ie, Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies ORIGINAL ARTICLE

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Using Geographic Information Systems for Exposure Assessment

An Introduction to Parameter Estimation

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Core Courses for Students Who Enrolled Prior to Fall 2018

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 7: Simple linear regression

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Continuous Time Survival in Latent Variable Models

2008 Winton. Statistical Testing of RNGs

A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES Questionnaire

Meta-analysis of epidemiological dose-response studies

Package Rsurrogate. October 20, 2016

Measurement error, GLMs, and notational conventions

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Tutorial 2: Power and Sample Size for the Paired Sample t-test

1 Introduction. 2 A regression model

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Chapter 2: simple regression model

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Some practical aspects of the use of lognormal models for confidence limits and block distributions in South African gold mines

"ZERO-POINT" IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY

Correction for classical covariate measurement error and extensions to life-course studies

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

AGEC 661 Note Fourteen

Logistic Regression: Regression with a Binary Dependent Variable

American Journal of EPIDEMIOLOGY

Using Estimating Equations for Spatially Correlated A

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Statistics in medicine

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Regression of Time Series

MA Advanced Econometrics: Applying Least Squares to Time Series

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi

Growth Mixture Model

Applied Quantitative Methods II

Efficient Robbins-Monro Procedure for Binary Data

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

Sigmaplot di Systat Software

Using Instrumental Variables to Find Causal Effects in Public Health

D. A Method for Estimating Occupational Radiation Dose to Individuals, Using Weekly Dosimetry Data

Bootstrapping, Randomization, 2B-PLS

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties

Chapter 22: Log-linear regression for Poisson counts

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Transcription:

American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit David B. Richardson 1 and Antonio Ciampi 1 Department of Epidemiology, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC. Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada. Received for publication March 14, 001; accepted for publication October 14, 00. Epidemiologic studies routinely suffer from bias due to exposure measurement error. In this paper, the authors examine the effect of measurement error when the exposure variable of interest is constrained by a lower limit. This is an important consideration, since often in epidemiologic studies an exposure variable is constrained by a lower limit such as zero or a nonzero detection limit. In this paper, attenuation of exposure-disease associations is defined within the framework of a classical model of uncorrelated additive error. Then, the special case of nonlinearity due to the effect of a lower threshold is examined. A general model is developed to characterize the effect of random measurement error when there is a lower threshold for recorded values. Findings are illustrated under the assumption that the true exposure follows the lognormal and gamma distributions. The authors show that the direction and magnitude of bias in estimated exposure-response associations depends on the population distribution of the exposure, the magnitude of the recording threshold, the value assigned to below-threshold measurement results, and the variance in the measured exposure due to random measurement error. bias (epidemiology); epidemiologic methods; measurement error; regression analysis Epidemiologists are routinely concerned about the consequences of exposure measurement error. Investigators may rely on inaccurate proxy measures of exposure or on information derived from imprecise exposure measurement tools. In environmental and occupational epidemiologic studies, information on exposure is often collected under poorly controlled conditions, and it may be derived from historical records that were originally compiled for purposes other than epidemiologic research. Almost any measurement process combines human error with the limitations of a measurement tool, leading to measurement error. Errors in exposure measurement may lead to biased estimates of exposure-disease associations. A number of authors have discussed the direction and magnitude of bias resulting from specified patterns of exposure measurement error and have provided models describing these associations (1 3). These models can help in assessing the bias and uncertainty that result from commonly encountered problems of exposure measurement error. Such models have been used in a range of epidemiologic investigations, including studies of nutritional factors, environmental contaminants, and occupational hazards (4 6). Taking a similar approach in this paper, we begin with a model in which exposure measurement error is assumed to be nondifferential (that is, independent of disease status) and randomly distributed. In this paper, however, we focus on how the effects of measurement error change when an exposure variable is constrained by a lower limit. It is common in epidemiologic studies for recorded exposures to be constrained by a lower limit, such as zero or a minimal detection threshold for a measurement process (7 9). For example, in studies of workers in the nuclear industry, a lower boundary for recorded radiation doses often reflects the inability of a measurement tool to accurately obtain values below a specified minimal threshold of detection (10). In this case, measurement error conforms to a nonlinear rather than an additive model, and the assumption that these errors are randomly distributed is no longer accurate. Consequently, a constraint on the minimal recorded exposure will influence the distribution of exposure measurement error and, importantly for epidemiologists, influence the effect of measurement error on estimates of exposure-disease associations. In this paper, we investigate the direction and magnitude of bias in exposure-response associations when there is random measurement error and a lower threshold for Correspondence to Professor David B. Richardson, Department of Epidemiology, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 7599-8050 (e-mail: david_richardson@unc.edu). 355 Am J Epidemiol 003;157:355 363 on 6 December 017

356 Richardson and Ciampi recorded exposures. We develop a general equation for the coefficient of bias in exposure-response associations resulting from random measurement error in the presence of a recording threshold limit, and we illustrate our findings under a range of specified model conditions. METHODS Regression models of exposure-disease association Risk estimates for epidemiologic associations are often derived by ordinary linear regression with the following model: y = α + βx + ε, (1) where y is a health response measured on a continuous scale, x is a continuous exposure (assumed to be nonnegative for our purposes), and ε is the outcome error term, which is assumed to be uncorrelated with x and to have a mean equal to zero and a constant variance. The parameters α and β denote the intercept and the average change in y with x, respectively. If the health outcome under study is described by a binary variable, a logistic regression model may be preferable. In this case, equation 1 is replaced by a model in which y is a binary random variable with parameter p = E(y x) = Pr(y = 1/ x) and logit(p) = α + βx. (1 ) In the discussion that follows, we will assume that associations conform to equation 1. However, the theory remains valid even when the correct model is the logistic one (equation 1 ), provided that the following assumptions are satisfied: 1) the outcome is rare; ) β is not too large in absolute terms; and 3) the measurement error is not too severe (11). In many cases of epidemiologic interest, the above assumptions will be satisfied. Bias due to measurement error Typically, in an epidemiologic study, the true exposure, x, is not measurable without error. Consequently, one can consider that the study uses a surrogate exposure variable, z, which provides an imperfect measure of the true exposure, x. A simple case considered in epidemiology is that of the classical error model: z = x + η, () where η is a random variable with variance σ that is uncorrelated with x. It is usually assumed that cov(ε,η) = 0. Several generalizations of this model are also often used in epidemiology (5); they usually imply a linear relation between z and x and/or a relaxation of the assumption of zero correlation between η and x, while maintaining the assumption that cov(ε,η) = 0. Suppose we fitted a linear regression model between a surrogate measure of exposure, z, and a continuous measure of disease, y: y = α + β z + ε, (3) obtaining the usual least squares estimator, where the subscript s denotes the usual sample estimates of variance and covariance: β' ˆ cov s ( YZ, ) = ----------------------. (4) var s ( Z) Let us write z = x + η as in equation, but without the assumptions that the distribution of η is independent of x and that there is no correlation between x and η. However, we will continue to assume that cov(ε,η) = 0. Then, as the sample size of the study population tends towards infinity, it can be shown simply from cov(y,z) = cov(βx + ε,z) = βcov(x,z) that the estimated association between the surrogate measure of exposure and disease, β, is equal to the estimated association between the true exposure and disease, β, multiplied by a coefficient of bias, λ, as follows: cov s ( x, z) E( βˆ ) ---------------------- β = λβ, (5) n var( z) where cov( xz, ) λ = ---------------------. (6) var( z) Therefore, provided that ε and η are uncorrelated, equation 6 is valid in general, not only in the familiar case of additive error. We will call standard the well-studied case described by equation and its linear generalizations, characterized by the assumption that the measurement error is distributed with mean equal to zero and constant variance, σ. In this standard case, measurement error always leads to attenuation of the exposure-disease association in addition to diminishing the goodness-of-fit of the regression model (5). The coefficient of bias, λ, described by equation 6 will always take values less than 1; and, for the additive error model, the coefficient, λ, is equal to the ratio of the variance of x to the sum of the variance of x and σ (1). However, we are interested in the range of possible values for the coefficient of bias, λ, in the nonstandard case in which recorded values for z are constrained by a minimal threshold limit, d, and exposures below this threshold limit are set equal to a value, a. In this nonstandard case, the model of measurement error in equation can be replaced by the following model: z = x + η for x + η > d. a for x + η d. We first examine the case, which we call the pure threshold model, in which there is no random measurement error (the variance of η, denoted by σ, is equal to zero). In this case, the only source of exposure measurement error is the inability of the measuring instrument to detect x values that are below the threshold limit, d. Then we examine the case, which we call the threshold model with error, in which there is random measurement error (a nonzero variance for η). To explore these cases, we developed a general formula for bias due to exposure (7) Am J Epidemiol 003;157:355 363 on 6 December 017

Effects of Exposure Measurement Error 357 FIGURE 1. Assumed population distribution of the true exposure, x. The solid line shows the gamma (1,1) distribution; the dashed line shows the lognormal (0,1) distribution. Values for x > 5.0 are not shown. E(x) and var(x) for the lognormal (0,1) distribution are 1.65 and 4.65, respectively. E(x) and var(x) for the gamma (1,1) distribution are 1.00 and 1.00, respectively. measurement error, as described in the Appendix. Under the threshold model with error, the relation between the coefficient of bias, λ, and the threshold limit, d, depends on the distributions of x, η, and the value assigned to belowthreshold measurements, a. Thus, rather than attempting a general analytical study of λ as a function of d, a, and the parameters of the distributions involved, we give some specific examples using distributions that mimic reasonably well what can be expected in real-life situations. To explore these examples, we generated simulation data for 1,000,000 study subjects. A true exposure value, x, was assigned to each study subject by sampling from the lognormal (0,1) or gamma (1,1) distribution (figure 1). Using equation 7, we calculated values for z for the following cases: 1) a pure threshold model (σ = 0) with x distributed according to the lognormal (0,1) distribution; ) a pure threshold model (σ = 0) with x distributed according to the gamma (1,1) distribution; 3) a threshold model with error with x distributed according to the lognormal (0,1) distribution; and 4) a threshold model with error with x distributed according to the gamma (1,1) distribution. Using SAS PROC CORR to calculate variances and covariances of x and z, we derived Monte Carlo estimates of λ via equation 6. Results were derived for specified values of a and d, in the case of the pure threshold model, and for specified values of a, d, and σ, in the case of the threshold model with error (13). Each simulation was repeated 100 times, and the resulting estimates of the coefficient of bias were averaged. RESULTS Pure threshold model We begin by considering the pure threshold model in which the only source of exposure measurement error is the inability of the measuring instrument to detect x values below the threshold limit, d. In the Appendix, we present a generalized formula for the coefficient of bias, λ, under the pure threshold model as a function of the threshold limit and the value assigned to below-threshold measurements, a. Under the pure threshold model, the coefficient of bias, λ, can be either less than 1 or greater than 1, depending on the distribution of x and on d and a. By properly choosing a, it is possible to have no bias despite the presence of a threshold limit. The coefficient, λ, is equal to 1 when a is equal to the expected value of x conditional on x being below the threshold: a = E[x x d]. If the value assigned to belowthreshold measurements is less than the expected value of x conditional on x being below the threshold, attenuation will occur in estimates of association; in contrast, if the value, a, assigned to below-threshold measurements is larger, inflation will occur in estimates of association. Let us consider some specific examples. If the value assigned to below-threshold measurements, a, is equal to 0, the coefficient of bias will always be less than or equal to 1. In contrast, if the value assigned to below-threshold measurements is equal to the threshold limit, d, the coefficient of bias will always be greater than or equal to 1. The common practice of setting a equal to d/ may result in upward or downward bias, depending on the magnitude of d/ with respect to E[x x d]. Figures and 3 illustrate the relation between the coefficient of bias and a, the value assigned to exposure measurements below a threshold limit, d = 1 or d =. Figure pertains to the case in which the population distribution of exposure is lognormal. Figure 3 pertains to the case in which the population distribution of exposure conforms to the gamma distribution. Threshold model with error Next, we examine the threshold model with error. We examine three examples of recording practices commonly encountered in epidemiologic studies. We first consider the situation in which below-threshold exposures are set equal to zero (a = 0). Next we consider the situation in which belowthreshold exposures are set equal to one half of the threshold limit (a = d/). Lastly, we consider the situation in which Am J Epidemiol 003;157:355 363 on 6 December 017

358 Richardson and Ciampi FIGURE. Estimated coefficients of bias for a linear exposure-response association under the pure threshold model, in which there is no random measurement error (σ = 0). Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to a, as in equation 7. The population distribution of exposure follows a lognormal (0,1) distribution. The two stars indicate the mean value for x d, which is 0.5d when d = 1 and 0.41d when d =. below-threshold exposures are set equal to the threshold limit (a = d). We begin with the situation in which the population distribution of exposure conforms to the lognormal distribution. In the absence of random measurement error, assigning a value of zero to exposure measurements below a threshold limit attenuates exposure-response associations (see above). Figure 4 illustrates the situation in which random measurement error occurs and a value of zero is assigned to exposure measurements below a threshold limit, d = 1 or d =. In this situation, the degree of attenuation increases as the standard deviation of the random error, σ, increases; and the degree of attenuation is larger when the threshold limit is equal to two units than when the threshold limit is equal to one unit (figure 4). Notably, one can see that in comparison with the classical model of exposure measurement error (depicted by the solid line), the effect of random measurement error differs in the case where there is a threshold limit (depicted by the dashed lines). At small values of σ, the coefficient of bias is closer to 1 under the classical model of measurement error (in which there is not a FIGURE 3. Estimated coefficients of bias for a linear exposure-response association under the pure threshold model, in which there is no random measurement error (σ = 0). Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to a, as in equation 7. The population distribution of exposure follows a gamma (1,1) distribution. The two stars indicate the mean value for x d, which is 0.4d when d = 1 and 0.34d when d =. Am J Epidemiol 003;157:355 363 on 6 December 017

Effects of Exposure Measurement Error 359 FIGURE 4. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the magnitude of the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to zero (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a lognormal (0,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. FIGURE 6. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the magnitude of the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to the threshold limit (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a lognormal (0,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. FIGURE 5. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the magnitude of the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to d/ (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a lognormal (0,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. threshold limit) than under the threshold model with error (figure 4). In contrast, at large values of σ, the coefficient of bias is closer to 1 under the threshold model with error than under the classical model of measurement error (figure 4). This is because the decline in the magnitude of the coefficient, λ, with increasing values of σ is consistently greater in the case where there is no threshold limit (the classical model of exposure measurement error) than in the nonstandard case where there is a threshold limit. Next we examine the situation in which below-threshold values are assigned a value of one half of the threshold limit. Figure 5 illustrates the coefficient of bias for exposureresponse associations in relation to the standard deviation of the random error, σ, and threshold limits, d = 1 and d = (assuming that the population distribution of exposure is lognormal). When the standard deviation of the random error, σ, is close to zero, assigning a value of d/ to belowthreshold measurement results leads to minimal bias in estimates of exposure-disease associations, while at larger values of σ, attenuation is observed (figure 5). However, one can see that in comparison with the classical model of exposure measurement error (depicted by the solid line), there tends to be less bias in estimates of association if there is a threshold limit (and below-threshold measurements are assigned a value equal to d/) than in the classical model of measurement error; this is particularly true at large values of σ (figure 5). We also examined the situation in which the threshold limit, d, is assigned to below-threshold measurements. When the standard deviation of the random error, σ, is close to zero, assigning a value of d to below-threshold measurement results leads to inflation of estimates of association. When there is greater random measurement error, the coefficient of bias declines in magnitude (figure 6). However, the decline in the magnitude of the attenuation coefficient with increasing values of σ is greater for the classical model of measurement error (depicted by the solid line) than in the case where there is a lower limit for recorded exposures. Again, the degree of attenuation that occurs because of random measurement error is conditional upon whether or not there is a recording threshold limit. Figures 7 9 examine the situation in which the population distribution of exposure conforms to the gamma distribution, as opposed to the lognormal distribution assumed in figures 4 6. When a value of zero is assigned to exposure measurements below a threshold limit, exposure-response associations are attenuated, and the degree of attenuation increases Am J Epidemiol 003;157:355 363 on 6 December 017

360 Richardson and Ciampi FIGURE 7. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to zero (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a gamma (1,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. FIGURE 9. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to the threshold limit (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a gamma (1,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. as the standard deviation of the random error, σ, increases (figure 7). Compared with the situation in which the population distribution conforms to the lognormal distribution, the degree of attenuation is greater when the population distribution conforms to the gamma distribution (figure 7). Assigning a value of one half of the threshold leads to inflation of exposure-response associations for low values of σ and to attenuation at higher values of σ (figure 8). When the threshold limit value, d, is assigned to below-threshold measurements, inflation is observed at low values of σ. FIGURE 8. Values of the coefficient of bias, λ, according to the standard deviation of the measurement error term and specified assumptions about the lower threshold limit, d. Exposures below the threshold limit, d = 1 (triangles) and d = (circles), are set equal to d/ (see equation 7). The solid line shows values for the case in which there is no lower threshold limit. The population distribution of exposure follows a gamma (1,1) distribution. Values greater than 1 indicate that the estimated association between the disease and the surrogate variable was greater than the association between the disease and true exposure. However, the decline in the coefficient of bias with increasing values of σ is relatively large; consequently, while estimates of association were inflated at low values of σ, estimates were attenuated at larger values of σ (figure 9). DISCUSSION A number of papers in the epidemiologic literature discuss the consequences of nondifferential randomly distributed measurement error (, 1, 14). These discussions are useful for understanding the potential effects of measurement error on a study s reported results. However, in many settings of concern to epidemiologists, exposure measurement data are recorded with a minimal threshold value. Under these circumstances, measurement error may lead to different patterns of bias in risk estimates than those observed in situations where exposure data are unconstrained. In this paper, we investigated the effect of measurement error when the surrogate exposure was constrained by a lower threshold value. We have shown that the direction and magnitude of bias in estimated associations may vary depending on recording practices, the variance in the surrogate exposure variable due to measurement error, and the population distribution of the exposure. While the assumptions underlying the simulation analyses in this paper were informed by empirical studies on the health effects of ionizing radiation, we have attempted to examine situations that are comparable to those commonly encountered by epidemiologists. In some of the examples, a large proportion of the study data had exposure values lower than the minimal threshold limit; this was particularly true when we assumed that the population distribution of the true exposure conformed to the gamma distribution. However, in studies of environmental and occupational exposures, highly skewed Am J Epidemiol 003;157:355 363 on 6 December 017

Effects of Exposure Measurement Error 361 exposure distributions are commonly encountered; consequently, the conclusions drawn from these examples should be useful for considerations about bias in such settings (7, 15). Measurement error was assumed to conform to a symmetrical normal distribution. In studies of ionizing radiation in which radiation doses are determined from film badge dosimetry, one source of exposure measurement error is the uncertainty that arises from laboratory processes (including film badge calibration, chemical processing of films, measurements of film optical densities, and comparison of the optical densities of badges to calibration films). In these settings, measurement error due to laboratory uncertainties may be assumed to conform to the normal distribution (16). We focused on several situations that are illustrative of recording practices for below-threshold exposure measurements: situations in which below-threshold measurements are set equal to either zero, one half of the threshold limit, or the threshold limit. Other recording practices might be considered, such as assigning a value equal to the threshold limit divided by the square root of to below-threshold measurements (17). The choice of an appropriate recording practice is often informed by knowledge or assumptions about the underlying distribution of true exposures (8). However, in regulatory settings, an upper value, such as the threshold limit, is sometimes recorded in order to ensure that exposures have not been underestimated. As figures and 3 show, in the case of the pure threshold model, the closer the assigned value is to the expected value of x in the below-threshold range, the smaller is the degree of bias in the exposure-mortality association. The special situation in which the assigned value, a, equals the expected value of x conditional on x being below the threshold limit, is an instance of Berkson error; the true exposure is distributed around the surrogate exposure with an average error equal to zero, producing no bias in estimates of exposure-response associations (14). Our primary interest in this paper, however, was in the more general case of the threshold model with error. We developed a formula for the coefficient of bias due to exposure measurement error for the case of the threshold model with error. When compared with the classical model of measurement error, the slope of the decline in the magnitude of the coefficient of bias with increasing measurement error is lower in the case where there is a threshold than in the classical case where it is assumed that there is no recording threshold. We have suggested that the threshold model with error provides a better description of the exposure measurement error encountered in many occupational and environmental studies than does the classical model of exposure measurement error. However, the threshold model with error is still a relatively simple model, and the results presented in this paper are best viewed as examples for understanding the potential effects of measurement error under simplified conditions. We emphasize that this paper explored simple patterns of measurement error, not the effects of the complex patterns of exposure measurement error that often appear in research settings. We focused on linear estimates of exposure-response associations (which are often examined in environmental and occupational epidemiology); however, patterns of measurement error that arise when there is a recording threshold may in some cases lead to departures from linearity in estimated exposure-response associations despite the presence of a true linear association. In addition, these analyses focused on the situation where there is a single exposure measurement. In settings of chronic exposure, in which a cumulative measure of exposure is derived from a series of measurements made on each individual, the problems of measurement error may be more complex. Furthermore, the assumption of nondifferential random variation in exposure estimates is often an inadequate description of the true extent of problems that measurement error entails. In epidemiologic studies, researchers may encounter complex patterns of measurement error, biased estimates of exposure, varying distributions of measurement error at different values of the true exposure, and even patterns of measurement error that are differential with regard to disease status. Patterns of measurement error that are differential with respect to disease status may occur in occupational settings, for example, if there is health-related selection of workers into jobs or areas where the exposure conditions lead to greater problems of measurement error. In addition, exposure information available for epidemiologic studies is often incomplete and may not reflect all relevant sources of exposure or periods of exposure. These are equally important considerations as sources of bias in estimates of exposure-disease associations. Investigative analyses, in which subcohorts are examined or assumptions about etiologically relevant exposures are varied, can often improve our understanding of the problems involved in measurement error. Despite these limitations, the observations in this paper contribute to a growing body of epidemiologic literature on the effects of exposure misclassification. Bias resulting from the inaccurate assignment of study subjects to exposure groups has been the subject of a great deal of discussion. Much of the early literature on this topic focused on studies in which exposure variables were categorized (18, 19). More recently, discussions have expanded to cover the effects of measurement error in continuous exposure variables. As this paper illustrates, conclusions about the effects of measurement error should be sensitive to data collection and recording processes that influence the distribution of recorded exposures. In epidemiologic studies, it is common to have exposure data that are constrained by a lower threshold limit. The direction and magnitude of bias resulting from measurement error will depend on the population distribution of the exposure, the variance due to measurement error, and the recording practices used for below-threshold measurements. REFERENCES 1. Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med 1998;55:651 6.. Thomas D, Stram D, Dwyer J. Exposure measurement error: influence on exposure-disease. Relationships and methods of correction. Annu Rev Public Health 1993;14:69 93. Am J Epidemiol 003;157:355 363 on 6 December 017

36 Richardson and Ciampi 3. Carroll RJ, Ruppert D, Stefanski LA. Measurement error in nonlinear models. London, United Kingdom: Chapman and Hall Ltd, 1995. 4. Willett W. An overview of issues related to the correction of non-differential exposure measurement error in epidemiologic studies. Stat Med 1989;8:1031 40. 5. Armstrong BG. The effects of measurement errors on relative risk regressions. Am J Epidemiol 1990;13:1176 84. 6. Zeger SL, Thomas D, Dominici F, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect 000;108:419 6. 7. Strom DJ. Estimating individual and collective doses to groups with less than detectable doses: a method for use in epidemiologic studies. Health Phys 1986;51:437 45. 8. Helsel DR. Less than obvious: statistical treatment of data below the detection limit. Environ Sci Technol 1990;4:1767 74. 9. Nielson KK, Rogers VC. Statistical estimation of analytical data distributions and censored measurements. Anal Chem 1989;61:719 4. 10. Watkins J, Cragle D, Frome E, et al. Collection, validation, and treatment of data for a mortality study of nuclear industry workers. Appl Occup Environ Hyg 1997;1:195 05. 11. Rosner B, Willet W, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 1989;8: 1051 69. 1. Armstrong BK, White E, Saracci R. Principles of exposure measurement in epidemiology. Oxford, United Kingdom: Oxford University Press, 199. 13. SAS Institute, Inc. SAS, version 8.01. Cary, NC: SAS Institute, Inc, 1999. 14. Cox DR, Darby SC, Reeves GK, et al. The effects of measurement errors with particular reference to a study of exposure to residential radon. In: Ron E, Hoffman FO, eds. Uncertainties in radiation dosimetry and their impact on dose-response analyses. Proceedings of a workshop at the National Cancer Institute, September 3 5, 1997. Bethesda, MD: National Cancer Institute, 1999:139 51. (NIH publication no. 99-4541). 15. Waters MA, Selvin S, Rappaport SM. A measure of goodnessof-fit for the lognormal model applied to occupational exposures. Am Ind Hyg Assoc J 1991;5:493 50. 16. National Research Council, Committee on Film Badge Dosimetry in Atmospheric Nuclear Tests. Film badge dosimetry in atmospheric nuclear tests. Washington, DC: National Academy Press, 1989. 17. Hornung RW, Reed LD. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg 1990;5:46 51. 18. Dosemeci M, Wacholder S, Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value? Am J Epidemiol 1990;13:746 8. 19. Weinberg CR, Umbach DM, Greenland S. When will nondifferential misclassification of an exposure preserve the direction of a trend? Am J Epidemiol 1994;140:565 71. APPENDIX The effect of random measurement error in the presence of a recording threshold can be examined directly by calculating values for the coefficient, λ, under specified model conditions. Equation 7 can be written z = (x + η)i[x + η d] + ai[x + η < d], where I[ logical expression ] equals 1 if logical expression is true and 0 if it is false. In what follows, f(x) (F(x)) and g(η) (G(η)) shall denote the densities (cumulative distribution functions) of x and η, respectively. Let us first consider the pure threshold model, var(η) = 0. Define, for k = 0, 1,, Notice in particular that M f,0 (d) = F(d). Then a direct calculation yields Therefore, λ M fk, ( d) = x k fx ( ) dx. from which it follows that λ 1 if and only if In addition, the following proposition holds. (A1) (A) Proposition For a = 0, λ < 1 and, for a = d, λ > 1. Furthermore, λ = 1 for a = E(x x d). We now give the formulas for the case, var(η) 0. Define, for k = 0, 1,, Then, proceeding as above, one obtains d 0 k (,) ad = am fk, 1 ( d) M fk, ( d). Ez ( ) = µ x + 1. Ez ( ) = µ x + σ x + a 1 +. Exz ( ) = µ x + σ x +. cov(,) xz = σ x + µ x 1. var( z) = σ x + + ( a µ x 1 ) 1. + (,) ad µ x 1 (,) ad = ------------------------------------------------------------------------------------------------------, σ x + (,) ad + ( a µ x 1 (,) ad ) 1 ( ad, ) σ x Ez ( ) = µ x + Γ 1. 1 ( ad, )( µ x + 1 ( ad, ) a) 0. H k ( d) = Ex ( k Gd ( x) ). L k ( d) = Ex ( k M gk, ( d x) ). Γ k ( ad, ) = ah k 1 ( d) H k ( d). Ez ( ) = µ x + σ x + σ η + aγ 1 + Γ L ( d) L 1 ( d). Exz ( ) = µ x + σ x L 1 ( d) + aγ 1 + Γ. cov(,) xz = σ x L 1 ( d) + Γ +( a µ x )Γ 1. var( z) = σ x + L ( d) L 1 ( d) + Γ + ( a µ x Γ 1 )Γ 1. σ η Am J Epidemiol 003;157:355 363 on 6 December 017

Effects of Exposure Measurement Error 363 We finally obtain the expression for λ, λ = L 1 ( d) + Γ (,) ad + ( a µ x )Γ 1 ( ad, ) ---------------------------------------------------------------------------------------------------------------------------------------------------------------, σ x + σ η L ( d) L 1 ( d) + Γ (,) ad + ( a µ x Γ 1 ( ad, ))Γ 1 (,) ad σ x (A3) which can be seen to be a generalization of equation A1, since, for η = 0, the L s become zero and the Γ s reduce to the s. Notice also that the threshold model with error (equation A3) reduces to the formula for the familiar case of the classical model of measurement error when d = 0, since the Γ s and L s become zero. Am J Epidemiol 003;157:355 363 on 6 December 017