Bias in the estimation of exposure effects with individual- or group-based exposure assessment

Size: px
Start display at page:

Download "Bias in the estimation of exposure effects with individual- or group-based exposure assessment"

Transcription

1 Journal of Exposure Science and Environmental Epidemiology (011) 1, 1 1 r 011 Nature America, Inc. All rights reserved /11 Bias in the estimation of exposure effects with individual- or group-based exposure assessment HYANG-MI KIM a,e, DAVID RICHARDSON b,danaloomis c, MARTIE VAN TONGEREN d AND IGOR BURSTYN e a Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada b Department of Epidemiology, University of North Carolina, Chapel Hill, USA c Department of Environmental and Occupational Health, University of Nevada, Nevada, USA d Institute of Occupational Medicine, Riccarton, UK e Department of Medicine, University of Alberta, Edmonton, Alberta, Canada In this paper, we develop models of bias in estimates of exposure disease associations for epidemiological studies that use group- and individual-based exposure assessments. In a study that uses a group-based exposure assessment, individuals are grouped according to shared attributes, such as job title or work area, and assigned an exposure score, usually the mean of some concentration measurements made on samples drawn from the group. We considered bias in the estimation of exposure effects in the context of both linear and logistic regression disease models, and the classical measurement error in the exposure model. To understand group-based exposure assessment, we introduced a quasi-berkson error structure that can be justified with a moderate number of exposure measurements from each group. In the quasi-berkson error structure, the true value is equal to the observed one plus error, and the error is not independent of the observed value. The bias in estimates with individual-based assessment depends on all variance components in the exposure model and is smaller when the between-group and between-subject variances are large. In group-based exposure assessment, group means can be assumed to be either fixed or random effects. Regardless of this assumption, the behavior of estimates is similar: the estimates of regression coefficients were less attenuated with a large sample size used to estimate group means, when between-subject variability was small and the spread between group means was large. However, if groups are considered to be random effects, bias is present, even with large number of measurements from each group. This does not occur when group effects are treated as fixed. We illustrate these models in analyses of the associations between exposure to magnetic fields and cancer mortality among electric utility workers and respiratory symptoms due to carbon black. Journal of Exposure Science and Environmental Epidemiology (011) 1, 1 1; doi: /jes ; published online 4 February 010 Keywords: quasi-berkson type error structure, non-differential measurement error, bias, mixed exposure model, homogenous error. Introduction In epidemiological cohort studies of occupational and environmental exposures, individual exposure measurements are often not available for all members of the study, whereas health outcome measures are obtained for each individual. In such settings, a commonly employed approach is to derive exposure estimates through a group-based strategy (Loomis and Kromhout, 004) (also know as semi-individual or semiecological study design). Individuals are grouped according to shared attributes, such as job title or work area, and assigned an exposure score, usually the mean of some concentration measurements made on samples drawn from the group. 1. Address all correspondence to: Dr. Hyang-Mi Kim, Department of Mathematics and Statistics, The University of Calgary, 500 University Drive N.W, Calgary, AB, Canada, TN 1N4. Tel: þ Fax: þ hmkim@ucalgary.ca Received 4 August 009; accepted 30 November 009; published online 4 February 010 Interestingly, in some settings, the use of a group-based strategy for assigning exposure scores can result in a less biased estimate of an exposure disease association than would be achieved through individual exposure measurements. It is well-known that non-differential measurement errors in individual exposure estimates may lead to bias in estimates of exposure response associations; a group-based strategy can minimize this attenuation bias by creating an error structure that has some properties of a Berkson-type error. The Berkson error model was originally proposed for experimental situations, in which the investigator attempted to set the exposure at a target value, but because of imprecision of instrumentation, its true value was randomly distributed around the target (Berkson, 1950). If the experiment was replicated many times with the same target value, the true value would be randomly distributed with an estimated mean approaching the target value: the errors would be independent of the target value. Kim et al. (006) showed that the group-based strategy leads to an approximate Berkson measurement error structure when data are

2 Bias estimation with exposure assessments Kim et al. available for a large number of subjects in each group. It is approximate in the sense that assigned group means may not be independent of error. To account for this complexity, we formally introduce a novel quasi-berkson error model in this paper. When members of a cohort are grouped, mixed-effects modelsareoftenusedtofittoexposuredata,asthesemodels allow an analyst to treat the group as either a fixed or a random effect. A question arises as to whether estimation using random grouping methods (RGE) produces exposure response results that are consistent with those obtained using fixed group effect (FGE) modeling. In some settings, the rationale for treating exposure groups as fixed is clear. An occupational cohort study that makes use of a previously published job exposure matrix implies an exposure assessment in which groups are fixed. If such an assumption is made, then conclusions can be drawn only about exposure response association in the occupational groups that were investigated. This may well be desirable in narrowly targeted studies of uncommon exposures (that only occur in the studied workplaces) or in investigations undertaken by one company 0 s health and safety department (wherein the goal is tosimplyunderstandhealthrisktoemployeesofagiven enterprise). In contrast, an occupational cohort in which the investigator wishes to draw conclusions about exposure response not only among a fixed set of studied occupational groups but also in all possible occupational groups, it is more natural to assume that groups are created through a random draw of all possible groupings. This desire to generalize findings beyond, say, one occupation in a given factory to all similar jobs in a specific industry requires us to assume that the observed groups provide information about the characteristics of all possible exposure groups. This assumption enables an investigator to estimate the variation in exposure between groups (Goldstein, 003). Thefollowingassumptionsweremadeforthepurposesof our study: a normal exposure distribution (a log transformation for log-normally distributed exposures was applied in the examples), known constant error variance components, no systematic error, non-differential error and no correlation among errors. We also focused on the scenario in which the disease under study was neither common nor extremely rare. Throughout this paper, we define exposure as intensity or concentration of substance, ignoring complications that arise from time-varying exposure patterns and accumulation of dose due to long-term exposure. Our first aim is to examine, from a theoretical perspective, the use of fixed and random group-based strategies for assigning exposure scores in an epidemiological cohort. Next, we illustrate how a researcher may obtain valid estimates of exposure disease associations through linear or logistic regression methods, even when exposure measurements for all subjects are not available, as long as an adequate sample of measured values for each group is drawn and the between-group variability is large. The impact of different grouping schemes on parameter estimation is illustrated in two examples: (a) occupational exposure to magnetic fields among workers with any cancer (Kromhout et al., 1995; Saviz and Loomis, 1995) and (b) respiratory health of employees in the European carbon black manufacturing industry in relation to exposure to carbon black dust (van Tongeren et al., 1997). In section, we present the bias equations for individual- and group-based assessments for both random and FGE models. Findings derived from the simulation studies are described in section 3. In section 4, we provide two examples; and, the findings are discussed in section 5. Theoretical Study Theoretical studies were considered by assuming an additive measurement error model together with linear and logistic response models. For both individual- and group-based strategies, the conditional mean of the linear response model, given the observed exposure (Harville, 1977), was used to obtain the attenuation factor for the regression coefficient in the linear model. For the logistic model, at first, the expressions of the conditional mean and variance of the true exposure, given the observed values, were derived and subsequently, the expression of the attenuation in the response model was found. For the group-based strategy, we considered two exposure models: the RGE exposure model in which the group is regarded as a random component, and the FGE exposure model, in which the group is a fixed component. For both exposure models, the Berkson error was induced from a classical error structure under the assumptions that (1) the number of measurements in each group is sufficiently large to estimate the true group means closely and () the group means are not correlated with the measurement error in the Berkson error model. As this approximation of the Berkson error depends on the sample size and the covariance between the group mean and measurement error, we call this a quasi- Berkson error model. In the presence of fixed unknown group means, we assume that the group means are fixed with different between-group variabilities (o ) for each distinct grouping scheme. However, in deriving the attenuation equation for logistic regression models, the assumption of normality of the exposure distribution is required, and the assumption fails when the between-group exposure variability (o ) is large for the FGE model. In a such situation, we have exposures being distributed as a mixture normal distribution with the number of components equal to the number of groups. Therefore, an expression for attenuation cannot be easily derived, and in this paper, we do not explore the theoretical behavior of the regression coefficient when o is large. Journal of Exposure Science and Environmental Epidemiology (011) 1() 13

3 Kim et al. Bias estimation with exposure assessments Models We postulate a classical exposure measurement error model. For the setting of RGE, the measurement error model is W gi ¼ m þ n g þ g gi þ Z gi ¼ X gi þ Z gi ð1þ and for FGEs, the measurement error model is W gi ¼ m g þ g gi þ Z gi ¼ X gi þ Z gi ðþ where W gi represents the observed exposure on the ith subject from the gth group, X gi and represents the true exposure of the subject; m isthecommontruemean;m g is the fixed group mean, g ¼ 1, y, G; n g BN(0,s g ) is a random effect due to group g, g ¼ 1, y, G; g gi BN(0,s b ) is a random effect due to subject i in each group, i ¼ 1, y, N; Z gi BN(0,s Z ) is a random effect due to measurement error and daily fluctuations in exposure that may arise in occupational settings from variability of daily tasks (e.g., day-to-day variability for full-shift measurement, W gi ), and the errors are mutually independent. For the association between exposure and response, we consider the linear and logistic regression models given, respectively, by Y gi ¼ b 0 þ b 1 X gi þ e gi where b 0 and b 1 are the intercept and the slope parameters, respectively, and e gi BN(0,s e ), and PðZ gi ¼ 1jX gi Þ¼Lðb 0 þ b 1 X gi Þ where Z gi is a binary variable for the health outcome and L(t) ¼ 1/(1 þ exp( t)). The conditional expectation of response Y gi given the observed values, W i ¼ (W gi : individual values, W g: group mean), for the linear models is E½Y gi jw i Š¼b 0 þ b 1 E½X gi jw i Š ð3þ and for logistic regression models is E½PðZ gi ¼ 1jX gi ÞjW i ŠE½F½cðb 0 þ b 1 X gi ÞŠjW i Š ¼ Lðb 0 0 þ b0 1 EðX ð4þ gijw i ÞÞ where c ¼ 0.588, b 0 0 and b 0 1 are functions of V(X gi W i )by using the approximation to probit regression model (Reeves et al., 1998) and F(t) is the cumulative density function of the standard normal distribution. By obtaining E(X gi W i ) ¼ f(w i )andv(x gi W i ) ¼ c(w i ), the bias factor can be formulated (Burr, 1988; Wang et al., 1998; Carroll et al., 006). Bias In this section, the conditional expectation and variance are calculated for both the RGE (Eq. (1)) and FGE (Eq. ()) models and used to derive the bias factors for both linear and logistic regression models. Individual-Based Strategy With the RGE model, E(X gi W gi ) ¼ m(1 l * ) þ l * W gi,where l * ¼ (s g þ s b )/(s g þ s b þ s Z ) and the conditional variance is given by V(X gi W gi ) ¼ V(X gi )(1 l * ) ¼ s Z l *.WiththeFGE model, E(X gi W gi ) ¼ m(1 l 0 ) þ l 0 W gi and V(X gi W gi ) ¼ l 0 s Z þ (1 l 0 ) s b, where l 0 ¼ (o þ s b )/(o þ s b þ s Z ), the between-group variability is defined as o ¼ P ðm g mþ =g and m ¼ P m g =g. On the basis of Eqs (3) and (4), we obtained approximate equations that describe the relationship between the true regression coefficient, b 1, and the observed regression coefficient, b * 1, with the observed exposures, W i. In the linear regression, b 1 ¼ l b 1 and b 1 ¼ l 0b 1 for RGE and FGE models, respectively. In the logistic regression context with a RGE model, b 1 ¼ l b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 c b 1 s Z l þ 1 and with the FGE model when the between-group variability is small so that the exposures are approximately normally distributed, b 1 ¼ l 0 b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 c b 1 ½s Z l 0 þð1 l 0Þ s b Šþ1 The attenuation factors, l * ¼ (s g þ s b )/(s g þ s b þ s Z ) (l 0 ¼ (o þ s b )/(o þ s b þ s Z )) for the linear and logistic models (Eqs (5) and (6)) go to l ¼ s b /(s b þ s Z ) as the between-group variability decreases. There is attenuation as lrl * (l 0 )p1. There is less attenuation when the between-subject variability increases for a model with fixed between-group variability, ((s g (o ) þ s b,1 )/(s g (o ) þ s b,1 þ s Z )r(s g (o ) þ s b, )/(s g (o ) þ s b, þ s Z ) r1 ifs b,1 os b, ). In addition, when the measurement error variance increases, the attenuation increases. Group-Based Strategy In the group-based strategy, an average (W g) of the observed measurements for a group g is taken to apply to all subjects in the group (e.g., from the same job title); W g ¼ P Y gi =n, where n is the number of subjects from a group of the total size (N). For each subject, this group mean W g is an approximation of his/her true exposure (X gi ), if the number of measurements is reasonably large. The conditional expectation of the true exposure given the observed group mean in this case is E½X gi j W g Š¼ W g þ ðm 1Þð W g m g Þ,wherem ¼ covðx gi ; W g Þ=varð W g Þ.The derivation is made under the assumption of a classic measurement error model. If the number of subjects in each group is sufficient for the true mean and the estimated group mean to be close in value, that is, W g E½X gi Š,thenwehave E½X gi j W g Š W g. By showing an approximate property, ðe½x gi j W g Š¼ W g Þ, we postulate a quasi-berkson error model with the assigned group mean (W g) and true exposures (X gi ), that is, X gi ¼ W g þ e gi ; E½e gi j W g Š¼0 ð7þ ð5þ ð6þ 14 Journal of Exposure Science and Environmental Epidemiology (011) 1()

4 Bias estimation with exposure assessments Kim et al. This approximation may depend on the sample selected, but we need only a moderately large sample size to obtain this property, wherein the true exposure of each worker randomly varies about the group mean, W g, and this mean is approximately the true group mean. This situation is analogous to the Berkson error model, in which the true exposure, given the observed exposure, has an expected value equal to the observed exposure. However, the model is not a true Berkson error model unless the group mean (W g) is independent of the error (e gi ) (Kim et al., 006). With RGE, it is necessary to consider the possibility of correlation between W g and e gi as the group means are random components correlated with the model error (e gi ) and e gi may be correlated with the model error (e gi ). The Berkson error structure will be approximated if this covariance is small. The covariance can be either positive or negative and a function of s g, covð W g ; e gi Þ¼xðs g Þ. With FGE, one can derive that cov(w g, e gi ) ¼ 0 and cov(x gi, e gi ) ¼ V(X gi W g)a0, as W g can be considered a constant when the number of measurements for the group mean is large. The model, however, is not a truly Berkson error model as cov(w g, e gi ) ¼ 0 and it does not imply that the observed value and the error are independent, which is required for the Berkson error model. Applying the RGE model in the linear regression model (Eq. (3)) leads from non-differential error to differential error if covariance exists between W g and e gi. As X gi ¼ W g þ e gi, the linear model is expressed as Y gi ¼ b 0 þ b 1 W g þ e * gi, where e * gi ¼ b 1 e gi þ e gi. Thus, covðe gi ; W g Þ¼covðb 1 e gi þ e gi ; W g Þ b 1 covðe gi ; W g Þ, that is, the model error (e gi *) is correlated with the covariate (W g). With RGE, the quasi-berkson error structure (Eq. (7)) and non-zero covariance, b 1 ¼ b 1 þ covð W g ; e gi Þ ð8þ s g whereas with FGE, b 1 b 1 ð9þ In a logistic regression analysis (Eq. (4)), it is necessary to find a relationship among the variances in Eqs. (1) and () under the quasi-berkson error model to derive the amount of attenuation in the response models. Using the RGE model, the error variance, s e, is obtained: s e ¼ Vðe giþ ¼s b covð W g ; e gi Þ0 where covð W g ; e gi Þ0:5s b. This equation implies that the bias with the RGE model depends on the between-subject variance, as well as the covariance between the group mean and the measurement error of the quasi-berkson error structure: b 1 ¼ b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 c b 1 ½s b covðe gi; W g ÞŠ þ 1 ð10þ Using the FGE model, by using a property of the Berkson error structure when the between-group variance is small, the error variance is obtained by s e ¼ Vðe giþ ¼VðX gi W g Þ ¼ VðX gi ÞþVð W g Þ covðx gi ; W g Þ¼VðX gi Þs b This equation implies that the bias with the FGE model depends on the between-subject variance when the number of sampled subjects (n) is sufficiently large: b 1 ¼ b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 c b 1 s b þ 1 ð11þ When grouping is used to assess exposure, the measurement error variances vanish as the sample size increases for the FGE model. However, for the RGE model, the correlation between the measurement error (e gi ) and the group mean leads to bias. That is, the group-based strategy reduces the effect of measurement error in the regression coefficient estimation in FGE, but not for RGE. From Eqs (8) and (10), the attenuation in both the linear and logistic models depends on the between-group and between-subject variability with the RGE model. As the between-subject variability increases, the attenuation increases, whereas when the between-group variability increases, the bias decreases. For the FGE model (Eq. (11)), the derived expression for bias does not include the between-group variance component. One reason for this is that we consider only small group variability, so that the distribution for the exposures has an approximate normal distribution. However, the simulation study (below) shows that the bias decreases as the groups are far away from each other, just as with the RGE model. Sample Size in Group-based Exposure Assessment The extent to which this quasi-berkson error model fits the data depends on (a) the variance of the group mean and (b) the covariance between the group mean and measurement error. It can be shown that the variance of W g approaches zero as the sample size, n, increases. For each group, the variance of group means can be expressed based on the sample size using a binary variable, which indicates that the observation is in the sample with probability of n N : Vð W g Þ¼ 1 n fðs g þ s b þ s Z Þþð1 n N Þm gþð1 1 N Þs g for RGE. The variance of the group mean depends on the between-group variance, regardless of the sample size. This variability together with the covariance affects the parameter estimation in the response models (Eqs (8) and (10)). Figure 1 shows how the variance changes as the sample size increases. For the FGE model, the variance of group means, Vð W g Þ¼ 1 n fðs b þ s Z Þþð1 n N Þm gg, starts to approach zero with a relatively small number of exposure measurements drawn from each group. This condition leads to the quasi-berkson model being a good approximation of the Berkson error model, so that the bias in the slope parameter is negligible. Journal of Exposure Science and Environmental Epidemiology (011) 1() 15

5 Kim et al. Bias estimation with exposure assessments variance Comparison between Individual and Group Strategies In a linear regression analysis with the RGE model, if the measurement error gets smaller, there is negligible attenuation in using the individual-based assessment, while there is bias depending on the covariance term, cov(w g, e gi ), and the betweengroup variability (s g ) with group-based assessment. However, with the FGE model, the group-based exposure assessment is always superior if the sample size is moderately large, because it leads to a quasi-berkson error structure that gives no bias. In logistic regression analysis, for a given between-group variability, as the error variance (s Z ) gets smaller, there is negligible attenuation when using the individual-based assessment, while there is still bias depending on the error variability (s e ) when the group-based assessment is used on the same data. Therefore, when the measurement error is small and the error variability, s e, is large, the estimates with individual-based assessment are expected to be less biased than that with group-based assessment (Eq. (5) versus Eq. (10) and Eq. (6) versus Eq. (11)). Simulation study 5 FGE N = 500; μ = 1; μ g = 1; σ b = 0.5 RGE: σ g = 1 RGE: σ g = sample size Figure 1. Variance of group mean in relation to sample size. Simulations were performed to examine attenuation in the regression coefficient estimates in linear and logistic models with individual- and group-based exposure assessment and a disease with an expected risk of about 10% (PE0.1) and less than 10% (Po0.1). We considered a cohort with timeinvariant exposure that segregates into five exposure groups. We further assumed that disease risk depended only on exposure intensity and not on its duration. The measurements of exposure for a sample of n ¼ 0(100) workers were obtainable among exposure measurements of all 500 subjects (N) in each group. Each subject was measured only once, and it was assumed that all variance components were known (the between-group, between-subject and measurement error variance on each subject from the measurement error models). The mean of the 1000 sets of estimates and standard errors were calculated. In addition, the empirical 0 5 standard deviations of the estimates were calculated and the empirical mean square error (M) were obtained. The true regression coefficient was set to 0.3, and 4 (for Po0.1) or (for PE0.1) were used as the intercept parameters for both regression models. The probability of disease, p, P(Z gi ¼ 1 X gi ) ¼ L(b 0 þ b 1 (exposure: X gi )), was calculated and used to assign binary disease status from a Bernoulli distribution. The exposures were assumed to be normally distributed with the common means of 0.1 and the between-group standard deviation of 0.3, 0.5 and 1 for the RGE model, and 0.1(0.3)1.3 (o ¼ 0.3), 0.1(0.5).1(o ¼ 0.5) and 0.1(1) 4.1(o ¼ 1) for the first group to fifth group for the FGE model. To see the impact of the between-subject standard deviation, we examined values that span a plausible range (Kromhout et al., 1993), that is, a small value of 0.7 (s b ¼ 0.49) and a large value of 1.414(s b ¼ ). As the measurement error disappears with the grouping strategy (s Z /sample size in each group), we considered only the measurement error standard deviation (s Z ) values of 1 for both the RGE and FGE models. For the group-based strategy, the estimated mean exposure for each group was assigned to all workers in a given group. The regression coefficients were estimated using the generalized linear model procedures of R software, which was developed by John Chambers and Hastie (1991) at Lucent Technologies. The corresponding author will make the R code used in the simulations available on request. Individual-Based Strategy Bias depends on all variance components for the RGE and FGE models in both linear and logistic regression models. As the measurement error variance increases, the bias increases, as expected (not shown). When the measurement error variance is fixed as s Z ¼ 1 (Table 1), the bias is reduced when the between-group variability and between-subject variability are large, which shows that the bias also depends on the variability of an unknown true covariate (s g þ s b for RGE or s b for FGE). With the RGE model, for example, when the between-group variability is s g ¼ 0.3 and the between-subject variability increases from s b ¼ 0.7 to 1.414, the bias decreases as the estimate (M) increases from (0.037) to 0.0 (0.010) in a linear model and as the estimate (M) increases from (0.039) to 0.0 (0.011) in the logistic model. Group-Based Strategy Tables 1, and 3 show the results for the RGE and FGE models with the probability of disease of B10%, the condition assumed in the preceding theoretical developments. The tables also present the results for the analogous set of simulation parameters, except in the case of rare, occurring in less than 10% of the subjects. Under the grouping, the bias depends on the between-group and between-subject variances, because the measurement error variance vanishes as the number of measurements increases. As the between- 16 Journal of Exposure Science and Environmental Epidemiology (011) 1()

6 Bias estimation with exposure assessments Kim et al. Table 1. Individual-based assessment with (a) b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (linear model); (b) b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (logistic model); (c) b 0 ¼ 4, b 1 ¼ 0.3 and s Z ¼ 1 (logistic model) ^b 1 s b ¼ 0.7 s b ¼ SD M ^b1 group variability increases, the bias decreases, whereas as the between-subject variability increases, the bias increases in both response models. With the RGE model, when the sample size is 100, for example, and the between-group variability is s g ¼ 0.3 and the between-subject variability increases from s b ¼ 0.7 to 1.414, the bias increases as the estimate (M) decreases from 0.60(0.006) to 0.35(0.01) in the linear model and as the estimate (M) decreases from 0.61(0.09) to 0.4(0.091) in the logistic model (Table ). When the between-group variability is large (s g ¼ 1) and the between-subject variability increases from s b ¼ 0.7 to 1.414, there is only a negligible change in both the estimate and M in the linear model, whereas the estimate decreases SD M (a) (b) (c) s g (o) is the between-group standard deviation; s b is the between-subject standard deviation; and s Z is the measurement error standard deviation. SD and M are the empirical standard deviation and the mean squared error (1000 iterations). Table. Group-based assessment with (a) n ¼ 100, b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (linear model); (b) n ¼ 100, b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (logistic model); (c) n ¼ 100, b 0 ¼ 4, b 1 ¼ 0.3 and s Z ¼ 1(logistic model) ^b 1 s b ¼ 0.7 s b ¼ SD M ^b1 from 0.98 to 0.89 with a negligible change of M in the logistic model. With FGE, the estimates do not change much as the between-group variability (o) changes, but the M decreases as the between-group variability increases in the logistic model. If the covariance between the group means and model error exists in the measurement error model, it is suspected that the ordinary least squares estimate in linear regression models do not take the correlation into account, underestimating. With the RGE model, when s g ¼ 1 and s b ¼ 0.7, for example, the mean of standard error is and the empirical standard deviation is 0.05 in the linear model, and the mean of standard error is 0.085, but the empirical standard deviation is in the logistic model SD M (a) (b) (c) s g (o) is the between-group standard deviation; s b is the between-subject standard deviation; and s Z is the measurement error standard deviation. SD and M are the empirical standard deviation and the mean squared error (1000 iterations). Journal of Exposure Science and Environmental Epidemiology (011) 1() 17

7 Kim et al. Bias estimation with exposure assessments Table 3. Group-based assessment with (a) n ¼ 0, b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (linear model); (b) n ¼ 0, b 0 ¼, b 1 ¼ 0.3 and s Z ¼ 1 (logistic model); (c) n ¼ 0, b 0 ¼ 4, b 1 ¼ 0.3 and s Z ¼ 1 (logistic model) ^b 1 s b ¼ 0.7 s b ¼ SD M ^b1 (Table ). When the number of measurements is small, the measurement error structure may contain the classic error together with being a quasi-berkson error (Table 3). As a result, all variance and covariance components affect the bias in the estimation. In linear and logistic models for both RGE and FGE, the error depends on all variance components and the covariance. With RGE, for example, when the betweengroup variability is s g ¼ 0.3 and the between-subject variability increases from s b ¼ 0.7 to 1.414, the bias increases as the estimate (M) decreases from 0.169(0.08) to 0.13(0.041) in the linear model and decreases from 0.163(0.084) to 0.14(0.077) in the logistic model (Table 3). SD M (a) (b) (c) s g (o) is the between-group standard deviation; s b is the between-subject standard deviation; and s Z is the measurement error standard deviation. SD and M are the empirical standard deviation and the mean squared error (1000 iterations). In addition, the mean of standard error was not consistent with the empirical standard deviation in the linear model. With RGE, when s g ¼ 1ands b ¼ 0.7, for example, the mean of standard error is and the empirical standard deviation is in the linear model, and the mean of standard error is and the empirical standard deviation is in the logistic model. The overall pattern of bias in the estimates when the disease is rare is consistent with the results obtained under the condition that the prevalence of the disease in the population is between 0.1 and 0.9, while the Ms increase. Examples Association of Cancer with Electric Magnetic Field Exposure Data were obtained from a large historical cohort study of workers in five electric utility companies in the USA, which included a survey of occupational exposure to 60 Hz electric magnetic fields among randomly selected workers in 8 job categories (Loomis et al., 1994; Kromhout et al., 1995; Saviz and Loomis, 1995). The between- and within-group variance components were estimated, and the effect of different grouping strategies was assessed for subsequent estimation of exposure to be used in an exposure response analysis of mortality data. Exposure measurements were log-transformed to ensure normality. Men employed full-time at any time between 1950 and 1986 and who had accrued a total of at least 6 months of continuous employment were enumerated through personnel records and a complete history of employment at the electrical utility companies was obtained. A total of 138,906 eligible men were included. Vital status as of December 1988, was determined for this cohort. The outcome of interest F cancer mortality F was defined based on the underlying cause of death coded to the international classification of disease (ICD); there were about 3.5% of workers who died due to cancer (ICD codes 8 and 9 between 140 and 08). The exposures in each occupational category were assigned to all workers, and the regression coefficient was estimated with their cancer mortality (Saviz and Loomis, 1995). A priori grouping schemes based on exposure level and occupational categories (OC) were compared. Using experience gained from preliminary surveys, the 8 OCs were aggregated into three ordinary levels of presumed magnetic exposures for the a priori grouping. The OC grouping method gave large between-group variability with comparably small between-worker variability. The use of OC is equivalent to the FGE model, whereas aggregation of these groups into three groups is more akin to the RGE model. Kromhout et al. (1999) showed that this makes little difference compared with the use of the usual job category. 18 Journal of Exposure Science and Environmental Epidemiology (011) 1()

8 Bias estimation with exposure assessments Kim et al. Table 4. Groupings with aprioriand occupational categories (OC) Grouping g s g Table 4 shows the estimates of the regression coefficient in a logistic regression model for each grouping strategy with a large sample size (450). The estimates conformed to expectations derived from our simulation studies: more attenuation was observed with small between-group variability and large between-worker variability under an approximated hypothesized true regression coefficient of In keeping with theory and simulations, grouping with greater contrast resulted in more precise estimates. Respiratory Symptoms and Exposure to Carbon Black A number of repeated cross-sectional studies with measures of respiratory health of employees in the European carbon black manufacturing industry, in relation to exposure to carbon black dust, were conducted (Gardiner et al., 1993, 001). In the second survey, exposure to inhalable dust in 19 factories in 8 European countries was determined among 1870 workers, resulting in 390 measurements (Gardiner et al., 1996). There were 8 job categories within 19 factories, and workers from each job title in each factory were selected randomly for monitoring of exposures. In addition, repeated measurements on the same worker were collected at random intervals to allow the estimation of the between- and withinworker variances. Inhalable dust measurements were logtransformed to satisfy the assumption of normality. All participants completed self-administered questionnaires to determine the prevalence of respiratory symptoms: cough, sputum production, cough with sputum production and chronic bronchitis. The prevalence of chronic bronchitis was 4% (see Gardiner et al. (001) for details). Two exposure groupings schemes were formed for inhalable dust: factory and job category (FGE). Exposure models were fitted with both homogenous and heterogeneous between-subject variances among groups to check the assumption of the homogenous between-subject variability for all groupings. Comparison with Bayesian information criterion values in mixed-effect modeling indicated that there was no evidence for heterogeneous between-subject variability in the data (van Tongeren et al., 006). As the groupbased assessment depends on the estimated group means that are assigned to all subjects in a group, the difference in the number of subjects drawn for exposure monitoring from each group should not matter, as long as it is sufficiently large to s b s Z Crude: ^b 1 () Apriori * (0.05) OC * (0.04) g, number of groups; s g, estimate variance of the between-group distribution of log-transformed exposures; s b, estimate variance of the between-worker distribution of log-transformed exposures; ^b 1,estimateof the regression coefficient in logistic regression model; *P-valueo0.05. Table 5. Groupings with factory and job category: all individuals used in group-based exposure assessment Grouping g s g approximate the true group means. In this paper, we used a subset of the original data in which (a) each subject within a group had exposure measurements (allowing us to use the total number of observations to calculate the group means) and (b) health outcomes were determined on every subject with at least one exposure measurement. We compare an individual-based and two grouping methods in Table 5. The individual-based strategy led to a non-significant estimate, whereas grouping by job category gave significant estimates in the expected direction. With the group-based strategy, the job category grouping had the largest between-group variability (s g ¼ 0.36), and the estimate was (unadjusted). According to our theory and simulation studies, the latter estimate is least likely to be affected by measurement error. Results and Discussion In this paper, we developed models of bias due to measurement error in epidemiological studies using individual- and group-based estimates of exposure and analyses by linear or logistic regression. Exposures were assumed to be normally distributed with known variance components (the between-group and between-subject variances and the measurement error variance, s g, s b and s Z, respectively) and no differential misclassification. Both fixed- and random-effects models were considered for group-based exposure estimates. We used simulated and empirical data to explore the impact of measurement error structures and modeling approaches on bias and precision in estimated regression coefficients. Error in exposure estimates derived through an individualbased approach leads to bias in estimating exposure disease associations, usually in the form of attenuation of the association. The magnitude of the bias depends on the between-group and between-subject variability and the measurement error variance, and the attenuation becomes more severe as the between-subject variability decreases. On the basis of our simulation study, the bias decreases when the assumed true variance of exposures increases. Group-based exposure assessment is often used when measuring exposure of all subjects in a study is not feasible. s b s Z Crude: ^b 1 () Individual-based F F F F (0.08) (no grouping) Factory (0.6) Job-category * (0.17) g, number of groups; s g, estimate variance of the between-group distribution of log-transformed exposures; s b, estimate variance of the between-worker distribution of log-transformed exposures; ^b 1,estimateof the regression coefficient in logistic regression model; *P-valueo0.05. Journal of Exposure Science and Environmental Epidemiology (011) 1() 19

9 Kim et al. Bias estimation with exposure assessments With this approach to exposure assessment, bias in exposure response associations depends on between-group and between-subject variability. If the sample size is sufficiently large to accurately estimate the true group means and the measurement error is independent of true exposure, a quasi- Berkson error structure is induced. This error structure depends on the covariance between the group means and the measurement error in the exposure model, which is repostulated from a classical model. The overall behavior of estimates is similar with both RGE and FGE designs: the estimates of regression coefficients were less attenuated when the sample size was large, the between-subject variability was small and the spread between-group means was large. We showed, however, that a true Berkson model, giving minimal bias in the estimated coefficient, is approximated when the exposure groups are treated as fixed, whereas no such approximation is assured when exposure groups are treated as random factors. In a random-effects model, the group means are assigned as random components, which may be correlated with both model errors and measurement error. This implies a different feature of RGE in linear models in contrast to the theoretical work of Tielemans et al. (1998) when the number of measurements in calculating the group means is large, wherein the estimate is derived using the ordinary least square estimate under the assumption that errors in the models are mutually exclusive. Although they predicted that attenuation bias becomes negligible as the number of measured workers per group increases, we show that, regardless of sample size, the bias (either positive or negative) in estimates of slope is present when groups are formed through a random process. Bias due to measurement error is negligible in both linear and logistic models when a fixed-effects structure is employed. When a random-effects model is used, however, the degree of bias in the estimate of association increases as the between-group variance decreases in both linear and logistic regression models. With both linear and logistic regression analysis, applying RGE leads to measurement error structures with differential error that can cause either over or underestimation of the regression coefficient. Fixedeffects exposure assignments consistently performed better than random-effects schemes, which generally produced higher variance and greater bias. Thus, fixed exposure groups, which may be based on narrowly targeted inferences (e.g., only job titles present in a particular cohort or factory), seem to be less vulnerable to bias when group-based exposure assessment is employed. When the sample size is small, the error structure retains features of both classical and Berkson error structures. In such cases, the attenuation is more severe than with a quasi- Berkson error structure achieved with a larger sample size. Even when all measurements in each group are available, the group-based assessment with the total group mean may give a more valid estimate of the regression coefficient than the individual-based assessment. This also implies that identical results are obtained with equal and different sample sizes from each group, as long as group means are estimated with sufficient precision. If the measurement error is severe, then the group-based strategy would give good estimates for the regression coefficient rather than using a more computationally intensive measurement error adjustment method with an individual-based strategy (Wang et al., 000). We show that assignment of exposure based on RGE models is a mechanism by which non-differential measurement error can become differential. Another well-known situation in which a similar phenomenon arises is in dichotomization of a miss-measured continuous variable (Gustafson, 003). Thus, our proposed quasi-berkson error structure emphasizes the fact that the independence of measurement error and outcome may not be retained on transformation of a mismeasured covariate. Whether mechanisms other than grouping lead to a quasi-berkson error structure or loss of non-differential error should be further investigated. One setting in which such an investigation may be warranted may be in studies that assess exposures on the basis of contaminant quantification in pooled biological samples that are stratified on health status, as in a case-referent study (Weinberg and Umbach, 1999). Our findings are largely in line with recommendations for bias reduction in the recently published book by Rappaport and Kupper (008), but contain a more in-depth theoretical exploration of the impact of individual- and group-based strategies on bias in both linear and logistic regression in the presence of either random- or fixed-group means. In conclusion, a group-based exposure assignment can be an effective and versatile approach to estimate the relationship between exposure and disease when data on exposure are not available for all subjects in a study. However, loss of precision is expected, especially when the between-group variability is small. This general conclusion is in accordance with the principles that guide occupational exposure assessment, giving it greater theoretical credence, while emphasizing previously unanticipated complications with making inference beyond studied groups, when group-based exposure assessment is employed. In such a setting, it is natural to assume that the assigned group means are random effects. The existence of this additional uncertainty and associated bias are in agreement with the intuition that a penalty is to be incurred for drawing conclusions about unobserved situations, especially when there are gaps in the observed data (e.g., exposures were not measured for every subject). Conflict of interest The authors declare no conflict of interest. 0 Journal of Exposure Science and Environmental Epidemiology (011) 1()

10 Bias estimation with exposure assessments Kim et al. Acknowledgements Hyang-Mi Kim is thankful to David Richardson and Dana Loomis for their hospitality during her stay at the University of North Carolina, Chapel Hill, USA. Drs Richardson and Loomis were supported by grant R01-CA from the National Cancer Institute, National Institutes of Health. Igor Burstyn was supported by salary awards from the Canadian Institutes for Health Research and the Alberta Heritage Foundation for Medical Research. References Berkson J. Are there two regressions? Am Stat Assoc 1950: 45: Burr D. On errors-in-variables in binary regression -Berkson case. JAmStat Assoc Theory Methods 1988: 83(403): Carroll R.J., Ruppert D., Stefanski L.A., and Crainiceanu C.M. Measurement Error in Nonlinear Models (A Modern Perspective). CRC, Chapman & Hall, Boca Raton, 006. Chambers J., and Hastie T.J. Statistical Models in S. Wadsworth & Brook/Cole, Pacific Grove, CA, USA, Gardiner K., Calvert I.A., van Tongeren M., and Harrington J.M. Occupational exposure to carbon black in its manufacture: data from 1987 to 199. Ann Occup Hyg 1996: 40: Gardiner K., Threthowan N.W., Harrington J.M., Rossiter C.E., and Calvert I.A. Respiratory health effects of carbon black. A survey of European carbon black workers. Br J Ind Med 1993: 50: Gardiner K., van Tongeren M., and Harrington J.M. Respiratory Health Effects from Exposure to Carbon Black: Results of the Phase II and III Crosssectional Studies in the European Carbon Black Manufacturing Industry. Occup Environ Med 001: 58: Goldstein H. Multilevel Statistical Models. Kendalls Library of Statistics, New York, 003: 3. Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology. CRC, Chapman & Hall, Boca Raton, 003, pp Harville D.A. Maximum likelihood approaches to variance component estimation andtorelatedproblems. JAmStatAssoc1977: 7: Kim H.M., Yasui Y., and Burstyn I. Attenuation in risk estimates in logistic and Cox models due to group-based exposure assessment strategy. Ann Occup Hyg 006: 50(6): Kromhout H., Loomis D., and Kleckner R.C. Uncertainty in the relation between exposure to magnetic fields and brain cancer due to assessment and assignment of exposure and analytical method in dose-response modeling. Ann NY Acad Sci 1999: 895: Kromhout H., Loomis D., Mihlan G.J., Peipins L.A., Kleckner R.C., Iriye R., and Savitz D.A. Assessment and grouping of occupational magnetic field exposure in five electric utility companies. Scand J Work Environ Health 1995: 1(1): Kromhout H., Symanski E., and Rappaport S.M. A comprehensive evaluation of within and between-worker components of occupational exposure to chemical agents. Ann Occup Hyg 1993: 37: Loomis D., Kromhout H., Peipins L.A., Kleckner R.C., Iriye R., and Savitz D.A. Sampling design and methods of a large randomized, multi-stage survey of occupational magnetic filed exposure. Appl Occup Environ Hyg 1994: 9: Loomis D., and Kromhout H. Exposure Variability: concept and applications in occupational epidemiology. Am J Ind Med 004: 45: Rappaport S.M., and Kupper L.L. Quantitative Exposure Assessment. Stephen Rappaport, El Cerrito, CA, USA, 008, pp Reeves G.K., Cox D.R., Darby S.C., and Whittey E. Some aspects of measurement error in explanatory variables for continuous and binary regression models. Stat Med 1998: 17: Saviz D.A., and Loomis D. Magnetic field exposure in relation to leukemia and brain cancer mortality among electric utility workers. Am J Epidemiol 1995: 141: Tielemans E., Kupper L., Kromhout H., Heederik D., and Houba R. Individualbased and group-based occupational exposure assessment: some equations to evaluate different strategies. Ann Occup Hyg 1998: 4(): van Tongeren M., Burstyn I., Kromhout H., and Gardiner K. Are variance components of exposure heterogeneous between time periods and factories in the European carbon black industry? Ann Occup Hyg 006: 50: van Tongeren M., Gardiner K., Calvert I., and Kromhout H. Efficiency of different grouping schemes for dust exposure in the European carbon black respiratory morbidity study. Occup Environ Med 1997: 54: Wang N., Lin X., Gutierrez R.G., and Carroll R.J. Bias analysis and SIMEX approach in generalized linear mixed measurement error models. JAmStat Assoc Theory Methods 1998: 93(441): Wang N., Lin X., and Gutierrez R.G. A bias correction regression calibration approach in generalized linear mixed measurement error model. Commun Stat A Theory Method 000: 8: Weinberg C.R., and Umbach D.M. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics 1999: 55(3): Journal of Exposure Science and Environmental Epidemiology (011) 1() 1

Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit

Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement

More information

Misclassification in Logistic Regression with Discrete Covariates

Misclassification in Logistic Regression with Discrete Covariates Biometrical Journal 45 (2003) 5, 541 553 Misclassification in Logistic Regression with Discrete Covariates Ori Davidov*, David Faraggi and Benjamin Reiser Department of Statistics, University of Haifa,

More information

Effect Modification and Interaction

Effect Modification and Interaction By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions

More information

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Simple Sensitivity Analysis for Differential Measurement Error By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Abstract Simple sensitivity analysis results are given for differential

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Non-Gaussian Berkson Errors in Bioassay

Non-Gaussian Berkson Errors in Bioassay Non-Gaussian Berkson Errors in Bioassay Alaa Althubaiti & Alexander Donev First version: 1 May 011 Research Report No., 011, Probability and Statistics Group School of Mathematics, The University of Manchester

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

A note on R 2 measures for Poisson and logistic regression models when both models are applicable Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

More information

Combining Expert Ratings and Exposure Measurements: A Random Effect Paradigm

Combining Expert Ratings and Exposure Measurements: A Random Effect Paradigm Ann. occup. Hyg., Vol. 46, No. 5, pp. 479 487, 00 00 British Occupational Hygiene Society Published by Oxford University Press DOI: 10.1093/annhyg/mef05 Combining Expert Ratings and Exposure Measurements:

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Measurement Error in Covariates

Measurement Error in Covariates Measurement Error in Covariates Raymond J. Carroll Department of Statistics Faculty of Nutrition Institute for Applied Mathematics and Computational Science Texas A&M University My Goal Today Introduce

More information

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm Richard Wyss 1, Bruce Fireman 2, Jeremy A. Rassen 3, Sebastian Schneeweiss 1 Author Affiliations:

More information

Statistics Applications Epidemiology. Does adjustment for measurement error induce positive bias if there is no true association? Igor Burstyn, Ph.D.

Statistics Applications Epidemiology. Does adjustment for measurement error induce positive bias if there is no true association? Igor Burstyn, Ph.D. Statistics Applications Epidemiology Does adjustment for measurement error induce positive bias if there is no true association? Igor Burstyn, Ph.D. Community and Occupational Medicine Program, Department

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Modeling conditional dependence among multiple diagnostic tests

Modeling conditional dependence among multiple diagnostic tests Received: 11 June 2017 Revised: 1 August 2017 Accepted: 6 August 2017 DOI: 10.1002/sim.7449 RESEARCH ARTICLE Modeling conditional dependence among multiple diagnostic tests Zhuoyu Wang 1 Nandini Dendukuri

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

Measurement Error in Spatial Modeling of Environmental Exposures

Measurement Error in Spatial Modeling of Environmental Exposures Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek

More information

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population Online supplement Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in Breathlessness in the General Population Table S1. Comparison between patients who were excluded or included

More information

Robustifying Trial-Derived Treatment Rules to a Target Population

Robustifying Trial-Derived Treatment Rules to a Target Population 1/ 39 Robustifying Trial-Derived Treatment Rules to a Target Population Yingqi Zhao Public Health Sciences Division Fred Hutchinson Cancer Research Center Workshop on Perspectives and Analysis for Personalized

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

CROSS SECTIONAL STUDY & SAMPLING METHOD

CROSS SECTIONAL STUDY & SAMPLING METHOD CROSS SECTIONAL STUDY & SAMPLING METHOD Prof. Dr. Zaleha Md. Isa, BSc(Hons) Clin. Biochemistry; PhD (Public Health), Department of Community Health, Faculty of Medicine, Universiti Kebangsaan Malaysia

More information

EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS

EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS PERIODICA POLYTECHNICA SER. CHEM. ENG. VOL. 48, NO. 1, PP. 41 52 (2004) EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS Kinga KOMKA and Sándor KEMÉNY

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

SOME ASPECTS OF MEASUREMENT ERROR IN EXPLANATORY VARIABLES FOR CONTINUOUS AND BINARY REGRESSION MODELS

SOME ASPECTS OF MEASUREMENT ERROR IN EXPLANATORY VARIABLES FOR CONTINUOUS AND BINARY REGRESSION MODELS STATISTICS IN MEDICINE Statist. Med. 17, 2157 2177 (1998) SOME ASPECTS OF MEASUREMENT ERROR IN EXPLANATORY VARIABLES FOR CONTINUOUS AND BINARY REGRESSION MODELS G. K. REEVES*, D.R.COX, S. C. DARBY AND

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2015 Paper 192 Negative Outcome Control for Unobserved Confounding Under a Cox Proportional Hazards Model Eric J. Tchetgen

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Quantitative Economics for the Evaluation of the European Policy

Quantitative Economics for the Evaluation of the European Policy Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey * Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using

More information

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Using Geographic Information Systems for Exposure Assessment

Using Geographic Information Systems for Exposure Assessment Using Geographic Information Systems for Exposure Assessment Ravi K. Sharma, PhD Department of Behavioral & Community Health Sciences, Graduate School of Public Health, University of Pittsburgh, Pittsburgh,

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Simplified marginal effects in discrete choice models

Simplified marginal effects in discrete choice models Economics Letters 81 (2003) 321 326 www.elsevier.com/locate/econbase Simplified marginal effects in discrete choice models Soren Anderson a, Richard G. Newell b, * a University of Michigan, Ann Arbor,

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations

The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations Heinz Leitgöb University of Linz, Austria Problem In logistic regression, MLEs are

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Improving a safety of the Continual Reassessment Method via a modified allocation rule

Improving a safety of the Continual Reassessment Method via a modified allocation rule Improving a safety of the Continual Reassessment Method via a modified allocation rule Pavel Mozgunov, Thomas Jaki Medical and Pharmaceutical Statistics Research Unit, Department of Mathematics and Statistics,

More information

Correction for classical covariate measurement error and extensions to life-course studies

Correction for classical covariate measurement error and extensions to life-course studies Correction for classical covariate measurement error and extensions to life-course studies Jonathan William Bartlett A thesis submitted to the University of London for the degree of Doctor of Philosophy

More information

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model UW Biostatistics Working Paper Series 3-28-2005 Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model Ming-Yu Fan University of Washington, myfan@u.washington.edu

More information

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization L I Q I O N G F A N S H A R O N D. Y E A T T S W E N L E Z H A O M E D I C

More information

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Bayesian Adjustment for Exposure Misclassification in. Case-Control Studies

Bayesian Adjustment for Exposure Misclassification in. Case-Control Studies Bayesian Adjustment for Exposure Misclassification in Case-Control Studies Rong Chu Clinical Epidemiology and Biostatistics, McMaster University Hamilton, Ontario, L8N 3Z5, Canada Paul Gustafson Department

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit 2011/04 LEUKAEMIA IN WALES 1994-2008 Welsh Cancer Intelligence and Surveillance Unit Table of Contents 1 Definitions and Statistical Methods... 2 2 Results 7 2.1 Leukaemia....... 7 2.2 Acute Lymphoblastic

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

International publications on measuring nanoparticles with the portable testo DiSCmini particle counter.

International publications on measuring nanoparticles with the portable testo DiSCmini particle counter. Testo Book of Abstracts International publications on measuring nanoparticles with the portable testo DiSCmini particle counter. www.testo-particle.com testo DiSCmini Great insights into the world of the

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

GENERALIZED LINEAR MIXED MODELS: AN APPLICATION

GENERALIZED LINEAR MIXED MODELS: AN APPLICATION Libraries Conference on Applied Statistics in Agriculture 1994-6th Annual Conference Proceedings GENERALIZED LINEAR MIXED MODELS: AN APPLICATION Stephen D. Kachman Walter W. Stroup Follow this and additional

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Impact of approximating or ignoring within-study covariances in multivariate meta-analyses

Impact of approximating or ignoring within-study covariances in multivariate meta-analyses STATISTICS IN MEDICINE Statist. Med. (in press) Published online in Wiley InterScience (www.interscience.wiley.com).2913 Impact of approximating or ignoring within-study covariances in multivariate meta-analyses

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example Accepted Manuscript Comparing different ways of calculating sample size for two independent means: A worked example Lei Clifton, Jacqueline Birks, David A. Clifton PII: S2451-8654(18)30128-5 DOI: https://doi.org/10.1016/j.conctc.2018.100309

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Estimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds

Estimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds American Journal of Epidemiology ª The Author 2009. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

More information

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES Libraries Annual Conference on Applied Statistics in Agriculture 2005-17th Annual Conference Proceedings BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES William J. Price Bahman Shafii Follow this

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Known unknowns : using multiple imputation to fill in the blanks for missing data

Known unknowns : using multiple imputation to fill in the blanks for missing data Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies

Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies Clinical Chemistry 44:5 1024 1031 (1998) Laboratory Management Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies Kristian Linnet Application

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information