POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR

Size: px
Start display at page:

Download "POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR"

Transcription

1 POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR Stephen J. Iturria and Raymond J. Carroll 1 Texas A&M University, USA David Firth University of Oxford, UK Abstract We consider the polynomial regression model in the presence of multiplicative measurement error in the predictor. Two general methods are considered, with the methods differing in their assumptions about the distributions of the predictor and the measurement errors. Consistent parameter estimates and asymptotic standard errors are derived using estimating equation theory. Diagnostics are presented for distinguishing additive and multiplicative measurement error. Data from a nutrition study are analyzed using the methods. The results from a simulation study are presented and the performances of the methods compared. Key Words and Phrases: Bootstrap; Errors-in-Variables; Estimating Equations; Measurement Error; Nonlinear Regression; Nutrition. Short title: Multiplicative Measurement Error 1 Address for correspondence: Department of Statistics, Texas A&M University, College Station, TX

2 1 INTRODUCTION Much work has been done in the estimation of regression coefficients in the presence of additive measurement error in the predictors. A detailed account of the developments for linear regression models can be found in Fuller (1987). Carroll, et al. (1995) summarize much of the recent work for nonlinear regression models. Considerably less work has been done for cases of nonadditive measurement error however. Hwang (1986) derives a consistent estimator for the coefficients of the ordinary linear model under multiplicative measurement error by modifying the usual normal equations of least squares regression. No distributional assumptions are made about the unobserved predictor other than i.i.d., but consistent estimates of the moments of the measurement errors are required. One of the general methods we propose is a special case of Hwang s estimator. Two distributional forms for the measurement errors are considered and we propose methods for estimating their moments. For the second general method we consider, we model the distribution of the unobserved predictor as well. Fitting this method will require estimating the distribution of the predictor conditional on its mismeasured version. We apply our methods to a nutrition data set taken from the Nurses Health Survey. We also present the results from a simulation study. 1.1 The Polynomial Regression Model The polynomial regression model with multiplicative measurement error is given by p Y i = β 0 + β k Xi k + βp+1z t i + ɛ i, k=1 W ij = X i U ij, i =1,...,n, j =1,...,r i, where U ij is the measurement error associated with the jth replicate of the error prone predictor of X i,namelyw ij,andz i is a vector of covariates assumed to be measured without error. Further assumptions are that all elements of (ɛ i ), (U ij ), and (X i ) are mutually independent, the (X i ) assume positive values only, the (ɛ i ) have mean zero, and the (U ij )haveeithermeanormedianone. We consider three possible models for the distribution of the (X i,u ij ). No further distributional assumptions will be made about the (Z i )and(ɛ i ). One of the contributions of the paper is to indicate conditions under which is it is possible to estimate moments of U without making distributional assumptions about the X s, using ratios of the W s. 1

3 1.2 Nurses Health Survey The Nurses Health Survey includes measurements of energy (caloric) intake and vitamin A intake for 168 individuals calculated from four 7 day food diaries. We model Y = long term energy intake (calories) as a quadratic function of X = long term vitamin A intake (without supplements). No important effects were evident among the possible covariates so we only consider the regression of Y on X. Food diaries are an imprecise method for calculating long term nutrient intakes so the reported vitamin A intakes are presumed to be measured with error. It is standard in the field to treat these data as replicates following a measurement error model Rosner, et al., (1989), rather than to analyze them in some sort of longitudinal fashion, because of the relatively short time period involved. If the data had been measured over many years, rather than over a single year, then one would expect trends and a longitudinal analysis would be appropriate. See Wang, et al. (1998) for a discussion of this issue in the context of mixed effects models. We regressed energy intake on Vitamin A intake. If the sources of Vitamin A intake are mixed homogeneously in the sample, then we would expect this relationship to be fairly linear. If on the other hand there is a distinct subgroup eating foods rich in Vitamin A, then one would expect the relationship to have a non linear component. In a quadratic regression, one would expect in this case a negative quadratic component, since the heterogeneous subgroup eating large amounts of vitamin A would not be increasing their energy intake. A scatter plot of the averages of the energy replicates against the averages of the vitamin A replicates is given in Figure 1. The p value for the quadratic term in the ordinary least squares (OLS) fit of the energy replicate averages as a quadratic function of the vitamin A replicate averages is.002. The p value changes to if the three observations with the largest covariate values are deleted. In the absence of measurement error one might expect a quadratic regression with additive errors. Long term energy intake, say y, is also estimated imprecisely when using observed food diaries Y, with error that based on our analysis is somewhat multiplicative. However, it is easily seen in most circumstances that the error in energy intake does not materially affect the regression function. If we assume for example that E(Y y) = y, this means that the regression of observed energy intake on long term vitamin A intake would follow the same form as the regression of long term energy intake on long term vitamin A intake, with model errors (ɛ i ) still having mean zero, although they possibly may be heteroscedastic. 2

4 1.3 Effects of Multiplicative Measurement Error on Curvature One question to consider is whether the curvature exhibited in the OLS fit of the Nurses data accurately reflects the curvature in the underlying relationship between Y and the unobservable X. To see the effect that measurement error can have on curvature, consider the plots given in Figure 2. The top two plots are of Y vs. X and Y vs. W for data generated from a linear regression model with right skewed, multiplicative measurement errors. Note the curvature exhibited in the plot of Y vs. W. Measurement errors of this type can also have the effect of dampening the curvature of the underlying model, as can be seen in the second pair of plots, which are for data generated from a quadratic regression model with β 2 < 0. The common feature of the two pairs of plots is that the measurement errors tend to stretch the data along the X axis, giving a distorted view of the true relationship between Y and X. 1.4 Diagnostics for Multiplicative Measurement Error Measurement error models have been most fully developed for the additive error case, W = X + U, with U being either a mean or median zero error term that is independent of X. A convenient diagnostic for assessing additivity when X is independent of the mean zero measurement error term are plots of W ij W ik against W ij + W ik for various j k, where W ij is the jth replicate for individual i. In the appendix we show that under the additive model, one would expect to see no correlation in these plots. If, however, the multiplicative model, W = XU, ismore appropriate, then an additive error model is appropriate when considering the logarithm of W. Plots of log(w ij ) log(w ik ) against log(w ij )+log(w ik ) therefore provide a ready diagnostic for multiplicative measurement error. For our analysis of the Nurses data we define Y i to be the average of the four energy replicates for individual i, W i1 to be the average of the first two vitamin A replicates for individual i, and W i2 to be the average of the third and fourth vitamin A replicates for individual i. This combining of the first/second and third/fourth values is done only for purposes of presentation: our methods apply to the more general case. The diagnostics for the Nurses data are given in Figure 3. The correlation coefficient for the plot of log(w i1 ) log(w i2 ) against log(w i1 )+log(w i2 )is.02, suggesting that the measurement errors are additive in the log scale, and hence multiplicative in the untransformed scale. To see that an additive model is not appropriate for the data in the original scale, note the strength of the correlation in the plot for the untransformed data, which corresponds to a correlation coefficient of.50. 3

5 1.5 Models for (X, U) We consider two distributional forms for the measurement error, U. The first form is where U can be expressed as exp(v ), where V is mean-zero and symmetric. The second form is a special case of the first, that U is lognormal(0,σu 2 ). Note that in both cases we have that W is median unbiased for X. The assumption of median as opposed to mean unbiasedness is not really important since there is no way to distinguish between the two cases in practice. The advantage to assuming median unbiasedness in the case of lognormal measurement error is that it simplifies the identification of parameters. When working with the first distributional form for U, we do not place any distributional assumptions on X other than X is nonnegative with finite moments. We call this the nonparametric case. For the second distributional form of U, the case of lognormal measurement error, we consider two possibilities for X. The first is where once again we assume only that X is nonnegative with finite moments, which we call the semiparametric case. The second form is that X, conditional on Z, is distributed lognormal(α 0 + α t 1 Z,σ2 x), which we call the parametric case. Note that the semiparametric model is a special case of the nonparametric model, and that the parametric model is a special case of the other two models. Also note that these names refer only to the assumptions placed on the X and U. For example, the parametric model is not fully parametric in that we do not assume anything beyond independence and a zero expectation for the (ɛ i ). We believe this is one of the attractive features of our method. The assumption that U = exp(v)with V symmetrically distributed is crucial only for the nonparametric case, because it allows estimation of appropriate moments of U. The methods for the lognormal case are easily modified to non symmetric V, e.g., that U has a gamma distribution with median Unbiased Estimating Functions for Polynomial Regression under Multiplicative Measurement Error We derive consistent estimators for the coefficients of the polynomial regression model using the theory of estimating equations. An advantage to formulating estimators in terms of estimating equations is that the theory provides a general method for computing asymptotic standard errors. A brief overview of the method is provided in the appendix. A more detailed description can be found in Carroll, et al. (1995). In practice, the estimating function, Ψ( ), is not formulated independently, but rather is a consequence of the estimation method being considered. For example, a maximum likelihood approach would imply taking Ψ( ) to be the derivative of the log likelihood. 4

6 Note that for the polynomial regression model, an unbiased estimating function for B = (β 0,β t p+1,β 1,...,β p ) t when the distribution of U is known is Ψ(Y,W,Z,B) = (Y β 0 β t p+1 Z p 1 β kw k /c k )(1,Z t ) t (Y β 0 β t p+1 Z)W/c 1 p 1 β kw k+1 /c k+1 (Y β 0 β t p+1 Z)W p /c p p 1 β kw k+p /c k+p, where W is the average of the replicates of W,andc k is the kth moment of U. The somewhat awkward looking form of this estimating function is an example of what Nakamura (1990) and Carroll, et al. (1995, section 6.5) call a corrected estimating equation, i.e., one with the property that E{Ψ(Y,W,Z,B) Y,Z,X} equals the usual least squares normal equations when X is known. In practice, the distribution of U will be unknown and the c k will have to be estimated. Unbiased estimating functions for the nonparametric and semiparametric cases can be found by modifying Ψ( ) to incorporate the estimation of the c k. We take up methods for estimating the c k in the next section. For the parametric case, we take an alternative approach that allows us to exploit our knowledge of the distributional form of X. Defining T i = r 1 r i i 1 log(w ij), i =1,...,n, and noting that E(Y T,Z) =β 0 +βp+1 t Z + p 1 β ke(x k T,Z), a method for estimating B is to regress the Y i on the Z i and on estimates of the E(X k T i,z i ). Simple calculations give us that the conditional distribution of X given (T,Z) is lognormal with parameters (σ 2 u µ x z +2σ 2 x T)/(σ2 u +2σ2 x ) and σxσ 2 u/(σ 2 u 2 +2σx), 2 where µ x z = α 0 + α t 1Z. The exact form of the unbiased estimating equation for the parametric case is given in the next section. 2 ANALYSIS OF MEASUREMENT ERROR 2.1 Error Parameter Estimation Computing estimates of the E(U k ) in the nonparametric and semiparametric cases requires that we obtain estimates for the moments of U. Let m k denote the kth moment of U. It is shown in the appendix that a consistent estimator for m k in the nonparametric case is given by m k = [ n1 ri j l {nr i(r i 1)} 1 (W ij /W il ) k] 1/2. For the semiparametric and parametric models, in which U is lognormal(0,σu 2), we can take σ2 u to be the mean-square error resulting from an ANOVA on the log(w ij ), which is unbiased for σu 2.Sincethekth moment of lognormal(0,σ2 u )isexp(k2 σu 2/2), a consistent estimator for m k in the semiparametric case is given by m k =exp(k 2 σ u 2 /2). Moments of 5

7 U for the nonparametric and semiparametric cases can be estimated by substituting the m k into the expansions of the E(U k ). For the parametric model, in addition to σ 2 u, we need estimators for α 0, α 1,andσ 2 x. Estimates for α 0 and α 1 are given by the regression of the log(w ij )onthe Z i. By the independence of X and U, an unbiased estimate for σ 2 x is given by σ 2 x = σ 2 u + n1 ri 1 (nr i) 1 {log(w ij ) α 0 α t 1 Z i} Unbiased Estimating Equations for the Case of Two Replicates An unbiased estimating function for the nonparametric estimator when r i =2,i=1,...,n,isgiven { t, by Ψ NP (Y,W,Z,B NP )= Ψ t NP,1 (Y,W,Z,B NP), Ψ t NP,2 NP)} (Y,W,Z,B where Ψ NP,1 (Y,W,Z,B NP )= (Y β 0 β t p+1 Z p 1 β kw k /c k )(1,Z t ) t (Y β 0 β t p+1 Z)W/c 1 p 1 β kw k+1 /c k+1 (Y β 0 β t p+1 Z)W p /c p p 1 β kw k+p /c k+p m {(W 1/W 2 )+(W 2 /W 1 )} Ψ NP,2 (Y,W,Z,B NP )= m 2 2p {(W 1 /W 2 ) 2p +(W 2 /W 1 ) 2p} ; ; with B NP =(β 0,β t p+1,β 1,...,β p,m 2 1,...,m2 2p )t and the c k treated as functions of the m 2 k. Letting B SP =(β 0,βp+1 t,β 1,...,β p,σu 2)t, an unbiased estimating function for the semiparametric estimator { t, is Ψ SP (Y,W,Z,B SP )= Ψ t SP,1 (Y,W,Z,B SP ), Ψ t SP,2 SP)} (Y,W,Z,B where ΨSP,1 is the same as Ψ NP,1,exceptthatthec k are treated as functions of σ 2 u, and Ψ SP,2 (Y,W,Z,B SP ) = 2σ 2 u+ {log(w 1 ) log(w 2 )} 2. Finally, an unbiased estimating function in the parametric case is given by { t,with Ψ PR (Y,W,Z,B PR )= Ψ t PR,1 (Y,W,Z,B PR), Ψ t PR,2 PR)} (Y,W,Z,B (Y β 0 βp+1 t Z p 1 β kv k )(1,Z t ) t (Y β 0 βp+1 t Ψ PR,1 (Y,W,Z,B PR )= Z)v 1 p 1 β kv k v 1 ; (Y β 0 βp+1 t Z)v p p 1 β kv k v p { log(w1 )+log(w 2 ) 2α 0 2α t 1 Z} (1,Z t ) t Ψ PR,2 (Y,W,Z,B PR )= 2σ 2 x 2σ2 u +{ log(w 1 ) α 0 α t 1 Z}2 + { log(w 2 ) α 0 α t 1 Z} 2 ; 2σu 2 + {log(w 1 ) log(w 2 )} 2 6

8 where we define v k = E(X k T,Z), and B PR =(β 0,βp+1 t,β 1,...,β p,α 0,α 1,σx,σ 2 u) 2 t. We call the solution to the estimating equation Ψ PR ( )=0thepartial regression estimator, in reference to the conditioning on T and Z and the fact that it is partially parametric. We prefer this name over parametric estimator since the latter suggests a likelihood based estimator. Note that a likelihood estimator would require assuming a distributional form for ɛ, something we wish to avoid. 2.3 Asymptotic Variance Comparisons Asymptotic variances for the estimators are found by taking one term Taylor series approximations of Ψ( ) attheestimates, B. Variances were computed for the case of quadratic regression without covariates. (The details have been omitted due to space considerations, but are available from the first author.) The variances were calculated under the assumptions of the parametric model, with the additional assumption of finite and constant variance for the (ɛ i ). These formulae were used to calculate the asymptotic relative efficiencies (AREs) of the partial regression estimator relative to both the nonparametric and semiparametric estimators for various parameter values, allowing us to assess the gain in efficiency that results from choosing to model X when the parametric model holds. Plots of the AREs for β 2 are shown in Figure 4. The AREs were computed using the parameter estimates for the Nurses data from the partial regression fits, except that σu 2 was allowed to vary, and are plotted as a function of the ratio of the coefficients of variation for U and X. This allows us to see how the efficiency of the partial regression estimator varies with changes in the relative amount of measurement error. The value of σɛ 2 was taken to be the method-of-moments estimator which we derive in the appendix. The plot is consistent with our simulation studies in that under the parametric model, the nonparametric and semiparametric methods produce virtually identical estimates for large n. More results from our simulation study are given later. 3 NUMERICAL EXAMPLE 3.1 Diagnostics for U and X for the Nurses Data In order to determine which of the three methods is the most appropriate for the Nurses data, we must characterize the distributions of U and X. We can assess the lognormality of U by constructing the Q Q plot for log(w i1 /W i2 ), i =1,...,n.IfUis lognormal, this plot should look like that for normally distributed data. If the lognormality assumption for U is valid, a diagnostic 7

9 for lognormality of X is the Q Q plot for log(w i1 )+log(w i2 ), i =1,...,n. For lognormal X, this plot should also look like a Q Q plot of normally distributed data. Examination of these plots in Figure 5 suggests that the lognormality assumption is reasonable for both X and U. Taken together, the above diagnostics suggest that the partial regression estimator is reasonable for the Nurses data. 3.2 Regression Fits for the Nurses Data Plots of the fitted regression functions are given in Figure 6, which shows that SP and NP are similar, with PR being intermediate between these two and OLS. We computed 95% confidence intervals for the NP, SP, PR, and OLS estimates of β 2 using the following bootstrap resampling procedure. One thousand with replacement samples of (Y i,w i1,w i2 ) were drawn and estimates of β 2 were computed using each of the four methods. Confidence interval endpoints were taken to be the 2.5 and 97.5 percentiles of the bootstrap samples. Simulation results demonstrated that this method provided more reliable intervals than methods using asymptotic standard errors. Confidence intervals for the estimators respectively were: (-.121,-.014), (-.165,-.015), (-.051,-.014), and (-.022,-.006). 3.3 Other Fitting Methods Our methods make no assumption about the distribution of the errors ɛ. However, it is interesting to compare them to fully likelihood based methods when one makes the additional assumption that the (ɛ i ) were taken to be normally distributed. We computed maximum likelihood estimated (MLE) and Gibbs sample based Bayesian estimates under this additional assumption. Gibbs estimates were computed using noninformative normal priors for the mean terms, and noninformative inverse gamma priors for the variance terms. The MLEs were used as starting values for the Gibbs procedure. Further details are available from the first author. The MLE and Gibbs estimates were nearly the same and were close to the partial regression estimates. Further comparisons with the MLE are taken up in the next section. 8

10 4 SIMULATION STUDY 4.1 Overview A simulation study was carried out to assess the relative performance of the three methods under the parametric model without covariates. Generating parameter values were taken from the fit of the partial regression estimator for the Nurses data. Parameter values used were B = (.464,.398,.029) t, µ x = 1.613, σx 2 =.094, and σu 2 =.076. The (ɛ i ) were taken to be i.i.d. N(0,σɛ 2 ), with σɛ 2 = Some Descriptive Statistics Given in Table 1 are the medians, MADs, and estimated root mean square errors of β 2 for 5000 simulated data sets. The sampling distributions for the nonparametric and semiparametric estimators, although asymptotically normal, were found to be highly skewed for n = 168, making necessary the use of the more robust medians and MADs to assess the bias and standard errors. As one might expect, the OLS estimates were the least variable, but were also the most biased. We see that the partial regression estimator provided the most favorable tradeoff between bias and variance reduction. It is important to note that the nonparametric and semiparametric models both contain the parametric model as a special case, and so are not incorrect models for the simulated data. What is evident, however, is that there may be considerable gains to be made if one is willing to model the distribution of the predictor, X. 4.3 Bootstrap Percentile Confidence Interval Widths and Coverages The performances of 95% bootstrap percentile confidence intervals for β 2 were examined by generating 500 data sets at the Nurses parameter estimates and computing bootstrap intervals based on 1000 with replacement samples. Empirical coverage probabilities and median confidence interval lengths for the 500 intervals are given in Table 2. We see that only the confidence intervals for the partial regression estimator provided both accurate coverage and reasonable length. Further simulations showed that as sample size increases, the performances of the nonparametric and semiparametric estimators approach that of the partial regression estimator. Much of the poor performance of the nonparametric and semiparametric methods at moderate values of n appears to be due to highly skewed sampling distributions for the estimators at those sample sizes. 9

11 4.4 Comparisons to Maximum Likelihood Estimation Simulations were carried out to compare the performance of the partial regression estimator to the MLE. For normally distributed ɛ, both methods had nearly identical means and standard deviations for all parameters except the estimates of σɛ 2, for which the MLE had less than half the variance of the method-of-moments estimator. To explore the robustness of both methods to skewed errors, the simulation was repeated with errors generated from an exponential distribution. The errors were shifted and scaled to have mean zero and variance equal to that used in the prior simulation. Ratios of the estimated mean squared errors for both simulations are given in Table 3. The estimates from the partial regression fits were virtually unchanged, but the MLE was considerably less reliable in the presence of exponential errors. 5 GENERALIZATIONS AND CONCLUDING REMARKS The methods and results of this paper are easily extended to general estimating functions. In the additive error case, a series of works by Stefanski (1989), Nakamura (1990), Carroll, et al. (1995), and Buzas & Stefanski (1996) have established the method of corrected estimating equations. Under various guises, the basic idea is that in some cases, an estimating function Ψ(Y,X,Z,B) canbe expanded as a polynomial Ψ(Y,X,Z,B) = Ψ j (Y,Z,B)X j. j=0 For the special structure of the additive model, expansions can be done either in powers of X as above, powers of exp(x), or combinations of the two. For the multiplicative model, expanding in powers of X is most convenient. Note that this is equivalent to first replacing X by its logarithm X, thus obtaining an additive model, and then expanding the estimating function in terms of powers of exponentials of X. For the multiplicative model, if the moments of U are known then under appropriate regularity conditions relating to convergence of the sum, an unbiased estimating function for B is Ψ UB (Y,W,Z,B) = Ψ j (Y,Z,B)W j /c j, j=0 where c j is the jth moment of U. For instance, it is easily seen that for the polynomial regression model, the estimating equations for the nonparametric and semiparametric estimators are of this form up to the nuisance parameters (m 1,...,m 2p ) t and σu 2 respectively, where m k = E(U k ). 10

12 The general equivalent of the parametric approach is described briefly as follows. Suppose that we can expand both the mean and variance of Y in powers of X, sothat E(Y X, Z, B) = d j (Z, B)X j ; Var(Y X, Z, B) = e j (Z, B)X j. (1) j=0 j=0 Then provided that the following sums converge, we have E(Y W, Z, B) = d j (Z, B)v j, j=0 Var(Y W, Z, B) = e j (Z, B)v j + d i (Z, B)d j (Z, B)(v i+j v i v j ), (2) j=0 i=0 j=0 where v j = E(X j W, Z). If we assume a parametric distribution for X and U, thev j are known up to parameters and we can estimate B via ordinary quasilikelihood (generalized least squares). In our formulation of the partial regression estimator for polynomial regression, we did not specify a model for var(y X, Z, B), but rather worked only with E(Y X, Z, B). Since we are not directly specifying a variance model, for the purposes of estimation we have computed the ordinary least squares estimate of B, given estimates of the v j. This is in effect a solution to a generalized estimating equation with a homoscedastic working variance function (Zeger, et al., 1988). Modeling the variance of Y given (X, Z) as in (1) and using (2) as the observed variance function may lead to a more efficient estimator, but as seen in Figure 5, our working parametric solution is already reasonably efficient relative to the nonparametric and semiparametric estimators. We do wish to reemphasize, however, that the gains in efficiency come from correctly modeling the distribution of X. In this paper we have considered two general approaches to fitting polynomial regression models in the presence of multiplicative measurement error in the predictor. The approaches differed in that for one we did not make any distributional assumptions for the predictor beyond the usual i.i.d. assumption, and for the other we assumed a distributional form. In our analysis we found that the latter approach, though less robust, can in some cases lead to a substantial increase in efficiency, particularly for small to moderate sample sizes. We also found that these gains in efficiency increase with the degree of the measurement error. Much of the gain in efficiency appears due to the slow convergence to normality of the less parametric approach. ACKNOWLEDGMENTS The authors wish to thank Suojin Wang for his generous and helpful comments during the preparation of this article. Carroll s research was supported by a grant from the National Cancer Institute 11

13 and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences. Carroll s research was partially completed while visiting the Institut für Statistik und Ökonometrie, Sonderforschungsbereich 373, Humboldt Universität zu Berlin, with partial support from a senior Alexander von Humboldt Foundation research award. APPENDIX A.1 Justification of Measurement Error Diagnostics For the additive model, Cov( W 1 W 2,W 1 +W 2 )=E{ U 1 U 2 (U 1 + U 2 )}, whichis s t (s+t)f U1 (s)f U2 (t) dsdt. By a change of variable, this is s+r (s r)f U 1 (s)f U2 (r) dsdr, which is 0. Similarly, for the multiplicative model, Cov { log(w 1 ) log(w 2 ), log(w 1 )+log(w 2 )} is 0. A.2 Estimating Functions A function Ψ(Y,X,B) is an unbiased estimating function for B if E {Ψ(Y,X,B)} = 0. Given such a function, Ψ( ), one possible estimator for B is the solution, B, ofn 1 n 1 Ψ(Y i,x i,b) = 0. Under a set of mild regularity conditions on Ψ, one can show that B is a consistent estimator of B. The limiting distribution of B can be found by taking a first-order Taylor series approximation of n 1 n 1 Ψ(Y i,x i, B)aboutB, and then applying Slutsky s Theorem and the central limit theorem. One finds that asymptotically n 1/2 ( B B) has mean 0 and covariance A 1 BA t, where A = E { ( / B t )Ψ }, B = E { Ψ(Y,X,B)Ψ t (Y,X,B) },anda t =(A 1 ) t. Further details can be found in Carroll, et al. (1995). A.3 Justification for the Nonparametric Estimator of m k { First note that E (W ij /W il ) k} =E [exp {k(v ij V il )}]=E [exp {k(v ij + V il )}] because V 1,V 2 independent, mean zero and symmetric gives us that V 1 V 2 and V 1 + V 2 are equal in distribution. The result then follows from noting that E [exp {k(v ij + V il )}] is the square of E(U k ), and there are r i (r i 1) such terms in the sum for each i. A.4 Method-of-moments Estimator of σ 2 ɛ Taking R to be Y β 0 β 1 W/c 1 β 2 W 2 /c 2 = ɛ + β 1 X(1 U/c 1 )+β 2 X 2 (1 U 2 /c 2 ), we have that E(R 2 )=Var(R)isgivenby { σɛ 2 +β1e(x 2 2 )E (1 U/c 1 ) 2} +β2e(x 2 4 )E {(1 U 2 /c 2 ) 2} { +2β 1 β 2 E(X 3 )E (1 U/c 1 )(1 U 2 } /c 2 ) = σɛ 2 + β1µ 2 2 (c 2 /c 2 1 1) + β2µ 2 4 (c 4 /c 2 2 1) +2β 1 β 2 µ 3 {c 3 /(c 1 c 2 ) 1}. 12

14 Replacing the β k, µ k, c k with estimated values and solving for σɛ 2 gives us the MOM estimator. REFERENCES Buzas, J. S. and Stefanski, L. A. (1996). A note on corrected score estimation. Statistics & Probability Letters, 28, 1 8. Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995). Measurement Error in Nonlinear Models. Chapman and Hall, London. Hwang, J. T. (1986). Multiplicative errors in variables models with applications to the recent data released by the U.S. Department of Energy. Journal of the American Statistical Association, 81, Fuller, W. A. (1987). Measurement Error Models. John Wiley and Sons, New York. Nakamura, T. (1990). Corrected score function for errors-in-variables models: Methodology and application to generalized linear models. Biometrika, 77, 1, Rosner, B., Willett, W. C. & Spiegelman, D. (1989). Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine, 8, Stefanski, L. A. (1989). Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models. Communications in Statistics, Series A, 18, Wang, N., Lin, X., Gutierrez, R. G. & Carroll, R. J. (1998). Generalized linear mixed measurement error models. Journal of the American Statistical Association, 93, Zeger, S. L., Liang, K. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics, 44,

15 Table 1: Summary statistics for β 2, β 2 =.029. median MAD sqrt(mse) NP SP PR OLS Table 2: Simulated bootstrap confidence interval coverages and median lengths, n = 168. NP SP PR OLS Coverage Median length Table 3: Ratios of the PR to ML estimated MSEs for all parameters. β 0 β 1 β 2 µ x σ x σ u σ ɛ Normal ɛ Exponential ɛ

16 OLS fit for Energy Vs. Vitamin A Average Vitamin A Average Energy Figure 1: Least squares quadratic fit for the Nurses data.

17 (a) X Y True (b) (W1+W2)/2 Y OLS True (c) X Y True (d) (W1+W2)/2 Y OLS True Figure 2: Plots for two simulated data sets: (a) Y vs X for linear model, (b) Y vs W for linear model, (c) Y vs X for quadratic model, (d) Y vs W for quadratic model.

18 Untransformed W1 + W2 W1 - W Transformed log(w1) + log(w2) log(w1) - log(w2) Figure 3: Measurement error diagnostics for the Nurses data.

19 ARE of the PR estimator vs. C.V.(U)/C.V.(X) for Nurses ARE 4 3 PR vs. Semipar. PR vs. Nonpar C.V.(U)/C.V.(X) Figure 4: ARE of PR estimator vs. C.V.(U)/C.V.(X) for the Nurses data. 18

20 QQ-plot: log(w1/w2) QQ-plot: log(w1)+log(w2) log(w1/w2) quantiles log(w1)+log(w2) quantiles z quantiles z quantiles Figure 5: Q Q plots for log(w1/w2) and log(w1)+log(w2) for the Nurses data.

21 Energy vs. Vitamin A Vitamin A Energy NP SP PR OLS Figure 6: Nonparametric, semiparametric, partial regression, and OLS fits for the Nurses data.

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari Orthogonal Locally Ancillary Estimating Functions for Matched-Pair Studies and Errors-in-Covariates Molin Wang Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and John J.

More information

Non-Gaussian Berkson Errors in Bioassay

Non-Gaussian Berkson Errors in Bioassay Non-Gaussian Berkson Errors in Bioassay Alaa Althubaiti & Alexander Donev First version: 1 May 011 Research Report No., 011, Probability and Statistics Group School of Mathematics, The University of Manchester

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models Sonderforschungsbereich 386, Paper 196 (2000) Online

More information

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and

More information

Measurement Error in Covariates

Measurement Error in Covariates Measurement Error in Covariates Raymond J. Carroll Department of Statistics Faculty of Nutrition Institute for Applied Mathematics and Computational Science Texas A&M University My Goal Today Introduce

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

SIMEX and TLS: An equivalence result

SIMEX and TLS: An equivalence result SIMEX and TLS: An equivalence result Polzehl, Jörg Weierstrass Institute for Applied Analysis and Stochastics, Mohrenstr. 39, 10117 Berlin, Germany polzehl@wias-berlin.de Zwanzig, Silvelyn Uppsala University,

More information

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,

More information

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

THE USE AND MISUSE OF. November 7, R. J. Carroll David Ruppert. Abstract

THE USE AND MISUSE OF. November 7, R. J. Carroll David Ruppert. Abstract THE USE AND MISUSE OF ORTHOGONAL REGRESSION ESTIMATION IN LINEAR ERRORS-IN-VARIABLES MODELS November 7, 1994 R. J. Carroll David Ruppert Department of Statistics School of Operations Research Texas A&M

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

STAT 6350 Analysis of Lifetime Data. Probability Plotting

STAT 6350 Analysis of Lifetime Data. Probability Plotting STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Measurement error, GLMs, and notational conventions

Measurement error, GLMs, and notational conventions The Stata Journal (2003) 3, Number 4, pp. 329 341 Measurement error, GLMs, and notational conventions James W. Hardin Arnold School of Public Health University of South Carolina Columbia, SC 29208 Raymond

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College December 2016 Abstract Lewbel (2012) provides an estimator

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker

More information

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements

A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements A Tolerance Interval Approach for Assessment of Agreement in Method Comparison Studies with Repeated Measurements Pankaj K. Choudhary 1 Department of Mathematical Sciences, University of Texas at Dallas

More information

Local Polynomial Regression and SIMEX

Local Polynomial Regression and SIMEX Local Polynomial Regression and SIMEX John Staudenmayer and David Ruppert June 6, 2003 Abstract: This paper introduces a new local polynomial estimator and develops supporting asymptotic theory for non-parametric

More information

Better Bootstrap Confidence Intervals

Better Bootstrap Confidence Intervals by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Vicente J. Monleon for the degree of Doctor of Philosophy in Statistics presented on November, 005. Title: Regression Calibration and Maximum Likelihood Inference for

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models G. R. Pasha Department of Statistics, Bahauddin Zakariya University Multan, Pakistan E-mail: drpasha@bzu.edu.pk

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Chapter 17 Failure-Time Regression Analysis William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors

More information

The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations

The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations Heinz Leitgöb University of Linz, Austria Problem In logistic regression, MLEs are

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1. Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Misclassification in Logistic Regression with Discrete Covariates

Misclassification in Logistic Regression with Discrete Covariates Biometrical Journal 45 (2003) 5, 541 553 Misclassification in Logistic Regression with Discrete Covariates Ori Davidov*, David Faraggi and Benjamin Reiser Department of Statistics, University of Haifa,

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Efficient Robbins-Monro Procedure for Binary Data

Efficient Robbins-Monro Procedure for Binary Data Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information