Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs

Size: px

Start display at page:

Download "Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs"

Michael Parsons
5 years ago
Views:

1 Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series B (2008) Scott Coggeshall June 6,

2 Background and Motivation: Health Care Cost Data Key component of risk assessment models used in insurance, health care industries Requires prediction of a patient s health care cost on the original scale Let Y be a patient s health care cost and X be a vector of patient characteristics and previous health states. Goal: Given a patient s covariate vector x, can we accurately predict µ(x) = E[Y X = x]? 2

3 Health Care Cost Data What we d like to have... Distribution of Health Care Costs Frequency

4 Health Care Cost Data And what we actually have BLOUGH AND RAMSEY Figure 1. Histogram of expenditures for diabetic patients in the rst year following a stroke (nˆ774). 4

5 Health Care Cost Data Skewed Distribution Heteroscedasticity Estimates of µ(x) can vary widely depending on how estimators handle these aspects of the data 5

6 Transformation: A Common Approach Suppose we observe a patient s health care cost Y and a vector of patient characteristics X A common approach is to fit a linear model to a transformation of the data Is H(Y ) actually of interest? H(Y ) = X β + ɛ Does the model tell us anything about the data on the original scale? 6

7 Transformation Bias Suppose we fit a linear model on the transformed scale Bias is often introduced when retransforming In general, E[H 1 (H(Y )) X ] H 1 E[H(Y ) X ] How do we get unbiased estimate on original scale? 7

8 Duan s Smearing Estimator Assume H is known and data are homoscedastic Fit linear model on transformed scale to obtain parameter estimate ˆβ and residuals ˆɛ Unbiased estimate on original scale is guaranteed by taking expectation with respect to residuals: Ê[Y 0 X = x 0 ] = H 1 (x 0 ˆβ + ɛ)dˆf (ɛ) = 1 n n H 1 (x 0 ˆβ + ˆɛi ) i=1 Suppose we want Robustness to model misspecification? Ability to handle heteroscedasticity? 8

9 Extending Duan s Smearing Estimator Proposed Model H(Y ) = X β + σ(x γ)ɛ Knowns σ( ) E[ɛ] = 0, Var[ɛ] = 1 Unknowns H( ) β, γ CDF F of ɛ Approach: Estimate H via kernel estimation Estimate β and γ via estimating equations 9

10 Estimating β and γ Authors propose set of estimating equations: n i=1 (H(Y i ) X i β)x i σ 2 (X i γ) = 0 and n {(H(Y i ) X iβ) 2 σ 2 (X iγ)}x i = 0 i=1 Benefits Closed-form solution for β Drawbacks No closed-form solution for γ Newton-Rhapson implementation will vary depending on form of σ Mean-variance relationship? 10

11 Estimating H Note that Y depends on X through indices Z 1 = X β and Z 2 = X γ Under the model, we have the following relationship between conditional CDF of Y, G(y z1, z2), and unknown CDF of error term F : ( ) H(y) z1 G(y z1, z2) = F σ(z 2 ) Taking derivatives with respect to y and z 1 yields ( ) H(y) z1 H (y) p(y z 1, z 2 ) = f σ(z 2 ) σ(z 2 ) and ( ) H(y) z1 1 g 1 (y z 1, z 2 ) = f σ(z 2 ) σ(z 2 ) 11

12 Estimating H continued These derivatives give us the relationship between p(y z 1, z 2 ) and g 1 (y z 1, z 2 ) : p(y z 1, z 2 ) = g 1 (y z 1, z 2 )H (y) By replacing z 1, z 2 with Z 1i, Z 2i and summing over all observations, we obtain H p(y Z1i, Z 2i )p(z 1i, Z 2i ) (y) = g1 (y Z 1i, Z 2i )p(z 1i, Z 2i ) Integrating both sides yields an expression for H: H(y) = y p(y Z 1i, Z 2i )p(z 1i, Z 2i ) y 0 g1 (y Z 1i, Z 2i )p(z 1i, Z 2i ) Estimator for H is given by replacing unknown functions p(y z 1, z 2 ), g 1 (y z 1, z 2 ), p(z 1, z 2 ) with estimates obtained through kernel estimation 12

13 Estimating H continued p n (y z 1, z 2 ) = G n (y z 1, z 2 ) = p n (z 1, z 2 ) = 1 nh 0 h 1 h 2 n ( Yi y K 0 h0 ) i=1 K 2 ( Z2i z 2 h 2 1 n nh 1 h 2 p n (z 1, z 2 ) i=1 ( ) Z2i z 2 K 2 1 nh 1 h 2 n i=1 h 2 K 1 ( Z1i z 1 h 1 ) ( ) Z1i z 1 K 1 h1 ( ) Z1i z 1 I (Y i y) K 1 h 1 ) ( ) Z2i z 2 K 2 h 2 13

14 Kernels Kernels taken from Muller (1984) K 0 = 15 ( 1 2x 2 + x 4) 16 K 1, K 2 = 315 ( x x 4 396x x 8)

15 Kernels Kernel K0 K1 15

16 Final Algorithm Note interdependence of Ĥ and ˆβ, ˆγ Iterative algorithm combines the two estimation procedures 1. Select initial values of H and β 2. Estimate γ 3. Re-estimate H given current β and γ 4. Re-estimate β and γ given current H 5. Repeat Steps 3 and 4 until convergence 16

17 Asymptotic Behavior Authors show that ) n (Ĥ(y) H(y) asymptotically normal ( ) n ˆβ β asymptotically normal n (ˆγ γ) asymptotically normal Asymptotic covariances are complicated and depend on other unknown functions More things to estimate! 17

18 Getting Estimate on Original Scale Given final estimates Ĥ, ˆβ, and ˆγ, estimate on original scale is ( ˆµ(x) = 1 n Ĥ 1 x ˆβ + σ(x n γ)ĥ(y i) X ˆβ ) i σ(x i γ) i=1 Compare with Duan s smearing estimator: ˆµ(x) = 1 n n H 1 (x ˆβ + ˆɛ i ) i=1 18

19 Simulations: Setup Generate data according to model H(Y ) = X β + X γɛ where β = ( 1.8, 1.4, 1.4) γ = (0.4, 0.35, 0) X 1 Bernoulli(.5) X2 Unif (0, 2) ɛ N(0, 1) H is related to Y via H(y) = Φ 1 (exp(y 10)) 19

20 Simulations: Consistency of β estimates Coefficient Estimate e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Sample Size Coefficient Estimate e+00 2e+04 4e+04 6e+04 8e+04 1e+05 20

21 Simulations: Consistency of γ estimates Coefficient Estimate e+00 2e+04 4e+04 6e+04 8e+04 1e+05 21

22 Simulations: Duan s Smearing Estimator vs. Proposed Estimator x µ(x) Method Bias (0,1) Proposed Duan 0.06 (0,2) Proposed Duan 0.03 (1,1) Proposed Duan 0.03 (1,2) Proposed < Duan

23 Discussion and Critique Statistical Contribution Extends previous methods to address issues commonly encountered in these types of data Scientific Contribution Provides more accurate estimation of health care costs Implementation is slow Approach is somewhat unintuitive Explaining to non-statistical collaborators might be difficult Do we really need to transform data? 23

24 Thanks for your time! 24

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical