GB2 Regression with Insurance Claim Severities Mitchell Wills, University of New South Wales Emiliano A. Valdez, University of New South Wales Edward W. (Jed) Frees, University of Wisconsin - Madison UNSW Actuarial Research Symposium 9 November 2006 University of New South Wales Sydney, Australia Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 1 / 20
Summary of talk Summary of talk Purpose of this paper introduce the flexibility of the GB2 family to model long-tailed claims. how to inject regressor variables to the distribution. early discussion of the empirical work done. Approaches to introducing covariates to loss models Construction and properties of the GB2 family of distributions contains 4 parameters some well-known distributions within the family Empirical work preliminary some future direction Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 2 / 20
Covariates Introducing covariates in loss models Sun, J., Frees, E.W. and Rosenberg, M. (2006) discussion of: Klugman, S. and Rioux, J. (2006) NAAJ paper on Toward a Unified Approach to Fitting Loss Models Possible approaches to introducing regressor variables: Normal regression models - some transformation introduced (e.g. log of response) Generalized Linear Models - exponential dispersion distributions (e.g. Gamma, Inverse Gaussian) Parametric survival models - e.g. Cox s PH model More flexible parametric distributions - regression introduced on the parameters Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 3 / 20
GB2 distribution - construction Construction of the GB2 distribution Let X Gamma(γ 1, 1) and Y Gamma(γ 2, 1). Then ( ) X 1/α Z = β is a r.v. with a GB2 distribution. Y Four parameters: α 0, β, γ 1, γ 2 > 0. Density function: f Z (z) = α z αγ 1 1 β αγ 2 B (γ 1, γ 2 ) (β α + z α ) γ 1+γ 2, for z 0 where B (, ) is the usual Beta function. Distribution function: ( (z/β) α ) F Z (z) = B 1 + (z/β) α ; γ 1, γ 2 where B ( ;, ) is the incomplete Beta function. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 4 / 20
GB2 distribution - moments MGF s and moments Moment generating function: B ( γ 1 + k α M Z (t) =, γ 2 k ) α B (γ 1, γ 2 ) k=0 t k β k. k! Moments: E (Z n ) = β n B ( γ 1 + n α, γ 2 n ) α. B (γ 1, γ 2 ) Mean: E (Z) = β B ( γ 1 + 1 α, γ 2 1 ) α. B (γ 1, γ 2 ) Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 5 / 20
GB2 distribution - varying the parameters Figure 1: GB2 density for varying parameters GB2 density f(x) 0.0 0.5 1.0 1.5 2.0 α= 2 α= 1 α=1 α=2 GB2 density f(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 β=1 β=2 β=3 β=4 0 1 2 3 4 5 0 5 10 15 x x GB2 density f(x) 0.0 0.4 0.8 1.2 γ 1 = 0.5 γ 1 = 1 γ 1 = 5 γ 1 = 10 GB2 density f(x) 0.0 0.5 1.0 1.5 γ 2 = 2 γ 2 = 1.5 γ 2 = 1 γ 2 = 0.5 0 1 2 3 4 5 0 1 2 3 4 5 x x Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 6 / 20
GB2 distribution - some special cases Figure 2: Some special cases of GB2 GB2 γ1 =1 α = 1 γ 2 = 1 Burr XII Burr II Burr III γ 2 =1 γ 1 =1 α =1 γ =1 γ 2 = 1 1 α = 1 Pareto II (Lomax) Log-logistic Inverse Lomax Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 7 / 20
GB2 distribution some empirical work Some empirical work on GB2 Income or wealth distributions McDonald (1984) Butler and McDonald (1989) McDonald and Mantrala (1993, 1995) Bordley and McDonald (1993) McDonald and Xu (1995) Unemployment duration McDonald and Butler (1987) Insurance loss Cummins, Dionne, McDonald and Pritchett (1990) - fire losses published in Insurance: Mathematics & Economics Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 8 / 20
GB2 distribution introducing covariates Regression models for GB2 variables Assumption: x is a vector of m known covariates. Possible approaches: through the scale parameter β: β (x) = exp (θ x) through the shape parameter α: α (x) = θ x through both the scale and shape parameters simultaneously. Here, θ = (θ 1,..., θ m ) is the vector of regression coefficients. McDonald and Butler (1990) - regressors introduced in GB2 models for duration of AFDC claims Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 9 / 20
GB2 distribution regression Regression through the scale parameter We have: Z x GB2(α, β (x), γ 1, γ 2 ). Define residuals R i = Z i e θ x i so that where R i GB2(α, 1, γ 1, γ 2 ). log Z i = θ x i + log R i QQ plot diagnostics: ( ( ) ) i 0.5 Q, r n (i) for i = 1,..., n where r (i) denotes the ordered residuals with r (1) r (n). Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 10 / 20
GB2 distribution regression Regression through the shape parameter We have: Z x GB2(α (x), β, γ 1, γ 2 ). Define residuals R i = where R i GB2(1, 1, γ 1, γ 2 ). ( ) θ x i Zi so that β log R i log Z i log β = θ x i QQ plot diagnostics: ( ( ) ) i 0.5 Q, r n (i) for i = 1,..., n where r (i) denotes the ordered residuals with r (1) r (n). Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 11 / 20
Empirical analysis data characteristics Data analysis We have a portfolio of automobile insurance policies from Singapore. detailed information on policies of registered cars, claims and payments settled. period: 1 January 1993 until 31 December 2001 (nine years in total). Data contains individual records of 1,090,942 registered cars with policy and claims information over 9 years from 46 companies. Policy file has 26 variables with 5,667,777 records; claims file has 12 variables with 786,678 records; payment file has 8 variables with 4,427,605 records. In each year, about 5% are recorded as fleet policies. For our investigation, we selected fleet policies from one company. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 12 / 20
Empirical analysis possible covariates Possible covariates The calendar year - 1993-2001; treated as continuous variable. The level of gross premium for the policy in the calendar year - continuous. The type of vehicle: bus (B), car (C), or motorcycle (M) Cover type: comprehensive (C), third party fire and theft (F), and third party (T). The NCD applicable for the calendar year - rnaging from 0% to 50%, increment of 10%. No driver characteristics were included because only fleets considered in this initial investigation. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 13 / 20
Empirical analysis summary statistics Some summary statistics of the claims data Count 1,470 Mean 3,523 Standard deviation 4,765.4 Variance 22,709,497 Minimum 3 25th percentile 950 Median 1,949 75th percentile 4,226 Maximum 53,500 Skewness 4.01 Kurtosis 25.5 Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 14 / 20
Empirical analysis histogram of total claims Figure 3: claims histogram/density Histogram of Total Claims Density 0 e+00 1 e 04 2 e 04 3 e 04 4 e 04 5 e 04 0 10000 20000 30000 40000 50000 Total Claims Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 15 / 20
Empirical analysis parameter estimates Parameter estimates Parameter scale regression shape regression α 1.37066 (0.24943) - γ 1 0.75094 (0.17442) 1.82104 (0.41116) γ 2 1.80991 (0.65597) 6.19284 (2.11426) β - 12.17706 (5.35773) regression coefficients: β (x) = e θ x α (x) = θ x intercept 0.70564 (0.29956) 0.47929 (0.07573) Year 1992 (time) 0.01506 (0.01453) 0.00153 (0.00519) premium (in 000 s) 1.06505 (0.28226) 0.13931 (0.06957) Cover C 0.35377 (0.12717) 0.12748 (0.04574) VType Car 0.16869 (0.11352) 0.10131 (0.04536) premium*cover C -0.64244 (0.24697) - premium*vtype Car 0.03097 (0.12557) -0.03843 (0.04555) NCD 0 0.33157 (0.14687) 0.11076 (0.05182) NCD 10 0.37900 (0.28206) 0.18841 (0.11338) premium*ncd 0-0.41656 (0.15844) -0.11875 (0.06267) premium*ncd 10-0.66920 (0.24108) -0.18471 (0.09244) log-likelihood -3,232.752-3,228.611 # of params 14 13 AIC 6,493.5041 6,483.2217 BIC 6,493.3210 6,483.0516 Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 16 / 20
Empirical analysis QQ plots of residuals Figure 4: QQ plots of residuals Beta Regression Alpha Regression Theoretical Quantile of GB2 0 5 10 15 20 Theoretical Quantile of GB2 0 1 2 3 4 5 0 5 10 15 20 Empirical Quantile 0 1 2 3 4 5 Empirical Quantile Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 17 / 20
Empirical analysis unfinished business Unfinished business Analysis of another company s data Residual diagnostics QQ plot or PP plot Interpretation of the work - parameters Comparison with other known regression models GLM Burr XII regression Predictive power compare predictions with other models Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 18 / 20
Some references Some references Cummins, J.D., Dionne, G., McDonald, J.B., Pritchett, B.M. (1990) Applications of the GB2 family of distributions in modeling insurance loss processes, Insurance: Mathematics & Economics 9: 257-272. Kleiber, C., Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences. Wiley, New Jersey. Klugman, S. and Rioux, J. (2006) Toward a Unified Approach to Fitting Loss Models, North American Actuarial Journal 10(1): 147-153. McDonald, J.B. (1984) Some Generalized Functions for the Size Distribution of Income, Econometrica 52: 647-663. McDonald, J.B., Butler, R.J. (1990) Regression Models for Positive Random Variables, Journal of Econometrics 43: 227-251. Sun, J., Frees, E.W., Rosenberg, M. (2006), discussion of Klugman and Rioux s paper, NAAJ 10(2): 63-83. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 19 / 20
Acknowledgement Acknowledgement The author wishes to acknowledge the following for financial support: Australian Research Council through the Discovery Grant DP0345036; and the UNSW Actuarial Foundation of the Institute of Actuaries of Australia. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 20 / 20