Multivariate Longitudinal of Using Copulas joint work with Xiaoping Feng and Jean-Philippe Boucher University of Wisconsin - Madison Université du Québec à Montréal Actuarial Research Conference August 5-8, 2015 1 / 21
Outline 1 2 3 4 5 6 2 / 21
modeling depends on the purposes of the models Model building often relies on the features of the data Some unique features in insurance claims data 1) Semi-continuous 2) Multivariate 3) Multilevel Frequency-severity (two-part) v.s. pure premium models More details in Predictive s in Actuarial Science, edited by Frees, Derrig, and Meyers 3 / 21
Some unique features in insurance claims data 1) Semi-continuous 2) Multivariate Automobile insurance: third party liability, collision, comprehensive... Homeowner multiperil: fire, lighting, hail, wind... Worker s compensation: medical care, income replacement... Health insurance: physician v.s. non-physician, outpatient v.s. hospital inpatient 3) Multilevel Automobile insurance: fleet/household, vehicle, multi-period Homeowner multiperil: neighborhood, houses, periods Worker s compensation: employer, occupation, individual Health insurance: employer/household, individual, periods 4 / 21
Automobile Personal automobile insurance in Canada Four types of coverage Accident benefit - no-fault insurance benefit to the injured insured Collision - damage to the policyholder s vehicle due to collision All risk - damage to the policyholder s vehicle due to risks other than collision Civil liability - bodily injury and property damage of third party We examine a longitudinal dataset for years 2003-2006 Each policy could cover multiple vehicles Table : Distribution of number of vehicles per policy 1 2 3 4 Total Number 77,352 10,058 253 7 87,670 Percentage 88.23 11.47 0.289 0.01 100 5 / 21
Claim Costs 6 / 21
Predictors The data set contains basic rating variables policyholder characteristics, driving history, vehicle characteristics use binary predictors in the analysis Table : Summary of rating variables Variable Description Mean young =1 if age between 16 and 25 0.021 seinor =1 if age more than 60 0.175 marital =1 if married 0.731 homeowner =1 if homeowner 0.658 experience =1 if more than ten years of experience 0.921 conviction =1 if positive number of convictions 0.050 newcar =1 if new car 0.896 leasecar =1 if lease car 0.153 business =1 if business use 0.037 highmilage =1 if drive more than 10,000 miles 0.704 multidriver =1 if more than two drivers 0.063 7 / 21
Multivariate Let Y ikjt denote the insurance cost for Household (Cluster) i (= 1,,M) Vehicle k (= 1,,K i ) Coverage type j (= 1,,J i ) Period t (= 1,,T i ) For example K 2 = 2, J i = 3, T i = 4, let Y i = (Y i111,,y i114,y i121,,y i124,,y i231,,y i234 ) Using Gaussian copula to build the joint distribution F i (y i ) = H(F(y i111 ),,F(Y i114 ),F(Y i121 ),,F(Y i124 ),,F(Y i231 ),,F(Y i234 );R) Need to specify F Need to specify the association matrix R in the gaussian copula 8 / 21
Multivariate Start from R = B (Σ P) where ( ) 1 δ B =,P = δ 1 and for example P = 1 ρ ρ 2 ρ 3 ρ 1 ρ ρ 2 ρ 2 ρ 1 ρ ρ 3 ρ 2 ρ 1 1 σ 12 σ 13 σ 21 1 σ 23 σ 31 σ 32 1 Fully separate 9 / 21
Multivariate Tweedie Define R = B P where Further ( 1 δ B = δ 1 ) and P = P 11 σ 12 P 12 σ 13 P 13 σ 21 P 21 P 22 σ 23 P 23 σ 31 P 31 σ 32 P 32 P 33 1 ρ j ρ 2 j ρ3 j P AR jj = ρ j 1 ρ j ρ 2 j ρj 2 ρ j 1 ρ j ρj 3 ρj 2 ρ j 1 Summarize dependence parameters Between vehicles δ Between coverage types σ 12, σ 13, σ 23 Temporal ρ 1, ρ 2, ρ 3 10 / 21
Composite Likelihood Use composite likelihood to estimate model parameters cl i (θ;y) = 1 (sum of bivariate loglik) m i 1 Use the inverse of the Godambe information matrix to get standard error G 1 N (θ) = H 1 N (θ)j N (θ)h 1 N (θ) 2 cl i (θ;y i ) θ θ and J N (θ) = 1 N N i=1 where H N (θ) = N 1 N i=1 Model Comparison CLAIC = 2cl(θ;y) + 2tr(J(θ)H(θ) 1 ) CLBIC = 2cl(θ;y) + log(dim(θ))tr(j(θ)h(θ) 1 ) cl i (θ;y i ) cl i (θ;y i ) θ θ 11 / 21
Bivariate Model Use Gaussian copula H(u1,u2) = Φ ρ (Φ 1 (u 1 ),Φ 2 (u 2 )) H(F 1 (0),F 2 (0)) if y 1 = 0 and y 2 = 0 f f (y 1,y 2 ) = 1 (y 1 )h 1 (F 1 (y 1 ),F 2 (0)) if y 1 > 0 and y 2 = 0 f 2 (y 2 )h 2 (F 1 (0),F 2 (y 2 )) if y 1 = 0 and y 2 > 0 f 1 (y 1 )f 2 (y 2 )h(f 1 (y 1 ),F 2 (y 2 ) if y 1 > 0 and y 2 > 0 Here h(u 1,u 2 ) = H(u 1,u 2 )/ u 1 u 2 h j (u 1,u 2 ) = H(u 1,u 2 )/ u j for j = 1,2 They have close-form expressions for Gaussian copula 12 / 21
Specification We set J = 2, K = 2, and T = 4, that is, a policy covers two vehicles and provides two types of coverage for each vehicle. We use Tweedie(µ j,p j,φ j ) with the following specification for coverage type j = 1 and 2: µ j = exp(β j0 + β j1 X 1 + β j2 X 2 ) φ j = exp(γ j0 + γ j1 X 1 + γ j2 X 2 ) X 1 Bernoulli(0.5) and X 2 Bernoulli(0.6) independently. 13 / 21
Specification We use the Gaussian copula with the correlation matrix specified as: ( 1 δ Σ = δ 1 ) σ 12 1 ρ 1 ρ1 2 ρ1 3 ρ 1 1 ρ 1 ρ1 2 ρ1 2 ρ 1 1 ρ 1 ρ1 3 ρ1 2 ρ 1 1 1 ρ 1 ρ 2 1 ρ 3 1 ρ 2 1 ρ 1 ρ 2 1 ρ 2 2 ρ 2 1 ρ 1 ρ 3 2 ρ 2 2 ρ 2 1 σ 12 1 ρ 2 ρ2 2 ρ 3 ρ 1 1 ρ 2 ρ2 2 ρ1 2 ρ 1 1 ρ 2 ρ1 3 ρ1 2 ρ 1 1 1 ρ 2 ρ 2 2 ρ 3 2 ρ 2 1 ρ 2 ρ 2 2 ρ 2 2 ρ 2 1 ρ 2 ρ 3 2 ρ 2 2 ρ 2 1 Use parameterization σ 12 = τ 1 2 1 ρ1 2 1 ρ2 2/(1 ρ 1ρ 2 ). 14 / 21
Specification Table : results for sample size (number of policies) N = 200 and 500 Parameter Relative Bias SD SE MSE N = 200 500 200 500 200 500 200 500 β 10 = 1-0.102-0.030 0.361 0.244 0.369 0.234 0.141 0.060 β 11 = 1.5 0.010 0.007 0.201 0.122 0.208 0.136 0.041 0.015 β 12 = 0.5 0.130 0.020 0.290 0.176 0.295 0.187 0.088 0.031 β 20 = 1-0.011-0.018 0.282 0.190 0.278 0.183 0.080 0.036 β 21 = 0.5-0.012 0.003 0.269 0.163 0.218 0.142 0.072 0.026 β 22 = 2 0.003-0.002 0.200 0.121 0.194 0.129 0.040 0.015 p 1 = 1.2-0.011-0.003 0.025 0.016 0.022 0.014 0.001 0.000 p 2 = 1.4-0.002-0.002 0.028 0.018 0.027 0.017 0.001 0.000 γ 10 = 5-0.006-0.005 0.127 0.081 0.119 0.080 0.017 0.007 γ 11 = 1 0.015 0.017 0.093 0.057 0.093 0.060 0.009 0.004 γ 12 = 1-0.028-0.023 0.120 0.077 0.114 0.077 0.015 0.006 γ 20 = 4-0.002 0.000 0.127 0.084 0.119 0.079 0.016 0.007 γ 21 = 0 NA NA 0.110 0.073 0.108 0.071 0.012 0.005 γ 22 = 1-0.002 0.010 0.138 0.083 0.121 0.079 0.019 0.007 ρ 1 = 0.8-0.011-0.006 0.045 0.030 0.042 0.028 0.002 0.001 ρ 2 = 0.8-0.011-0.009 0.044 0.026 0.038 0.026 0.002 0.001 τ = 0.3-0.041-0.037 0.129 0.085 0.113 0.080 0.017 0.007 δ = 0.6-0.015-0.017 0.083 0.053 0.073 0.049 0.007 0.003 15 / 21
Marginal Models Model dispersion as well as mean Different predictors for each type Table : Parameter estimates for accident benefit Mean Dispersion Est. S.E. Est. S.E. intercept 5.732 0.13 intercept 7.107 0.05 conviction 0.807 0.14 homeowner -0.099 0.03 homeowner -0.225 0.07 experience 0.489 0.05 experience -0.324 0.12 multidriver -0.237 0.06 young -0.911 0.22 marrital 0.174 0.04 senior -0.466 0.11 senior 0.151 0.05 highmilage -0.578 0.08 p 1.703 0.00 16 / 21
Dependence Strong association among coverage types Small serial correlation and cluster effect Table : Estimates of dependence parameters Est. S.E. ρ 1 0.163 0.022 ρ 2 0.051 0.013 ρ 3 0.099 0.015 ρ 4 0.101 0.011 τ 12 0.436 0.010 τ 13 0.049 0.018 τ 14 0.641 0.009 τ 23 0.013 0.012 τ 24 0.351 0.007 τ 34 0.021 0.010 δ 0.082 0.020 17 / 21
Model Comparison Table below summarizes the goodness-of-fit statistics of alternative specifications Smaller statistics indicate better fit Table : Goodness-of-fit statistics Model Description CLAIC CLBIC M0 independence 958,513 959,142 M1 no temporal 957,747 958,375 M2 no cross-sectional 958,501 959,130 M3 no cluster 957,734 958,363 M4 no dispersion 958,491 959,120 M5 full model 957,730 958,358 18 / 21
Individual Risk young senior marital homeowner experience conviction newcar leasecar business highmilage multidriver Excellent 0 1 1 1 1 0 0 0 0 1 0 Very Good 0 1 1 1 1 0 1 1 1 0 0 Good 0 1 1 1 0 1 1 0 1 1 0 Fire 0 1 1 1 1 1 1 1 1 0 1 Poor 0 0 0 0 0 1 1 1 1 0 1 19 / 21
Portfolio Risk Left: mean v.s. dispersion Right: independence v.s. copula 20 / 21
We focused on the multilevel structure of claims data Tweedie model was considered as an example Thank you for your kind attention. Learn more about my research at: https://sites.google.com/a/wisc.edu/peng-shi/ 21 / 21