Multivariate negative binomial models for insurance claim counts

Multivariate negative binomial models for insurance claim counts Peng Shi (Northern Illinois University) and Emiliano A. Valdez (University of Connecticut) 9 November 0, Montréal, Quebec Université de Montréal Seminar

Université de Montréal Seminar 0 Background For short term insurance contracts, there are generally two major components to decompose: cost of claims = frequency severity Many believe that the frequency component: may be the more important of these two components reveals, to an extent, more of the riskiness of the policyholder Model improvements can be made on the prediction on frequency: usual accounting for heterogeneity - risk classification variables using additional information available in data recorded e.g. types of claim additional insight on the association between the types of claim Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 The multivariate claim count data Assume we have a portfolio of m policyholders observed over a fixed time period (cross-sectional). For each policyholder, we observe a vector of claim counts expressed as (N i, N i,..., N ik ), where N ij is the claim count associated with type j, with j =,,...,k, and i =,,...,m. Each policyholder also reveals a set of observable covariates x i useful to sub-divide the portfolio into classes of risks with homogeneous characteristics. We present and compare various alternatives for modeling multivariate claim counts using classical methods of using common shocks and the modern methods of using copulas. Traditional count regression models are used in the modeling of the marginal component. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Alternative multivariate models Some literature on incorporating dependence structure in a multivariate count outcomes: use of a common additive error: Kocherlakota and Kocherlakota (99), Johnson et al. (997), Winkelmann (000), Karlis and Meligkotsidou (00), and Bermúdez and Karlis (0) Poisson, zero-inflated Poisson, and negative binomial to capture overdispersion mixture model with a multiplicative error: Hausmann et al. (98), Day and Chung (99) may be restricted to non-negative covariance structure - alternatives here are Aitchison and Ho (989) and van Ophem (999) Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Multivariate models using copulas Construct general discrete multivariate distribution to support more complex correlation structures through the use of copulas: although believed still in its infancy, increasingly been popular applications extend to several disciplines: Economics: Prieger (00), Cameron et al. (00), Zimmer and Trivedi (00) Biostatistics: Song et al. (008), Madsen and Fang (00) Actuarial science: Purcaru and Denuit (00), Shi and Valdez (0), Shi and Valdez (0) Boucher, Denuit and Guillén (008) provides a survey of models for insurance claim counts with time dependence. Genest and Nešlehová (007) says be pre-cautious when using copulas: non-uniqueness of the copula, vague interpretation of the nature of dependence. Shi, P. and Valdez, E.A. /

Negative binomial distribution model Université de Montréal Seminar 0 N is said to follow a negative binomial distribution if its pmf may be expressed as ( ) Γ(η + n) η ( ) ψ n Pr(N = n) =, n = 0,,, Γ(η)Γ(n + ) + ψ + ψ and is denoted as N NB(ψ, η) for ψ, η > 0. Mean E(N) = ηψ and variance Var(N) = ηψ( + ψ) Unlike the Poisson distribution, the NB accommodates overdispersion via parameter ψ. As ψ 0, overdispersion vanishes and the NB converges to the Poisson distribution. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Negative binomial regression models For regression, consider the mean specified in terms of covariates λ = ηψ = exp(x β), which may come in two different parameterizations: NB-I model: η = σ exp(x β) NB-II model: η = σ Both assume same mean structure but different dispersion φ such that Var(N x) = φe(n x). The NB-I model implies a constant dispersion φ = + σ. The NB-II model allows for subject heterogeneity in the dispersion φ = + σ exp(x β). See Winkelmann (008). Shi, P. and Valdez, E.A. 7 /

Université de Montréal Seminar 0 Empirical data For our empirical analysis, claims data are obtained from one randomly selected automobile insurance company in Singapore: claims experience for a single year focus on non-fleet policy considered only policyholders with comprehensive coverage Summary of response and explanatory variables Variable Description Mean StdDev Responses N number of claims of third-party bodiliy injury 0.0 0. N number of claims of own damage 0.09 0. N number of claims of third-party property damage 0.0 0.9 Covariates young equals if age is less than 0.0 0.90 lowncd equals if NCD is less than 0 0.8 0.79 vage vehicle age.98.0 private equals for private car 0.90 0.9 vlux equals for luxuary car 0.0 0. smallcap equals if vehicle capacity is small 0.0 0.07 Shi, P. and Valdez, E.A. 8 /

Goodness of fit of the marginal models Université de Montréal Seminar 0 N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 7 7.9 70. 70.9 7. 9 0.9 9.0 9. 9. 7.0 0. 0.7 0. 0.9 0.79.0. χ 8.7.7.. N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 70 79.7 70.87 70.8 70. 9..8 9...78 8.. 7...80.8.8 0.0 0. 0.0 0.8 χ 0.9.7.7. N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 70 79.88 797. 700. 700.9 7.8 7.9.7. 0.9 7.7 0.7 0.7 0. 0. 0. 0.8 χ 7.8.8 0.78 0.8 Shi, P. and Valdez, E.A. 9 /

Multivariate models considered Université de Montréal Seminar 0 Multivariate models constructed based on common shocks as follows: N i = U i + U i + U i N i = U i + U i + U i N i = U i + U i + U i Multivariate models based on copulas: with F i (n, n, n x i ) = C(F (n x i ), F (n x i ), F (n x i ) x i ;Θ) f i (n, n, n x i ) = l =0 l =0 l =0 ( ) l +l +l C(u i,l, u i,l, u i,l x i ;Θ) Shi, P. and Valdez, E.A. 0 /

Families of copulas examined Université de Montréal Seminar 0 The class of mixtures of max-id copulas: extends construction of the Archimedean copula by mixing bivariate max-id copulas a multivariate distribution, say H, is max-id if H γ is a CDF for all γ > 0, Joe (99) allows to capture a global together with pair-wise dependence the copula appears complex but in manageable explicit form: C(u, u, u ) = ϕ X j<k log R jk e ν jϕ (u j ), e ν kϕ (u k ) X j= «ω j ν j ϕ (u j ) Joe and Hu (997) Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Families of copulas examined - continued The class of elliptical copulas: has been popular because of straightforward calculation of the copula while at the same time, allowing for flexible correlation structures considered both Gaussian and t copulas, but ended up choosing Gaussian as a better model of the two (based on our empirical data) Gaussian copula: ) C(u, u, u ) = Φ Σ (Φ (u ), Φ (u ), Φ (u ) to circumvent the problem often encountered with using copulas for multivariate discrete data, we continuitize the discrete form using jitters Shi, P. and Valdez, E.A. /

Continuous extension with jitters Université de Montréal Seminar 0 Define Ñij = N ij U ij for j {,, } where U ij Uniform(0, ). The joint pdf of jittered counts for the ith policyholder Ñ i = (Ñi, Ñi, Ñi) may be expressed as: ef i (en i, en i, en i x i ) = c( e F (en i x i ), e F (en i x i ), e F (en i x i ) x i ;Θ) Y j= ef ij (en ij x ij ) Retrieve the joint pmf of (N i, N i, N i ) by averaging over the jitters: f i (n i, n i, n i x i ) =E Ui c( e F (n i U i x i ), e F (n i U i x i ), e F (n i U i x i ) x i ;Θ) Based on relations: Y j= «ef j (n ij U ij x ij ) F j (ñ ij x ij ) = F j ([ñ ij ] x ij ) + (ñ ij [ñ ij ])f j ([ñ ij + ] x ij ) f j (ñ ij x ij ) = f j ([ñ ij + ] x ij ) Shi, P. and Valdez, E.A. /

Multivariate NB-I with common shocks Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio intercept -.7-8.9 -. -8.80 -. -8.8 young 0.7.9 0.. 0.87.7 lowncd 0.78.97 0.7.0 0.0.7 vage -0.0-0. 0.0 0. -0.0-0. vage vage 0.00 0. -0.0 -.0 0.00. private -0.90 -.87 0.0.7 -.09 -.00 vlux 0.. 0.099 0. -.9 -.9 smallcap -0.0-0.8-0.77 -. -0.0 -.9 Estimate 9% CI lnλ -.79 (-., -.) lnλ -.709 (-.899, -.8) lnλ -.7 (-.7, -.08) lnσ -.97 (-7.87, -.007) Goodness-of-Fit Loglikelihood -9.9 AIC 909.7 BIC 9.7 Shi, P. and Valdez, E.A. /

Multivariate NB-I using mixtures of max-id copulas Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-I intercept -.9 -. -.87 -. -. -.70 young 0.8.9 0.7. 0.0.7 lowncd 0..8 0.8.08 0..78 vage -0.07 -. -0.0 -.0-0.00 -.0 vage vage 0.00. 0.00 0.79 0.00.09 private -0. -.8 0..9-0.70 -.70 vlux 0.07 0.08 0.0 0.7-0.7 -.8 smallcap -0. -. -0.8 -.0-0.77 -. Estimate 9% CI lnσ -.9 (-.0, -.) lnσ -. (-., -.) lnσ -.9 (-.7, -.08) θ.9 (.77,.8) θ.7 (.89,.) θ.7 (.,.) Goodness-of-Fit Loglikelihood -87. AIC 8. BIC 88. Shi, P. and Valdez, E.A. /

Multivariate NB-II using mixtures of max-id copulas Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-II intercept -.8 -.87 -.8 -.0 -.9 -.9 young 0.0. 0.. 0.8. lowncd 0.0.0 0.00.8 0.09. vage -0.07 -.98-0.0 -.7-0.0-0.8 vage vage 0.00. 0.00 0. 0.00 0.80 private -0. -.8 0..80-0.7 -.8 vlux -0.0-0.7-0.00-0.0-0. -.00 smallcap -0.9 -.7-0. -. -0.89 -. Estimate 9% CI lnσ 0.9 (-0.07, 0.8) lnσ -0.9 (-0.79, 0.) lnσ 0. (-0.09,.9) θ. (.79,.8) θ.80 (.99,.) θ. (.,.07) Goodness-of-Fit Loglikelihood -8. AIC 8.7 BIC 88.8 Shi, P. and Valdez, E.A. /

Multivariate NB-I using Gaussian copula Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-I intercept -.0 -.7 -.97 -. -.0 -. young 0.9.8 0.8.7 0.78.0 lowncd 0.78.70 0.. 0.8.8 vage -0.07 -.0-0. -.00-0.09 -. vage vage 0.00. 0.00 0.7 0.00.0 private -0.90 -. 0.7. -0.8 -.79 vlux 0.07 0.08 0.007 0.0-0. -.88 smallcap -0. -.9-0.9 -.88-0.8 -.8 Estimate 9% CI lnσ -.90 (-.77, -.) lnσ -.998 (-.80, -.7) lnσ -.99 (-.9, -.00) ρ 0.798 ( 0.770, 0.8) ρ 0.80 ( 0.79, 0.8) ρ 0.8 ( 0., 0.) Goodness-of-Fit Loglikelihood -9.8 AIC 87. BIC 88.0 Shi, P. and Valdez, E.A. 7 /

Multivariate NB-II using Gaussian copula Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-II intercept -.9 -. -.9 -.0 -.8 -. young 0.0.99 0.8. 0.8. lowncd 0.8. 0.0.0 0..8 vage -0.07 -.00-0. -.89-0.08 -. vage vage 0.00.9 0.00 0.9 0.00.0 private -0. -. 0.. -0.9 -.87 vlux -0.0-0. -0.0-0.0-0.70 -.98 smallcap -0.0 -.7-0. -.0-0.879 -.7 Estimate 9% CI lnσ -0.7 (-0.9, 0.9) lnσ -0.7 (-., 0.0) lnσ -0.0 (-.,.00) ρ 0.799 (0.77, 0.87) ρ 0.80 (0.770, 0.88) ρ 0.8 (0., 0.7) Goodness-of-Fit Loglikelihood -9.8 AIC 8. BIC 88.8 Shi, P. and Valdez, E.A. 8 /

Predictive applications Université de Montréal Seminar 0 This application examines the financial losses of third party bodily injury, own damage, and third party property damage, in a comprehensive coverage. For demonstration purposes, we look into the variable L i = N i + N i + N i that is the basis of the premium for risk class i. Five hypothetical risk classes: Ranking from high to low risk levels, they are Excellent, Very Good, Good, Fair, and Poor. For example, the policyholders in class Excellent are older than, have a NCD score above 0, and drive a 0-year old private small luxury car. Hypothetical risk classification profile young lowncd vage private vlux smallcap Excellent 0 0 0 Very Good 0 0 Good 0 0 0 Fair 0 0 0 0 Poor 0 0 0 0 Shi, P. and Valdez, E.A. 9 /

Prediction estimates of claim frequency Université de Montréal Seminar 0 Excellent Very Good MVNB MaxID Gaussian Product MVNB MaxID Gaussian Product ooo 0.90 0.99 0.97 0.97 0.8 0.907 0.899 0.889 oo+ 0.00 0.000 0.008 0.00 0.009 0.009 0.008 0.0 o+o 0.0 0.0 0.07 0.00 0.07 0.08 0.08 0.098 +oo 0.0 0.009 0.00 0.08 0.078 0.0 0.0 0.0 o++ 0.009 0.000 0.000 0.000 0.00 0.000 0.000 0.0009 +o+ 0.07 0.008 0.009 0.000 0.0 0.0070 0.007 0.0009 ++o 0.07 0.0 0.0 0.00 0.0 0.00 0.0 0.00 +++ 0.007 0.00 0.00 0.0000 0.00 0.0 0.07 0.0000 Good Fair MVNB MaxID Gaussian Product MVNB MaxID Gaussian Product ooo 0.8 0.87 0.88 0.7 0.778 0.7 0.799 0.7089 oo+ 0.0 0.0098 0.0 0.080 0.0 0.0 0.0 0.00 o+o 0.0989 0.07 0.07 0.98 0.089 0.088 0.07 0. +oo 0.000 0.00 0.0 0.08 0.0 0.00 0.0 0.088 o++ 0.00 0.00 0.009 0.000 0.00 0.008 0.00 0.0078 +o+ 0.009 0.0079 0.007 0.00 0.08 0.08 0.09 0.000 ++o 0.0 0.0 0.08 0.0097 0.09 0.0 0.00 0.0 +++ 0.00 0.0 0.090 0.000 0.00 0.07 0.00 0.0009 Poor MVNB MaxID Elliptical Product ooo 0.78 0.9 0.9 0.90 oo+ 0.070 0.088 0.0 0.09 o+o 0.087 0.08 0.077 0. +oo 0.07 0.07 0.0 0.0 o++ 0.0 0.08 0.00 0.07 +o+ 0.08 0.07 0.09 0.0 ++o 0.0 0.0 0.078 0.09 +++ 0.0070 0.0 0.078 0.000 Shi, P. and Valdez, E.A. 0 /

Mean and variance of total claims by risk class Université de Montréal Seminar 0 Excellent Very Good Good Fair Poor E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) MVNB 0.0 0. 0.9 0. 0.0 0.9 0. 0.9 0. 0.90 MaxID-I 0.087 0.78 0.8 0.9 0.7 0.8 0.0.00 0.. MaxID-II 0.08 0. 0. 0.9 0.77 0.0 0.8.0 0..70 Gauss-I 0.078 0. 0. 0.9 0. 0.08 0.0 0.87 0.0.0 Gauss-II 0.07 0. 0. 0.7 0.7 0. 0.0 0.8 0..7 Indep-I 0.08 0.08 0. 0.7 0.80 0.9 0. 0.77 0. 0.7 Indep-II 0.079 0.080 0. 0.9 0.80 0.9 0. 0.80 0.0 0.0 Shi, P. and Valdez, E.A. /

Predictive distributions for various models Université de Montréal Seminar 0 Percent of Total Percent of Total MaxID Copula NB I Excellent Very Good Good Fair Poor 0. 0. 0. 0.8 Expected Count Elliptical Copula NB I Excellent Very Good Good Fair Percent of Total Percent of Total MaxID Copula NB II Excellent Very Good Good Fair Poor 0. 0. 0. 0.8 Expected Count Elliptical Copula NB II Excellent Very Good Good Fair Poor Poor 0. 0. 0. 0.8 0. 0. 0. 0.8 Expected Count Expected Count Shi, P. and Valdez, E.A. /

Concluding remarks Université de Montréal Seminar 0 We considered alternative approaches to construct multivariate count regression models based on the negative binomial distributions. The trivariate negative binomial models was proposed in order to accommodate the dependency among three types of claims: the third party bodily injury, own damage, and third-party property damage. Copulas provide flexibility to allow modeling various dependence structures, allowing to separate the effects of peculiar characteristics of the margins such as thickness of tails. In contrast, the class of multivariate negative binomial models based on common shocks rely on the additivity of the NB-I distribution and require a common dispersion for all marginals. We found that the superiority of the copula approach was supported by the better fit in the empirical analysis. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 - Merci bien - Shi, P. and Valdez, E.A. /