Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia State University, Atlanta, Georgia 26-29 July 2017 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 1 / 22

Outline Introduction Data structure The frequency-severity model Exponential family Generalized linear mixed models Compound sum Model specifications Negative binomial for frequency Gamma for average severity Properties: mean and variance Preliminary investigation Data estimation Estimates for frequency Estimates for average severity Tweedie results Validation results Conclusion Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 2 / 22

Introduction Data structure Data structure Policyholder i is followed over time t = 1,..., T i years, where T i is at most 9 years. Unit of analysis it a registered vehicle insured i over time t (year) For each it, could have several claims, k = 0, 1,..., n it Have available information on: number of claims n it, amount of claim c itk, exposure e it and covariates (explanatory variables) x it covariates often include age, gender, vehicle type, driving history and so forth We will model the pair (n it, c it ) where is the observed average claim size. 1 n it c itk, n it > 0 c it = n it k=1 0, n it = 0 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 3 / 22

The frequency-severity model The frequency-severity model Traditional to predict/estimate insurance claims distributions: Cost of Claims = Frequency Average Severity The joint density of the number of claims and the average claim size can be decomposed as f(n, C x) = f(n x) f(c N, x) joint = frequency conditional severity. This natural decomposition allows us to investigate/model each component separately and it does not preclude us from assuming N and C are independent. For purposes of notation, we will use S = NC to denote the aggregate claims. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 4 / 22

The frequency-severity model Exponential family Exponential family We say Y comes from an exponential dispersion family if its density has the form [ ] yγ ψ(γ) f(y) = exp + c(y; τ). τ where γ and ψ are location and scale parameters, respectively, and ψ(γ) and c(y; τ) are known functions. The following well-known relations hold for these distributions: mean: µ = E[Y ] = ψ (γ) variance: V ar(y ) = τψ (γ) = τv (µ) See Nelder and Wedderburn (1972) Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 5 / 22

The frequency-severity model Exponential family Examples of members of this family Normal N(µ, σ 2 ) with γ = µ and τ = σ 2, V (µ) = 1 Gamma(α, β) with γ = β/α and τ = 1/α, V (µ) = µ 2 Inverse Gaussian(α, β) with γ = 1 2 β2 /α 2 and τ = β/α 2, V (µ) = µ 3 Poisson(µ) with γ = log µ and τ = 1, V (µ) = 1, V (µ) = µ Binomial(m, p) with γ = log [p/(1 p)] and τ = 1, V (µ) = µ(1 µ/m) Negative Binomial(r, p) with γ = log(1 p) and τ = 1, V (µ) = µ(1 + µ/r) Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 6 / 22

The frequency-severity model Exponential family The link function and linear predictors The linear predictor is a function of the covariates, or predictor variables. The link function connects the mean of the response to this linear predictor in the form η = g(µ). g(µ) is called the link function with µ = E(Y ) = linear predictor. The transformed mean follows a linear model as η = x iβ = β 0 + β 1 x i1 + + β p x ip with p predictor variables (or covariates). You can actually invert µ = g 1 (x i β) = g 1 (η). Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 7 / 22

The frequency-severity model Generalized linear mixed models Generalized linear mixed models GLMMs extend GLMs by allowing for random, or subject-specific, effects in the linear predictor which reflects the idea that there is a natural heterogeneity across subjects. For insurance applications, our subject i is usually a policyholder observed for a period of T i periods. Given the vector b with the random effects for each subject i, Y it belongs to the exponential family: ( ) yit γ it ψ(γ it ) f(y it R i, β, τ) = exp + c(y it, τ) τ with the following conditional relations: mean: µ it = Ey it R i = ψ (γ it ) variance: V ar(y it R i ) = τψ (γ it ) = τv (µ it ) Link function: g(µ it ) = x it β + z it R i Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 8 / 22

Compound sum Compound sum We can define the compound sum as S it = N it j=1 C itj = N it C it to represent the aggregate claims. Suppressing the subscripts, the unconditional mean of S can be expressed as E[S x] = E[NC x] = E[NE[C N, u] x] = E[Ng 1 C (x β + z u + θn) x] and the unconditional variance as V ar(s x) = V ar(e[nc N, x]) + E[V ar(nc N, x)] = V ar(ne[c N, x] x) + E[N 2 V ar(c N, x) x] = V ar(ng 1 C (x β + z u + θn) x) +E[N 2 τv (g 1 C (x β + z u + θn)) x] For aggregate claims for calendar year t alone, we can define M S t = N it C it, i=1 where here M is the total policies for each year. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 9 / 22

Model specifications Model specifications We calibrated models with the following specifications: For the count of claims (frequency): negative binomial for our baseline model more suitable than the traditional Poisson model because it can handle overdispersion and individual unobserved effects For the average severity of claims: Gamma For the random effects for GLMMs: Normal with mean zero and unknown variances different variances for frequency and claim severity The Tweedie model is a special case of the GLM family and it has mass at zero (avoids modeling frequency and severity separately). for some case, its density has no explicit form, but the variance function has the form V (µ) = µ p e.g. compound Poisson Gamma with 1 < p < 2. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 10 / 22

Model specifications Negative binomial for frequency Negative binomial for frequency Starting with the frequency component, we have the following model specification: where N b indep. NB(νe b, r) with density f N (N b) ( ) ( ) n + r 1 r r ( ) νe b n f N (n b) = n r + νe b r + νe b. Based on the log-link function, we have ν = exp(x α) where α is a p 1 vector of regression coefficients. The following relations hold: ( ) E[N b] = e x α+b and V ar(n b) = e x α+b 1 + e x α+b /r. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 11 / 22

Model specifications Gamma for average severity Gamma for average severity For the average severity component, given N > 0, we have the following model specification: C N, u indep. Gamma(µe u, φ/n) with density f C N (c n, u) where f C N (c n, u) = ( ) 1 n n/φ ( Γ(n/φ) µe u c n/φ 1 exp nc ) φ µe u. φ Additionally, we introduce the number of claims n as a covariate with coefficient θ. With this specification, the following relations hold: E[C N, u] = e x β+θn+u and V ar(c N, u) = e 2(x β+θn+u) φ/n. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 12 / 22

Model specifications Properties: mean and variance Mean and variance of aggregate claims It is interesting to note that in the case where we have negative binomial for frequency and gamma for average severity, we find that E[S x] = µνe[[1 (νe b /r)(e θ 1)] r 1 ]e σ2 u /2 and the unconditional variance as ( V ar(s x) = µ 2 e σ2 u φe[νe b [1 (νe b /r)(e 2θ 1)] r 1 ]e σ2 u + 1 4 E[ν2 e 2b (1 + 1/r)[1 (νe b /r)(e 2θ 1)] r 2 ]e σ2 u E[νe b [1 (νe b /r)(e θ 1)] r 1 ] 2) Note that in the case where b = 0 and u = 0, no random effects and we get Garrido, et al. (2016). In addition, if θ = 0, we have the independent GLM. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 13 / 22

Preliminary investigation Goodness-of-fit test for the frequency component Count Observed Poisson NegBin 0 148198 147608.6 148193.5 1 12788 13895.4 12809.9 2 1109 654 1077.6 3 77 20.5 89.8 4 5 0.5 7.5 5 2 0 0.6 χ 2 104550.3 104079.2 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 14 / 22

Preliminary investigation Q-Q plots of fitting the gamma to average severity Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 15 / 22

Data estimation Observable policy characteristics used as covariates Categorical Description Proportions variables VehType Type of insured vehicle: Car 99.27% Motorbike 0.47% Others 0.27% Gender Insured s sex: Male = 1 80.82% Female = 0 19.18% Cover Code Type of insurance cover: Comprehensive = 1 78.65% Others = 0 21.35% Continuous Minimum Mean Maximum variables VehCapa Insured vehicle s capacity in cc 10.00 1587.44 9996.00 VehAge Age of vehicle in years -1.00 6.71 48.00 IssAge The policyholder s issue age 18.00 44.46 99.00 NCD No Claim Discount in % 0.00 35.67 50.00 M = 50, 215 unique policyholders, each observed over 8 years. The aggregated total number of observations is 162,179; we observe each policyholder over 3+ years. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 16 / 22

Data estimation Estimates for frequency Regression estimates of the negative binomial for frequency Negative binomial GLM Negative binomial GLMM Est s.e p-value Est s.e p-value (Intercept) -2.29 0.35 0.00-2.23 0.36 0.00 VTypeCar 0.23 0.20 0.25 0.23 0.20 0.25 VTypeMBike -1.81 0.49 0.00-1.79 0.49 0.00 VehCapa 0.00 0.00 0.00 0.00 0.00 0.00 VehAge -0.02 0.00 0.00-0.02 0.00 0.00 SexM 0.11 0.02 0.00 0.11 0.02 0.00 Comp 0.81 0.04 0.00 0.82 0.04 0.00 Age -0.02 0.02 0.22-0.03 0.02 0.14 Age 2 0.00 0.00 0.52 0.00 0.00 0.40 Age 3 0.00 0.00 0.94 0.00 0.00 0.83 NCD -0.01 0.00 0.00-0.01 0.00 0.00 σb 2 0.20 Loglikelihood -49486.83-49434.93 AIC 98997.65 98895.85 BIC 99087.81 99025.81 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 17 / 22

Data estimation Estimates for average severity Regression estimates of the gamma for severity Independent GLM Dependent GLM Dependent GLMM Est s.e p-value Est s.e p-value Est s.e p-value (Intercept) 9.92 0.84 0.00 10.12 0.84 0.00 7.89 0.44 0.00 VTypeCar 0.62 0.48 0.20 0.49 0.48 0.31 0.25 0.25 0.33 VTypeMBike 3.11 1.32 0.02 2.97 1.31 0.02 2.07 0.65 0.00 VehCapa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 VehAge -0.03 0.01 0.00-0.03 0.01 0.00-0.01 0.00 0.00 SexM 0.00 0.05 0.98 0.00 0.05 0.92-0.02 0.03 0.53 Comp 0.04 0.10 0.67 0.05 0.10 0.58 0.19 0.05 0.00 Age -0.16 0.04 0.00-0.16 0.04 0.00-0.05 0.02 0.03 Age 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 Age 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 NCD 0.00 0.00 0.00-0.01 0.00 0.00 0.00 0.00 0.00 Count -0.12 0.04 0.01 0.01 0.02 0.62 σu 2 1.02 Loglikelihood -138593.88-138575.66-133757.96 AIC 277211.75 277177.32 267543.92 BIC 277301.91 277274.99 267649.10 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 18 / 22

Data estimation Tweedie results Regression estimates based on Tweedie Tweedie GLM Tweedie GLMM Est s.e p-value Est s.e p-value (Intercept) 7.85 1.00 0.00 7.86 4.18 0.06 VTypeCar 0.93 0.60 0.12 0.93 1.20 0.44 VTypeMBike 1.11 0.74 0.14 1.10 2.83 0.70 VehCapa 0.00 0.00 0.00 0.00 0.00 0.19 VehAge -0.05 0.01 0.00-0.05 0.02 0.04 SexM 0.11 0.06 0.06 0.11 0.42 0.79 Comp 1.02 0.10 0.00 1.02 0.23 0.00 Age -0.21 0.05 0.00-0.21 0.24 0.40 Age 2 0.00 0.00 0.00 0.00 0.00 0.42 Age 3 0.00 0.00 0.00 0.00 0.00 0.47 NCD -0.02 0.00 0.00-0.02 0.00 0.00 σr 2 1259.50 Loglikelihood -172582.32-240222.97 AIC 345188.64 480471.93 BIC 345278.79 480601.89 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 19 / 22

Data estimation Validation results Validation measures to compare models Frequency models Average severity models Tweedie models MSE MAE Negative binomial GLM 0.28187 0.13718 Negative binomial GLMM 0.28187 0.13664 MSE MAE MAPE Independent GLM 8099.188 3517.314 4.39044 Dependent GLM 8099.480 3529.276 4.42182 Dependent GLMM 8017.858 2567.392 2.23721 MSE MAE Tweedie GLM 1984.409 479.4245 Tweedie GLMM 1983.866 479.2018 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 20 / 22

Data estimation Validation results Comparing the Gini index values for the various models Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 21 / 22

Conclusion Concluding remarks We are still in the early stages of this work. In particular, this is an early exploration of the promise of using GLMMs to account for dependency between frequency and severity. This work hopes to extend the literature on this type of dependency: H. Jeong (2016) work on Simple Compound Risk Model with Dependent Structure Garrido, et al. (2016) Frees and Wang (2006), Frees, et al. (2011) Czado (2012) Shi, et al. (2015) Further work is needed to measure the financial impact of using GLMMs versus other models: risk classification a posteriori prediction and applications in bonus-malus systems modified insurance coverage and reinsurance applications Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 22 / 22