Generalized linear mixed models for dependent compound risk models

Similar documents
Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Multivariate negative binomial models for insurance claim counts

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Generalized Linear Models Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Ratemaking with a Copula-Based Multivariate Tweedie Model

Lecture 8. Poisson models for counts

PL-2 The Matrix Inverted: A Primer in GLM Theory

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Chapter 5: Generalized Linear Models

RMSC 2001 Introduction to Risk Management

Generalized Linear Models (GLZ)

GB2 Regression with Insurance Claim Severities

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

RMSC 2001 Introduction to Risk Management

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

STAT5044: Regression and Anova

Generalized Linear Models 1

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

Linear Regression Models P8111

Advanced Ratemaking. Chapter 27 GLMs

LOGISTIC REGRESSION Joseph M. Hilbe

Solutions to the Spring 2016 CAS Exam S

Non-Life Insurance: Mathematics and Statistics

Generalized Linear Models for a Dependent Aggregate Claims Model

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

Tail negative dependence and its applications for aggregate loss modeling

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Outline of GLMs. Definitions

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

GLM I An Introduction to Generalized Linear Models

Linear Regression With Special Variables

Generalized logit models for nominal multinomial responses. Local odds ratios

Creating New Distributions

SB1a Applied Statistics Lectures 9-10

Peng Shi. Actuarial Research Conference August 5-8, 2015

Some explanations about the IWLS algorithm to fit generalized linear models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

GENERALIZED LINEAR MODELS Joseph M. Hilbe

A Practitioner s Guide to Generalized Linear Models

Chapter 4: Generalized Linear Models-II

Generalized Linear Models

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Introduction to General and Generalized Linear Models

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Semiparametric Generalized Linear Models

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

Bonus Malus Systems in Car Insurance

Introduction to Generalized Linear Models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Single-level Models for Binary Responses

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Package HGLMMM for Hierarchical Generalized Linear Models

SPRING 2007 EXAM C SOLUTIONS

Generalized Linear Models I

Non-Gaussian Response Variables

Applied Generalized Linear Mixed Models: Continuous and Discrete Data

Credibility Theory for Generalized Linear and Mixed Models

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Optimal Design for the Rasch Poisson-Gamma Model

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Generalized Estimating Equations

High-Throughput Sequencing Course

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture 1 Introduction to Multi-level Models

Qualifying Exam in Probability and Statistics.

Contents 1. Contents

9 Generalized Linear Models

Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Catalina Bolancé & Raluca Vernic

A strategy for modelling count data which may have extra zeros

MODELING COUNT DATA Joseph M. Hilbe

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Generalized Linear Mixed Models in the competitive non-life insurance market

Likelihoods for Generalized Linear Models

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Generalized linear models

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form

Generalized Linear Models. Kurt Hornik

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Advanced Quantitative Methods: limited dependent variables

Lecture 9 STK3100/4100

Regression Model Building

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Generalized linear models

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

Qualifying Exam in Probability and Statistics.

11. Generalized Linear Models: An Introduction

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Transcription:

Generalized linear mixed models for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut ASTIN/AFIR Colloquium 2017 Panama City, Panama 21-24 August 2017 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 1 / 22

Outline Introduction Data structure The frequency-severity model Exponential family Generalized linear mixed models Compound sum Model specifications Negative binomial for frequency Gamma for average severity Properties: mean and variance Preliminary investigation Data estimation Estimates for frequency Estimates for average severity Tweedie results Validation results Conclusion Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 2 / 22

Introduction Data structure Data structure Policyholder i is followed over time t = 1,..., T i years, where T i is at most 9 years. Unit of analysis it a registered vehicle insured i over time t (year) For each it, could have several claims, k = 0, 1,..., n it Have available information on: number of claims n it, amount of claim c itk, exposure e it and covariates (explanatory variables) x it covariates often include age, gender, vehicle type, driving history and so forth We will model the pair (n it, c it ) where is the observed average claim size. 1 n it c itk, n it > 0 c it = n it k=1 0, n it = 0 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 3 / 22

The frequency-severity model The frequency-severity model Traditional to predict/estimate insurance claims distributions: Cost of Claims = Frequency Average Severity The joint density of the number of claims and the average claim size can be decomposed as f(n, C x) = f(n x) f(c N, x) joint = frequency conditional severity. This natural decomposition allows us to investigate/model each component separately and it does not preclude us from assuming N and C are independent. For purposes of notation, we will use S = NC to denote the aggregate claims. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 4 / 22

The frequency-severity model Exponential family Exponential family We say Y comes from an exponential dispersion family if its density has the form [ ] yγ ψ(γ) f(y) = exp + c(y; τ). τ where γ and ψ are location and scale parameters, respectively, and ψ(γ) and c(y; τ) are known functions. The following well-known relations hold for these distributions: mean: µ = E[Y ] = ψ (γ) variance: V ar(y ) = τψ (γ) = τv (µ) See Nelder and Wedderburn (1972) Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 5 / 22

The frequency-severity model Exponential family Examples of members of this family Normal N(µ, σ 2 ) with γ = µ and τ = σ 2, V (µ) = 1 Gamma(α, β) with γ = β/α and τ = 1/α, V (µ) = µ 2 Inverse Gaussian(α, β) with γ = 1 2 β2 /α 2 and τ = β/α 2, V (µ) = µ 3 Poisson(µ) with γ = log µ and τ = 1, V (µ) = 1, V (µ) = µ Binomial(m, p) with γ = log [p/(1 p)] and τ = 1, V (µ) = µ(1 µ/m) Negative Binomial(r, p) with γ = log(1 p) and τ = 1, V (µ) = µ(1 + µ/r) Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 6 / 22

The frequency-severity model Exponential family The link function and linear predictors The linear predictor is a function of the covariates, or predictor variables. The link function connects the mean of the response to this linear predictor in the form η = g(µ). g(µ) is called the link function with µ = E(Y ) = linear predictor. The transformed mean follows a linear model as η = x iβ = β 0 + β 1 x i1 + + β p x ip with p predictor variables (or covariates). You can actually invert µ = g 1 (x i β) = g 1 (η). Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 7 / 22

The frequency-severity model Generalized linear mixed models Generalized linear mixed models GLMMs extend GLMs by allowing for random, or subject-specific, effects in the linear predictor which reflects the idea that there is a natural heterogeneity across subjects. For insurance applications, our subject i is usually a policyholder observed for a period of T i periods. Given the vector b with the random effects for each subject i, Y it belongs to the exponential family: ( ) yit γ it ψ(γ it ) f(y it R i, β, τ) = exp + c(y it, τ) τ with the following conditional relations: mean: µ it = Ey it R i = ψ (γ it ) variance: V ar(y it R i ) = τψ (γ it ) = τv (µ it ) Link function: g(µ it ) = x it β + z it R i Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 8 / 22

Compound sum Compound sum We can define the compound sum as S it = N it j=1 C itj = N it C it to represent the aggregate claims. Suppressing the subscripts, the unconditional mean of S can be expressed as E[S x] = E[NC x] = E[NE[C N, u] x] = E[Ng 1 C (x β + z u + θn) x] and the unconditional variance as V ar(s x) = V ar(e[nc N, x]) + E[V ar(nc N, x)] = V ar(ne[c N, x] x) + E[N 2 V ar(c N, x) x] = V ar(ng 1 C (x β + z u + θn) x) +E[N 2 τv (g 1 C (x β + z u + θn)) x] For aggregate claims for calendar year t alone, we can define M S t = N it C it, i=1 where here M is the total policies for each year. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 9 / 22

Model specifications Model specifications We calibrated models with the following specifications: For the count of claims (frequency): negative binomial for our baseline model more suitable than the traditional Poisson model because it can handle overdispersion and individual unobserved effects For the average severity of claims: Gamma For the random effects for GLMMs: Normal with mean zero and unknown variances different variances for frequency and claim severity The Tweedie model is a special case of the GLM family and it has mass at zero (avoids modeling frequency and severity separately). for some case, its density has no explicit form, but the variance function has the form V (µ) = µ p e.g. compound Poisson Gamma with 1 < p < 2. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 10 / 22

Model specifications Negative binomial for frequency Negative binomial for frequency Starting with the frequency component, we have the following model specification: where N b indep. NB(νe b, r) with density f N (N b) ( ) ( ) n + r 1 r r ( ) νe b n f N (n b) = n r + νe b r + νe b. Based on the log-link function, we have ν = exp(x α) where α is a p 1 vector of regression coefficients. The following relations hold: ( ) E[N b] = e x α+b and V ar(n b) = e x α+b 1 + e x α+b /r. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 11 / 22

Model specifications Gamma for average severity Gamma for average severity For the average severity component, given N > 0, we have the following model specification: C N, u indep. Gamma(µe u, φ/n) with density f C N (c n, u) where f C N (c n, u) = ( ) 1 n n/φ ( Γ(n/φ) µe u c n/φ 1 exp nc ) φ µe u. φ Additionally, we introduce the number of claims n as a covariate with coefficient θ. With this specification, the following relations hold: E[C N, u] = e x β+θn+u and V ar(c N, u) = e 2(x β+θn+u) φ/n. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 12 / 22

Model specifications Properties: mean and variance Mean and variance of aggregate claims It is interesting to note that in the case where we have negative binomial for frequency and gamma for average severity, we find that [ E[S x] = µνe [1 (νe b /r)(e θ 1)] r 1] e σ2 u/2+θ and the unconditional variance as ( [ V ar(s x) = µ 2 e σ2 u +2θ φe νe b [1 (νe b /r)(e 2θ 1)] r 1] e σ2 u [ +E ν 2 e 2b+2θ (1 + 1/r)[1 (νe b /r)(e 2θ 1)] r 2] e σ2 u [ +E νe b [1 (νe b /r)(e 2θ 1)] r 1] e σ2 u E [νe b [1 (νe b /r)(e θ 1)] r 1] ) 2 Note that in the case where b = 0 and u = 0, no random effects and we get Garrido, et al. (2016). In addition, if θ = 0, we have the independent GLM. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 13 / 22

Preliminary investigation Goodness-of-fit test for the frequency component Count Observed Poisson Negative Binomial 0 148198 147608.6 148193.5 1 12788 13895.4 12809.9 2 1109 654 1077.6 3 77 20.5 89.8 4 5 0.5 7.5 5 2 0 0.6 χ 2 104550.3 104079.2 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 14 / 22

Preliminary investigation Q-Q plots of fitting the gamma to average severity Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 15 / 22

Data estimation Observable policy characteristics used as covariates Categorical Description Proportions variables VehType Type of insured vehicle: Car 99.27% Motorbike 0.47% Others 0.27% Gender Insured s sex: Male = 1 80.82% Female = 0 19.18% Cover Code Type of insurance cover: Comprehensive = 1 78.65% Others = 0 21.35% Continuous Minimum Mean Maximum variables VehCapa Insured vehicle s capacity in cc 10.00 1587.44 9996.00 VehAge Age of vehicle in years -1.00 6.71 48.00 Age The policyholder s issue age 18.00 44.46 99.00 NCD No Claim Discount in % 0.00 35.67 50.00 M = 50, 215 unique policyholders, each observed over 8 years. The aggregated total number of observations is 162,179; we observe each policyholder over 3+ years. Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 16 / 22

Data estimation Estimates for frequency Regression estimates of the negative binomial for frequency Negative binomial GLM Negative binomial GLMM Estimate s.e. p-value Estimate s.e. p-value (Intercept) -4.3325 0.4071 0.0000-4.2787 0.4157 0.0000 VTypeCar 0.1902 0.1973 0.3349 0.1903 0.2000 0.3414 VTypeMBike -1.4025 0.4943 0.0045-1.3771 0.4967 0.0056 log(vehcapa) 0.3312 0.0345 0.0000 0.3322 0.0356 0.0000 VehAge -0.0162 0.0029 0.0000-0.0149 0.0029 0.0000 SexM 0.1085 0.0217 0.0000 0.1069 0.0223 0.0000 Comp 0.8128 0.0395 0.0000 0.8203 0.0399 0.0000 Age -0.0267 0.0174 0.1264-0.0317 0.0178 0.0745 Age 2 0.0003 0.0003 0.3669 0.0004 0.0003 0.2736 Age 3 0.0000 0.0000 0.7686 0.0000 0.0000 0.6629 NCD -0.0124 0.0005 0.0000-0.0117 0.0005 0.0000 σb 2 0.1931 0.4513 0.6518 Loglikelihood -49473.7233-49422.6414 AIC 98971.4466 98871.2829 BIC 99061.6022 99001.2368 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 17 / 22

Data estimation Estimates for average severity Regression estimates of the gamma for severity Independent GLM Dependent GLM Dependent GLMM Estimate s.e. p-value Estimate s.e. p-value Estimate s.e. p-value (Intercept) 7.4159 0.9852 0.0000 7.6057 0.9772 0.0000 5.9141 0.5052 0.0000 VTypeCar -0.1772 0.4963 0.7211-0.2935 0.4951 0.5533 0.1240 0.2528 0.6237 VTypeMBike 2.9972 1.3579 0.0273 2.8745 1.3420 0.0322 2.3229 0.6491 0.0003 log(vehcapa) 0.5261 0.0870 0.0000 0.5276 0.0860 0.0000 0.3298 0.0452 0.0000 VehAge -0.0303 0.0074 0.0000-0.0303 0.0073 0.0000-0.0129 0.0035 0.0002 SexM -0.0019 0.0537 0.9713-0.0061 0.0530 0.9091-0.0193 0.0281 0.4919 Comp 0.0387 0.1002 0.6994 0.0514 0.0989 0.6036 0.1880 0.0453 0.0000 Age -0.1629 0.0423 0.0001-0.1596 0.0418 0.0001-0.0456 0.0216 0.0344 Age 2 0.0032 0.0008 0.0001 0.0032 0.0008 0.0001 0.0010 0.0004 0.0164 Age 3 0.0000 0.0000 0.0001 0.0000 0.0000 0.0001 0.0000 0.0000 0.0135 NCD -0.0050 0.0012 0.0000-0.0052 0.0012 0.0000-0.0047 0.0005 0.0000 Count -0.1250 0.0446 0.0051 0.0104 0.0232 0.6537 σu 2 1.0250 0.5000 0.0404 Loglikelihood -138626.1189-138605.4168-133760.1180 AIC 277276.2378 277236.8337 267548.2359 BIC 277366.3933 277334.5022 267653.4174 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 18 / 22

Data estimation Tweedie results Regression estimates based on Tweedie Tweedie GLM Tweedie GLMM Estimate s.e. p-value Estimate s.e. p-value (Intercept) 3.1566 1.1189 0.0048 3.0962 5.8970 0.5996 VTypeCar 0.3362 0.5332 0.5283 0.3606 1.1259 0.7487 VTypeMBike 1.6437 0.7020 0.0192 1.6759 3.0302 0.5802 log(vehcapa) 0.8405 0.0922 0.0000 0.8464 0.6303 0.1794 VehAge -0.0478 0.0076 0.0000-0.0478 0.0242 0.0486 SexM 0.1041 0.0589 0.0775 0.1037 0.4205 0.8053 Comp 1.0102 0.1013 0.0000 1.0088 0.2363 0.0000 Age -0.2115 0.0496 0.0000-0.2120 0.2459 0.3886 Age 2 0.0039 0.0010 0.0001 0.0039 0.0048 0.4137 Age 3 0.0000 0.0000 0.0004 0.0000 0.0000 0.4637 NCD -0.0172 0.0013 0.0000-0.0172 0.0034 0.0000 σr 2 1283.0000 89.1794 0.0000 Loglikelihood -172583.9216-240643.4451 AIC 345191.8433 481312.8902 BIC 345281.9988 481442.8442 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 19 / 22

Data estimation Validation results Validation measures to compare models Frequency models Average severity models Tweedie models MSE MAE Negative binomial GLM 0.33161 0.16997 Negative binomial GLMM 0.33170 0.16942 MSE MAE MAPE Independent GLM 8327.977 3451.512 4.09889 Dependent GLM 8325.652 3462.561 4.12698 Dependent GLMM 8303.615 2595.855 2.12260 MSE MAE Tweedie GLM 2487.862 592.0651 Tweedie GLMM 2487.878 592.2978 Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 20 / 22

Data estimation Validation results Comparing the Gini index values for the various models Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 21 / 22

Conclusion Concluding remarks This work presents the results of our exploration of the promise of using GLMMs to account for dependency between frequency and severity. This work hopes to extend the literature on this type of dependency: H. Jeong (2016) work on Simple Compound Risk Model with Dependent Structure Garrido, et al. (2016) Frees and Wang (2006), Frees, et al. (2011) Czado (2012) Shi, et al. (2015) Further work may be necessary to measure the financial impact of using GLMMs versus other models: risk classification a posteriori prediction and applications in bonus-malus systems modified insurance coverage and reinsurance applications Jeong/Ahn/Park/Valdez (U. of Connecticut) ASTIN/AFIR Colloquium 2017 21-24 August 2017 22 / 22