Generalized linear mixed models (GLMMs) for dependent compound risk models

Similar documents
Generalized linear mixed models for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Multivariate negative binomial models for insurance claim counts

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Ratemaking with a Copula-Based Multivariate Tweedie Model

Generalized Linear Models Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Chapter 5: Generalized Linear Models

PL-2 The Matrix Inverted: A Primer in GLM Theory

GB2 Regression with Insurance Claim Severities

Generalized Linear Models (GLZ)

RMSC 2001 Introduction to Risk Management

Lecture 8. Poisson models for counts

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Advanced Ratemaking. Chapter 27 GLMs

RMSC 2001 Introduction to Risk Management

GLM I An Introduction to Generalized Linear Models

Tail negative dependence and its applications for aggregate loss modeling

Creating New Distributions

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

Peng Shi. Actuarial Research Conference August 5-8, 2015

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

Solutions to the Spring 2016 CAS Exam S

Non-Life Insurance: Mathematics and Statistics

Generalized Linear Models for a Dependent Aggregate Claims Model

STAT5044: Regression and Anova

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Linear Regression Models P8111

Generalized Linear Models 1

Generalized logit models for nominal multinomial responses. Local odds ratios

LOGISTIC REGRESSION Joseph M. Hilbe

SB1a Applied Statistics Lectures 9-10

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Bonus Malus Systems in Car Insurance

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Linear Regression With Special Variables

Outline of GLMs. Definitions

Some explanations about the IWLS algorithm to fit generalized linear models

A Practitioner s Guide to Generalized Linear Models

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

MODELING COUNT DATA Joseph M. Hilbe

Introduction to General and Generalized Linear Models

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Catalina Bolancé & Raluca Vernic

GENERALIZED LINEAR MODELS Joseph M. Hilbe

Credibility Theory for Generalized Linear and Mixed Models

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Single-level Models for Binary Responses

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Semiparametric Generalized Linear Models

SPRING 2007 EXAM C SOLUTIONS

Non-Gaussian Response Variables

Applied Generalized Linear Mixed Models: Continuous and Discrete Data

Chapter 4: Generalized Linear Models-II

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

A Study of the Impact of a Bonus-Malus System in Finite and Continuous Time Ruin Probabilities in Motor Insurance

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

High-Throughput Sequencing Course

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Generalized Linear Mixed Models in the competitive non-life insurance market

9 Generalized Linear Models

A strategy for modelling count data which may have extra zeros

A simple bivariate count data regression model. Abstract

Introduction to Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models

A finite mixture of bivariate Poisson regression models with an application to insurance ratemaking

Generalized Linear Models

arxiv: v1 [stat.ap] 16 Nov 2015

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Generalized linear models

Using P-splines to smooth two-dimensional Poisson data

Lecture 9 STK3100/4100

Advanced Quantitative Methods: limited dependent variables

Regression Model Building

Lecture 1 Introduction to Multi-level Models

Qualifying Exam in Probability and Statistics.

Generalized Estimating Equations

Generalized linear models

The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Optimal Design for the Rasch Poisson-Gamma Model

Generalized Linear Models I

Qualifying Exam in Probability and Statistics.

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

11. Generalized Linear Models: An Introduction

Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Likelihoods for Generalized Linear Models

Transcription:

Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia State University, Atlanta, Georgia 26-29 July 2017 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 1 / 22

Outline Introduction Data structure The frequency-severity model Exponential family Generalized linear mixed models Compound sum Model specifications Negative binomial for frequency Gamma for average severity Properties: mean and variance Preliminary investigation Data estimation Estimates for frequency Estimates for average severity Tweedie results Validation results Conclusion Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 2 / 22

Introduction Data structure Data structure Policyholder i is followed over time t = 1,..., T i years, where T i is at most 9 years. Unit of analysis it a registered vehicle insured i over time t (year) For each it, could have several claims, k = 0, 1,..., n it Have available information on: number of claims n it, amount of claim c itk, exposure e it and covariates (explanatory variables) x it covariates often include age, gender, vehicle type, driving history and so forth We will model the pair (n it, c it ) where is the observed average claim size. 1 n it c itk, n it > 0 c it = n it k=1 0, n it = 0 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 3 / 22

The frequency-severity model The frequency-severity model Traditional to predict/estimate insurance claims distributions: Cost of Claims = Frequency Average Severity The joint density of the number of claims and the average claim size can be decomposed as f(n, C x) = f(n x) f(c N, x) joint = frequency conditional severity. This natural decomposition allows us to investigate/model each component separately and it does not preclude us from assuming N and C are independent. For purposes of notation, we will use S = NC to denote the aggregate claims. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 4 / 22

The frequency-severity model Exponential family Exponential family We say Y comes from an exponential dispersion family if its density has the form [ ] yγ ψ(γ) f(y) = exp + c(y; τ). τ where γ and ψ are location and scale parameters, respectively, and ψ(γ) and c(y; τ) are known functions. The following well-known relations hold for these distributions: mean: µ = E[Y ] = ψ (γ) variance: V ar(y ) = τψ (γ) = τv (µ) See Nelder and Wedderburn (1972) Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 5 / 22

The frequency-severity model Exponential family Examples of members of this family Normal N(µ, σ 2 ) with γ = µ and τ = σ 2, V (µ) = 1 Gamma(α, β) with γ = β/α and τ = 1/α, V (µ) = µ 2 Inverse Gaussian(α, β) with γ = 1 2 β2 /α 2 and τ = β/α 2, V (µ) = µ 3 Poisson(µ) with γ = log µ and τ = 1, V (µ) = 1, V (µ) = µ Binomial(m, p) with γ = log [p/(1 p)] and τ = 1, V (µ) = µ(1 µ/m) Negative Binomial(r, p) with γ = log(1 p) and τ = 1, V (µ) = µ(1 + µ/r) Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 6 / 22

The frequency-severity model Exponential family The link function and linear predictors The linear predictor is a function of the covariates, or predictor variables. The link function connects the mean of the response to this linear predictor in the form η = g(µ). g(µ) is called the link function with µ = E(Y ) = linear predictor. The transformed mean follows a linear model as η = x iβ = β 0 + β 1 x i1 + + β p x ip with p predictor variables (or covariates). You can actually invert µ = g 1 (x i β) = g 1 (η). Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 7 / 22

The frequency-severity model Generalized linear mixed models Generalized linear mixed models GLMMs extend GLMs by allowing for random, or subject-specific, effects in the linear predictor which reflects the idea that there is a natural heterogeneity across subjects. For insurance applications, our subject i is usually a policyholder observed for a period of T i periods. Given the vector b with the random effects for each subject i, Y it belongs to the exponential family: ( ) yit γ it ψ(γ it ) f(y it R i, β, τ) = exp + c(y it, τ) τ with the following conditional relations: mean: µ it = Ey it R i = ψ (γ it ) variance: V ar(y it R i ) = τψ (γ it ) = τv (µ it ) Link function: g(µ it ) = x it β + z it R i Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 8 / 22

Compound sum Compound sum We can define the compound sum as S it = N it j=1 C itj = N it C it to represent the aggregate claims. Suppressing the subscripts, the unconditional mean of S can be expressed as E[S x] = E[NC x] = E[NE[C N, u] x] = E[Ng 1 C (x β + z u + θn) x] and the unconditional variance as V ar(s x) = V ar(e[nc N, x]) + E[V ar(nc N, x)] = V ar(ne[c N, x] x) + E[N 2 V ar(c N, x) x] = V ar(ng 1 C (x β + z u + θn) x) +E[N 2 τv (g 1 C (x β + z u + θn)) x] For aggregate claims for calendar year t alone, we can define M S t = N it C it, i=1 where here M is the total policies for each year. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 9 / 22

Model specifications Model specifications We calibrated models with the following specifications: For the count of claims (frequency): negative binomial for our baseline model more suitable than the traditional Poisson model because it can handle overdispersion and individual unobserved effects For the average severity of claims: Gamma For the random effects for GLMMs: Normal with mean zero and unknown variances different variances for frequency and claim severity The Tweedie model is a special case of the GLM family and it has mass at zero (avoids modeling frequency and severity separately). for some case, its density has no explicit form, but the variance function has the form V (µ) = µ p e.g. compound Poisson Gamma with 1 < p < 2. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 10 / 22

Model specifications Negative binomial for frequency Negative binomial for frequency Starting with the frequency component, we have the following model specification: where N b indep. NB(νe b, r) with density f N (N b) ( ) ( ) n + r 1 r r ( ) νe b n f N (n b) = n r + νe b r + νe b. Based on the log-link function, we have ν = exp(x α) where α is a p 1 vector of regression coefficients. The following relations hold: ( ) E[N b] = e x α+b and V ar(n b) = e x α+b 1 + e x α+b /r. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 11 / 22

Model specifications Gamma for average severity Gamma for average severity For the average severity component, given N > 0, we have the following model specification: C N, u indep. Gamma(µe u, φ/n) with density f C N (c n, u) where f C N (c n, u) = ( ) 1 n n/φ ( Γ(n/φ) µe u c n/φ 1 exp nc ) φ µe u. φ Additionally, we introduce the number of claims n as a covariate with coefficient θ. With this specification, the following relations hold: E[C N, u] = e x β+θn+u and V ar(c N, u) = e 2(x β+θn+u) φ/n. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 12 / 22

Model specifications Properties: mean and variance Mean and variance of aggregate claims It is interesting to note that in the case where we have negative binomial for frequency and gamma for average severity, we find that E[S x] = µνe[[1 (νe b /r)(e θ 1)] r 1 ]e σ2 u /2 and the unconditional variance as ( V ar(s x) = µ 2 e σ2 u φe[νe b [1 (νe b /r)(e 2θ 1)] r 1 ]e σ2 u + 1 4 E[ν2 e 2b (1 + 1/r)[1 (νe b /r)(e 2θ 1)] r 2 ]e σ2 u E[νe b [1 (νe b /r)(e θ 1)] r 1 ] 2) Note that in the case where b = 0 and u = 0, no random effects and we get Garrido, et al. (2016). In addition, if θ = 0, we have the independent GLM. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 13 / 22

Preliminary investigation Goodness-of-fit test for the frequency component Count Observed Poisson NegBin 0 148198 147608.6 148193.5 1 12788 13895.4 12809.9 2 1109 654 1077.6 3 77 20.5 89.8 4 5 0.5 7.5 5 2 0 0.6 χ 2 104550.3 104079.2 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 14 / 22

Preliminary investigation Q-Q plots of fitting the gamma to average severity Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 15 / 22

Data estimation Observable policy characteristics used as covariates Categorical Description Proportions variables VehType Type of insured vehicle: Car 99.27% Motorbike 0.47% Others 0.27% Gender Insured s sex: Male = 1 80.82% Female = 0 19.18% Cover Code Type of insurance cover: Comprehensive = 1 78.65% Others = 0 21.35% Continuous Minimum Mean Maximum variables VehCapa Insured vehicle s capacity in cc 10.00 1587.44 9996.00 VehAge Age of vehicle in years -1.00 6.71 48.00 IssAge The policyholder s issue age 18.00 44.46 99.00 NCD No Claim Discount in % 0.00 35.67 50.00 M = 50, 215 unique policyholders, each observed over 8 years. The aggregated total number of observations is 162,179; we observe each policyholder over 3+ years. Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 16 / 22

Data estimation Estimates for frequency Regression estimates of the negative binomial for frequency Negative binomial GLM Negative binomial GLMM Est s.e p-value Est s.e p-value (Intercept) -2.29 0.35 0.00-2.23 0.36 0.00 VTypeCar 0.23 0.20 0.25 0.23 0.20 0.25 VTypeMBike -1.81 0.49 0.00-1.79 0.49 0.00 VehCapa 0.00 0.00 0.00 0.00 0.00 0.00 VehAge -0.02 0.00 0.00-0.02 0.00 0.00 SexM 0.11 0.02 0.00 0.11 0.02 0.00 Comp 0.81 0.04 0.00 0.82 0.04 0.00 Age -0.02 0.02 0.22-0.03 0.02 0.14 Age 2 0.00 0.00 0.52 0.00 0.00 0.40 Age 3 0.00 0.00 0.94 0.00 0.00 0.83 NCD -0.01 0.00 0.00-0.01 0.00 0.00 σb 2 0.20 Loglikelihood -49486.83-49434.93 AIC 98997.65 98895.85 BIC 99087.81 99025.81 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 17 / 22

Data estimation Estimates for average severity Regression estimates of the gamma for severity Independent GLM Dependent GLM Dependent GLMM Est s.e p-value Est s.e p-value Est s.e p-value (Intercept) 9.92 0.84 0.00 10.12 0.84 0.00 7.89 0.44 0.00 VTypeCar 0.62 0.48 0.20 0.49 0.48 0.31 0.25 0.25 0.33 VTypeMBike 3.11 1.32 0.02 2.97 1.31 0.02 2.07 0.65 0.00 VehCapa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 VehAge -0.03 0.01 0.00-0.03 0.01 0.00-0.01 0.00 0.00 SexM 0.00 0.05 0.98 0.00 0.05 0.92-0.02 0.03 0.53 Comp 0.04 0.10 0.67 0.05 0.10 0.58 0.19 0.05 0.00 Age -0.16 0.04 0.00-0.16 0.04 0.00-0.05 0.02 0.03 Age 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 Age 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 NCD 0.00 0.00 0.00-0.01 0.00 0.00 0.00 0.00 0.00 Count -0.12 0.04 0.01 0.01 0.02 0.62 σu 2 1.02 Loglikelihood -138593.88-138575.66-133757.96 AIC 277211.75 277177.32 267543.92 BIC 277301.91 277274.99 267649.10 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 18 / 22

Data estimation Tweedie results Regression estimates based on Tweedie Tweedie GLM Tweedie GLMM Est s.e p-value Est s.e p-value (Intercept) 7.85 1.00 0.00 7.86 4.18 0.06 VTypeCar 0.93 0.60 0.12 0.93 1.20 0.44 VTypeMBike 1.11 0.74 0.14 1.10 2.83 0.70 VehCapa 0.00 0.00 0.00 0.00 0.00 0.19 VehAge -0.05 0.01 0.00-0.05 0.02 0.04 SexM 0.11 0.06 0.06 0.11 0.42 0.79 Comp 1.02 0.10 0.00 1.02 0.23 0.00 Age -0.21 0.05 0.00-0.21 0.24 0.40 Age 2 0.00 0.00 0.00 0.00 0.00 0.42 Age 3 0.00 0.00 0.00 0.00 0.00 0.47 NCD -0.02 0.00 0.00-0.02 0.00 0.00 σr 2 1259.50 Loglikelihood -172582.32-240222.97 AIC 345188.64 480471.93 BIC 345278.79 480601.89 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 19 / 22

Data estimation Validation results Validation measures to compare models Frequency models Average severity models Tweedie models MSE MAE Negative binomial GLM 0.28187 0.13718 Negative binomial GLMM 0.28187 0.13664 MSE MAE MAPE Independent GLM 8099.188 3517.314 4.39044 Dependent GLM 8099.480 3529.276 4.42182 Dependent GLMM 8017.858 2567.392 2.23721 MSE MAE Tweedie GLM 1984.409 479.4245 Tweedie GLMM 1983.866 479.2018 Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 20 / 22

Data estimation Validation results Comparing the Gini index values for the various models Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 21 / 22

Conclusion Concluding remarks We are still in the early stages of this work. In particular, this is an early exploration of the promise of using GLMMs to account for dependency between frequency and severity. This work hopes to extend the literature on this type of dependency: H. Jeong (2016) work on Simple Compound Risk Model with Dependent Structure Garrido, et al. (2016) Frees and Wang (2006), Frees, et al. (2011) Czado (2012) Shi, et al. (2015) Further work is needed to measure the financial impact of using GLMMs versus other models: risk classification a posteriori prediction and applications in bonus-malus systems modified insurance coverage and reinsurance applications Jeong/Ahn/Park/Valdez (U. of Connecticut) 52nd Actuarial Research Conference 26-29 July 2017 22 / 22