Multivariate negative binomial models for insurance claim counts

Similar documents
Counts using Jitters joint work with Peng Shi, Northern Illinois University

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models for dependent compound risk models

Peng Shi. Actuarial Research Conference August 5-8, 2015

Ratemaking with a Copula-Based Multivariate Tweedie Model

Generalized linear mixed models (GLMMs) for dependent compound risk models

Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Catalina Bolancé & Raluca Vernic

GB2 Regression with Insurance Claim Severities

Marginal Specifications and a Gaussian Copula Estimation

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

A finite mixture of bivariate Poisson regression models with an application to insurance ratemaking

On a simple construction of bivariate probability functions with fixed marginals 1

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Modelling multivariate count data using copulas

arxiv: v1 [stat.ap] 16 Nov 2015

Modelling Dependent Credit Risks

DOCUMENT DE TREBALL XREAP Mixture of bivariate Poisson regression models with an application to insurance

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

Tail negative dependence and its applications for aggregate loss modeling

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

A Goodness-of-fit Test for Copulas

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Notes for Math 324, Part 20

Simulating Realistic Ecological Count Data

The Credibility Estimators with Dependence Over Risks

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Risk Aggregation with Dependence Uncertainty

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Polynomial approximation of mutivariate aggregate claim amounts distribution

Bonus Malus Systems in Car Insurance

Simulating Exchangeable Multivariate Archimedean Copulas and its Applications. Authors: Florence Wu Emiliano A. Valdez Michael Sherris

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY GRADUATE DIPLOMA, Statistical Theory and Methods I. Time Allowed: Three Hours

Non parametric estimation of Archimedean copulas and tail dependence. Paris, february 19, 2015.

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

A simple bivariate count data regression model. Abstract

Discrete Choice Modeling

Multivariate Distributions

A strategy for modelling count data which may have extra zeros

Statistical Models for Defective Count Data

Financial Econometrics and Volatility Models Copulas

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application

Hybrid Copula Bayesian Networks

Modelling the risk process

The Instability of Correlations: Measurement and the Implications for Market Risk

Multivariate Distribution Models

Correlation: Copulas and Conditioning

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Notes for Math 324, Part 19

Introduction to Dependence Modelling

Unconditional Distributions Obtained from Conditional Specification Models with Applications in Risk Theory

Modelling dependent yearly claim totals including zero-claims in private health insurance

Multivariate Non-Normally Distributed Random Variables

Varieties of Count Data

Contents 1. Contents

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach

Using copulas to model time dependence in stochastic frontier models

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

CHAPTER 1 INTRODUCTION

Reducing Model Risk With Goodness-of-fit Victory Idowu London School of Economics

Default Priors and Effcient Posterior Computation in Bayesian

Construction and estimation of high dimensional copulas

Multivariate Normal-Laplace Distribution and Processes

Trivariate copulas for characterisation of droughts

High-Throughput Sequencing Course

On the existence of maximum likelihood estimators in a Poissongamma HGLM and a negative binomial regression model

CTDL-Positive Stable Frailty Model

MULTIDIMENSIONAL POVERTY MEASUREMENT: DEPENDENCE BETWEEN WELL-BEING DIMENSIONS USING COPULA FUNCTION

Lecture 2: Repetition of probability theory and statistics

Report and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model

Graduate Econometrics I: What is econometrics?

Vine copulas with asymmetric tail dependence and applications to financial return data 1. Abstract

Correlation & Dependency Structures

Hidden Markov models for time series of counts with excess zeros

Package CopulaRegression

The LmB Conferences on Multivariate Count Analysis

MODELING COUNT DATA Joseph M. Hilbe

2. Inference method for margins and jackknife.

Describing Contingency tables

STAT5044: Regression and Anova

VaR bounds in models with partial dependence information on subgroups

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Probability and Stochastic Processes

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Calculation of Bayes Premium for Conditional Elliptical Risks

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Mixture distributions in Exams MLC/3L and C/4

Creating New Distributions

Point process with spatio-temporal heterogeneity

Copula modeling for discrete data

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Stat 704 Data Analysis I Probability Review

Tail Dependence of Multivariate Pareto Distributions

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Some results on Denault s capital allocation rule

On the Systemic Nature of Weather Risk

Individual loss reserving with the Multivariate Skew Normal framework

The combined model: A tool for simulating correlated counts with overdispersion. George KALEMA Samuel IDDI Geert MOLENBERGHS

Transcription:

Multivariate negative binomial models for insurance claim counts Peng Shi (Northern Illinois University) and Emiliano A. Valdez (University of Connecticut) 9 November 0, Montréal, Quebec Université de Montréal Seminar

Université de Montréal Seminar 0 Background For short term insurance contracts, there are generally two major components to decompose: cost of claims = frequency severity Many believe that the frequency component: may be the more important of these two components reveals, to an extent, more of the riskiness of the policyholder Model improvements can be made on the prediction on frequency: usual accounting for heterogeneity - risk classification variables using additional information available in data recorded e.g. types of claim additional insight on the association between the types of claim Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 The multivariate claim count data Assume we have a portfolio of m policyholders observed over a fixed time period (cross-sectional). For each policyholder, we observe a vector of claim counts expressed as (N i, N i,..., N ik ), where N ij is the claim count associated with type j, with j =,,...,k, and i =,,...,m. Each policyholder also reveals a set of observable covariates x i useful to sub-divide the portfolio into classes of risks with homogeneous characteristics. We present and compare various alternatives for modeling multivariate claim counts using classical methods of using common shocks and the modern methods of using copulas. Traditional count regression models are used in the modeling of the marginal component. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Alternative multivariate models Some literature on incorporating dependence structure in a multivariate count outcomes: use of a common additive error: Kocherlakota and Kocherlakota (99), Johnson et al. (997), Winkelmann (000), Karlis and Meligkotsidou (00), and Bermúdez and Karlis (0) Poisson, zero-inflated Poisson, and negative binomial to capture overdispersion mixture model with a multiplicative error: Hausmann et al. (98), Day and Chung (99) may be restricted to non-negative covariance structure - alternatives here are Aitchison and Ho (989) and van Ophem (999) Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Multivariate models using copulas Construct general discrete multivariate distribution to support more complex correlation structures through the use of copulas: although believed still in its infancy, increasingly been popular applications extend to several disciplines: Economics: Prieger (00), Cameron et al. (00), Zimmer and Trivedi (00) Biostatistics: Song et al. (008), Madsen and Fang (00) Actuarial science: Purcaru and Denuit (00), Shi and Valdez (0), Shi and Valdez (0) Boucher, Denuit and Guillén (008) provides a survey of models for insurance claim counts with time dependence. Genest and Nešlehová (007) says be pre-cautious when using copulas: non-uniqueness of the copula, vague interpretation of the nature of dependence. Shi, P. and Valdez, E.A. /

Negative binomial distribution model Université de Montréal Seminar 0 N is said to follow a negative binomial distribution if its pmf may be expressed as ( ) Γ(η + n) η ( ) ψ n Pr(N = n) =, n = 0,,, Γ(η)Γ(n + ) + ψ + ψ and is denoted as N NB(ψ, η) for ψ, η > 0. Mean E(N) = ηψ and variance Var(N) = ηψ( + ψ) Unlike the Poisson distribution, the NB accommodates overdispersion via parameter ψ. As ψ 0, overdispersion vanishes and the NB converges to the Poisson distribution. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Negative binomial regression models For regression, consider the mean specified in terms of covariates λ = ηψ = exp(x β), which may come in two different parameterizations: NB-I model: η = σ exp(x β) NB-II model: η = σ Both assume same mean structure but different dispersion φ such that Var(N x) = φe(n x). The NB-I model implies a constant dispersion φ = + σ. The NB-II model allows for subject heterogeneity in the dispersion φ = + σ exp(x β). See Winkelmann (008). Shi, P. and Valdez, E.A. 7 /

Université de Montréal Seminar 0 Empirical data For our empirical analysis, claims data are obtained from one randomly selected automobile insurance company in Singapore: claims experience for a single year focus on non-fleet policy considered only policyholders with comprehensive coverage Summary of response and explanatory variables Variable Description Mean StdDev Responses N number of claims of third-party bodiliy injury 0.0 0. N number of claims of own damage 0.09 0. N number of claims of third-party property damage 0.0 0.9 Covariates young equals if age is less than 0.0 0.90 lowncd equals if NCD is less than 0 0.8 0.79 vage vehicle age.98.0 private equals for private car 0.90 0.9 vlux equals for luxuary car 0.0 0. smallcap equals if vehicle capacity is small 0.0 0.07 Shi, P. and Valdez, E.A. 8 /

Goodness of fit of the marginal models Université de Montréal Seminar 0 N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 7 7.9 70. 70.9 7. 9 0.9 9.0 9. 9. 7.0 0. 0.7 0. 0.9 0.79.0. χ 8.7.7.. N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 70 79.7 70.87 70.8 70. 9..8 9...78 8.. 7...80.8.8 0.0 0. 0.0 0.8 χ 0.9.7.7. N Observed Fitted Poisson ZIP NegBin-I NegBin-II 0 70 79.88 797. 700. 700.9 7.8 7.9.7. 0.9 7.7 0.7 0.7 0. 0. 0. 0.8 χ 7.8.8 0.78 0.8 Shi, P. and Valdez, E.A. 9 /

Multivariate models considered Université de Montréal Seminar 0 Multivariate models constructed based on common shocks as follows: N i = U i + U i + U i N i = U i + U i + U i N i = U i + U i + U i Multivariate models based on copulas: with F i (n, n, n x i ) = C(F (n x i ), F (n x i ), F (n x i ) x i ;Θ) f i (n, n, n x i ) = l =0 l =0 l =0 ( ) l +l +l C(u i,l, u i,l, u i,l x i ;Θ) Shi, P. and Valdez, E.A. 0 /

Families of copulas examined Université de Montréal Seminar 0 The class of mixtures of max-id copulas: extends construction of the Archimedean copula by mixing bivariate max-id copulas a multivariate distribution, say H, is max-id if H γ is a CDF for all γ > 0, Joe (99) allows to capture a global together with pair-wise dependence the copula appears complex but in manageable explicit form: C(u, u, u ) = ϕ X j<k log R jk e ν jϕ (u j ), e ν kϕ (u k ) X j= «ω j ν j ϕ (u j ) Joe and Hu (997) Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 Families of copulas examined - continued The class of elliptical copulas: has been popular because of straightforward calculation of the copula while at the same time, allowing for flexible correlation structures considered both Gaussian and t copulas, but ended up choosing Gaussian as a better model of the two (based on our empirical data) Gaussian copula: ) C(u, u, u ) = Φ Σ (Φ (u ), Φ (u ), Φ (u ) to circumvent the problem often encountered with using copulas for multivariate discrete data, we continuitize the discrete form using jitters Shi, P. and Valdez, E.A. /

Continuous extension with jitters Université de Montréal Seminar 0 Define Ñij = N ij U ij for j {,, } where U ij Uniform(0, ). The joint pdf of jittered counts for the ith policyholder Ñ i = (Ñi, Ñi, Ñi) may be expressed as: ef i (en i, en i, en i x i ) = c( e F (en i x i ), e F (en i x i ), e F (en i x i ) x i ;Θ) Y j= ef ij (en ij x ij ) Retrieve the joint pmf of (N i, N i, N i ) by averaging over the jitters: f i (n i, n i, n i x i ) =E Ui c( e F (n i U i x i ), e F (n i U i x i ), e F (n i U i x i ) x i ;Θ) Based on relations: Y j= «ef j (n ij U ij x ij ) F j (ñ ij x ij ) = F j ([ñ ij ] x ij ) + (ñ ij [ñ ij ])f j ([ñ ij + ] x ij ) f j (ñ ij x ij ) = f j ([ñ ij + ] x ij ) Shi, P. and Valdez, E.A. /

Multivariate NB-I with common shocks Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio intercept -.7-8.9 -. -8.80 -. -8.8 young 0.7.9 0.. 0.87.7 lowncd 0.78.97 0.7.0 0.0.7 vage -0.0-0. 0.0 0. -0.0-0. vage vage 0.00 0. -0.0 -.0 0.00. private -0.90 -.87 0.0.7 -.09 -.00 vlux 0.. 0.099 0. -.9 -.9 smallcap -0.0-0.8-0.77 -. -0.0 -.9 Estimate 9% CI lnλ -.79 (-., -.) lnλ -.709 (-.899, -.8) lnλ -.7 (-.7, -.08) lnσ -.97 (-7.87, -.007) Goodness-of-Fit Loglikelihood -9.9 AIC 909.7 BIC 9.7 Shi, P. and Valdez, E.A. /

Multivariate NB-I using mixtures of max-id copulas Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-I intercept -.9 -. -.87 -. -. -.70 young 0.8.9 0.7. 0.0.7 lowncd 0..8 0.8.08 0..78 vage -0.07 -. -0.0 -.0-0.00 -.0 vage vage 0.00. 0.00 0.79 0.00.09 private -0. -.8 0..9-0.70 -.70 vlux 0.07 0.08 0.0 0.7-0.7 -.8 smallcap -0. -. -0.8 -.0-0.77 -. Estimate 9% CI lnσ -.9 (-.0, -.) lnσ -. (-., -.) lnσ -.9 (-.7, -.08) θ.9 (.77,.8) θ.7 (.89,.) θ.7 (.,.) Goodness-of-Fit Loglikelihood -87. AIC 8. BIC 88. Shi, P. and Valdez, E.A. /

Multivariate NB-II using mixtures of max-id copulas Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-II intercept -.8 -.87 -.8 -.0 -.9 -.9 young 0.0. 0.. 0.8. lowncd 0.0.0 0.00.8 0.09. vage -0.07 -.98-0.0 -.7-0.0-0.8 vage vage 0.00. 0.00 0. 0.00 0.80 private -0. -.8 0..80-0.7 -.8 vlux -0.0-0.7-0.00-0.0-0. -.00 smallcap -0.9 -.7-0. -. -0.89 -. Estimate 9% CI lnσ 0.9 (-0.07, 0.8) lnσ -0.9 (-0.79, 0.) lnσ 0. (-0.09,.9) θ. (.79,.8) θ.80 (.99,.) θ. (.,.07) Goodness-of-Fit Loglikelihood -8. AIC 8.7 BIC 88.8 Shi, P. and Valdez, E.A. /

Multivariate NB-I using Gaussian copula Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-I intercept -.0 -.7 -.97 -. -.0 -. young 0.9.8 0.8.7 0.78.0 lowncd 0.78.70 0.. 0.8.8 vage -0.07 -.0-0. -.00-0.09 -. vage vage 0.00. 0.00 0.7 0.00.0 private -0.90 -. 0.7. -0.8 -.79 vlux 0.07 0.08 0.007 0.0-0. -.88 smallcap -0. -.9-0.9 -.88-0.8 -.8 Estimate 9% CI lnσ -.90 (-.77, -.) lnσ -.998 (-.80, -.7) lnσ -.99 (-.9, -.00) ρ 0.798 ( 0.770, 0.8) ρ 0.80 ( 0.79, 0.8) ρ 0.8 ( 0., 0.) Goodness-of-Fit Loglikelihood -9.8 AIC 87. BIC 88.0 Shi, P. and Valdez, E.A. 7 /

Multivariate NB-II using Gaussian copula Université de Montréal Seminar 0 Bodily Injury Own Damage Property Damage Estimate t-ratio Estimate t-ratio Estimate t-ratio NB-II intercept -.9 -. -.9 -.0 -.8 -. young 0.0.99 0.8. 0.8. lowncd 0.8. 0.0.0 0..8 vage -0.07 -.00-0. -.89-0.08 -. vage vage 0.00.9 0.00 0.9 0.00.0 private -0. -. 0.. -0.9 -.87 vlux -0.0-0. -0.0-0.0-0.70 -.98 smallcap -0.0 -.7-0. -.0-0.879 -.7 Estimate 9% CI lnσ -0.7 (-0.9, 0.9) lnσ -0.7 (-., 0.0) lnσ -0.0 (-.,.00) ρ 0.799 (0.77, 0.87) ρ 0.80 (0.770, 0.88) ρ 0.8 (0., 0.7) Goodness-of-Fit Loglikelihood -9.8 AIC 8. BIC 88.8 Shi, P. and Valdez, E.A. 8 /

Predictive applications Université de Montréal Seminar 0 This application examines the financial losses of third party bodily injury, own damage, and third party property damage, in a comprehensive coverage. For demonstration purposes, we look into the variable L i = N i + N i + N i that is the basis of the premium for risk class i. Five hypothetical risk classes: Ranking from high to low risk levels, they are Excellent, Very Good, Good, Fair, and Poor. For example, the policyholders in class Excellent are older than, have a NCD score above 0, and drive a 0-year old private small luxury car. Hypothetical risk classification profile young lowncd vage private vlux smallcap Excellent 0 0 0 Very Good 0 0 Good 0 0 0 Fair 0 0 0 0 Poor 0 0 0 0 Shi, P. and Valdez, E.A. 9 /

Prediction estimates of claim frequency Université de Montréal Seminar 0 Excellent Very Good MVNB MaxID Gaussian Product MVNB MaxID Gaussian Product ooo 0.90 0.99 0.97 0.97 0.8 0.907 0.899 0.889 oo+ 0.00 0.000 0.008 0.00 0.009 0.009 0.008 0.0 o+o 0.0 0.0 0.07 0.00 0.07 0.08 0.08 0.098 +oo 0.0 0.009 0.00 0.08 0.078 0.0 0.0 0.0 o++ 0.009 0.000 0.000 0.000 0.00 0.000 0.000 0.0009 +o+ 0.07 0.008 0.009 0.000 0.0 0.0070 0.007 0.0009 ++o 0.07 0.0 0.0 0.00 0.0 0.00 0.0 0.00 +++ 0.007 0.00 0.00 0.0000 0.00 0.0 0.07 0.0000 Good Fair MVNB MaxID Gaussian Product MVNB MaxID Gaussian Product ooo 0.8 0.87 0.88 0.7 0.778 0.7 0.799 0.7089 oo+ 0.0 0.0098 0.0 0.080 0.0 0.0 0.0 0.00 o+o 0.0989 0.07 0.07 0.98 0.089 0.088 0.07 0. +oo 0.000 0.00 0.0 0.08 0.0 0.00 0.0 0.088 o++ 0.00 0.00 0.009 0.000 0.00 0.008 0.00 0.0078 +o+ 0.009 0.0079 0.007 0.00 0.08 0.08 0.09 0.000 ++o 0.0 0.0 0.08 0.0097 0.09 0.0 0.00 0.0 +++ 0.00 0.0 0.090 0.000 0.00 0.07 0.00 0.0009 Poor MVNB MaxID Elliptical Product ooo 0.78 0.9 0.9 0.90 oo+ 0.070 0.088 0.0 0.09 o+o 0.087 0.08 0.077 0. +oo 0.07 0.07 0.0 0.0 o++ 0.0 0.08 0.00 0.07 +o+ 0.08 0.07 0.09 0.0 ++o 0.0 0.0 0.078 0.09 +++ 0.0070 0.0 0.078 0.000 Shi, P. and Valdez, E.A. 0 /

Mean and variance of total claims by risk class Université de Montréal Seminar 0 Excellent Very Good Good Fair Poor E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) E(L i ) Var(L i ) MVNB 0.0 0. 0.9 0. 0.0 0.9 0. 0.9 0. 0.90 MaxID-I 0.087 0.78 0.8 0.9 0.7 0.8 0.0.00 0.. MaxID-II 0.08 0. 0. 0.9 0.77 0.0 0.8.0 0..70 Gauss-I 0.078 0. 0. 0.9 0. 0.08 0.0 0.87 0.0.0 Gauss-II 0.07 0. 0. 0.7 0.7 0. 0.0 0.8 0..7 Indep-I 0.08 0.08 0. 0.7 0.80 0.9 0. 0.77 0. 0.7 Indep-II 0.079 0.080 0. 0.9 0.80 0.9 0. 0.80 0.0 0.0 Shi, P. and Valdez, E.A. /

Predictive distributions for various models Université de Montréal Seminar 0 Percent of Total Percent of Total MaxID Copula NB I Excellent Very Good Good Fair Poor 0. 0. 0. 0.8 Expected Count Elliptical Copula NB I Excellent Very Good Good Fair Percent of Total Percent of Total MaxID Copula NB II Excellent Very Good Good Fair Poor 0. 0. 0. 0.8 Expected Count Elliptical Copula NB II Excellent Very Good Good Fair Poor Poor 0. 0. 0. 0.8 0. 0. 0. 0.8 Expected Count Expected Count Shi, P. and Valdez, E.A. /

Concluding remarks Université de Montréal Seminar 0 We considered alternative approaches to construct multivariate count regression models based on the negative binomial distributions. The trivariate negative binomial models was proposed in order to accommodate the dependency among three types of claims: the third party bodily injury, own damage, and third-party property damage. Copulas provide flexibility to allow modeling various dependence structures, allowing to separate the effects of peculiar characteristics of the margins such as thickness of tails. In contrast, the class of multivariate negative binomial models based on common shocks rely on the additivity of the NB-I distribution and require a common dispersion for all marginals. We found that the superiority of the copula approach was supported by the better fit in the empirical analysis. Shi, P. and Valdez, E.A. /

Université de Montréal Seminar 0 - Merci bien - Shi, P. and Valdez, E.A. /