Ratemaking with a Copula-Based Multivariate Tweedie Model

Similar documents
Peng Shi. Actuarial Research Conference August 5-8, 2015

Multivariate negative binomial models for insurance claim counts

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Generalized linear mixed models (GLMMs) for dependent compound risk models

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models for dependent compound risk models

Polynomial approximation of mutivariate aggregate claim amounts distribution

PL-2 The Matrix Inverted: A Primer in GLM Theory

GB2 Regression with Insurance Claim Severities

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

Notes for Math 324, Part 20

Nonparametric Model Construction

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

Claims Reserving under Solvency II

Tail negative dependence and its applications for aggregate loss modeling

GLM I An Introduction to Generalized Linear Models

A finite mixture of bivariate Poisson regression models with an application to insurance ratemaking

Bonus Malus Systems in Car Insurance

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Catalina Bolancé & Raluca Vernic

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Notes for Math 324, Part 19

INSTITUTE OF ACTUARIES OF INDIA

Techniques for Dimension Reduction Variable Selection with Clustering

Analysis of a Bivariate Risk Model

RMSC 2001 Introduction to Risk Management

Correlation & Dependency Structures

Solutions to the Spring 2015 CAS Exam ST

RMSC 2001 Introduction to Risk Management

Non-Life Insurance: Mathematics and Statistics

Chapter 5: Generalized Linear Models

A New Generalized Gumbel Copula for Multivariate Distributions

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

C-14 Finding the Right Synergy from GLMs and Machine Learning

A Practitioner s Guide to Generalized Linear Models

Advanced Ratemaking. Chapter 27 GLMs

November 2000 Course 1. Society of Actuaries/Casualty Actuarial Society

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Qualifying Exam in Probability and Statistics.

Introduction to Rare Event Simulation

DOCUMENT DE TREBALL XREAP Mixture of bivariate Poisson regression models with an application to insurance

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Generalized Linear Models for a Dependent Aggregate Claims Model

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Analysing geoadditive regression data: a mixed model approach

STAT5044: Regression and Anova

Bayesian non-parametric model to longitudinally predict churn

Econometrics Problem Set 3

Unconditional Distributions Obtained from Conditional Specification Models with Applications in Risk Theory

Lecture 16: Mixtures of Generalized Linear Models

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Study Guide on Dependency Modeling for the Casualty Actuarial Society (CAS) Exam 7 (Based on Sholom Feldblum's Paper, Dependency Modeling)

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

A Study of the Impact of a Bonus-Malus System in Finite and Continuous Time Ruin Probabilities in Motor Insurance

Correlation: Copulas and Conditioning

ACTEX Seminar Exam P

ON THE BIVARIATE GENERALIZED POISSON DISTRIBUTION. University "Ovidius" Constanta, Romania

Lecture 1 Introduction to Multi-level Models

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

REGRESSION TREE CREDIBILITY MODEL

Information geometry for bivariate distribution control

Imputation Algorithm Using Copulas

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Copulas. MOU Lili. December, 2014

DELTA METHOD and RESERVING

Module 1 Linear Regression

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Qualifying Exam in Probability and Statistics.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Actuarial Science Exam 1/P

Econometrics Problem Set 6

A few basics of credibility theory

A Bayesian approach to parameter estimation for kernel density estimation via transformations

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

Chapter 5 continued. Chapter 5 sections

Solutions to the Spring 2016 CAS Exam S

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business

Severity Models - Special Families of Distributions

Simulating Exchangeable Multivariate Archimedean Copulas and its Applications. Authors: Florence Wu Emiliano A. Valdez Michael Sherris

Parameter Estimation

Neural, Parallel, and Scientific Computations 18 (2010) A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM 1. INTRODUCTION

Comonotonicity and Maximal Stop-Loss Premiums

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Linear Regression With Special Variables

Qualifying Exam in Probability and Statistics.

AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY *

Course 1 Solutions November 2001 Exams

Brief Review of Probability

Creating New Distributions

MODELING COUNT DATA Joseph M. Hilbe

Generalized Linear Models 1

Discrete Choice Modeling

Transcription:

with a Copula-Based Model joint work with Xiaoping Feng and Jean-Philippe Boucher University of Wisconsin - Madison CAS RPM Seminar March 10, 2015 1 / 18

Outline 1 2 3 4 5 6 2 / 18

Insurance Claims Two types of predictive models for insurance claims Frequency severity model: Two-stage framework Pure premium: Both belong to GLM and straightforward to implement More details in Predictive s in Actuarial Science, edited by Frees, Derrig, and Meyers 3 / 18

Multilevel Structure We often assume individual risks are independent which is not always the case We examine a specific case where dependence among risks is often introduced through the multilevel structure Many examples in property-casualty insurance Automobile insurance: household/fleet, multiple coverage Homeowner: neighborhood, multi-peril coverage Health: individual/group Worker s compensation. One can argue that dependence among risks is important for claim management Our goal is to incorporate the correlation into the claims modeling framework 4 / 18

Automobile Insurance Personal automobile insurance in Canada Four types of coverage Accident benefit - no-fault insurance benefit to the injured insured Collision - damage to the policyholder s vehicle due to collision All risk - damage to the policyholder s vehicle due to risks other than collision Civil liability - bodily injury and property damage of third party We examine a longitudinal dataset for years 2003-2006 Each policy could cover multiple vehicles Table : Distribution of number of vehicles per policy 1 2 3 4 Total Number 77352 10058 253 7 87670 Percentage 88.23 11.47 0.289 0.01 100 5 / 18

Claim Costs 6 / 18

Predictors The data set contains basic rating variables policyholder characteristics, driving history, vehicle characteristics use binary predictors in the analysis Table : Summary of rating variables Variable Description Mean young =1 if age between 16 and 25 0.021 seinor =1 if age more than 60 0.175 marital =1 if married 0.731 homeowner =1 if homeowner 0.658 experience =1 if more than ten years of experience 0.921 conviction =1 if positive number of convictions 0.050 newcar =1 if new car 0.896 leasecar =1 if lease car 0.153 business =1 if business use 0.037 highmilage =1 if drive more than 10,000 miles 0.704 multidriver =1 if more than two drivers 0.063 7 / 18

A Poisson sum of gamma random variables Y = (X 1 + + X N )/ω N Poisson(ωλ) Y j (j = 1,,N) gamma(α,γ) The belongs to the exponential familiy with the reparameterizations: λ = µ2 p φ(2 p), α = 2 p p 1, The density function is shown as [ ( ω f (y) = exp φ y µ2 p (p 1)µ p 1 2 p γ = φ(p 1)µp 1 ) ] + S(y;φ/ω) with E(Y) = µ and Var(Y) = φ ω µp Dispersion modeling? Tweed GLM: g µ (µ) = x β Dispersion model: g φ (φ) = z η 8 / 18

Bivariate 9 / 18 A copula is a multivariate distribution function with uniform marginals. Let U 1,...,U J be J uniform random variables on (0,1). Their distribution function H(u 1,...,u J ) = Pr(U 1 u 1,...,U J u J ) Consider two tweedie marginals Y 1 and Y 2 with cdf F 1 and F 2. Define the joint distribution using the copula H such that F(y 1,y 2 ) = H(F 1 (y 1 ),F 2 (y 2 )) Use Gaussian copula H(u1,u2) = Φ ρ (Φ 1 (u 1 ),Φ 2 (u 2 )) H(F 1 (0),F 2 (0)) if y 1 = 0 and y 2 = 0 f f (y 1,y 2 ) = 1 (y 1 )h 1 (F 1 (y 1 ),F 2 (0)) if y 1 > 0 and y 2 = 0 f 2 (y 2 )h 2 (F 1 (0),F 2 (y 2 )) if y 1 = 0 and y 2 > 0 f 1 (y 1 )f 2 (y 2 )h(f 1 (y 1 ),F 2 (y 2 ) if y 1 > 0 and y 2 > 0 Here h(u 1,u 2 ) = H(u 1,u 2 )/ u 1 u 2 h j (u 1,u 2 ) = H(u 1,u 2 )/ u j for j = 1,2 They have close-form expressions for Gaussian copula

Let Y ikjt denote the insurance cost for Household (Cluster) i (= 1,,M) Vehicle k (= 1,,K i ) Coverage type j (= 1,,J i ) Period t (= 1,,T i ) For example K 2 = 2, J i = 3, T i = 4, let Y i = (Y i111,,y i114,y i121,,y i124,,y i231,,y i234 ) Each marginal follows distribution Using Gaussian copula to build the joint distribution F i (y i ) = H(F(y i111 ),,F(Y i114 ),F(Y i121 ),,F(Y i124 ),,F(Y i231 ),,F(Y i234 )) Need to specify the association matrix R in the gaussian copula 10 / 18

Define R = B P where ( 1 δ B = δ 1 ) and P = P 11 σ 12 P 12 σ 13 P 13 σ 21 P 21 P 22 σ 23 P 23 σ 31 P 31 σ 32 P 32 P 33 Further 1 ρ j ρ 2 τ jj 1 ρj 2 1 ρ 2 j ρ3 j j σ jj = and P AR 1 ρ j ρ j jj = ρ j 1 ρ j ρ 2 j ρj 2 ρ j 1 ρ j ρj 3 ρj 2 ρ j 1 Summarize dependence parameters Between vehicles δ Between coverage types τ 12, τ 13, τ 23 Temporal ρ 1, ρ 2, ρ 3 11 / 18

Composite Likelihood Use composite likelihood to estimate model parameters cl i (θ;y) = 1 (sum of bivariate loglik) m i 1 Use the inverse of the Godambe information matrix to get standard error G 1 N (θ) = H 1 N (θ)j N (θ)h 1 N (θ) 2 cl i (θ;y i ) θ θ and J N (θ) = 1 N N i=1 where H N (θ) = N 1 N i=1 Model Comparison CLAIC = 2cl(θ;y) + 2tr(J(θ)H(θ) 1 ) CLBIC = 2cl(θ;y) + log(dim(θ))tr(j(θ)h(θ) 1 ) cl i (θ;y i ) cl i (θ;y i ) θ θ 12 / 18

Marginal Models Model dispersion as well as mean Different predictors for each type Table : Parameter estimates for accident benefit Mean Dispersion Est. S.E. Est. S.E. intercept 5.732 0.13 intercept 7.107 0.05 conviction 0.807 0.14 homeowner -0.099 0.03 homeowner -0.225 0.07 experience 0.489 0.05 experience -0.324 0.12 multidriver -0.237 0.06 young -0.911 0.22 marrital 0.174 0.04 senior -0.466 0.11 senior 0.151 0.05 highmilage -0.578 0.08 p 1.703 0.00 13 / 18

Dependence Strong association among coverage types Small serial correlation and cluster effect Table : Estimates of dependence parameters Est. S.E. ρ 1 0.163 0.022 ρ 2 0.051 0.013 ρ 3 0.099 0.015 ρ 4 0.101 0.011 τ 12 0.436 0.010 τ 13 0.049 0.018 τ 14 0.641 0.009 τ 23 0.013 0.012 τ 24 0.351 0.007 τ 34 0.021 0.010 δ 0.082 0.020 14 / 18

Model Comparison Table below summarizes the goodness-of-fit statistics of alternative specifications Smaller statistics indicate better fit Table : Goodness-of-fit statistics Model Description CLAIC CLBIC M0 independence 958,513 959,142 M1 no temporal 957,747 958,375 M2 no cross-sectional 958,501 959,130 M3 no cluster 957,734 958,363 M4 no dispersion 958,491 959,120 M5 full model 957,730 958,358 15 / 18

Individual Risk young senior marital homeowner experience conviction newcar leasecar business highmilage multidriver Excellent 0 1 1 1 1 0 0 0 0 1 0 Very Good 0 1 1 1 1 0 1 1 1 0 0 Good 0 1 1 1 0 1 1 0 1 1 0 Fire 0 1 1 1 1 1 1 1 1 0 1 Poor 0 0 0 0 0 1 1 1 1 0 1 16 / 18

Portfolio Risk Left: mean v.s. dispersion Right: independence v.s. copula 17 / 18

We focused on the multilevel structure of claims data model was considered as an example Thank you for your kind attention. Learn more about my research at: https://sites.google.com/a/wisc.edu/peng-shi/ 18 / 18