Peng Shi. Actuarial Research Conference August 5-8, 2015

Similar documents
Ratemaking with a Copula-Based Multivariate Tweedie Model

Multivariate negative binomial models for insurance claim counts

Counts using Jitters joint work with Peng Shi, Northern Illinois University

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

Polynomial approximation of mutivariate aggregate claim amounts distribution

Generalized linear mixed models (GLMMs) for dependent compound risk models

Nonparametric Model Construction

GB2 Regression with Insurance Claim Severities

Notes for Math 324, Part 20

Notes for Math 324, Part 19

Generalized linear mixed models for dependent compound risk models

Claims Reserving under Solvency II

Generalized linear mixed models (GLMMs) for dependent compound risk models

A finite mixture of bivariate Poisson regression models with an application to insurance ratemaking

Tail negative dependence and its applications for aggregate loss modeling

RMSC 2001 Introduction to Risk Management

Bonus Malus Systems in Car Insurance

Notes for Math 324, Part 12

INSTITUTE OF ACTUARIES OF INDIA

RMSC 2001 Introduction to Risk Management

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Multi-level Models: Idea

Imputation Algorithm Using Copulas

Module 1 Linear Regression

Propensity Score Weighting with Multilevel Data

November 2000 Course 1. Society of Actuaries/Casualty Actuarial Society

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

Econometrics Problem Set 3

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

A Bayesian approach to parameter estimation for kernel density estimation via transformations

15 TH ANNUAL ACI RISK MANAGEMENT CONFERENCE SESSION 1A

PL-2 The Matrix Inverted: A Primer in GLM Theory

Solutions to the Spring 2015 CAS Exam ST

Reducing Model Risk With Goodness-of-fit Victory Idowu London School of Economics

Checklist: Deposing the Driver in an Auto Accident

DOCUMENT DE TREBALL XREAP Mixture of bivariate Poisson regression models with an application to insurance

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Techniques for Dimension Reduction Variable Selection with Clustering

Discrete Choice Modeling

Correlation & Dependency Structures

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Analysis of a Bivariate Risk Model

A simple bivariate count data regression model. Abstract

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Modelling dependent yearly claim totals including zero-claims in private health insurance

The Credibility Estimators with Dependence Over Risks

Disclosures - IFFCO TOKIO General Insurance Co. Ltd. for the period 1st April, st December, 2018 S.No. Form No Description

Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Catalina Bolancé & Raluca Vernic

C-14 Finding the Right Synergy from GLMs and Machine Learning

Correlation: Copulas and Conditioning

A New Generalized Gumbel Copula for Multivariate Distributions

REGRESSION TREE CREDIBILITY MODEL

Introduction to Maximum Likelihood Estimation

Advanced Ratemaking. Chapter 27 GLMs

WEST CHICAGO FIRE PROTECTION DISTRICT Budget vs. Actual Summary For the 12 Month(s) Ended May 31, 2017

Qualifying Exam in Probability and Statistics.

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

ACTEX Seminar Exam P

Calendar Year Dependence Modeling in Run-Off Triangles

Introduction to Probability Theory

On the Estimation and Application of Max-Stable Processes

Lecture 1 Introduction to Multi-level Models

WORKERS COMPENSATION HISTORY

Econometrics Problem Set 6

SOCIO-DEMOGRAPHIC INDICATORS FOR REGIONAL POPULATION POLICIES

The Church Demographic Specialists

Neural, Parallel, and Scientific Computations 18 (2010) A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM 1. INTRODUCTION

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Introduction to Econometrics. Regression with Panel Data

Course 1 Solutions November 2001 Exams

Assessing Social Vulnerability to Biophysical Hazards. Dr. Jasmine Waddell

A Study of the Impact of a Bonus-Malus System in Finite and Continuous Time Ruin Probabilities in Motor Insurance

Demographic Data in ArcGIS. Harry J. Moore IV

Multiple Decrement Models

Copulas. MOU Lili. December, 2014

On the existence of maximum likelihood estimators in a Poissongamma HGLM and a negative binomial regression model

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

A Practitioner s Guide to Generalized Linear Models

NEW YORK COMPENSATION INSURANCE RATING BOARD Loss Cost Revision

Kernel density estimation for heavy-tailed distributions...

Surge Pricing and Labor Supply in the Ride- Sourcing Market

Low-Income African American Women's Perceptions of Primary Care Physician Weight Loss Counseling: A Positive Deviance Study


multilevel modeling: concepts, applications and interpretations

Semiparametric Generalized Linear Models

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Kernel density estimation for heavy-tailed distributions using the champernowne transformation

A Model of Traffic Congestion, Housing Prices and Compensating Wage Differentials

Generalized Linear Models for Non-Normal Data

Markov Chains and Hidden Markov Models

Marginal Specifications and a Gaussian Copula Estimation

Measuring community health outcomes: New approaches for public health services research

How Detailed Weather Data and Geospatial Tools Can Be Used to Increase Net Income

Spatial Organization of Data and Data Extraction from Maptitude

Joseph E. Boxhorn, Ph.D., Senior Planner Southeastern Wisconsin Regional Planning Commission #

Problem #1 #2 #3 #4 Total Points /5 /7 /8 /4 /24

Transcription:

Multivariate Longitudinal of Using Copulas joint work with Xiaoping Feng and Jean-Philippe Boucher University of Wisconsin - Madison Université du Québec à Montréal Actuarial Research Conference August 5-8, 2015 1 / 21

Outline 1 2 3 4 5 6 2 / 21

modeling depends on the purposes of the models Model building often relies on the features of the data Some unique features in insurance claims data 1) Semi-continuous 2) Multivariate 3) Multilevel Frequency-severity (two-part) v.s. pure premium models More details in Predictive s in Actuarial Science, edited by Frees, Derrig, and Meyers 3 / 21

Some unique features in insurance claims data 1) Semi-continuous 2) Multivariate Automobile insurance: third party liability, collision, comprehensive... Homeowner multiperil: fire, lighting, hail, wind... Worker s compensation: medical care, income replacement... Health insurance: physician v.s. non-physician, outpatient v.s. hospital inpatient 3) Multilevel Automobile insurance: fleet/household, vehicle, multi-period Homeowner multiperil: neighborhood, houses, periods Worker s compensation: employer, occupation, individual Health insurance: employer/household, individual, periods 4 / 21

Automobile Personal automobile insurance in Canada Four types of coverage Accident benefit - no-fault insurance benefit to the injured insured Collision - damage to the policyholder s vehicle due to collision All risk - damage to the policyholder s vehicle due to risks other than collision Civil liability - bodily injury and property damage of third party We examine a longitudinal dataset for years 2003-2006 Each policy could cover multiple vehicles Table : Distribution of number of vehicles per policy 1 2 3 4 Total Number 77,352 10,058 253 7 87,670 Percentage 88.23 11.47 0.289 0.01 100 5 / 21

Claim Costs 6 / 21

Predictors The data set contains basic rating variables policyholder characteristics, driving history, vehicle characteristics use binary predictors in the analysis Table : Summary of rating variables Variable Description Mean young =1 if age between 16 and 25 0.021 seinor =1 if age more than 60 0.175 marital =1 if married 0.731 homeowner =1 if homeowner 0.658 experience =1 if more than ten years of experience 0.921 conviction =1 if positive number of convictions 0.050 newcar =1 if new car 0.896 leasecar =1 if lease car 0.153 business =1 if business use 0.037 highmilage =1 if drive more than 10,000 miles 0.704 multidriver =1 if more than two drivers 0.063 7 / 21

Multivariate Let Y ikjt denote the insurance cost for Household (Cluster) i (= 1,,M) Vehicle k (= 1,,K i ) Coverage type j (= 1,,J i ) Period t (= 1,,T i ) For example K 2 = 2, J i = 3, T i = 4, let Y i = (Y i111,,y i114,y i121,,y i124,,y i231,,y i234 ) Using Gaussian copula to build the joint distribution F i (y i ) = H(F(y i111 ),,F(Y i114 ),F(Y i121 ),,F(Y i124 ),,F(Y i231 ),,F(Y i234 );R) Need to specify F Need to specify the association matrix R in the gaussian copula 8 / 21

Multivariate Start from R = B (Σ P) where ( ) 1 δ B =,P = δ 1 and for example P = 1 ρ ρ 2 ρ 3 ρ 1 ρ ρ 2 ρ 2 ρ 1 ρ ρ 3 ρ 2 ρ 1 1 σ 12 σ 13 σ 21 1 σ 23 σ 31 σ 32 1 Fully separate 9 / 21

Multivariate Tweedie Define R = B P where Further ( 1 δ B = δ 1 ) and P = P 11 σ 12 P 12 σ 13 P 13 σ 21 P 21 P 22 σ 23 P 23 σ 31 P 31 σ 32 P 32 P 33 1 ρ j ρ 2 j ρ3 j P AR jj = ρ j 1 ρ j ρ 2 j ρj 2 ρ j 1 ρ j ρj 3 ρj 2 ρ j 1 Summarize dependence parameters Between vehicles δ Between coverage types σ 12, σ 13, σ 23 Temporal ρ 1, ρ 2, ρ 3 10 / 21

Composite Likelihood Use composite likelihood to estimate model parameters cl i (θ;y) = 1 (sum of bivariate loglik) m i 1 Use the inverse of the Godambe information matrix to get standard error G 1 N (θ) = H 1 N (θ)j N (θ)h 1 N (θ) 2 cl i (θ;y i ) θ θ and J N (θ) = 1 N N i=1 where H N (θ) = N 1 N i=1 Model Comparison CLAIC = 2cl(θ;y) + 2tr(J(θ)H(θ) 1 ) CLBIC = 2cl(θ;y) + log(dim(θ))tr(j(θ)h(θ) 1 ) cl i (θ;y i ) cl i (θ;y i ) θ θ 11 / 21

Bivariate Model Use Gaussian copula H(u1,u2) = Φ ρ (Φ 1 (u 1 ),Φ 2 (u 2 )) H(F 1 (0),F 2 (0)) if y 1 = 0 and y 2 = 0 f f (y 1,y 2 ) = 1 (y 1 )h 1 (F 1 (y 1 ),F 2 (0)) if y 1 > 0 and y 2 = 0 f 2 (y 2 )h 2 (F 1 (0),F 2 (y 2 )) if y 1 = 0 and y 2 > 0 f 1 (y 1 )f 2 (y 2 )h(f 1 (y 1 ),F 2 (y 2 ) if y 1 > 0 and y 2 > 0 Here h(u 1,u 2 ) = H(u 1,u 2 )/ u 1 u 2 h j (u 1,u 2 ) = H(u 1,u 2 )/ u j for j = 1,2 They have close-form expressions for Gaussian copula 12 / 21

Specification We set J = 2, K = 2, and T = 4, that is, a policy covers two vehicles and provides two types of coverage for each vehicle. We use Tweedie(µ j,p j,φ j ) with the following specification for coverage type j = 1 and 2: µ j = exp(β j0 + β j1 X 1 + β j2 X 2 ) φ j = exp(γ j0 + γ j1 X 1 + γ j2 X 2 ) X 1 Bernoulli(0.5) and X 2 Bernoulli(0.6) independently. 13 / 21

Specification We use the Gaussian copula with the correlation matrix specified as: ( 1 δ Σ = δ 1 ) σ 12 1 ρ 1 ρ1 2 ρ1 3 ρ 1 1 ρ 1 ρ1 2 ρ1 2 ρ 1 1 ρ 1 ρ1 3 ρ1 2 ρ 1 1 1 ρ 1 ρ 2 1 ρ 3 1 ρ 2 1 ρ 1 ρ 2 1 ρ 2 2 ρ 2 1 ρ 1 ρ 3 2 ρ 2 2 ρ 2 1 σ 12 1 ρ 2 ρ2 2 ρ 3 ρ 1 1 ρ 2 ρ2 2 ρ1 2 ρ 1 1 ρ 2 ρ1 3 ρ1 2 ρ 1 1 1 ρ 2 ρ 2 2 ρ 3 2 ρ 2 1 ρ 2 ρ 2 2 ρ 2 2 ρ 2 1 ρ 2 ρ 3 2 ρ 2 2 ρ 2 1 Use parameterization σ 12 = τ 1 2 1 ρ1 2 1 ρ2 2/(1 ρ 1ρ 2 ). 14 / 21

Specification Table : results for sample size (number of policies) N = 200 and 500 Parameter Relative Bias SD SE MSE N = 200 500 200 500 200 500 200 500 β 10 = 1-0.102-0.030 0.361 0.244 0.369 0.234 0.141 0.060 β 11 = 1.5 0.010 0.007 0.201 0.122 0.208 0.136 0.041 0.015 β 12 = 0.5 0.130 0.020 0.290 0.176 0.295 0.187 0.088 0.031 β 20 = 1-0.011-0.018 0.282 0.190 0.278 0.183 0.080 0.036 β 21 = 0.5-0.012 0.003 0.269 0.163 0.218 0.142 0.072 0.026 β 22 = 2 0.003-0.002 0.200 0.121 0.194 0.129 0.040 0.015 p 1 = 1.2-0.011-0.003 0.025 0.016 0.022 0.014 0.001 0.000 p 2 = 1.4-0.002-0.002 0.028 0.018 0.027 0.017 0.001 0.000 γ 10 = 5-0.006-0.005 0.127 0.081 0.119 0.080 0.017 0.007 γ 11 = 1 0.015 0.017 0.093 0.057 0.093 0.060 0.009 0.004 γ 12 = 1-0.028-0.023 0.120 0.077 0.114 0.077 0.015 0.006 γ 20 = 4-0.002 0.000 0.127 0.084 0.119 0.079 0.016 0.007 γ 21 = 0 NA NA 0.110 0.073 0.108 0.071 0.012 0.005 γ 22 = 1-0.002 0.010 0.138 0.083 0.121 0.079 0.019 0.007 ρ 1 = 0.8-0.011-0.006 0.045 0.030 0.042 0.028 0.002 0.001 ρ 2 = 0.8-0.011-0.009 0.044 0.026 0.038 0.026 0.002 0.001 τ = 0.3-0.041-0.037 0.129 0.085 0.113 0.080 0.017 0.007 δ = 0.6-0.015-0.017 0.083 0.053 0.073 0.049 0.007 0.003 15 / 21

Marginal Models Model dispersion as well as mean Different predictors for each type Table : Parameter estimates for accident benefit Mean Dispersion Est. S.E. Est. S.E. intercept 5.732 0.13 intercept 7.107 0.05 conviction 0.807 0.14 homeowner -0.099 0.03 homeowner -0.225 0.07 experience 0.489 0.05 experience -0.324 0.12 multidriver -0.237 0.06 young -0.911 0.22 marrital 0.174 0.04 senior -0.466 0.11 senior 0.151 0.05 highmilage -0.578 0.08 p 1.703 0.00 16 / 21

Dependence Strong association among coverage types Small serial correlation and cluster effect Table : Estimates of dependence parameters Est. S.E. ρ 1 0.163 0.022 ρ 2 0.051 0.013 ρ 3 0.099 0.015 ρ 4 0.101 0.011 τ 12 0.436 0.010 τ 13 0.049 0.018 τ 14 0.641 0.009 τ 23 0.013 0.012 τ 24 0.351 0.007 τ 34 0.021 0.010 δ 0.082 0.020 17 / 21

Model Comparison Table below summarizes the goodness-of-fit statistics of alternative specifications Smaller statistics indicate better fit Table : Goodness-of-fit statistics Model Description CLAIC CLBIC M0 independence 958,513 959,142 M1 no temporal 957,747 958,375 M2 no cross-sectional 958,501 959,130 M3 no cluster 957,734 958,363 M4 no dispersion 958,491 959,120 M5 full model 957,730 958,358 18 / 21

Individual Risk young senior marital homeowner experience conviction newcar leasecar business highmilage multidriver Excellent 0 1 1 1 1 0 0 0 0 1 0 Very Good 0 1 1 1 1 0 1 1 1 0 0 Good 0 1 1 1 0 1 1 0 1 1 0 Fire 0 1 1 1 1 1 1 1 1 0 1 Poor 0 0 0 0 0 1 1 1 1 0 1 19 / 21

Portfolio Risk Left: mean v.s. dispersion Right: independence v.s. copula 20 / 21

We focused on the multilevel structure of claims data Tweedie model was considered as an example Thank you for your kind attention. Learn more about my research at: https://sites.google.com/a/wisc.edu/peng-shi/ 21 / 21