GB2 Regression with Insurance Claim Severities

Similar documents
Counts using Jitters joint work with Peng Shi, Northern Illinois University

Probability Transforms with Elliptical Generators

Multivariate negative binomial models for insurance claim counts

Generalized linear mixed models (GLMMs) for dependent compound risk models

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Bonus Malus Systems in Car Insurance

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking with a Copula-Based Multivariate Tweedie Model

Generalized linear mixed models for dependent compound risk models

Modelling the risk process

Severity Models - Special Families of Distributions

Non-Life Insurance: Mathematics and Statistics

Probability Distributions Columns (a) through (d)

Subject CS1 Actuarial Statistics 1 Core Principles

DELTA METHOD and RESERVING

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Generalized linear mixed models (GLMMs) for dependent compound risk models

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

BAYESIAN ESTIMATION OF BETA-TYPE DISTRIBUTION PARAMETERS BASED ON GROUPED DATA. Kazuhiko Kakamu Haruhisa Nishino

Institute of Actuaries of India

Peng Shi. Actuarial Research Conference August 5-8, 2015

Tail negative dependence and its applications for aggregate loss modeling

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

DEPARTMENT OF ECONOMICS

Introduction & Random Variables. John Dodson. September 3, 2008

Generalized Linear Models. Kurt Hornik

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

STATISTICAL PROPERTIES FOR THE GENERALIZED COMPOUND GAMMA DISTRIBUTION

Estimating Income Distributions Using a Mixture of Gamma. Densities

Parameter Estimation

INSTITUTE OF ACTUARIES OF INDIA

Bootstrapping the triangles

Nonparametric Model Construction

Math438 Actuarial Probability

Correlation: Copulas and Conditioning

A strategy for modelling count data which may have extra zeros

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

The Inverse Weibull Inverse Exponential. Distribution with Application

Notes for Math 324, Part 20

Solutions to the Spring 2015 CAS Exam ST

MAXIMUM LQ-LIKELIHOOD ESTIMATION FOR THE PARAMETERS OF MARSHALL-OLKIN EXTENDED BURR XII DISTRIBUTION

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

PL-2 The Matrix Inverted: A Primer in GLM Theory

Creating New Distributions

SPRING 2007 EXAM C SOLUTIONS

Foundations of Probability and Statistics

Chapter 5: Generalized Linear Models

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

On the relationship between objective and subjective inequality indices and the natural rate of subjective inequality

ASTIN Colloquium 1-4 October 2012, Mexico City

Financial Econometrics and Quantitative Risk Managenent Return Properties

Robust and Efficient Fitting of Loss Models: Diagnostic Tools and Insights

Introduction to Algorithmic Trading Strategies Lecture 10

3 Joint Distributions 71

Standard Error of Technical Cost Incorporating Parameter Uncertainty

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

THE FIVE PARAMETER LINDLEY DISTRIBUTION. Dept. of Statistics and Operations Research, King Saud University Saudi Arabia,

Brief Review of Probability

GLM I An Introduction to Generalized Linear Models

Bayesian Assessment of Lorenz and Stochastic Dominance in Income Distributions

Practice Exam #1 CAS Exam 3L

DEPARTMENT OF ECONOMICS

The Burr X-Exponential Distribution: Theory and Applications

Method of Moments. which we usually denote by X or sometimes by X n to emphasize that there are n observations.

Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm

Tail Conditional Expectations for. Extended Exponential Dispersion Models

Applied Probability and Stochastic Processes

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION

Distribution Fitting (Censored Data)

A Comparison: Some Approximations for the Aggregate Claims Distribution

Introduction and Overview STAT 421, SP Course Instructor

Polynomial approximation of mutivariate aggregate claim amounts distribution

Multivariate Normal-Laplace Distribution and Processes

Module 1 Linear Regression

Information Matrix for Pareto(IV), Burr, and Related Distributions

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Advanced Ratemaking. Chapter 27 GLMs

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Healthcare costs; Health econometrics; Heavy tails; Quasi-Monte-Carlo methods

PROPERTIES AND DATA MODELLING APPLICATIONS OF THE KUMARASWAMY GENERALIZED MARSHALL-OLKIN-G FAMILY OF DISTRIBUTIONS

Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application

A Modified Family of Power Transformations

A finite mixture of bivariate Poisson regression models with an application to insurance ratemaking

arxiv: v1 [stat.me] 20 Apr 2018

The Metalog Distributions

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du

ABC methods for phase-type distributions with applications in insurance risk problems

General Regression Model

An automatic report for the dataset : affairs

3 Continuous Random Variables

Solutions to the Spring 2018 CAS Exam MAS-1

HANDBOOK OF APPLICABLE MATHEMATICS

Chart types and when to use them

Experience Rating in General Insurance by Credibility Estimation

Further results involving Marshall Olkin log logistic distribution: reliability analysis, estimation of the parameter, and applications

Generalized Autoregressive Score Models

Lecture 22 Survival Analysis: An Introduction

Approximate Median Regression via the Box-Cox Transformation

On the existence of maximum likelihood estimators in a Poissongamma HGLM and a negative binomial regression model

Transcription:

GB2 Regression with Insurance Claim Severities Mitchell Wills, University of New South Wales Emiliano A. Valdez, University of New South Wales Edward W. (Jed) Frees, University of Wisconsin - Madison UNSW Actuarial Research Symposium 9 November 2006 University of New South Wales Sydney, Australia Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 1 / 20

Summary of talk Summary of talk Purpose of this paper introduce the flexibility of the GB2 family to model long-tailed claims. how to inject regressor variables to the distribution. early discussion of the empirical work done. Approaches to introducing covariates to loss models Construction and properties of the GB2 family of distributions contains 4 parameters some well-known distributions within the family Empirical work preliminary some future direction Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 2 / 20

Covariates Introducing covariates in loss models Sun, J., Frees, E.W. and Rosenberg, M. (2006) discussion of: Klugman, S. and Rioux, J. (2006) NAAJ paper on Toward a Unified Approach to Fitting Loss Models Possible approaches to introducing regressor variables: Normal regression models - some transformation introduced (e.g. log of response) Generalized Linear Models - exponential dispersion distributions (e.g. Gamma, Inverse Gaussian) Parametric survival models - e.g. Cox s PH model More flexible parametric distributions - regression introduced on the parameters Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 3 / 20

GB2 distribution - construction Construction of the GB2 distribution Let X Gamma(γ 1, 1) and Y Gamma(γ 2, 1). Then ( ) X 1/α Z = β is a r.v. with a GB2 distribution. Y Four parameters: α 0, β, γ 1, γ 2 > 0. Density function: f Z (z) = α z αγ 1 1 β αγ 2 B (γ 1, γ 2 ) (β α + z α ) γ 1+γ 2, for z 0 where B (, ) is the usual Beta function. Distribution function: ( (z/β) α ) F Z (z) = B 1 + (z/β) α ; γ 1, γ 2 where B ( ;, ) is the incomplete Beta function. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 4 / 20

GB2 distribution - moments MGF s and moments Moment generating function: B ( γ 1 + k α M Z (t) =, γ 2 k ) α B (γ 1, γ 2 ) k=0 t k β k. k! Moments: E (Z n ) = β n B ( γ 1 + n α, γ 2 n ) α. B (γ 1, γ 2 ) Mean: E (Z) = β B ( γ 1 + 1 α, γ 2 1 ) α. B (γ 1, γ 2 ) Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 5 / 20

GB2 distribution - varying the parameters Figure 1: GB2 density for varying parameters GB2 density f(x) 0.0 0.5 1.0 1.5 2.0 α= 2 α= 1 α=1 α=2 GB2 density f(x) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 β=1 β=2 β=3 β=4 0 1 2 3 4 5 0 5 10 15 x x GB2 density f(x) 0.0 0.4 0.8 1.2 γ 1 = 0.5 γ 1 = 1 γ 1 = 5 γ 1 = 10 GB2 density f(x) 0.0 0.5 1.0 1.5 γ 2 = 2 γ 2 = 1.5 γ 2 = 1 γ 2 = 0.5 0 1 2 3 4 5 0 1 2 3 4 5 x x Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 6 / 20

GB2 distribution - some special cases Figure 2: Some special cases of GB2 GB2 γ1 =1 α = 1 γ 2 = 1 Burr XII Burr II Burr III γ 2 =1 γ 1 =1 α =1 γ =1 γ 2 = 1 1 α = 1 Pareto II (Lomax) Log-logistic Inverse Lomax Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 7 / 20

GB2 distribution some empirical work Some empirical work on GB2 Income or wealth distributions McDonald (1984) Butler and McDonald (1989) McDonald and Mantrala (1993, 1995) Bordley and McDonald (1993) McDonald and Xu (1995) Unemployment duration McDonald and Butler (1987) Insurance loss Cummins, Dionne, McDonald and Pritchett (1990) - fire losses published in Insurance: Mathematics & Economics Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 8 / 20

GB2 distribution introducing covariates Regression models for GB2 variables Assumption: x is a vector of m known covariates. Possible approaches: through the scale parameter β: β (x) = exp (θ x) through the shape parameter α: α (x) = θ x through both the scale and shape parameters simultaneously. Here, θ = (θ 1,..., θ m ) is the vector of regression coefficients. McDonald and Butler (1990) - regressors introduced in GB2 models for duration of AFDC claims Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 9 / 20

GB2 distribution regression Regression through the scale parameter We have: Z x GB2(α, β (x), γ 1, γ 2 ). Define residuals R i = Z i e θ x i so that where R i GB2(α, 1, γ 1, γ 2 ). log Z i = θ x i + log R i QQ plot diagnostics: ( ( ) ) i 0.5 Q, r n (i) for i = 1,..., n where r (i) denotes the ordered residuals with r (1) r (n). Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 10 / 20

GB2 distribution regression Regression through the shape parameter We have: Z x GB2(α (x), β, γ 1, γ 2 ). Define residuals R i = where R i GB2(1, 1, γ 1, γ 2 ). ( ) θ x i Zi so that β log R i log Z i log β = θ x i QQ plot diagnostics: ( ( ) ) i 0.5 Q, r n (i) for i = 1,..., n where r (i) denotes the ordered residuals with r (1) r (n). Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 11 / 20

Empirical analysis data characteristics Data analysis We have a portfolio of automobile insurance policies from Singapore. detailed information on policies of registered cars, claims and payments settled. period: 1 January 1993 until 31 December 2001 (nine years in total). Data contains individual records of 1,090,942 registered cars with policy and claims information over 9 years from 46 companies. Policy file has 26 variables with 5,667,777 records; claims file has 12 variables with 786,678 records; payment file has 8 variables with 4,427,605 records. In each year, about 5% are recorded as fleet policies. For our investigation, we selected fleet policies from one company. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 12 / 20

Empirical analysis possible covariates Possible covariates The calendar year - 1993-2001; treated as continuous variable. The level of gross premium for the policy in the calendar year - continuous. The type of vehicle: bus (B), car (C), or motorcycle (M) Cover type: comprehensive (C), third party fire and theft (F), and third party (T). The NCD applicable for the calendar year - rnaging from 0% to 50%, increment of 10%. No driver characteristics were included because only fleets considered in this initial investigation. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 13 / 20

Empirical analysis summary statistics Some summary statistics of the claims data Count 1,470 Mean 3,523 Standard deviation 4,765.4 Variance 22,709,497 Minimum 3 25th percentile 950 Median 1,949 75th percentile 4,226 Maximum 53,500 Skewness 4.01 Kurtosis 25.5 Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 14 / 20

Empirical analysis histogram of total claims Figure 3: claims histogram/density Histogram of Total Claims Density 0 e+00 1 e 04 2 e 04 3 e 04 4 e 04 5 e 04 0 10000 20000 30000 40000 50000 Total Claims Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 15 / 20

Empirical analysis parameter estimates Parameter estimates Parameter scale regression shape regression α 1.37066 (0.24943) - γ 1 0.75094 (0.17442) 1.82104 (0.41116) γ 2 1.80991 (0.65597) 6.19284 (2.11426) β - 12.17706 (5.35773) regression coefficients: β (x) = e θ x α (x) = θ x intercept 0.70564 (0.29956) 0.47929 (0.07573) Year 1992 (time) 0.01506 (0.01453) 0.00153 (0.00519) premium (in 000 s) 1.06505 (0.28226) 0.13931 (0.06957) Cover C 0.35377 (0.12717) 0.12748 (0.04574) VType Car 0.16869 (0.11352) 0.10131 (0.04536) premium*cover C -0.64244 (0.24697) - premium*vtype Car 0.03097 (0.12557) -0.03843 (0.04555) NCD 0 0.33157 (0.14687) 0.11076 (0.05182) NCD 10 0.37900 (0.28206) 0.18841 (0.11338) premium*ncd 0-0.41656 (0.15844) -0.11875 (0.06267) premium*ncd 10-0.66920 (0.24108) -0.18471 (0.09244) log-likelihood -3,232.752-3,228.611 # of params 14 13 AIC 6,493.5041 6,483.2217 BIC 6,493.3210 6,483.0516 Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 16 / 20

Empirical analysis QQ plots of residuals Figure 4: QQ plots of residuals Beta Regression Alpha Regression Theoretical Quantile of GB2 0 5 10 15 20 Theoretical Quantile of GB2 0 1 2 3 4 5 0 5 10 15 20 Empirical Quantile 0 1 2 3 4 5 Empirical Quantile Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 17 / 20

Empirical analysis unfinished business Unfinished business Analysis of another company s data Residual diagnostics QQ plot or PP plot Interpretation of the work - parameters Comparison with other known regression models GLM Burr XII regression Predictive power compare predictions with other models Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 18 / 20

Some references Some references Cummins, J.D., Dionne, G., McDonald, J.B., Pritchett, B.M. (1990) Applications of the GB2 family of distributions in modeling insurance loss processes, Insurance: Mathematics & Economics 9: 257-272. Kleiber, C., Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences. Wiley, New Jersey. Klugman, S. and Rioux, J. (2006) Toward a Unified Approach to Fitting Loss Models, North American Actuarial Journal 10(1): 147-153. McDonald, J.B. (1984) Some Generalized Functions for the Size Distribution of Income, Econometrica 52: 647-663. McDonald, J.B., Butler, R.J. (1990) Regression Models for Positive Random Variables, Journal of Econometrics 43: 227-251. Sun, J., Frees, E.W., Rosenberg, M. (2006), discussion of Klugman and Rioux s paper, NAAJ 10(2): 63-83. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 19 / 20

Acknowledgement Acknowledgement The author wishes to acknowledge the following for financial support: Australian Research Council through the Discovery Grant DP0345036; and the UNSW Actuarial Foundation of the Institute of Actuaries of Australia. Wills, Valdez & Frees (UNSW/Wisconsin) GB2 Regression UNSW Res Symp 2006 20 / 20