Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

Similar documents
Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Recall the Basics of Hypothesis Testing

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Distribution Fitting (Censored Data)

Subject CS1 Actuarial Statistics 1 Core Principles

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Hypothesis testing:power, test statistic CMS:

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Generalized Linear Models

5.3 Three-Stage Nested Design Example

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Poisson Regression. The Training Data

Textbook Examples of. SPSS Procedure

Generalized linear models

Modeling and Performance Analysis with Discrete-Event Simulation

Exam details. Final Review Session. Things to Review

STT 843 Key to Homework 1 Spring 2018

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

EE/CpE 345. Modeling and Simulation. Fall Class 9

Answer Keys to Homework#10

Assignment 9 Answer Keys

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Reducing Model Risk With Goodness-of-fit Victory Idowu London School of Economics

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Institute of Actuaries of India

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Section 4.6 Simple Linear Regression

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Chapter 11. Hypothesis Testing (II)

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Statistics 3858 : Contingency Tables

Research Methodology: Tools

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Investigation of goodness-of-fit test statistic distributions by random censored samples

Log-linear Models for Contingency Tables

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

HANDBOOK OF APPLICABLE MATHEMATICS

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

Topic 21 Goodness of Fit

Odor attraction CRD Page 1

Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Chapter 8 (More on Assumptions for the Simple Linear Regression)

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Introduction and Background to Multilevel Analysis

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Testing Independence

One-Sample Numerical Data

Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Identification of marginal and joint CDFs using Bayesian method for RBDO

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

The Multinomial Model

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

2.1: Inferences about β 1

Asymptotic Statistics-VI. Changliang Zou

Chi-Square. Heibatollah Baghi, and Mastee Badii

Generalized linear models

Lecture 28 Chi-Square Analysis

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Statistical Data Analysis Stat 3: p-values, parameter estimation

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Frequency Distribution Cross-Tabulation

Statistical Hypothesis Testing with SAS and R

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Model selection and comparison

R Hints for Chapter 10

Bootstrapping the triangles

A Few Special Distributions and Their Properties

Notation Precedence Diagram

Lecture 41 Sections Mon, Apr 7, 2008

Sleep data, two drugs Ch13.xls

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Exam 2 (KEY) July 20, 2009

Model Fitting. Jean Yves Le Boudec

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

SPRING 2007 EXAM C SOLUTIONS

Tables Table A Table B Table C Table D Table E 675

Random Number Generation. CS1538: Introduction to simulations

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Confidence Intervals, Testing and ANOVA Summary

Transcription:

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1

Expected vs Actual Distribution Test distributions of: Number of claims (frequency) Sizeof ultimateloss loss (severity) Sources of significant difference between actual and expected amounts: Programming or communication errors Not understanding how statistical language (e.g. R ) works. Errors or misleading results in R. J. Marker, LSMWP, CLRS 2

Display Raw Simulator Output Claims file Simulation No Occurrence No Claim No Accident Date Report Date Line Type 1 1 1 20000104 20000227 1 1 1 2 1 20000105 20000818 1 1. Transactions file Simulation No Occurrence No Claim No Date Transaction Case Reserve Payment 1 1 1 20000227 REP 2000 0 1 1 1 20000413 RES 89412 0 1 1 1 20000417 CLS 91412 141531..... J. Marker, LSMWP, CLRS 3

Another use for Testing information Create Ultimate Loss File for Analysis Layout Simula tion. No Occurrence No Claim No Accident. Date Report. Date Line Type Case. Reserve Payment Idea: Another use for this section of paper If an insurer can summarize its own claim data to this format, then it can use the tests we will discuss to parameterize the Simulator using its data. We have included in this paper all the R code used in testing. J. Marker, LSMWP, CLRS 4

Emphasis in the Paper Document the R code used in performing various tests. Provide references for those who want to explore the modeling more oedeeply. Provide visual as well as formal tests QQPlots, histograms, densities, etc. J. Marker, LSMWP, CLRS 5

Test 1 Frequency, Zero Modification, Trend Model parameters: # Occurrences ~ Poisson (mean = 120 per year) 1,000 simulations One claim li per occurrence Frequency Trend 2% per year, three accident years Pr[Claim is Type 1] = 75%; Pr[Type 2] = 25% Pr[CNP( Closed Closed No payment )] = 40% Type and Status independent. Status is a category variable for whether a claim is closed with payment. Test output to see if its distribution is consistent with assumptions. J. Marker, LSMWP, CLRS 6

Test 1 Classical Chi square Contingency Table Tbl Actual Counts Expected Counts Type 1 Type 2 Margin Type 1 Type 2 Margin CNP 111,066 37,007 0.398906 CNP 111,029.0 37,044.0 0.398906 CWP 167,268 55,857 0.601094 CWP 167,305.0 55,820.0 0.601094 Margin 0.749826 0.250174 371,198 0.749826 0.250174 371,198 Χ 2 2 = ( Actualij Expectedij ) = 0.0819 Expected i j ij Pr [Χ 2 > 0.0819 ] = 0.775. The independence of Type and Status is supported. J. Marker, LSMWP, CLRS 7

Test 1 Regression approach Previous result can be obtained using xtabs command in R Result can also be obtained using Poisson GLM Full model: dl model6x<- glm(count ~ Type + Status + Type*Status, data = temp.datacc.stack, family = poisson, x=t) Reduced dmodel: dl model5x<- glm(count ~ Type + Status, data = temp.datacc.stack, family = poisson, x=t) Independence obtains if the interactive variable Type*Status is not significant. J. Marker, LSMWP, CLRS 8

Test 1 Analysis of variance anova( model5x, model6x, test="chi") t Analysis of Deviance Table Response: count Terms Resid. Df Resid. Dev Test Df 1 + Type + Status 143997 160969.366 2 Type + Status + Type * Status 143996 160969.284 +Type:Status 1 Deviance Pr(Chi) 1 2 0.0819088429 0.774727081 Result matches the previous Χ 2 Test. We did not show here the model coefficients, which will produce the expected frequency for each combination i of Type and Status. J. Marker, LSMWP, CLRS 9

Test 2 Univariate size of loss Model parameters: Three lines no correlation in frequency by line # Claims for each line ~ Poisson (mean = 600 per year) Two accident years, 100 simulations Size of loss distributions Line 1 lognormal Line 2 Pareto Line 3 Weibull Zero trend in frequency and size of loss. Expected count = 600 (freq) x 100 (# sims) x 3 (lines) x 2 (years) = 360,000. Actual # claims: 359,819. J. Marker, LSMWP, CLRS 10

Size of loss testing strategy Person doing testing Person running simulation. Test all three distributions ib ti on each line s output. t Produce plots to get a feel for distributions. Fit using maximum likelihood estimation. Produce QQ (quantile quantile) plots Run formal goodness of fit tests. J. Marker, LSMWP, CLRS 11

Size of loss Histograms and p.d.f. df J. Marker, LSMWP, CLRS 12

Size of loss Histograms and p.d.f. J. Marker, LSMWP, CLRS 13

Size of loss The plots above compare: Histogram of empirical distribution Density of the theoretical distribution withm m.l.e. parameters The plots show that both Weibull and Pareto fit Lines 2 and 3 well. QQ plots offer another perspective. J. Marker, LSMWP, CLRS 14

Size of loss QQ Plots Example of R code to produce a QQ Plot thqua.w2 <- rweibull(n2,shape=fit.w2$estimate[1],scale=fit.w2$estimate[2]) generate a random sample same size n2 as empirical data qqplot(ultloss2,thqua.w2,xlab="sample Quantiles", ylab="theoretical Quantiles", main="line 2, Weibull") ultloss2 is empirical data, thqua.w2 is the generated sample abline(0,1,col="red ) One can also replace the sample with the quantiles of the theoretical Weibull c.d.f. J. Marker, LSMWP, CLRS 15

Size of Loss QQ Plot, Line 1 J. Marker, LSMWP, CLRS 16

Size of Loss QQ Plot, Line 2 J. Marker, LSMWP, CLRS 17

Size of Loss QQ Plot, Line 3. J. Marker, LSMWP, CLRS 18

Size of Loss Fitted distributions From QQ Plots, it appears that lognormal fits Line 1, Pareto fits Line 2, and Weibull fits Line 3. Chi square is a formal goodness of fit test. Section 6 discusses setting up the test for Pareto on Line 2. Appendix B contains R code for all the chi square tests. Komogorov Smirnov test was applied also, but too late to include results in this presentation. J. Marker, LSMWP, CLRS 19

Size of Loss Chi square g.o.f. test Setting up bins and the expected and actual # claims by bin is not easy in R. Define break points and bins: s = sqrt(var(ultloss2)) ult2.cut <- cut(ultloss2.0, ##binning data breaks = c(0,m-s/2,m,m+s/4,m+s/2,m+s,m+2*s,2*max(ultloss2))) Note: ultloss2.0 is vector of loss sizes, m = mean The table of expected and observed values by bin: # E.2 O.2 x.sq.2 #[1,] 43993.890 44087 0.19705959 Notes: #[2,] 35651.989 35680 0.02200752 E.2 expected number #[3,] 10493.758 10323 2.77864169 O.2 actual number #[4,] 7240.583 7269 0.11152721 x.sq.2 Chi-sq statistic #[5,] 9277.383 9164 1.38570182 #[6,] 8063.576 8176 1.56743997 #[7,] 5289.820 5312 0.09299630 J. Marker, LSMWP, CLRS 20

Size of Loss Chi square g.o.f. test Execute the Chi Square test df=length(e.2)-1-2 ## degrees of freedom Result= 4 chi.sq.2 <- sum(x.sq.2) ## test statistic Result = 6.155374 qchisq(.95,df) ## critical value Result = 9.487729 1-pchisq(chi.sq.2,df) ## p-value Result = 0.1878414 Important degrees of freedom = 4, not 6, because the two parameters for expected distribution were determined from m.l.e. on the data rather than from a predetermined distribution. Using the chi squared test in R directly would produce a wrong p value: chisq.test(o.2,p=e.2/n2.0) This test tuses degrees of freedom = 6 J. Marker, LSMWP, CLRS 21

Correlation Model allows correlated variables ibl in two ways: Frequencies among lines. Report lag and size of loss. We tested the correlation feature for frequency by line. To do this, first specify the parameters for Poisson or negative binomial frequency by line. Then specify correlation matrix and the copula that links the univariate frequency distributions to the multivariate distribution. The correlation testing helped the programmer determine how the copula statements from R actually work in the model. J. Marker, LSMWP, CLRS 22

Correlation simulation parameters Simulator was run 7/20/2010 / with parameters: Three lines Annual frequency by line is Poisson with mean 96. One accident year. 1,000 simulations Gaussian (normal) copula Frequency correlation matrix: Correlation Line 1 Line 2 Line 3 Line 1 1 0 0.99 Line 2 0 1-0.01 Line 3 0.99-0.01 1 J. Marker, LSMWP, CLRS 23

Correlation data used The annual number of claims were summarized by simulation and line to a file D:/LSMWP/byyear.csv. Visualize this data: Row (simulation) Line 1 Line 2 Line 3 1 114 95 117 2 89 85 90.... 99 103 78 101 100 96 106 99 J. Marker, LSMWP, CLRS 24

Correlation Fitting data Detail of statisticaltestingtesting for correlation is in section 6.2.3 3and Appendix B of the paper. Data was fit to normal copula using both m.l.e. and inversion of Kendall s tau, using all 1,000 observations, and then goodness of fit tests were applied to each pair of lines. Scatter plot of Line 1 and Line 3 data Line.3 0.8 1.0 0.0 0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Line.1 J. Marker, LSMWP, CLRS 25

Correlation estimated correlation from data Details of maximum likelihood estimate of correlations Estimate Std. Error z value Pr(> z ) Rho(line 1 & 2) -0.002112605 0.031977597-0.06606516 0.9473259 Rho(line 1 & 3) 0.979258746 0.000921392 1062.80366235 0.0000000 Rho(line 2 & 3) -0.010486832 0.031974114-0.32797880 0.7429277 Example of statements used for first rho above: normal2.cop <- normalcopula(c(0),dim=2,dispstr="un") gofcopula(normal2.cop, x12, N=100, method = "mpl") Note: x12 is a dataset without line 3 observations. J. Marker, LSMWP, CLRS 26

Correlation goodness of fit The empirical copula and hypothesized copula are compared under the null hypothesis that they are from the same copula. Cramér von Mises ( CvM ) statistic S n is used. Goodness of fit test runs very slowly, so each pair of lines were compared using only the first 100simulations simulations. The two sample Kolmogorov Smirnov test was performed. This compared the empirical distribution with a random sample from the hypothesized distribution. J. Marker, LSMWP, CLRS 27

Correlation g.o.f. results Line 1&2 Parameter estimate(s): 0.002100962 Cramer von Mises statistic: 0.0203318 with p value 0.4009901 Line 1&3 Parameter estimate(s): 0.97926 Cramer von Mises statistic: ttiti 0.007494245007494245 with p value l 0.3811881 Line 2&3 Parameter estimate(s): 0.01049841 Cramer von Mises statistic: 0.01614539 with p value 0.5891089 J. Marker, LSMWP, CLRS 28

Final Thoughts on Testing Initial tests were simple because we were also checking the mechanics of the model. There are many more features of the model to explore and to test. The testing statements can also be applied to parameterize the model using an insurer s data. The tests described donly test ultimate distributions, not the loss development patterns. J. Marker, LSMWP, CLRS 29