FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Similar documents
ST 740: Markov Chain Monte Carlo

Bayesian Networks in Educational Assessment

Advanced Statistical Modelling

Lecture 16: Mixtures of Generalized Linear Models

Bayesian Methods for Machine Learning

Principles of Bayesian Inference

Computational statistics

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

Bayesian Phylogenetics:

Markov Chain Monte Carlo methods

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

STA 4273H: Statistical Machine Learning

Bayes: All uncertainty is described using probability.

Tutorial on Probabilistic Programming with PyMC3

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

MCMC algorithms for fitting Bayesian models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Regression Linear and Logistic Regression

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Principles of Bayesian Inference

ntopic Organic Traffic Study

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

STA 4273H: Sta-s-cal Machine Learning

Modelling Operational Risk Using Bayesian Inference

13: Variational inference II

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

The lmm Package. May 9, Description Some improved procedures for linear mixed models

The STS Surgeon Composite Technical Appendix

Lecture 13 : Variational Inference: Mean Field Approximation

Contents. Part I: Fundamentals of Bayesian Inference 1

Probabilistic Machine Learning

Lecture 13 Fundamentals of Bayesian Inference

MCMC Methods: Gibbs and Metropolis

2 Inference for Multinomial Distribution

MCMC for Cut Models or Chasing a Moving Target with MCMC

BUGS Bayesian inference Using Gibbs Sampling

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

0.1 normal.bayes: Bayesian Normal Linear Regression

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Inference and Decision Theory

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Session 5B: A worked example EGARCH model

Parameter Estimation

Bayesian linear regression

Bayesian Inference and MCMC

STA 4273H: Statistical Machine Learning

Advanced Machine Learning

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Introduction to Bayesian Statistics 1

16 : Approximate Inference: Markov Chain Monte Carlo

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Learning Energy-Based Models of High-Dimensional Data

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Modeling of Accelerated Life Tests with Random Effects

Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

Bayesian Methods in Multilevel Regression

1 Using standard errors when comparing estimated values

CPSC 540: Machine Learning

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Principles of Bayesian Inference

Bayesian Linear Regression

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

A Bayesian Approach to Phylogenetics

Part 1: Expectation Propagation

BIG4: Biosystematics, informatics and genomics of the big 4 insect groups- training tomorrow s researchers and entrepreneurs

Bayesian Networks BY: MOHAMAD ALSABBAGH

Statistical Inference for Stochastic Epidemic Models

17 : Markov Chain Monte Carlo

36-463/663Multilevel and Hierarchical Models

STA 294: Stochastic Processes & Bayesian Nonparametrics

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Markov Chain Monte Carlo

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Why Bayesian approaches? The average height of a rare plant

Integrated Non-Factorized Variational Inference

Text Mining for Economics and Finance Latent Dirichlet Allocation

Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Gentle Introduction to Infinite Gaussian Mixture Modeling

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Bayesian Estimation with Sparse Grids

eqr094: Hierarchical MCMC for Bayesian System Reliability

Bayesian Computation

An introduction to Bayesian statistics and model calibration and a host of related topics

Markov Chain Monte Carlo, Numerical Integration

Principles of Bayesian Inference

Investigation into the use of confidence indicators with calibration

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

Estimating Income Distributions Using a Mixture of Gamma. Densities

MCMC Sampling for Bayesian Inference using L1-type Priors

Transcription:

Bayesian Claim Severity Part 2 Mixed Exponentials with Trend, Censoring, and Truncation By Benedict Escoto FAV i R This paper is produced mechanically as part of FAViR. See http://www.favir.net for more information. Abstract This is an continuation of the FAViR paper Bayesian Claim Severity with Mixed Distributions. The application is the same: the actuary is trying to produce a claim severity distribution and has a prior Dirichlet over mixed exponential distribution that s/he wants to update with observed claim data. However, in this paper the data is allowed to have a flat severity trend, and the claim data may be truncated or censored. Instead of a custom Gibbs sampler, JAGS is used to compute the posterior parameters. 1 Introduction The earlier FAViR paper Bayesian Claim Severity with Mixed Distributions (Escoto) derived a claim severity distribution from observed claim amounts using traditional Bayesian updating. There, each claim had a mixed exponential severity distribution, conditional on parameter risk which was represented by a Dirichlet distribution. Under these conditions, the marginal distribution of each claim was also a mixed exponential. Thus, an actuary could start with a prior mixed exponential severity distribution (from ISO or some other source) and refine it with available claim data. The resulting marginal distribution would be the correct credibility-weighted mixture of the prior distribution and the claim data, but would still be mixed exponential in form. This paper is a continuation of Bayesian Claim Severity with Mixed Distributions. The basic probabilistic model is the same: a mixed exponential severity with Dirichlet parameters. Unlike that paper however, here we handle three complications often seen in practice: 1. censored data (policy limits), 2. truncated data (deductibles), and 1

2 REQUIRED INPUT DATA 3. severity trend. Trend parameter risk (represented by a gamma distribution) is added to the model because trend credibility needs to be estimated simultaneously with severity. For instance, increasing average claim severity in the most recent years may indicate a high severity trend; or maybe a few huge losses happened to occur recently. Bayesian statistics properly weighs these possibilities. To compute the posterior distribution, this paper uses Just Another Gibbs Sampler (JAGS), a cross-platform general purpose open source MCMC engine, while the previous paper used a custom Gibbs sampler. As a result, this paper is slower and has more dependencies, but is also easier to modify, assess for convergence, and run in parallel chains. Knowledge of MCMC theory is not necessary to use this paper. 2 Required input data This section displays all the initial data required by this paper. The initial inputs can be grouped into four categories: the observed claim data, the prior means and weights of the mixed exponential severity distribution, the prior severity uncertainty (represented by a Dirichlet distribution), and the prior trend mean and standard deviation. 2.1 Claim Data The claim data used is shown in figure 1. For each claim we need to know its severity, whether or not it was censored at policy limits, its deductible (truncation threshold), and the age of the claim. Amount Age Deductible Capped? 33,750 3 0 No 1,000,000 1 0 Yes 22,707 1 0 No 54,135 1 0 No 174,524 3 0 No 19,661 2 0 No 140,735 2 0 No 1,000,000 3 0 Yes 1,127 1 0 No 316,483 2 0 No Figure 1: Observed Claim Data 2 FAV i R

2.2 Mixed Exponential 2 REQUIRED INPUT DATA 2.2 Mixed Exponential Figure 2 shows the means and weights of a mixed exponential distribution. This determines the model s marginal claim severity prior to conditionalization on the claim data (i.e. the severity distribution you d expect for the very next claim if no detailed claim data were available). Weights (%) Means 30.0 50,000 25.0 100,000 25.0 500,000 10.0 1,500,000 7.0 5,000,000 3.0 20,000,000 Avg 1,265,000 Figure 2: Prior Means and Weights 2.3 Dirichlet Uncertainty Although the means and weights of the mixed exponential distribution determine the margin severity distribution, we also need to know how certain we are that these parameters are accurate. Are they just a rough estimate, and may be far off, or are we sure that the true distribution is very similar? This uncertainty can be summarized as α 0, the sum of the Dirichlet parameters. Here we picked a value of 20. See section 4.1 for guidance on choosing this parameter. 2.4 Trend Finally, we need a prior distribution over the trend rate. The trend is applied to each claim by age. For instance, if the trend rate is 7%, then claims of age 2.5 are expected to be 1.07 2.5 times as severe as claims of age 0. We assume trend is constant over time, but the parameter uncertainty is modeled as a gamma distribution, shifted so that the zero point indicates 100% trend. Choosing a mean and standard deviation sufficies to determine the gamma parameters. In this paper, the trend mean is 0.05 and the trend standard deviation is 0.01. 3 FAV i R

4 PROBABILISTIC MODEL 3 Results This section presents the results of conditionalizing the Bayesian model on the observed claim data using MCMC. A total of 30000 samples were computed. Figure 3 shows the prior and posterior marginal weights. Prior to Data Posterior to Data Weight (%) Mean Weight (%) Error (%) Mean 30.0 50,000 31.0 0.12 50,000 25.0 100,000 25.7 0.12 100,000 25.0 500,000 23.3 0.09 500,000 10.0 1,500,000 9.7 0.06 1,500,000 7.0 5,000,000 7.2 0.05 5,000,000 3.0 20,000,000 3.3 0.04 20,000,000 Avg 1,265,000 Avg 1,310,196 Figure 3: Prior vs Posterior Exponential Weights The error column is an estimate of the standard error of the MCMC method. This can be decreased through running more simulations. Because the error is estimated using timeseries methods, it takes autocorrelation into account. The other exhibits assume this error is acceptably small and can be ignored. The coda package is compatible with JAGS and includes more tools for MCMC error-testing and diagnostics; a few sample commands are given in the source code to this paper. Figures 4, 5, and 6 show the prior and posterior expected loss in layer, trend, and ILFs. In these figures, each boxplot shows the 10th, 25th, 50th, 75th, and 90th percentiles of the corresponding distribution. 4 Probabilistic Model Here is the formal description of the Bayesian hierarchical claim severity model. It can be divided into process and parameter risk. Process risk: 1 x i b i g(exponential( µ bi d )) t i b i w 1,..., w m Categorical(w 1,..., w m ) 4 FAV i R

4 PROBABILISTIC MODEL 350 Expected Loss in Layer ($000) 300 250 200 150 100 50 0 Distribution Prior Posterior 500x0 250x500 250x750 500x1M 500x1.5M 1Mx2M 2Mx3M Layer Figure 4: Prior vs Posterior Loss in Layer 6.0 5.5 Trend (%) 5.0 4.5 4.0 Prior Distribution Posterior Figure 5: Trend Results 5 FAV i R

4.1 Choosing α 0 4 PROBABILISTIC MODEL 2.5 Increased Limit Factor 2.0 1.5 1.0 Distribution Prior Posterior 500 750 1000 1500 2000 3000 5000 Limit ($000) Figure 6: Prior vs Posterior ILF Distribution Parameter risk: w 1,..., w m Dirichlet(α 1,..., α m ) d Gamma(k, θ) The function g above represents censoring and/or truncation. The index i ranges over the number of observed claims. The means of the mixed exponential are µ 1,..., µ m and t is the age of the claim. See Bayesian Claim Severity with Mixed Distributions for more information. Mathematically, the only distinction between parameter and process parts of the model is that all the claims are assumed to have the same parameter risk variables but the process risk parameters vary per claim. For example, there is one the trend parameter d which affects all claims, but each claim will get its own instance of x and b. Conditional on d and w 1,..., w m, the distribution of each claim is independent. 4.1 Choosing α 0 In the input data section, the paramater α 0 was used to summarize our uncertainty around the mixed exponential prior distribution. Intuitively, this parameter can be thought of the number of claim observations encapsulated by the prior mixed exponential. So, after 6 FAV i R

5 COMPUTATION we observe α 0 claims, the resulting distribution will depend equally on our prior and the observed data. Because intuition is not of much help in selecting a particular value, we will use this result: If X is a mixed exponential distribution (with fixed means) depending on weights w 1,..., w m, and g(x) is a real function, then therefore Var[E[g(x) w 1,..., w m ]] = Var[ w j E[g(x) b = j]] = Var[ w j g j ] (1) α 0 = σ 2 ( = = gj 2 Var[w j ] + g j g k Cov[w j, w k ] (2) j k g 2 j a j (1 a j ) α 0 + 1 + j k g j g k a j a k α 0 + 1 gj 2 a j (1 a j ) g j g k a j a k ) 1 j k where g j = E[g(x) b = j], a j = α j α 0, and (3) follows from the properties of the Dirichlet distribution (see Bayesian Claim Severity with Mixed Distributions for more about the Dirichlet distribution). Suppose the actuary may feel that the expected value of the true claim distribution capped at $1M may be $50,000 off from the expected capped value implied by the prior mixed distribution. Then by setting g(x) = min(x, 1Mil), g j = µ j (1 e 1e6 µ j ) and the above result can be used to calculate α 0. In this paper, we set α 0 = 20, which implies that the standard deviation of loss in the first million layer is 65770. 5 Computation The posterior distribution of the probabilistic model specified in section?? is probably not analytically soluable. The version of the model in Bayesian Claim Severity with Mixed Distributions (which had no trend parameter) actually could be solved analytically, but the solution was not tractable. In these situations, Monte Carlo Markov Chain (MCMC) techniques are extremely useful. They have revolutionized Bayesian statistics in the last few decades. One MCMC algorithm is called Gibbs sampling. The earlier paper implemented its own Gibbs sampler. However, there are now dedicated software packages such as WinBUGS which allow users to specify custom Bayesian hierarchical models in an intuitive modelling language, and then solve those models using using Gibbs sampling and other MCMC algorithms. 7 FAV i R (3)

8 LEGAL This paper uses JAGS (Just Another Gibbs Sampler), which is an improved version of WinBUGS that is open source and cross platform. See Plummer (2003) for more information. JAGS can be used seamlessly from R through the runjags package. The JAGS model description for this paper takes about a dozen lines of code. Compared to custom code, JAGS (and WinBUGS) are easier to use and modify, but much slower. On my computer (Intel Core 2 Duo 6600 running Linux), the model takes about 10 minutes to process 600 claims. A tuned MCMC algorithm written in a low-level language like C would probably be 10 50 times faster. Also, MCMC techniques are highly parallelizable, so more cores increase speed almost linearly. 6 Conclusion This paper credibility weighs prior beliefs about claim severity with observed claim data. The actuary starts with a prior mixed exponential severity distribution (perhaps from an external source such as ISO) and uses standard Bayesian conditionalization on the claim data to arrive at a posterior weights distribution. Complications such as trend, censoring, and truncation are handled. This method may be practical whenever there is a shortage of claim data. If a huge number of relevant claims were available, there would be no need for Bayesian statistics the actuary could simply use the (possibly smoothed) empirical distribution. But when the number of data points is insufficient for non-parametric statistics, actuaries frequently turn to maximum likelihood methods. If prior distributions are available, Bayesian methods such as the one presented here are superior to maximum likelihood methods because they incorporate that information. 7 Bibliography 1. Escoto, Benedict M. Bayesian Claim Severity with Mixed Distributions. FAViR Drafts. http://www.favir.net/local-files/papers/bayesianmixed2.pdf 2. Plummer, Martyn. JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling, Proceedings of the 3rd International Workshop on Distributed Statistical Computing, (DSC 2003), Vienna, Austria. http://www.ci.tuwien.ac.at/conferences/dsc-2003/proceedings/plummer.pdf 8 Legal Copyright 2010 Benedict Escoto 8 FAV i R

8 LEGAL This paper is part of the FAViR project. All the R source code used to produce it is freely distributable under the GNU General Public License. See http://www.favir.net for more information on FAViR or to download the source code for this paper. Copying and distribution of this paper itself, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This paper is offered as-is, without any warranty. This paper is intended for educational purposes only and should not be used to violate anti-trust law. The authors and FAViR editors do not necessarily endorse the information or techniques in this paper and assume no responsibility for their accuracy. 9 FAV i R