Generalized Linear Models in Reserving Risk

Size: px

Start display at page:

Download "Generalized Linear Models in Reserving Risk"

Lee Townsend
5 years ago
Views:

1 Charles University in Prague Faculty of Mathematics and Physics MASTER THESIS Bc. Lenka Zboňáková Generalized Linear Models in Reserving Risk Department of Probability and Mathematical Statistics Supervisor of the master thesis: Study programme: Specialization: RNDr. Michal Pešta, Ph.D. Mathematics Probability, Mathematical Statistics and Econometrics Prague 2014

2 I would like to express my sincere gratitude to the supervisor of my master thesis RNDr. Michal Pešta, Ph.D. for his time, valuable advice and suggestions. My special thanks also goes to my family for their love, care and support they have been giving me not only from the beginning of my studies, but throughout my whole life. Last but not least, who I want to extend my thanks to, are my friends who encouraged and helped me whenever I needed.

3 I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In Prague, 5th December 2014 Bc. Lenka Zboňáková

4 Název práce: Zobecněné lineární modely v upisovacím riziku Autor: Bc. Lenka Zboňáková Katedra: Katedra pravděpodobnosti a matematické statistiky Vedoucí diplomové práce: RNDr. Michal Pešta, Ph.D., Katedra pravděpodobnosti a matematické statistiky Abstrakt: V předložené diplomové práci se zabýváme zobecněnými lineárními modely v koncepci problému rezerv na pojistná plnění. Po představení tvorby rezerv na pojistná plnění a uvažované třídy modelů, zavedeme tuhle větev stochastického modelování do neživotního pojištění. Pro výpočet rizika spojeného s tvorbou rezerv na pojistná plnění potřebujeme prediktivní rozdělení budoucích závazků, aby bylo možné určit hodnotu rizikových měr jako jsou hodnota v riziku (Value at Risk) a podmíněná hodnota v riziku (Conditional Value at Risk). Protože jsou data v neživotním pojištění běžně sestavena z malého počtu pozorování a odhad prediktivních rozdělení může být komplikovaný, pro tento účel volíme bootstrapovou metodu. Odhad modelů, simulace a následné meření upisovacího rizika jsou provedeny s použitím reálných dat. Na základě toho je do práce zahrnuta analýza odhadnutých modelů a jejich porovnání spolu s grafickými výstupy. Klíčová slova: rezervy na pojistná plnění, zobecněné lineární modely, bootstrapová metoda, míry rizika Title: Generalized Linear Models in Reserving Risk Author: Bc. Lenka Zboňáková Department: Department of Probability and Mathematical Statistics Supervisor: RNDr. Michal Pešta, Ph.D., Department of Probability and Mathematical Statistics Abstract: In the presented thesis we deal with the generalized linear models framework in a claims reserving problem. Claims reserving in non-life insurance is firstly described and the considered class of models is introduced. Consequently, this branch of stochastic modelling is implemented in the reserving setup. For computation of the risk associated with claims reserving, we need a predictive distribution of future liabilities in order to evaluate risk measures such as Value at Risk and Conditional Value at Risk. Since datasets in non-life insurance commonly consist of a small number of observations and estimation of predictive distributions can be complicated, we adopt a bootstrap method for this purpose. Model fitting, simulations and consequent measuring of the reserving risk are performed within the use of real-life data. Based on this, an analysis of fitted models and their comparison together with graphical outputs is included. Keywords: claims reserves, generalized linear models, bootstrap method, risk measures

5 Contents Introduction 3 1 Introduction to claims reserving theory Notation Basic claims reserving methods Chain-ladder method Bornhuetter-Ferguson method Generalized linear models Basic assumptions and terminology Link functions Model fitting Maximum likelihood method in GLM s Quasi-likelihood function Measuring the goodness of fit Testing the significance of explanatory variables Generalized linear models in claims reserving Modelling of claims counts Over-dispersed Poisson model Over-dispersed negative binomial model Modelling of claims amounts Gamma models Inverse Gaussian models Normal models Bootstrapping in development triangles Over-dispersed Poisson model Over-dispersed negative binomial model Gamma models Inverse Gaussian models Normal models Application to real-life data Dataset Analysis of models Normal approximation of the ONB model Conclusion 47 Bibliography 48 List of Figures 50 List of Tables 51 1

6 Appendix A Source code 52 A.1 Data import and representation A.2 The NONB model

7 Introduction Claims reserving is one of the most important issues of concern in general insurance. In the past, several methods for solving this problem were proposed and since then have served as instruments to determine the value of the outstanding liabilities of a company. The most popular of these methods, however, are deterministic and based on estimating only an expected value of future claims. This is not sufficient when considering random events such as claims occurrences, where uncertainty of an estimation must be taken into account. Dealing with this fact, stochastic methods of modelling claims development became popular among practitioners. Use of these models provides us with the possibility to specify higher moments of prediction of future liabilities, and therefore with the ability to measure the risk associated with claims reserving itself. One of the measures of claims reserves variability is the mean square error of prediction, which can be evaluated when the first and second moments of claims distributions are computed. This quantity is not satisfactory when meeting the regulations defined by government and thus it is convenient to determine the whole predictive distribution of outstanding liabilities and be able to compute its quantiles as well. Then the measures such as Value at Risk and Conditional Value at Risk can be used to calculate reserving risk, which is actually of interest. The choice of an appropriate stochastic model for claims amounts, from which the predictive distribution comes, is a task this thesis deals with. The focus is on generalized linear models, an extension of a classical linear regression. These models provide us with an option to model non-normally distributed data with non-linear relationship between the response and explanatory variables, which are advantageous characteristics used throughout this work. Regarding the construction of claims reserves predictive distributions, this might be complicated because of the reserves being sums of random variables. In this thesis this problem is overcome through the application of a bootstrap method. This also eliminates the problem of limited number of observations, which is the case of datasets in non-life insurance. The structure of the thesis is as follows. In the first chapter the claims reserving problem and its standard notation is introduced. We distinguish between incremental and cumulative claims amounts, which are recognized as random variables, and present development triangles, which are the form datasets are usually ordered in. Together with the basics of claims reserving widely used deterministic reserves estimation techniques, the Chain-ladder and Bornhuetter-Ferguson methods, are also described herein. These methods, due to their simplicity, form the base on which many stochastic models are built. The aim of the second chapter is to summarize principles of the generalized linear models theory. We define their structure and present possible principles of parameter estimation. Several methods of model analysis are also included. Next, the focus is on application of the generalized linear models in claims reserving framework. Several models whose distributions satisfy the theory associated with the actuarial practice are specified. In order to extend possibilities to model the data dependency, different relations between response and explanatory variables are used in models definitions. 3

8 The theory of the bootstrap procedure is determined in the fourth chapter. The algorithm is adjusted for the case of generalized linear models, where residuals have to be resampled. The definition of residuals and specific characteristics of the steps of the bootstrap method are incorporated for separate modelled distributions. Finally, in the fifth chapter, the application on real-life data is executed. We use R software to fit considered models to the data and consequently to obtain bootstrap predictive distributions. Achieved results are presented graphically or in the form of tables. The comparison of the fitted models together with a discussion of their performance and adequacy is included as well. 4

9 insured. It provides to the insurer a fixed amount of money (called premium), to the insured a financial coverage against the random occurrence of well-specified events (or at least a promise that he gets a well-defined amount in case such an event happens). The right of the insured to these amounts (in case the event happens) constitutes a claim by the insured on the insurer. 1. Introduction to claims known as reserving theory The amount which the insurer is obliged to pay in respect of a claim is known as claim amount or loss amount. The payments which make up this claim are claims payments, This thesis loss payments, deals with the problem of claims reserving in non-life insurance which includes all types of insurance products except life insurance. There is a difference between paid claims, contracts or of non-life and life insurance which implies also a different modelling paidof losses. particular products. A typical non-life insurance claims history can be depicted as in Figure 1.1 from Wüthrich The history andofmerz a typical (2008). claim may look as follows: accident date claims payments reopening reporting date claims closing payments claims closing time Figure 1.1: Time line of a non-life insurance claim Figure 1.1: Typical time line of a non-life insurance claim Usually there is a time-lag between claim occurrence and its settlement. The main Thisreasons means that areusually that athe claim insurance can be company reported is not with able a to delay, settleita may claimtake immediately, a claim this isdue mainly to unknown due to twocircumstances reasons: or a closed claim can be reopened. time to settle Because 1. Usually, of these there typical is a reporting featuresdelay of the (time-lag claim process, between claims non-life occurrence insurance and companies have claims to build reporting reserves to the which insurer). are The used reporting to cover of their a claim future can take liabilities several arising from overtaking risk of the insured individuals. years, especially in liability insurance (e.g. asbestos or environmental pollution claims), see also Example 1.1. As stated in Mandl and Mazurová (1999), in claims reserving there are two main values to be estimated. The first one is the value of claims amounts to becpaid 2006 (M. for Wüthrich, the reported ETH Zürich but not & M. settled Merz, Uni claims Tübingen) (RBNS) and the second is the amount of money to be paid for incurred claims that have not been reported yet (IBNR). The reserve includes also an estimation of the costs incurred during the settlement of claims. The value of a reserve to cover the RBNS claims is estimated by a specialist in non-life insurance, whilst the value of a claim amount of the IBNR claims needs to be approximated by mathematical methods using historical development of the claims. 1.1 Notation Historical data used to calculate reserves are usually presented in so-called claims development triangles. These separate the data on two time axes. The horizontal axis denotes the year of development of a claim, j, and the vertical one denotes the year of its occurrence, i. The scheme of such a claims development triangle can be seen in Figure 1.2. The last year of occurrence of a claim is denoted by I and the maximum of development years observed is denoted by J. Here the symbol X i,j refers to all payments in development year j for the claims occurred in year i, i.e. it 5

10 Accident year i Development year j J - 1 J 0 X 0, 0 X 0, X 0, J - 1 X 0, J 1 X 1, 0 X 1, X 1, J - 1 X 1, J... I - J Observations of r. v. X i, j (i + j I) Predicted X i, j (i + j > I)... I - 1 X I - 1, 0 X I - 1, X I - 1, J -1 X I - 1, J I X I, 0 X I, X I, J - 1 X I, J Figure 1.2: Claims development triangle corresponds to incremental payments and is assumed to be a random variable. As cumulative payments for the claims with accident year i after j development years we denote j C i,j = X i,k. k=0 The value C i,j is called ultimate claims amount of accident year i. Sometimes X i,j indicates the number of claims occurred in year i and reported with a time-delay j. Then C i,j denotes the total number of claims with accident year i reported up to time i + j and C i,j is the total number of claims incurred in year i. As can be seen in Figure 1.2, the values of X i,j are only known for i + j I (upper triangle), whereas the values of the outstanding payments X i,j for i+j > I (lower triangle) need to be predicted. The upper triangle can be denoted as D U I = {X i,j ; i + j I, 0 j J} and for the lower triangle we use the following notation D L I = {X i,j ; i + j > I, 0 j J}. For cumulative claims amounts we denote the set of observations at time I as a part of a cumulative claims development triangle by DI cu = {C i,j ; i + j I, 0 j J} and the values to be estimated by DI cl = {C i,j ; i + j > I, 0 j J}. What is actually of interest in a claims reserving problem is the value of outstanding loss liabilities. For accident year i at time i + j is the value given by R i,j = J k=j+1 X i,k = C i,j C i,j, where X i,j are incremental payments. So-called claims reserves are then the predictions of R i,j which, together with known past claims amounts C i,j, give the prediction of the ultimate claims amount C i,j for accident year i. Let us denote outstanding loss liabilities and ultimate claims amount for accident year i by R i = C i,j C i,i i and C i respectively. In this thesis we consider the equality of I and J and X i,j = 0 for all j > J, as it is assumed in Wüthrich and Merz (2008). This assumption provides that values of R i,j for accident years i = 1,..., I have to be predicted. 6

11 1.2 Basic claims reserving methods In this section we introduce widely used estimation techniques in the claims reserving problem, Chain-ladder (CL) method and Bornhuetter-Ferguson (BF) method. They provide us with algorithms to predict future liabilities and set IB- NR claims reserves. However, uncertainty of such prediction cannot be measured using these mechanical techniques and one needs to base them on an appropriate stochastic model Chain-ladder method The CL method is one of the most popular loss reserving techniques. Its distribution-free derivation developed by Mack (1993) is one of the stochastic models that motivate this method. The distribution-free CL model is based on cumulative claims amounts C i,j, 0 i I, 0 j J, considered to be random variables with known values only for i + j I as listed before. The intention is to estimate ultimate claims amounts C i and outstanding loss liabilities R i for accident years i = 1,..., I. The model works with two basic assumptions. Model assumptions 1.1. Cumulative claims amounts C i,j of different accident years i are independent random variables. That is {C i,0,..., C i,i }, {C k,0,..., C k,i } for i k are independent. (1.1) There exist development factors or link ratios λ 0,..., λ J 1 > 0 such that E(C i,j C i,0,..., C i,j 1 ) = E(C i,j C i,j 1 ) = λ j 1 C i,j 1 (1.2) holds for all 0 i I, 1 j J. The assumption (1.2) takes into account only the first moments, but provides everything needed to estimate conditional expected values of the future claims amounts. If one wants to model variability of such predictions, the assumptions on higher moments can be made. For further detail of an extended definition of the Mack model see Wüthrich and Merz (2008). The Model assumptions 1.1 are sufficient to describe the CL algorithm which consists of estimating in practice unknown development factors λ j by I j 1 i=0 C i,j+1 ˆλ j = I j 1, j = 0,..., J 1. i=0 C i,j Under the preceding assumptions and the properties of the conditional expectation we can derive the following equation E(C i DI cu ) (1.1) = E(C i C i,0,..., C i,i i ) = E[E(C i C i,i 1 ) C i,0,..., C i,i i ] (1.2) = λ I 1 E(C i,i 1 DI cu ) =... = λ I 1...λ I i C i,i i 7

12 The CL estimation of the ultimate claims amount C i for accident year i is then given by Ĉi CL = Ê(C i DI cu ) = ˆλ I 1...ˆλ I i C i,i i (1.3) and the estimation of the outstanding loss liabilities R i for accident year i by ˆR CL i = Ê(C i D cu I ) C i,i i = (ˆλ I 1...ˆλ I i 1)C i,i i. Note that we assume I = J. Considering the Model assumptions 1.1, useful properties of foregoing estimators can be shown. Mack (1993) proved that ˆλ j, j = 0,..., J 1 are unbiased and uncorrelated estimators of the development factors. The presented characteristic can also be used to prove unbiasedness of the ultimate claims amounts estimators ˆR CL i Ĉi CL and estimators of the outstanding loss liabilities R i. Other properties and their proofs are listed in Wüthrich and Merz (2008) Bornhuetter-Ferguson method Another approach to the claims reserving problem was introduced in Bornhuetter and Ferguson (1972). Without considering the outliers in observed data, this method is very robust and can be based on a number of stochastic models. The BF method model assumptions are as follows. Model assumptions 1.2. Cumulative claims C i,j of different accident years i are independent random variables. There exist parameters µ 0,..., µ I > 0 and a claims development pattern β 0,..., β J > 0, β J = 1, such that E(C i,0 ) = β 0 µ i, E(C i,j+k C i,0,..., C i,j ) = C i,j + (β j+k β j )µ i (1.4) holds for all 0 i I, 0 j J 1 and 1 k J j. The assumption (1.4) implies E(C i,j ) = β j µ i and E(C i ) = µ i. The claims development pattern for cumulative payments C i,j can also be called the payout pattern or cumulative cash-flow pattern and is used to set discounted reserves with different values over time. The assumption (1.4) implies yet a weaker assumption which, together with the independence of cumulative claims C i,j of different years i, the BF method can be modelled under. Model assumptions 1.3. Cumulative claims C i,j of different accident years i are independent random variables. It holds that there exist parameters µ 0,..., µ I > 0 and a pattern β 0,..., β J > 0, β J = 1, such that E(C i,j ) = β j µ i (1.5) for all 0 i I, 0 j J. 8

13 In the algorithm of the BF method the estimation of E(C i Di cu ) is of interest. Under the Model assumptions 1.2 we have E(C i D cu I ) = E(C i C i,0,..., C i,i i ) = C i,i i + E(C i C i,i i ) = C i,i i + (1 β I i )µ i (1.6) However, considering the weaker assumption (1.5) instead of (1.4) implies the formula E(C i D cu I ) = E(C i C i,0,..., C i,i i ) = C i,i i + E(C i C i,i i C i,0,..., C i,i i ), (1.7) where we face difficulty justifying the last term. This can be avoided by assuming the independence of incremental claims amount C i C i,i i and claims C i,0,..., C i,i i. Then we obtain the same result as in (1.6). In both presented identities, (1.6) and (1.7), the last term is unknown and is the subject of our focus. The IBNR claims reserves are in the BF method estimated for 1 i I by the following formula. Ĉ BF i,j = Ê(C i D cu I ) = C i,i i + (1 ˆβ I i )ˆµ i, (1.8) where ˆβ I i is an estimator of β I i and the estimator of the ultimate expected claims amount E(C i ), ˆµ i, is given a priori. As noted in Schmidt (2006), also { ˆβ j } J j=0 can be a prior estimator of a claims development pattern obtained by using either internal (contained in the claims development triangle) or external (e.g. market statistics) information or the combination of both. The use of external information is characteristic for Bayesian models. Whereas the estimation of development factors λ j, j = 0,..., J 1 and hence of the claims reserves was given explicitly in the CL method, prediction in the BF method is not that straightforward. For instance, the BF estimators can be obtained by comparison of the estimators of the two mentioned methods. Under the assumptions of the CL method, Model assumptions 1.1, the following holds Consequently j 1 E(C i,j ) = E[E(C i,j C i,j 1 )] = λ j 1 E(C i,j 1 ) = E(C i,0 ) λ k, J 1 E(C i ) = E(C i,0 ) λ k. k=0 J 1 E(C i,j ) = E(C i ) λ 1 k. Using the BF method with the Model assumptions 1.3, we can write the equation k=j k=j k=0 J 1 λ 1 k = β j, (1.9) 9

14 which is often used. However, this does not work with the assumption (1.4), since it is not implied by the assumptions of the CL method. Hence knowing the development factors λ 0,..., λ J 1 of the CL method, one can construct a development pattern {β j } J j=0 of the BF method or it can be done reversely. With the identity (1.9) we can rewrite the BF estimator as follows Ĉ BF i = C i,i i + ( 1 1 J 1 j=i i ˆλ j and the CL estimator (1.3) can be interpreted by the formula ) ˆµ i (1.10) Ĉ CL i = C i,i i J 1 j=i i ˆλ j ( J 1 = C i,i i + C i,i i = C i,i i + = C i,i i + Ĉ CL i J 1 ( j=i i ˆλ j 1 j=i i ˆλ j 1 ( J 1 j=i i 1 J 1 j=i i ˆλ j ) ) ˆλ j 1 Ĉ CL i ) (1.11) Comparing the relations (1.10) and (1.11) we can see that the main difference between identifying the development pattern {β j } J j=0 by the CL and BF methods consists of using the estimator ĈCL i based only on observations and a prior estimate ˆµ i respectively. The latter is usually the value given by a business plan or the value used for premium calculations and should be defined by an expert before any observations are recorded. In contrast to the CL method, the BF method has proved efficient if the instability in the ratio of ultimate claims paid during the first development years is observed. In such a case the mechanically applied CL method does not produce satisfactory results, as stated in England and Verrall (2002). In the following text we consider, among other concepts, stochastic methods of modelling claims reserves based on the generalized linear models framework, which are constructed to give the same predictions as the CL method, whilst the BF method as a member of the Bayesian models remains outside the scope of our work. For the GLM underlying the BF method see Alai, Merz and Wüthrich (2009). 10

15 2. Generalized linear models Before introducing generalized linear models (GLM) framework in stochastic modelling of claims reserves we present an introduction to the problem of GLM s, a class of statistical models. As a generalization of the classical linear models, this class includes for example logit and probit models, log-linear models, multinomial response models and models for survival data. Linear regression and analysis-of-variance models are considered as special cases of the GLM s. 2.1 Basic assumptions and terminology Using the notation of the classical linear models we have a column vector of observations y = (y 1,..., y n ) T which is supposed to be a realization of a random vector Y. Then, for p n, we have an n p matrix X of explanatory variables and a p-dimensional vector of unknown parameters denoted by β = (β 1,..., β p ) T. The components of the vector Y with a vector of means µ are assumed to be independently distributed. In the systematic part of the classical linear models, the mean vector µ is specified with help of the vector of parameters β in matrix notation by the identity µ = Xβ. Assumptions for the random part of the model are independence and normally distributed errors with constant variance σ 2. Thus in transition to the GLM s we adopt the classical linear model specification presented by McCullagh and Nelder (1989). This is written in three parts. 1. The random component: the components of the random vector Y are independent normally distributed random variables with E (Y) = µ and var (Y) = σ 2 I n, where I n is an n n identity matrix. 2. The systematic component: a linear predictor η is given by η = p x j β j, j=1 where x j, j = 1,..., p are column vectors of the matrix X (covariates). 3. The link between the two foregoing components of the model is given by the identity η = µ. 11

16 We can write a formula for the components of the linear predictor in terms of a so-called link function g ( ) η i = g (µ i ), (2.1) for i = 1,..., n. In GLM s the link function can be any monotonic differentiable function with the range of all real numbers and the normal distribution assumption is extended to the assumption that the components of the random vector Y come from an exponential family of distributions defined bellow. Definition 2.1. A distribution is said to be of the exponential type, if it can be expressed as { } yθ b (θ) df (y) = exp + c (y, ϕ) dν (y), y A R, (2.2) a (ϕ) where ν denotes either the Lebesgue or a counting measure, b ( ) is some realvalued twice-differentiable function of θ and a ( ) and c (, ) are some real-valued functions. The parameter θ is referred to as a canonical parameter and ϕ > 0 is called a dispersion parameter. The function b ( ) from (2.2) defines more specifically the family the distribution comes from and also specifies the domain of y. The function a ( ) is commonly of the form a (ϕ) = ϕ/w, where w > 0 is some known prior weight often set to 1. The function a (ϕ) cancels out in the maximum likelihood estimation of θ and the asymptotic distribution of the estimator of θ is the same for known and unknown dispersion parameter ϕ. Many of the well-known distributions, such as normal, Poisson, gamma, binomial or inverse Gaussian, belong to the exponential family. For the specification of the parameters θ and the functions a ( ), b ( ) and c (, ) from the form (2.2) of these distributions see McCullagh and Nelder (1989). The mean and variance of the distributions from the exponential family can be obtained via following lemma. Lemma 2.1. Let Y be a random variable with an exponential type of distribution. Then a moment-generating function m Y (t) = E ( e ) ty finite over the whole range and twice-differentiable at 0 exists. It holds E (Y ) = µ = d dt m Y (t) = b (θ) t=0 and [ ] 2 var (Y ) = d2 d dt m 2 Y (t) dt m Y (t) = b (θ) a (ϕ), (2.3) t=0 where differentiation with respect to θ is denoted by prime. 12

17 Proof. First we compute the moment-generating function m Y (t). By its definition we have m Y (t) = e ty e yθ b(θ) a(ϕ) e c(y,ϕ) dy = e y[ta(ϕ)+θ] b(θ) a(ϕ) e c(y,ϕ) dy A A = e y[ta(ϕ)+θ] b[ta(ϕ)+θ] a(ϕ) e c(y,ϕ) dy e b[ta(ϕ)+θ] b(θ) a(ϕ) A } {{ } =1 = e b[ta(ϕ)+θ] b(θ) a(ϕ). Then we obtain E (Y ) = d dt m Y (t) = m Y (t) t=0 1 a (ϕ) d dt b [ta (ϕ) + θ] a (ϕ) = b (θ) t=0 and { } d 2 dt m 2 2 Y (t) d = m Y (t) b [ta (ϕ) + θ] + m Y (t) d2 b [ta (ϕ) + θ] a (ϕ) t=0 dt dt2 = [b (θ)] 2 + b (θ) a (ϕ), which the computation of var (Y ) is straightforward from. t=0 From the identity (2.3) we can see that the variance of a random variable Y is a product of b (θ), dependent only on the canonical parameter and called a variance function, and the function a (ϕ), independent of θ. Since the dependence on the canonical parameter θ implies its dependency on µ, the variance function is usually denoted by V (µ) as a function of µ. The only distribution where the variance is independent of the mean is normal with V (µ) = 1. Variance functions and means of other forenamed distributions are listed in McCullagh and Nelder (1989). 2.2 Link functions As mentioned in the previous section, the linear predictor η in the GLM s is related to the expected value µ of a random vector Y by identity (2.1), where we specified the term of a link function. Since the range of the link function is all real values, there are various models specified for different distributions of the components of the random vector Y. The most common models are listed bellow. The first three of these models consider the components of Y to be Bernoulli random variables, that is the link functions map the interval (0, 1) onto the whole real line: 1. logit model: in this case we have η i = g (µ i ) = log ( µi 1 µ i ) ; 13

18 2. probit model: the link function takes the form η i = g (µ i ) = Φ 1 (µ i ), where Φ ( ) is the cumulative distribution function of the standard normal distribution; 3. complementary log-log model: the link function in this model is η i = g (µ i ) = log [ log (1 µ i )] ; 4. log-linear model: for the count data, where a random variable Y i has the Poisson distribution, i. e. µ i > 0, the following link function is used η i = g (µ i ) = log (µ i ), (2.4) for i = 1,..., n. The latter is a special case of the power family of links, where a positive expected value µ i of a random variable Y i is taken into account. These links are determined by the relation η i = g (µ i ) = µλ i 1, λ where the link function (2.4) of the log-linear model is the limiting value for λ 0. Alternatively the power family of links can be specified by { µ λ η i = g (µ i ) = i, λ 0; log (µ i ), λ = 0. In both cases a special treatment for λ = 0 is needed. McCullagh and Nelder (1989) also presented the so called canonical links defined by the identity η = θ and their form for several distributions from the exponential family. These links are important in statistical modelling for within their use there exists a sufficient statistic for β, X T Y, with dimension p and components n x ij Y i for j = 1,..., p. i=1 However, despite the stated statistical property, applying the canonical links for a particular data is not always justified, as the systematic effects in a model are not necessarily additive on a scale given by such link functions. 2.3 Model fitting Previously we introduced several possible link functions to be used in fitting a model to a given set of data. The choice of a link function depends on character of the data and is somehow arbitrary. Using the general notation for the link function, g (µ i ), we have g (µ i ) = x T i β, i = 1,..., n, 14

19 where µ i is E (Y i ), x T i is the i-th row of the design matrix X and a vector of parameters β is of dimension p n. A model with the maximum number of parameters is called a saturated or full model and a model with one parameter, where all the variation between responses is modelled by a random component, is called a null model. The saturated model is in practice uninformative, but helpful when measuring the adequacy of a fitted model, what is discussed later Maximum likelihood method in GLM s Once a particular model is selected, one needs to estimate its parameters. In case of the GLM s, the estimators of the parameters are obtained by using a maximum likelihood method. The log-likelihood function for each of the components of the random vector Y is following l i (θ i, ϕ; y i ) = y iθ i b (θ i ) + c (y i, θ i ), a (ϕ) where we consider l i (θ i, ϕ; y i ) to be a function of θ i and ϕ with y i being given. The log-likelihood for all components of Y is then l (θ, ϕ; y) = n l i (θ i, ϕ; y i ). i=1 For the maximum likelihood estimators of model parameters β we have the score statistic U = l β = = 1 a (ϕ) n ( li β i=1 n i=1 ) chain rule = [ (yi µ i ) var(y i ) n ( ) li θ i µ i η i θ i=1 i µ i η i β ( ) ] µi x i. η i The maximum likelihood estimator ˆβ of the vector of parameters solves then the equation 0 = U = 1 n [ ( ) ] ˆηi w (ˆµ i ) (y i ˆµ i ) x i, (2.5) a (ϕ) ˆµ i i=1 where the function w ( ) is of the form Adding the term w (ˆµ i ) = 1 var(y i ) ( ) 2 ˆµi, i = 1,..., n. ˆη i [ ( ) n ] X T ŴX ˆβ = w (ˆµ i ) x i x T i ˆβ, i=1 where Ŵ = diag {w (ˆµ 1),..., w (ˆµ n )}, to both sides of the equation (2.5) and rearranging it we get ( X ŴX) T ˆβ = X T Ŵẑ, 15

20 hence ˆβ = ( X T ŴX) 1 X T Ŵẑ. (2.6) The vector ẑ = (ẑ 1,..., ẑ n ) T denotes an adjusted response, where ( ) ẑ i = x T ˆβ ˆηi i + (y i ˆµ i ) for i = 1,..., n. ˆµ i The equation (2.6) is the same as the normal equations for a linear model where weighted least squares (WLS) are used. However, in practice the maximum likelihood estimators of β in the GLM s are commonly obtained by an iterative weighted least squares (IWLS) procedure, based on the equation ˆβ (m+1) = ( X T Ŵ (m) X) 1 X T Ŵ (m) ẑ (m). The algorithm of the procedure has four steps: 1. start with an initial estimate ˆµ (0) ; 2. compute an adjusted response vector ẑ (0) and a matrix of weights Ŵ(0) ; 3. calculate ˆβ (1) by WLS method; 4. repeat steps 2 and 3 till convergence. The justification of the use of the IWLS method, derived from the Fisher s method of scoring, can be found in Dobson (2002) Quasi-likelihood function In the previous section we described the maximum likelihood algorithm for estimating parameters in the GLM s. However, in practice it is common, that the distribution of the observed random variables is not recognized due to insufficient information provided by experiment and maximum likelihood function cannot be constructed. In such a case one can use a quasi-likelihood method to estimate the effect of covariates on the response variables. For the specification of the quasi-likelihood function only the relation between the mean and variance of the observations is needed. Here we discuss the case given by the assumption from previous sections, that the components of a random vector Y are independent. Suppose that E (Y) = µ and that the variance is a function of the mean given by var (Y) = ϕv (µ). Assume ϕ to be an unknown constant independent of β and V (µ) to be a matrix of known functions such that V (µ) = diag {V 1 (µ 1 ),..., V n (µ n )}, i.e. for i = 1,..., n the function V i (µ i ) depends only on the i-th component of the vector µ. Given the conditions presented above, we define for a particular component Y, due to simplicity written without a subscript i, of the vector Y the following function U = u (µ; Y ) = Y µ ϕv (µ) 16

21 with properties E (U) =0, var (U) = 1 ϕv (µ), ( ) U E = 1 µ ϕv (µ). (2.7) Then the quasi-likelihood function is given by the integral Q (µ; y) = µ y y t ϕv (t) dt, if it exists. With U having the properties (2.7), the function Q (µ; y) should behave like a log-likelihood function for µ. The quasi-likelihood for all components of Y is obtained by summing up respective elements n Q (µ; y) = Q i (µ i ; y i ), i=1 which holds because of the assumed independence. Denoting l as a log-likelihood function, it can be proved that ( ) ( ) U 2 l E E, (2.8) µ µ 2 which follows from the properties of U given by (2.7) and the Cramér-Rao inequality, see Wedderburn (1974). What is more, Wedderburn (1974) also proved that assuming a probability distribution of a random variable Y to be of oneparameter exponential family type is equivalent to the specification of only the mean-variance relationship. Thus, in such a case, the log-likelihood and quasilikelihood functions are identical. For further details of the parameter estimation using this method see McCullagh and Nelder (1989). A dispersion parameter ϕ does not influence the estimation of parameter vector β and its moment estimator ϕ is additionally estimated by either the model deviance or the Pearson χ 2 statistic divided by a difference between the number of observations and the number of model parameters Measuring the goodness of fit When the maximum likelihood estimators of the real model parameters β are obtained, the discrepancy of such a model comes into focus. In measuring a deviance of the fitted model a saturated model is used as a baseline. The maximized log-likelihood function of the full model with maximum possible parameters is denoted by l(ˆθ, ϕ; y), where we assume the dispersion parameter ϕ to be fixed. The deviance of the investigated model is then given by [ ] D = 2 l(ˆθ, ϕ; y) l(ˆθ, ϕ; y), (2.9) 17

22 i.e. it is twice the difference between maximum log-likelihood achievable and maximum log-likelihood obtained by the fitted model. As derived in Dobson (2002), the sampling distribution of the deviance of the fitted model is χ 2 (m p, υ), where m p is the difference between the number of parameters of the full and the fitted model and υ is the non-centrality parameter. For normally distributed response variables Y i, i, has D exactly χ 2 distribution, for other distributions it holds only approximately. The other important statistic used for measuring the discrepancy of a fit is the generalized Pearson χ 2 statistic defined by χ 2 = n i=1 (y i ˆµ i ) 2, V (ˆµ i ) where V (ˆµ i ) is the estimated variance function for the distribution in concern. Again, the Pearson χ 2 statistic has an exact χ 2 distribution for normal linear models, however this does not hold in general for the response variables of other distributions. The adequacy of a fit of a model can be judged also by checking its residuals. The definitions of the residuals used in the GLM s are extended in order to be applicable to the distributions replacing the normal one. From the residuals listed in McCullagh and Nelder (1989) we picked as an example the Pearson residuals defined as r (P ) i and the deviance residuals given by = y i ˆµ i V (ˆµi ) r (D) i = sgn (y i ˆµ i ) d i, (2.10) where d i = 2{y i (ˆθ i ˆθ i ) [b(ˆθ i ) b(ˆθ i )]} and thus D = 1 n a( ˆϕ) i=1 (r(d) i ) 2. In the GLM s the accuracy of a choice of a link function, a variance function and terms in a linear predictor can be investigated by these instruments Testing the significance of explanatory variables In model fitting it is very important to test for the significance of included explanatory variables, since models give more accurate estimates when not overparametrized. This holds especially in case of small data samples, which are common in non-life insurance. In the GLM s three types of tests are used. In the following we denote by the symbol ˆβ the parameter estimator under the model with p + k n, k 1, parameters and by β the estimator under a sub-model with p parameters, when restrictions Cβ = r hold. In this case the matrix C has k rows. In order to simplify the notation we write here the likelihood and log-likelihood functions as functions of β, L (β) and l (β) respectively. Likelihood ratio test The likelihood ratio (LR) test is based on the comparison of the maximized (log-) likelihood functions under the model and its respective sub-model. For LR test 18

23 being performed both ˆβ and β are required. The test statistic is then given by [ ] L(ˆβ) [ ] λ n = 2 log = 2 l(ˆβ) l( β). L( β) The LR test statistic is always non-negative and has the χ 2 k distribution under the null hypothesis that the restricted model is valid. The hypothesis is rejected when the large values of λ n are observed. In the distribution of the LR statistic also the parameter ϕ is included and therefore has to be estimated. However, the use of the χ 2 distribution is still applicable if the estimator ˆϕ of ϕ is consistent. The LR test is related to the deviance considering the possibility to rewrite the LR statistic into a difference of the deviances, when in both log-likelihoods the same estimator of ϕ is used. Wald test For the Wald test only ˆβ is needed. The test statistic takes the following form W n = 1 a(ϕ) (C ˆβ r) T [ C ( X T W X ) 1 C T ] 1 (C ˆβ r) and under the null hypothesis has the χ 2 k distribution, which holds only approximately when the weight matrix W is replaced by its estimator. Large values of W n indicate the rejection of the null hypothesis of Cβ = r. Score test Unlike in the Wald test, in the score test the estimate β is required. The score test statistic uses the derivative of the log-likelihood function, the score statistic U( β), and its variance and is given by R n = a(ϕ)u T ( β) ( X T W X ) 1 U( β). The test statistic has approximately the χ 2 k distribution under the null hypothesis and is less accurate when the estimators for W and ϕ are used. The null hypothesis is rejected, when R n takes on the values from the upper tail of the χ 2 k distribution. Note that the χ 2 distribution of the foregoing test statistics holds only asymptotically and therefore the tests are valid only in large samples. 19

24 3. Generalized linear models in claims reserving The aim of this chapter is to introduce the application of the GLM s to the claims reserving problem. As previously noted, the GLM s provide us with a wide range of models combining the distributions from the exponential family with possible link functions. In the following text we implement these combinations in order to estimate future liabilities of an insurance company within the use of development triangles. When selecting a particular distribution it is important to distinguish, whether dependent variables denote claims counts or claims amounts, which divides the modelling into two parts. 3.1 Modelling of claims counts First we consider modelling in case of having count data. For the purpose of this section we use the symbols X i,j and C i,j, 0 i, j I = J, for denoting the incremental and cumulative claims counts for accident year i and development year j respectively. In this case the Poisson and negative binomial distributions are commonly used, but can also be applied when modelling claims amounts, as performed in England and Verrall (2002) Over-dispersed Poisson model The use of the Poisson distribution for the count data modelling is a well-known method. However, in practice it has a limitation regarding the equality of the mean and variance. This can be overcome by defining the over-dispersed Poisson (ODP) model, where the variance is not equal, but proportional to the mean. As such, it provides the same estimates as the CL method, what is proved in Mack (1991). The dispersion parameter is unknown and has to be estimated from the given data. The probability function of the Poisson distribution is f(y) = µy y! e µ, y = 0, 1, 2,..., µ > 0. Here we present its over-dispersed extension in the GLM s context. The model for an incremental development triangle is defined by the following assumptions. Model assumptions 3.1. Incremental claims counts X i,j variables, for 0 i, j I, with are independent over-dispersed Poisson random E(X i,j ) = µ i,j and var(x i,j ) = ϕµ i,j. (3.1) The linear predictor η i,j is related to the mean µ i,j by a logarithmic link function η i,j = log(µ i,j ) = c + α i + γ j, (3.2) 20

25 for i, j, where α 0 = γ 0 = 0. The parameters α i and γ j denote the effects of the i-th year of occurrence and j-th development year on the expected value (and variance) of the incremental claims counts X i,j, for i, j, respectively. The corner restrictions for α 0 and γ 0 need to be taken into account, otherwise the model would be over-parameterized and one would not be able to compute unique parameter estimators. The logarithmic link function is a canonical link for the Poisson distribution and its use is very common in practice since the mean has a multiplicative form and for the estimation purposes it is more convenient to work with a linear structure. Considering the equation (3.2) we can rewrite the formula for the linear predictor η i,j, i, j, into terms of a design matrix and a parameter vector, which is given by η i,j = Γ T i,jβ, where Γ i,j = (1, δ 1,i,..., δ I,i, δ 1,j,..., δ I,j ) T, (3.3) with δ i,j denoting the Kronecker s delta. The parameter vector β has then the form β = (c, α 1,..., α I, β 1,..., β I ) T. (3.4) Within the use of the standard mathematical software packages, the estimation of β is easily obtained and is used to determine the estimations ˆµ i,j of the means of claims counts X i,j, i, j. Estimated values are then used to compute an estimator of the dispersion parameter ϕ. It can be done via either the Pearson χ 2 statistic or a model deviance divided by the difference between a number of observations and a number of parameters of the model. Note that here we consider only the number of elements of vector β. The use of the Pearson χ 2 statistic yields the formula ˆϕ = 1 n p I I i (X i,j ˆµ i,j ) 2, (3.5) V (ˆµ i,j ) i=0 j=0 with V (ˆµ i,j ) = ˆµ i,j for the ODP model. The denominator n p is equal to I(I 1)/2, where (I + 1)(I + 2)/2 is the number of observations and 2I + 1 is the number of parameters. Once the model is fitted, one can use the model deviance to measure the goodness of fit. For the ODP distribution, the deviance is defined as D = 2ˆϕ I I i [ ( ) ] Xi,j X i,j log (X i,j ˆµ i,j ). i=0 j=0 The deviance has asymptotically the χ 2 I(I 1)/2 distribution and for the saturated model it is 0. Values of the deviance from the upper tail of the given distribution indicate a poor fit. Alternatively, a model of claims counts with the mean and variance of the structure (3.1) can be defined in terms of quasi-likelihood. It has the form Q(µ; X) = 1 ϕ ˆµ i,j I I i [X i,j log(µ i,j ) µ i,j + κ], i=0 j=0 21

26 where κ is a constant independent of β, thus omitted after differentiation with respect to the parameter vector. The estimate of β is not affected by the dispersion parameter ϕ, which is then obtained via the relation (3.5), where the quasi-likelihood estimates are used. As described in de Jong and Heller (2008), the quasi-likelihood estimators are equal to the estimators obtained by the maximum likelihood method for the Poisson regression, if the dispersion parameter ϕ = Over-dispersed negative binomial model The negative binomial distribution can be obtained from the Poisson distribution by setting its parameter µ to be gamma distributed. A derivation of this relation in claims reserving context is to be found in Verrall (2000). The probability function of NB(µ, r) is given by f(y) = Γ(y + r) y!γ(r) ( r r + µ ) r ( µ r + µ ) y, y = 0, 1, 2,..., r, µ > 0. This distribution is of the exponential family type only when the number of failures r is fixed. For modelling we use the logarithmic link function here, even though it is not the canonical link for the negative binomial distribution. However, within its use the estimations are more convenient to interpret. The canonical link for NB(µ, r) is given by g(µ) = log[µ/(µ + r)]. The recursive model for the over-dispersed negative binomial (ONB) distributed claims counts, either incremental or cumulative, is then defined as written bellow. Model assumptions 3.2. Cumulative claims counts C i,j, i, j 1, are conditionally independent overdispersed negative binomial distributed random variables with the mean and variance given by ( ) E(C i,j C i,j 1 ) = µ i,j = λ j C i,j 1 and var(c i,j C i,j 1 ) = ϕ µ i,j + µ2 i,j. (3.6) r The linear predictor η i,j, for i, j 1, is related to the mean µ i,j through the logarithmic link function, i.e. where γ 0 = 0. η i,j = log(µ i,j ) = c + γ j 1 + log(c i,j 1 ), (3.7) Note that the parameters of the ONB model are related only to the number of development years, whereas the parameters of the ODP model describe both effects of the year of occurrence and of development. Here the identity (3.7) for computing the linear predictor η i,j can be expressed as η i,j = Γ T j β + log(c i,j 1 ), j 1, where Γ j = (1, δ 1,j 1,..., δ I 1,j 1 ) T (3.8) 22

27 and β = (c, γ 1,..., γ I 1 ) T. (3.9) The logarithmic values of the observations from the development triangle are used as an offset in the recursive model. Sometimes it is more appropriate to use the model specification in terms of development factors f i,j = C i,j /C i,j 1 and respective weights w i,j = C i,j 1. In this case the mean and variance are of the structure ( ) λj E(f i,j C i,j 1 ) = λ j and var(f i,j C i,j 1 ) = ϕ + λ2 j, w i,j r for i, j 1. After obtaining the estimation of parameter vector β, again, the estimator ˆϕ of the dispersion parameter is computed via the Pearson χ 2 statistic, as it was in case of the ODP model. Here the denominator in the equation (3.5) is n p = I(I 1)/2 and V (ˆµ i,j ) = ˆµ i,j + ˆµ 2 i,j/r. For examining the model fit, the deviance in the form D = 2ˆϕ I 1 I i [ X i,j log i=0 j=1 ( Xi,j ˆµ i,j ) (X i,j + r) log ( )] Xi,j + r ˆµ i,j + r is used. Similarly, as it was in the ODP model, the deviance is asymptotically χ 2 I(I 1)/2 distributed and large values are gained in case of unsuitable models. Quasi-likelihood function for the response variables with the mean and variance defined by the equations (3.6) is Q(µ; X) = 1 ϕ I 1 I i [ ( ) ( ) ] µi,j 1 X i,j log + r log + κ. r + µ i,j r + µ i,j i=0 j=1 Again a constant κ is independent of β and together with a dispersion parameter ϕ does not influence the quasi-likelihood estimator ˆβ. The estimator ˆϕ is obtained analogously as in the maximum likelihood case for the ONB distributed dependent variables. When treating the data in the quasi-likelihood way, it does no longer matter, whether they are discrete or continuous, since there is no distributional assumption. 3.2 Modelling of claims amounts In this section we consider the development triangles consisting of claims amounts. The symbols X i,j are used for incremental data and C i,j denote data in cumulative form, i, j. As stated before, the ODP and ONB models can also both be used in case of continuous data. However, when modelling claims amounts, in practice it is possible to observe negative values and in this regard, the ODP and ONB models are limited. Then one can use the quasi-likelihood method as described before. Among the distributions typically used for the continuous data modelling in non-life insurance belong gamma and inverse Gaussian distributions. The normal 23

Prediction Uncertainty in the Bornhuetter-Ferguson Claims Reserving Method: Revisited

Prediction Uncertainty in the Bornhuetter-Ferguson Claims Reserving Method: Revisited Daniel H. Alai 1 Michael Merz 2 Mario V. Wüthrich 3 Department of Mathematics, ETH Zurich, 8092 Zurich, Switzerland