Experience Rating in General Insurance by Credibility Estimation

Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new credibility estimation of the probability distributions of risks under Bayes settings in a nonparametric framework without any assumption on the prior distribution. The method is then applied to general insurance premium pricing, a procedure commonly known as experience rating, which utilizes the insured s past claim experience to calculate the premium under a given premium principle (also referred to as a risk measure). By estimating the entire probability distributions of losses, our method provides a unified nonparametric framework to experience rating under arbitrary premium principles. This encompasses the advantages of the well-known approaches introduced by Bühlmann and Ferguson, while overcomes their drawbacks. This is a joint work with Prof. Xiaoqiang Cai at the Chinese University of Hong Kong, Dr. Limin Wen at Jiangxi Normal University, and Prof Xianyi Wu at East China Normal University. 1

1 Introduction Pricing or measuring risks is one of the central tasks of financial enterprises and regulators in risk management. Numerous risk measures have been proposed for this purpose, such as value-at-risk, conditional value-at-risk, coherent risk measures, and distortion risk measures. In compensating insured losses in property, business, employment, health, etc., pricing risks is carried out by premium principle, which is considered as the translation of risk measures into the general insurance context. In this talk, the terms risk measure and premium principle are alternately used to indicate a rule H(X) that assigns a premium to a risk X in terms of a functional of its distribution function. In practice, however, to determine an appropriate premium principle is only the first step in the process of risk pricing, because the involved distributions are not known so that H(X) can only be estimated by using the insured s claim experience with certain effective statistical methods. These procedures form the so-called experience rating. We are concerned with solutions to this problem under the Bayesian setting: the distribution of a risk X is identified by an unknown and unobservable parameter (vector) θ, via S(x, θ) = Pr(X > x θ). 2

The unobservable parameter θ is modeled by a random vector. Its distribution π(θ) is referred to as the prior distribution in statistics or structural function in actuarial science. The majority of the literature has mainly focussed on fully specified S(x, θ) given θ and known (or assumed) π(θ). Thus the problem can be solved with the standard parametric Bayesian methodology through posterior update. Although the applications of the parametric Bayesian methodology have been extensively investigated for general insurance, the reality is that π(θ) is rarely known, making these applications impractical. To deal with such situations, one approach is to allow some unknown parameters in π(θ), but maintain some assumed mathematical forms of S(x, θ) and π(θ). Then the risk pricing can be carried out by the empirical Bayes analysis. In many insurance practices, however, the mathematical forms of S(x, θ) and π(θ) are difficult to specify or unavailable. Under such circumstances, the analysis can only be done on a distribution-free or nonparametric basis. The best-known solution in the actuarial community is probably the Bühlmann s approach to net premium principle (known as the credibility theory). 3

Under the net premium principle, the credibility premium of a future claim X n+1 can be expressed as a weighted average between the empirical individual mean (indicating the information delivered by historical data of the risk itself) and the collective mean (indicating the aggregated information obtained from all possible insureds): Ĥ n+1 = z X + (1 z)µ 0, (1.1) where X = n 1 n X i is the empirical mean of the claims data {X 1, X 2,..., X n }, µ 0 = E[X] is known as the collective mean of the risk, and z is the credibility factor. This (linear) weighted average has been so well received today that it is almost regarded as a synonym of the credibility theory. However, it is not easy to directly transplant Bühlmann s method to other premium principles. Consequently most contributions in this area have been devoted to the net premium principle. Another important solution stream to this problem, originated from Ferguson (1973) 1, is the Bayesian nonparametrics. Consider S(x, θ) as a Dirichlet process with time x. Its distribution is determined by a finite Borel measure α( ) on the real line R. After losses X 1,..., X n are observed, the posterior distribution of θ is also a Dirichlet process but with α being updated to α+ n δ X i, where δ Xi is a probability measure degenerated at X i. 1 Ferguson, T. A Bayesian analysis of some non-parametric problems. Annals of Statistics, 1, 209-230. 4

Consequently, S(x, θ) can be estimated by the posterior mean Ŝ(x, θ) = α(r) n α([x, )) + α(r) + n where S n (x) = n 1 n I(X i > x). α(r) + n S n(x), (1.2) Then the experience rating can be carried out by inserting the estimated distribution of the loss X into any premium principle. While the approach of Ferguson (1973) provides a unified solution for all premium principles, it requires the assumption of Dirichlet prior. This assumption is mainly for mathematical convenience, but hardly justifiable in practice. Without the Dirichlet assumption, the estimator in (1.2) would lose its justification, and the corresponding risk premium H(X n+1 ) would not be credible when the true prior deviates far from the Dirichlet. In this talk, we will introduce a new distribution-free approach for experience rating under arbitrary premium principles without any assumption on the prior distribution. It has two direct advantages: (i) truly distribution-free settings as in the Bühlman s credibility theory, and (ii) generating experience rating for arbitrary premium principles as achieved by Ferguson s Bayesian nonparametric method. 5

The main idea is to first derive an estimate of S(x, θ) by minimizing the L 2 -distance, and then embed its estimate, say Ŝ(x, θ), into the premium principle functional to obtain an estimate of the corresponding premium, i.e., H(X, θ) = H( Ŝ(x, θ)). Specifically, we consider the following two models: Model I: The first two moments of S(x, θ) with respect to the prior can be specified. In this case, the data are from X 1, X 2,..., which are (conditionally) i.i.d. copies of X given θ. This model brings insight into and motivates the estimation methods of the more practically meaningful model II next. Model II: This is a general model with totally unknown S(x, θ). Suppose that we have a portfolio of risks X i, i = 1, 2,..., K, where each X i is identified with a parameter (vector) θ i and contributes a series of data X i1, X i2,...x ini,..., which are i.i.d. copies of X i, given θ i. The purpose is to estimate the distributions and then assign proper premiums to the future claims X i,ni +1, i = 1, 2,..., K under a given premium principle. In order to overcome the restriction of Dirichlet priors, we develop a new approach in a quite different way from Ferguson (1973). It turns out, however, when the prior distribution is Dirichlet, our estimation happens to coincide with that of Ferguson (1973). This shows that our results actually generalize those of Ferguson (1973). 6

2 The Linear Bayes Estimation of Distributions We consider Model I in this section. Write S 0 (x) = E θ [S(x, θ)], τ 2 0 (x) = Var θ [S(x, θ)] (2.1) and σ0 2 (x) = E [ ( θ Var I(X>x) θ )], (2.2) where E θ and Var θ indicate that the expectations are computed with respect to the distribution of θ. We also denote µ(θ) = E[X θ], σ 2 = E θ [Var(X θ)], τ 2 = Var θ (E(X θ)). The performance measure to be optimized As S(x, θ) = E [I(X > x) θ], a pointwise credibility estimator of S(x, θ) for a fixed x following Bühlmann s classical credibility theory can be given by S(x, θ) = Z(x)S n (x) + (1 Z(x)) S 0 (x), (2.3) where S n (x) = n 1 n I(X i > x) is the empirical estimate of S(x, θ) based on the data X 1, X 2,..., X n, and Z(x) = nτ 2 0(x) σ 2 0 (x) + nτ 2 0 (x) is the credibility factor. By (2.1) (2.2), σ 2 0 (x) = S 0(x) E θ [ S 2 (x, θ) ], τ0 2 (x) = E [ θ S 2 (x, θ) ] S0 2 (x). (2.4) 7

The naive estimate in (2.3) can be more precisely expressed as S (x, θ) = n(e θ[s 2 (x, θ)] S 2 0 (x))s n(x) + (S 0 (x) E θ [S 2 (x, θ)])s 0 (x) n(e θ [S 2 (x, θ)] S 2 0(x)) + (S 0 (x) E θ [S 2 (x, θ)]) = n+1 (n i + 1)E θ [S 2 (x, θ)] (n i)s0 2(x) S 0(x)E θ [S 2 (x, θ)] I (n 1)E θ [S 2 (x, θ)] ns0(x) 2 [X(i 1),X + S 0 (x) (i) )(x), (2.5) where = X (0) < X (1) X (2) X (n) < X (n+1) = are the order statistics of the sample. This naive credibility estimate (2.3) (or (2.5)) would require: checking monotonicity of S(x, θ) in x, making an intuitive interpretation for S(x, θ), and computing a premium under S(x, θ). These tasks are however not feasible due to the dependence of the credibility factor Z(x) on x. A natural remedy is to consider Z(x) = Z to be independent of x. More specifically, we seek a function of the form n β 0 (x) + λ i I(X i > x), (2.6) which minimizes the square of the L 2 -distance: min E β 0 (x),λ 1,...,λ n + ( S(x,θ) β 0 (x) 2 n λ i I(X i > x)) dx, (2.7) where β 0 (x) is a nonincreasing function independent of the claims history and λ 1,..., λ n are real-valued decision variables. 8

Estimation of distribution functions The following theorem gives an estimator of S(x, θ) by solving the optimization problem (2.7). Theorem 2.1 The credibility estimator of S(x, θ), which minimizes (2.7), is given by Ŝ(x, θ) = ZS n (x) + (1 Z) S 0 (x), (2.8) where with τ 2 0 = τ 2 0 (x)dx Z = nτ 2 0 nτ 2 0 + σ2 0 and σ2 0 = σ 2 0 (x)dx. (2.9) Moreover, the integrated mean-squared loss for the optimal estimator given by (2.8) is + E ) 2 τ (Ŝ(x, 2 θ) S(x, θ) dx = 0 σ2 0 σ0 2 + nτ 0 2. (2.10) Note that Theorem 2.1 does not require to specify the prior distribution. If the prior is not Dirichlet, then Ferguson s method cannot be applied, but the estimator defined in (2.8) and (2.9) remains valid. In this sense, our new estimation provides a more general solution for the estimation of the distributions than that of Ferguson (1973). 9

Properties of estimator If S(x, θ) is a Dirichlet process with a finite Borel measure α( ) on R, then with density S(x, θ) Beta(α(x), α(r) α(x)) Γ(α(R)) Γ(α(x))Γ(α(R) α(x)) tα(x) 1 (1 t) α(r) α(x) 1, where α(x) = α((x, )). It is easy to show that in such a case, the credibility factor Z in (2.9) becomes Z = nτ 0 2 nτ0 2 + = α(r) σ2 0 α(r) + n. Thus the estimate in (2.8) reduces to Ferguson s estimate (1.2). The estimator in (2.8) is obviously a survival distribution function. By (2.9), 0 Z 1, Z 1 as n and Z 0 as n 0, (2.11) which allows the traditional credibility interpretation: the more data, the more credible the empirical distribution S n (x). The estimator (2.8) is uniformly strongly consistent in the sense that lim sup S(x, θ) Ŝ(x, θ) = 0 n x almost surely (a.s.). 10

3 The Empirical Bayes Estimation We now consider Model II, in which both the conditional distribution S(x, θ) and the structure function π(θ) are completely unknown, including the structure parameters S 0 (x), τ 2 0 and σ2 0. We propose an empirical Bayes method to estimate S 0 (x), τ 2 0 and σ 2 0 based on the claim experiences over a number of risks in the same portfolio. Let X 1,..., X K denote K risks under observation. The distribution of each X i is characterized by a risk parameter θ i and contributes a sequence of claim experiences X i = (X i1,...x ini ) over n i time periods, subject to the following usual assumptions. Assumption 3.1 (i) Conditional on θ i, the random variables X ij (j = 1, 2,..., n i ) are i.i.d. with common unknown distribution and moments: ( ) S(x, θ i ) = Pr (X ij > x θ i ), σ 2 (x, θ i ) = Var I ( X ij >x) θ i, i = 1, 2,..., K, j = 1, 2,..., n i. (ii) The random variables θ 1,..., θ K are i.i.d. with a common prior distribution π(θ). Write S 0 (x) = E θ [S(x, θ i )], τ 2 0 (x) = Var θ [S(x, θ i )] and σ 2 0(x) = E θ [ σ 2 (x, θ i ) ]. 11

Homogeneous estimation of the distribution Write S i (x) = 1 n i n i j=1 I (X ij > x), i = 1, 2,..., K. (3.1) To obtain the credibility estimator of S(x, θ i ), consider the class of distribution functions { K n s L = a st I (X st > x),a st R, s=1 t=1 K s=1 n s t=1 a st = 1 }, (3.2) Solve the minimization problem { + } min E [g S(x, θ i )] 2 dx. (3.3) g L The solution is stated in the next theorem. Theorem 3.1 The homogeneous credibility estimator of S(x, θ i ) as the solution to (3.3) is S(x, θ i ) = Z i S i (x) + (1 Z i ) S(x),, i = 1,..., K, (3.4) where Z i = n iτ 2 0 σ 2 0 + n iτ 2 0 and S(x) = 1 K Z i K Z i S i (x). (3.5) We can also derive an inhomogeneous estimator of the form K n s β 0 (x) + a st I (X st > x). s=1 t=1 The result, however, is the same as that presented by (2.8) (2.9). This is a typical phenomenon in credibility theory. 12

Estimation of structure parameters We now proceed to the estimation of the structure parameters τ 2 0 and σ 2 0, which needs the estimation of S 0 (x). At this moment, however, S(x) is not a suitable estimate of S 0 (x) because Z i s involve τ 2 0 and σ2 0 (cf. (3.5)). Moreover, if we use an estimator of the form S(x) = K w is i (x) by minimizing (S(x) S 0 (x)) 2 dx, it leads to S(x) = S(x). Therefore, we simply select S(x) = 1 N K n i S i (x), where N = K n i, (3.6) which is unbiased and strongly consistent in the sense that S(x) converges to S 0 (x) almost surely as K, under the condition n 2 K <. (3.7) (N K) 2 K=1 In view of the definitions of σ 2 0(x) and τ 2 0(x), the parameters σ 2 0 and τ 2 0 and can be estimated using SSE(x) = K SSA(x) = n i j=1 [I (X ij > x) S i (x)] 2 K 2 n i [S i (x) S(x)]. 13

The estimators can be formally defined as and τ 2 0 = N N 2 K n2 i σ 2 0 = 1 N K [ SSA(x)dx K 1 N K By inserting the estimates of τ 2 0 and σ2 0 SSE(x)dx (3.8) ] SSE (x) dx. (3.9) into (3.4) and (3.5), we can get empirical Bayes estimators for the distribution S(x, θ i ) as ( S(x, Ẑi) θ i ) = ẐiS i (x) + 1 Ŝ0 (x), i = 1,..., K, (3.10) where and Ẑ i = n iˆτ 0 2 ˆσ 0 2 + n iˆτ 0 2, i = 1,..., K, (3.11) Ŝ 0 (x) = 1 K Ẑi K Ẑ i S i (x). (3.12) The properties of the estimators in (3.8) (3.9) are provided in the following theorem. Theorem 3.2 σ 2 0 and τ 2 0 have the following properties. (1) σ 2 0 and τ 2 0 are unbiased estimators of σ2 0 and τ 2 0, respectively. (2) Under condition (3.7), σ2 0 σ 2 0 and τ 2 0 τ 2 0 almost surely as K. 14

Asymptotic optimality For simplicity, we discuss the balanced case with n i n, thus Z i and Ẑi are all the same. Consequently, Ŝ 0 (x) = 1 K K S i (x). An additional condition is needed for the asymptotic optimality needs the following extra: there is a constant δ > 0 such that + ( σ 2 0 (x) n ) δ/(2+δ) + τ 0 2 (x) dx <. (3.13) Write and Q(x, θ i ) = Q(x, θ i ) = + + ( S(x, θi ) S(x, θ i )) 2 dx ( S(x, θi ) S(x, θ i )) 2 dx. The asymptotic optimality is stated in the theorem below. Theorem 3.3 Under conditions (3.7) and (3.13), the S(x, θ i ) in (3.10) (3.12) are asymptotically optimal in the sense that [ lim max E Q(x, θi ) Q(x, θ i )] = 0, (3.14) K 1 i K where S(x, θ i ) is the linear estimator defined by (2.8) and (2.9). 15

4 Applications to Experience Ratemaking Having estimated the distributions, the application to the experience ratemaking is apparent. For a given premium principle H, write H (X θ) for the individual premium of a risk X with distribution S(x, θ), referred also to as the risk premium of X. By replacing S(x, θ) with Ŝ(x, θ) or S(x, θ i ), an estimator H(X θ) of the individual premium H(X θ) is obtained. We consider the experience premium by inserting Ŝ(x, θ) for S(x, θ) and compare it with existing methods. The version using S(x, θ i ) can be discussed in a similar way. The consistency of H (X θ) is presented in the theorem below, which follows directly from Theorems 2.1 and 3.2. Theorem 4.1 If the premium principle H is continuous with respect to the L -norm of distribution functions, then H(X θ) is strongly consistent for H(X θ). According to Theorem 4.1, the strong consistency of H(X θ) for H(X θ) is guaranteed as long as the premium principle H satisfies the condition of the theorem. 16

This is in contrast to existing results in the literature, where either the credibility estimator is not generally (strongly) consistent, or the consistency needs to be proved separately in each case. There are a number of well known premium principles for which strong consistency of the experience ratemaking can be obtained directly or by checking the condition of Theorem 4.1, including the net premium H(X θ) = E[X θ], variance premium H(X θ) = E[X θ] + αvarh(x θ), modified variance premium standard deviation premium Esscher premiums H(X θ) = E[X θ] + α Var(X θ), E[X θ] H(X θ) = E[X θ] + α Var(X θ), H(X θ) = E[XehX θ], E[e hx θ] conditional tail expectation premium H(X θ) = E[X X > α, θ], exponential premium H(X θ) = α 1 log [ E ( e αx θ )], and Kamp s premium H(X θ) = E[X(1 eαx ) θ]. E[(1 e αx ) θ] 17

The following are two exceptions where the proof of the consistency require additional work and conditions. 1. Dutch s premium principle: H(X θ) = E[X θ] + ηe [ (X αe[x θ]) + θ ], where α 1 and 0 < η 1. Observe that [ ] 1 n E Sn (X αe[x θ])+ = (X i α µ(θ)) +. n The estimate of H(X θ) is then given by H(X θ) = µ(θ) + η { Z n n ( X i α µ(θ) ) } ( + (1 Z) x α µ(θ) ) df 0 (x), + + where µ(θ) = ZX + (1 Z)µ 0. 2. Distortion premium principle: H(X θ) = The premium estimate is H(X θ) = 0 0 g (S(x, θ)) dx g (ZS n (x) + (1 Z)S 0 (x))dx 0 0 where the credibility factor is given by (2.9). g(1 S (x, θ))dx. g (ZF n (x) + (1 Z)F 0 (x))dx, The consistency in this case can be proved under certain additional regularity condition on g, such as the Lipschitz condition of g on [0, 1] if X has bounded support, or of g on [0, 1] if the support is unbounded. 18

5 Performance Evaluation We evaluate the performance of the proposed approach by comparing it with some well-known representative methods in the literature. Evaluation under the net premium principle Under the net premium principle, the optimal premium is given by the Bühlmann s credibility premium µ(θ) c = Z c X n + (1 Z c ) µ 0, where the superscript c stands for classical. The premium using our method is µ(θ) = H(X θ) = ZX + (1 Z) µ 0. (5.1) The expected quadratic losses of µ(θ) approaches that of µ(θ) c : lim n E[ µ(θ) µ(θ)] 2 E[ µ(θ) c µ(θ)] 2 = 1. (5.2) This shows that µ(θ) is asymptotically equivalent to the optimum. Assume S(x, θ) = e θx and θ Gamma(α, β). Then the expected squared losses of µ(θ), µ(θ) c and µ 0 (the collective premium) can be expressed by β 2 V cu (α), β 2 V c (α) and β 2 V col respectively. The curves of V cu (α), V c (α) and V col (α) as functions of α for n = 10 are plotted in Figure 1 below, which shows that µ(θ) is close to the optimal µ(θ) c. 19

Figure 1. The curves of V cu (α), V c (α) and V col (α) as functions of α. 12 V col V cu V c 10 8 6 4 2 0 2 2.05 2.1 2.15 2.2 2.25 2.3 α 20

Evaluation under the variance premium principle Consider the Poisson-exponential model, where S(x, θ) is Poisson with mean θ and π(θ) is exponential with rate λ. Bühlmann suggested a credibility premium formula that involves the forth moments of X i given θ. In comparison, we only need the first two moments of the risk distribution. Let V & denote the expected squared loss with & = B, cu, Bul, col corresponding to Bayes, proposed, Bühlmann, collective premiums respectively under the variance premium principle. Figures 2 below shows the numerical results of V & for λ = 0.2 to 1.0 with sample size n = 100. Figure 2. The curves of V B,V cu and V Bul as functions of λ with n = 100 0.12 V B V cu V Bul 0.1 0.08 0.06 0.04 0.02 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 λ 21

Evaluation under the Esscher premium principle Consider the Poisson-Gamma model with S(x, θ) P oisson(θ) and θ Gamma(α, β) with α ranging from 2 to 8 and β = 6. Let V & denote the expected squared loss with & = B, cu, col, P under the Esscher premium principle, where P denotes the method introduced in Pan et.al (2008) 2. Figure 3 below shows the numerical results of V & for different values of α and sample size 100. Figure 3. Curves of V & for different values of α and sample size 100 3.5 V B V cu 3 V P V col 2.5 2 1.5 1 0.5 0 2 3 4 5 6 7 8 α 2 Pan, M., Wang, R. and Wu, X. On the consistency of credibility premiums regarding Esscher principle. Insurance: Mathematics and Economics, 42, 119-126, 2008. 22