LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017
Overview Statistical }{{ Method } Regression Trees + Actuarial Model }{{} Credibility Model 1 / 23
Outline 1. CREDIBILITY MODEL 2. BENEFIT OF PARTITIONING THE DATA SPACE 3. REGRESSION TREE CREDIBILITY MODEL 4. SIMULATION STUDIES 5. AN APPLICATION TO US MEDICARE DATA 6. CONCLUDING REMARKS 2 / 23
Credibility Model Bühlmann-Straub Credibility Model Bühlmann-Straub Credibility Model CREDIBILITY THEORY has become the paradigm for insurance experience rating and widely used by actuaries. Bühlmann model (1967, 1969), and the Bühlmann-Straub model (1970). Consider a portfolio of I risks, where each individual risk i has n i years of claim experiences, i = 1, 2,..., I. Let Y i,j denote the claim ratio of individual risk i in year j, and m i,j be the associated volume measure, also known as weight variable. Collect all the claims experience of individual risk i into a vector Y i, i.e., Y i = (Y i,1,..., Y i,ni ) T. The profile of individual risk i is characterized by θ i, which is the realization of a random element Θ i (usually either a random variable or a random vector). 3 / 23
Credibility Model Bühlmann-Straub Credibility Model Assume that the following conditions are satisfied: H01. Conditionally given Θ i = θ i, {Y i,j : j = 1, 2,..., n i} are independent E[Y i,j Θ i = θ i] = µ(θ i) and Var[Y i,j Θ i = θ i] = σ2 (θ i) m i,j for some unknown but deterministic functions µ( ) and σ 2 ( ); H02. The pairs (Θ 1, Y 1),..., (Θ I, Y I) are independent, and {Θ 1,..., Θ I} are independent and identically distributed. Define structural parameters σ 2 = E[σ 2 (Θ i )] and τ 2 = Var[µ(Θ i )] for risks within the collective I := {1, 2,..., I}, and denote n i m i = m i,j, m = j=1 I m i, Y i = i=1 n i j=1 m i,j m i Y i,j, Y = I i=1 m i m Y i, Let µ = E[Y i,j ] denote the collective net premium. We are interested an estimator ˆµ(Θ i ) of µ(θ i ), is called the correct individual premium of the individual risk (the fair risk premium), such that it makes the following quadratic loss as small as possible E [ (ˆµ(Θ i ) µ(θ i )) 2]. 4 / 23
Credibility Model Inhomogeneous Credibility Premium Inhomogeneous Credibility Premium The (inhomogeneous) credibility premium for an individual risk is defined as the best premium predictor P i among the class I Q : Q = a n i 0 + a i,j Y i,j, a 0, a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )µ, (1) where α i = is called credibility factor for an individual risk. m i m i + σ 2 /τ 2 (2) 5 / 23
Credibility Model Inhomogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by The minimum quadratic loss for E σ 2 L i = m i + σ 2 /τ 2. (3) L i = [ (Q i Y i,n+1 ) 2] is given by σ 2 m i + σ 2 /τ 2 + σ2. (4) 6 / 23
Credibility Model Homogeneous Credibility Premium Homogeneous Credibility Premium The homogeneous credibility premium for an individual risk i from the collective I is defined as the best premium predictor P i among the class I Q n i i : Q i = a i,j Y i,j, E[Q i ] = E[µ(Θ i )], a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )Y, (5) where α i = is called credibility factor for individual risk i. m i m i + σ 2 /τ 2 (6) 7 / 23
Credibility Model Homogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by where α = I i=1 α i. The minimum quadratic loss for E where α = I i=1 α i. ( L i = τ 2 (1 α i ) 1 + 1 α ) i, (7) α [ (Q i Y i,n+1 ) 2] is given by ( L i = τ 2 (1 α i ) 1 + 1 α ) i + σ 2, (8) α 8 / 23
Benefit of Partitioning the Collective Better if we partition the collective? 9 / 23
Benefit of Partitioning the Collective The answer is artificially YES but genuinely not necessary For n i = n and m i,j = 1 for and i = 1,..., I and j = 1,..., n. Consider the loss function of homogenous premium and µ(θ i ). Arbitrarily partition a given collective of individuals into two sub-collectives, and then apply credibility formula separately for each of the two resulting sub-collectives. Let L 1 and L 2 be the total credibility losses we have for the two sub-collectives respectively, while L denotes the credibility loss with premium prediction apply with the whole collective without any partitioning. We formally proved L 1 + L 2 L. The above result relies on an artificial assumption: We know the credibility factor α for each sub-collective". The premium prediction is given by P i = α i Y i + (1 α i )Y, where α i is an estimator for α i, i = 1,..., I. 10 / 23
Regression Tree Credibility Model Overview of Regression Trees (Breiman, Friedman, Stone and Olshen, 1984) CLASSIFICATION AND REGRESSION TREES (CART; Breiman, Friedman, Stone and Olshen, 1984) and RANDOM FORESTS (RF; Breiman, 2001) are the most popular single-tree and ensemble recursive partitioning methods respectively. Any tree building process can be broadly described in three steps: 1 Choosing a criterion for making splitting decisions; 2 Generating a corresponding sequence of candidate trees; 3 Selecting best candidate tree. It s common to allow each step to depend on a given loss function Prevailing software implementation of CART: rpart by R 11 / 23
Regression Tree Credibility Model Covariate-Dependent Model Setup Consider a portfolio of I risks numbered with 1,..., I. Let Y i = (Y i,1,..., Y i,ni ) T be the vector of claim ratios, m i = (m i,1,..., m i,ni ) be the corresponding weight vector, and X i = (X i,1,..., X i,p ) T be the covariate vector associated with individual risk i, i = 1,..., I. The risk profile of each individual risk i is characterized by a scalar θ i, which is a realization of a random element Θ i. The following two conditions are further assumed: H11. The triplets (Θ 1, Y 1, X 1),..., (Θ I, Y I, X I) are independent; H12. Conditionally given Θ i = θ i and X i = x i, the entries Y i,j, j = 1,..., n, are independent with E[Y i,j X i = x i, Θ i = θ i] = µ(x i, θ i) and Var[Y i,j X i = x i, Θ i = θ i] = σ2 (x i, θ i) m i,j for some unknown but deterministic functions µ(, ) and σ 2 (, ). 12 / 23
Regression Tree Credibility Model We approximate µ(x i, Θ i) and σ 2 (X i, Θ i): K I{X i A k}µ (k) (Θ i), and k=1 K k=1 I{X i A k} σ2 (k)(θ i) m i,j, where {A 1, A 2,..., A K} is a partition of the covariate space µ (k) (Θ i) and σ 2 (k)(θ i) respectively represent the net premium and the variance of an individual risk i from the kth sub-collective with a risk profile Θ i, i.e., µ (k) (θ i) = E(Y i,j X i A k, Θ i = θ i) and σ 2 (k)(θ i) = Var(Y i,j X i A k, Θ i = θ i). The condition of X i A k means that the individual risk i is classified into the kth sub-collective based on its covariate information. 13 / 23
Regression Tree Credibility Model Regression Tree Credibility Premium We target to find a Good" partition {A 1, A 2,..., A K } and apply credibility formula for each sub-collective A i separately. By [ Good", we should minimize the true prediction error ( ) ] 2 E µ(θ i ) P i as much as possible Credibility Regression Trees Adopt one of the four credibility loss function with plugged-in estimates of structure parameters; Invent a heuristic longitudinal cross-validation. Credibility formula is applied for premium prediction for each terminal node separately, which is Regression Tree Credibility Premium. 14 / 23
Simulation Studies Simulation Studies Consider a collective of I = 300 individual risks For each i = 1,..., I, independently simulate covariate vector when p = 10 and p = 50. X i = (X i,1,..., X i,p ) i.i.d. U{1,..., 100} p Balanced Claims Model: individual risks with n = 5, 10 and 20 years of claims experience, respectively. Unbalanced claims model is also explored. 1, 000 independent samples 15 / 23
Simulation Studies Simulation Schemes Simulation Scheme 1 Interaction effect For each i = 1,..., I, independently simulate ε i,j from a given distribution function F( ), for j = 1,..., n, and define the n claims of individual risk i as where Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (9) f (X i ) = 0.01 ( X i,1 + 2X i,2 X i,3 + 2 X i,1 X i,3 X i,2 X i,4 ). (10) In our simulation, F takes one of the following three distributions: EXP(1.6487) LN (0, 1) PAR(3, 3.2974) 16 / 23
Simulation Studies Simulation Schemes Simulation Scheme 2 Interactive effect+heterogeneous variance For each i = 1,..., I, independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariate X i of the individual risk i, for j = 1,..., n, and define the n claims of individual risk i as Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (11) where f (X i ) is given by (10). We respectively consider three distinct distributions for F( ; X i ): (1) EXP ( e γ(xi)/2), (2) LN(0, γ(x i )), (3) PAR(3, 2e γ(xi)/2 ), (12) where γ(x i ) = 1 102 2Xi,1 X i,2 + X i,1 X i,2. 17 / 23
Simulation Studies Simulation Schemes Simulation Scheme 3 Interactive effect + heterogeneous variance + multiplication random effect For each i = 1,..., I, (1) independently simulate random effect variable Θ i from the uniform distribution U(0.9, 1.1) (2) independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariates X i associated with risk i, for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i ) is given by (10). Y i,j = Θ i [ e f (Xi) + ε i,j ], j = 1,..., n, (13) We consider each of the three distributions described in Scheme 2 for the distribution F( ; X i ). 18 / 23
Simulation Studies Simulation Schemes Simulation Scheme 4 Interactive effect + heterogeneous variance + complex random effect structure For each i = 1,..., I, Θ i = (ξ i,1, ξ i,2 ) T, (1) independently simulate random effect variables ξ i,1 and ξ i,2 from the uniform distribution U(0.9, 1.1). (2) independently simulate ε i,j from a distribution function F( ; X i, ξ i,2) for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i) is defined in (10). Y i,j = e ξ i,1 f (X i ) + ε i,j, j = 1,..., n, (14) We respectively consider the three distributions as defined in equation (12) in Scheme 2 with γ(x i ) replaced by ξ i,2 γ(x i ) so that the distribution of ε i,j depends on the random effect variable ξ i,2 in addition to covariate variable X i. 19 / 23
Simulation Studies Methods Compared Data-driven covariate-dependent partitioning: For each simulated sample, we grow and prune regression trees built using four credibility loss functions (R1-4) and L 2 (RL 2)loss function and the best tree is selected using longitudinal cross-validation. In addition, we consider ad hoc covariate-dependent partitioning, which is defined via the following notation: { } R(X j) = {i I : X i,j 50}, {i I : X i,j > 50}, j = 1,, 5. and { R(X j1, X j2 ) = {i I : X i,j1 50, X i,j2 50}, {i I : X i,j1 50, X i,j2 > 50}, } {i I : X i,j1 > 50, X i,j2 50}, {i I : X i,j1 > 50, X i,j2 > 50}, for j 1, j 2 = 1,..., 5 with j 1 j 2. We consider R(X 2), R(X 4), R(X 1, X 2, X 3), R(X 1, X 2, X 4), R(X 2, X 3, X 4), R(X 1, X 3, X 4), and R(X 1, X 2, X 3, X 4). 20 / 23
Simulation Studies Evaluation Metric Prediction error for a given partitioning {I 1,..., I K }: PE = 1 I I K i=1 k=1 where π (H)(k) i is the resulting premium prediction µ(x i, Θ i) is the true net premium The collective prediction error: PE 0 = 1 I ( 2 I{X i I k } π (H)(k) i µ(x i, Θ i )), (15) I i=1 ( P (H) i µ(x i, Θ i )) 2, which does not use any covariate information and is anticipated to underperform compared to various kinds of covariate-dependent partitioning. The relative prediction error (RPE): R = PE/PE 0. 21 / 23
Simulation Studies Simulation Results 21 / 23
Concluding Remarks 22 / 23
Concluding Remarks Concluding Remarks We propose novel regression tree credibility (RTC) model, and bring machine learning techniques into the framework of credibility theory to enhance the prediction accuracy of credibility premium. In our proposed model, no ex ante analysis on the relationship between individual net premium and covariate variables is necessary, and the designed regression tree algorithm automatically selects influential covariate variables and informative cutting points to form a partition of data space, upon which a well-performed premium prediction rule can be consequently established. Our simulation studies and data analysis show that the proposed RTC model performs very well compared to no partitioning, ad-hoc partitioning and the L 2 loss based binary partition procedure. 22 / 23
Concluding Remarks Although only the Classification and Regression Trees is introduced in this paper, it will be fruitful to pursue further research by considering other recursive partitioning methods, e.g., partdsa (partitioning deletion/substitution/addition algorithm) and MARS (multivariate adaptive regression splines) It will be even more promising to consider the applications of ensemble algorithms, such as bagging, boosting, and random forests. It will be useful to develop an algorithm which can adopt time-dependent covaraites. It will be even more fruitful to consider their applications in various other insurance problems in addition to the premium rating, since many practical insurance problems amount to quantifying the relationship between insureds claims and their demographic information. It is the authors hope that the present paper will stimulate more actuarial applications of these machine learning techniques and eventually contribute to the development of insurance predictive analytics in general. 23 / 23