REGRESSION TREE CREDIBILITY MODEL

Similar documents
Regression Tree Credibility Model. Liqun Diao, Chengguo Weng

The Credibility Estimators with Dependence Over Risks

ABC random forest for parameter estimation. Jean-Michel Marin

Statistics and learning: Big Data

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Variance Reduction and Ensemble Methods

Statistical Machine Learning from Data

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

A few basics of credibility theory

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Learning Decision Trees

Influence measures for CART

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

BAGGING PREDICTORS AND RANDOM FOREST

day month year documentname/initials 1

CS145: INTRODUCTION TO DATA MINING

Holdout and Cross-Validation Methods Overfitting Avoidance

Classification using stochastic ensembles

Experience Rating in General Insurance by Credibility Estimation

Censoring Unbiased Regression Trees and Ensembles

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY

Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Ensembles of Classifiers.

Ensemble Methods and Random Forests

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Classification Using Decision Trees

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Learning Decision Trees

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

SF2930 Regression Analysis

Lecture 7: DecisionTrees

Decision trees COMS 4771

Constructing Prediction Intervals for Random Forests

Variable Selection and Weighting by Nearest Neighbor Ensembles

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Random Forests for Ordinal Response Data: Prediction and Variable Selection

Regression tree-based diagnostics for linear multilevel models

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

arxiv: v1 [math.st] 14 Mar 2016

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS

Introduction to Machine Learning Midterm Exam Solutions

Decision Trees: Overfitting

Gradient Boosting, Continued

FINAL: CS 6375 (Machine Learning) Fall 2014

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Supervised Learning via Decision Trees

Discriminative v. generative

Machine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Statistical Consulting Topics Classification and Regression Trees (CART)

Introduction to Data Science Data Mining for Business Analytics

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

Chapter 14 Combining Models

Rule Generation using Decision Trees

CS7267 MACHINE LEARNING

COMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto

Solving Classification Problems By Knowledge Sets

Kernel Density Estimation

Learning theory. Ensemble methods. Boosting. Boosting: history

A Magiv CV Theory for Large-Margin Classifiers

Neural Networks and Ensemble Methods for Classification

Online Learning and Sequential Decision Making

Bayesian non-parametric model to longitudinally predict churn

Qualifying Exam in Machine Learning

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

CSCI-567: Machine Learning (Spring 2019)

Adaptive Crowdsourcing via EM with Prior

Regression tree for functional response : application in oceanology

Stat 587: Key points and formulae Week 15

Introduction to Machine Learning Midterm Exam

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

Monitoring of Mineral Processing Operations based on Multivariate Similarity Indices

ECE 5424: Introduction to Machine Learning

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

Chapter 6: Classification

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA

arxiv: v5 [stat.me] 18 Apr 2016

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Performance of Cross Validation in Tree-Based Models

Understanding Generalization Error: Bounds and Decompositions

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Lossless Online Bayesian Bagging

Advanced Statistical Methods: Beyond Linear Regression

Learning with multiple models. Boosting.

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Structured Problems and Algorithms

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Asset Pricing. Chapter IX. The Consumption Capital Asset Pricing Model. June 20, 2006

Harrison B. Prosper. Bari Lectures

Gradient Boosting (Continued)

Transcription:

LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017

Overview Statistical }{{ Method } Regression Trees + Actuarial Model }{{} Credibility Model 1 / 23

Outline 1. CREDIBILITY MODEL 2. BENEFIT OF PARTITIONING THE DATA SPACE 3. REGRESSION TREE CREDIBILITY MODEL 4. SIMULATION STUDIES 5. AN APPLICATION TO US MEDICARE DATA 6. CONCLUDING REMARKS 2 / 23

Credibility Model Bühlmann-Straub Credibility Model Bühlmann-Straub Credibility Model CREDIBILITY THEORY has become the paradigm for insurance experience rating and widely used by actuaries. Bühlmann model (1967, 1969), and the Bühlmann-Straub model (1970). Consider a portfolio of I risks, where each individual risk i has n i years of claim experiences, i = 1, 2,..., I. Let Y i,j denote the claim ratio of individual risk i in year j, and m i,j be the associated volume measure, also known as weight variable. Collect all the claims experience of individual risk i into a vector Y i, i.e., Y i = (Y i,1,..., Y i,ni ) T. The profile of individual risk i is characterized by θ i, which is the realization of a random element Θ i (usually either a random variable or a random vector). 3 / 23

Credibility Model Bühlmann-Straub Credibility Model Assume that the following conditions are satisfied: H01. Conditionally given Θ i = θ i, {Y i,j : j = 1, 2,..., n i} are independent E[Y i,j Θ i = θ i] = µ(θ i) and Var[Y i,j Θ i = θ i] = σ2 (θ i) m i,j for some unknown but deterministic functions µ( ) and σ 2 ( ); H02. The pairs (Θ 1, Y 1),..., (Θ I, Y I) are independent, and {Θ 1,..., Θ I} are independent and identically distributed. Define structural parameters σ 2 = E[σ 2 (Θ i )] and τ 2 = Var[µ(Θ i )] for risks within the collective I := {1, 2,..., I}, and denote n i m i = m i,j, m = j=1 I m i, Y i = i=1 n i j=1 m i,j m i Y i,j, Y = I i=1 m i m Y i, Let µ = E[Y i,j ] denote the collective net premium. We are interested an estimator ˆµ(Θ i ) of µ(θ i ), is called the correct individual premium of the individual risk (the fair risk premium), such that it makes the following quadratic loss as small as possible E [ (ˆµ(Θ i ) µ(θ i )) 2]. 4 / 23

Credibility Model Inhomogeneous Credibility Premium Inhomogeneous Credibility Premium The (inhomogeneous) credibility premium for an individual risk is defined as the best premium predictor P i among the class I Q : Q = a n i 0 + a i,j Y i,j, a 0, a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )µ, (1) where α i = is called credibility factor for an individual risk. m i m i + σ 2 /τ 2 (2) 5 / 23

Credibility Model Inhomogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by The minimum quadratic loss for E σ 2 L i = m i + σ 2 /τ 2. (3) L i = [ (Q i Y i,n+1 ) 2] is given by σ 2 m i + σ 2 /τ 2 + σ2. (4) 6 / 23

Credibility Model Homogeneous Credibility Premium Homogeneous Credibility Premium The homogeneous credibility premium for an individual risk i from the collective I is defined as the best premium predictor P i among the class I Q n i i : Q i = a i,j Y i,j, E[Q i ] = E[µ(Θ i )], a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )Y, (5) where α i = is called credibility factor for individual risk i. m i m i + σ 2 /τ 2 (6) 7 / 23

Credibility Model Homogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by where α = I i=1 α i. The minimum quadratic loss for E where α = I i=1 α i. ( L i = τ 2 (1 α i ) 1 + 1 α ) i, (7) α [ (Q i Y i,n+1 ) 2] is given by ( L i = τ 2 (1 α i ) 1 + 1 α ) i + σ 2, (8) α 8 / 23

Benefit of Partitioning the Collective Better if we partition the collective? 9 / 23

Benefit of Partitioning the Collective The answer is artificially YES but genuinely not necessary For n i = n and m i,j = 1 for and i = 1,..., I and j = 1,..., n. Consider the loss function of homogenous premium and µ(θ i ). Arbitrarily partition a given collective of individuals into two sub-collectives, and then apply credibility formula separately for each of the two resulting sub-collectives. Let L 1 and L 2 be the total credibility losses we have for the two sub-collectives respectively, while L denotes the credibility loss with premium prediction apply with the whole collective without any partitioning. We formally proved L 1 + L 2 L. The above result relies on an artificial assumption: We know the credibility factor α for each sub-collective". The premium prediction is given by P i = α i Y i + (1 α i )Y, where α i is an estimator for α i, i = 1,..., I. 10 / 23

Regression Tree Credibility Model Overview of Regression Trees (Breiman, Friedman, Stone and Olshen, 1984) CLASSIFICATION AND REGRESSION TREES (CART; Breiman, Friedman, Stone and Olshen, 1984) and RANDOM FORESTS (RF; Breiman, 2001) are the most popular single-tree and ensemble recursive partitioning methods respectively. Any tree building process can be broadly described in three steps: 1 Choosing a criterion for making splitting decisions; 2 Generating a corresponding sequence of candidate trees; 3 Selecting best candidate tree. It s common to allow each step to depend on a given loss function Prevailing software implementation of CART: rpart by R 11 / 23

Regression Tree Credibility Model Covariate-Dependent Model Setup Consider a portfolio of I risks numbered with 1,..., I. Let Y i = (Y i,1,..., Y i,ni ) T be the vector of claim ratios, m i = (m i,1,..., m i,ni ) be the corresponding weight vector, and X i = (X i,1,..., X i,p ) T be the covariate vector associated with individual risk i, i = 1,..., I. The risk profile of each individual risk i is characterized by a scalar θ i, which is a realization of a random element Θ i. The following two conditions are further assumed: H11. The triplets (Θ 1, Y 1, X 1),..., (Θ I, Y I, X I) are independent; H12. Conditionally given Θ i = θ i and X i = x i, the entries Y i,j, j = 1,..., n, are independent with E[Y i,j X i = x i, Θ i = θ i] = µ(x i, θ i) and Var[Y i,j X i = x i, Θ i = θ i] = σ2 (x i, θ i) m i,j for some unknown but deterministic functions µ(, ) and σ 2 (, ). 12 / 23

Regression Tree Credibility Model We approximate µ(x i, Θ i) and σ 2 (X i, Θ i): K I{X i A k}µ (k) (Θ i), and k=1 K k=1 I{X i A k} σ2 (k)(θ i) m i,j, where {A 1, A 2,..., A K} is a partition of the covariate space µ (k) (Θ i) and σ 2 (k)(θ i) respectively represent the net premium and the variance of an individual risk i from the kth sub-collective with a risk profile Θ i, i.e., µ (k) (θ i) = E(Y i,j X i A k, Θ i = θ i) and σ 2 (k)(θ i) = Var(Y i,j X i A k, Θ i = θ i). The condition of X i A k means that the individual risk i is classified into the kth sub-collective based on its covariate information. 13 / 23

Regression Tree Credibility Model Regression Tree Credibility Premium We target to find a Good" partition {A 1, A 2,..., A K } and apply credibility formula for each sub-collective A i separately. By [ Good", we should minimize the true prediction error ( ) ] 2 E µ(θ i ) P i as much as possible Credibility Regression Trees Adopt one of the four credibility loss function with plugged-in estimates of structure parameters; Invent a heuristic longitudinal cross-validation. Credibility formula is applied for premium prediction for each terminal node separately, which is Regression Tree Credibility Premium. 14 / 23

Simulation Studies Simulation Studies Consider a collective of I = 300 individual risks For each i = 1,..., I, independently simulate covariate vector when p = 10 and p = 50. X i = (X i,1,..., X i,p ) i.i.d. U{1,..., 100} p Balanced Claims Model: individual risks with n = 5, 10 and 20 years of claims experience, respectively. Unbalanced claims model is also explored. 1, 000 independent samples 15 / 23

Simulation Studies Simulation Schemes Simulation Scheme 1 Interaction effect For each i = 1,..., I, independently simulate ε i,j from a given distribution function F( ), for j = 1,..., n, and define the n claims of individual risk i as where Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (9) f (X i ) = 0.01 ( X i,1 + 2X i,2 X i,3 + 2 X i,1 X i,3 X i,2 X i,4 ). (10) In our simulation, F takes one of the following three distributions: EXP(1.6487) LN (0, 1) PAR(3, 3.2974) 16 / 23

Simulation Studies Simulation Schemes Simulation Scheme 2 Interactive effect+heterogeneous variance For each i = 1,..., I, independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariate X i of the individual risk i, for j = 1,..., n, and define the n claims of individual risk i as Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (11) where f (X i ) is given by (10). We respectively consider three distinct distributions for F( ; X i ): (1) EXP ( e γ(xi)/2), (2) LN(0, γ(x i )), (3) PAR(3, 2e γ(xi)/2 ), (12) where γ(x i ) = 1 102 2Xi,1 X i,2 + X i,1 X i,2. 17 / 23

Simulation Studies Simulation Schemes Simulation Scheme 3 Interactive effect + heterogeneous variance + multiplication random effect For each i = 1,..., I, (1) independently simulate random effect variable Θ i from the uniform distribution U(0.9, 1.1) (2) independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariates X i associated with risk i, for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i ) is given by (10). Y i,j = Θ i [ e f (Xi) + ε i,j ], j = 1,..., n, (13) We consider each of the three distributions described in Scheme 2 for the distribution F( ; X i ). 18 / 23

Simulation Studies Simulation Schemes Simulation Scheme 4 Interactive effect + heterogeneous variance + complex random effect structure For each i = 1,..., I, Θ i = (ξ i,1, ξ i,2 ) T, (1) independently simulate random effect variables ξ i,1 and ξ i,2 from the uniform distribution U(0.9, 1.1). (2) independently simulate ε i,j from a distribution function F( ; X i, ξ i,2) for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i) is defined in (10). Y i,j = e ξ i,1 f (X i ) + ε i,j, j = 1,..., n, (14) We respectively consider the three distributions as defined in equation (12) in Scheme 2 with γ(x i ) replaced by ξ i,2 γ(x i ) so that the distribution of ε i,j depends on the random effect variable ξ i,2 in addition to covariate variable X i. 19 / 23

Simulation Studies Methods Compared Data-driven covariate-dependent partitioning: For each simulated sample, we grow and prune regression trees built using four credibility loss functions (R1-4) and L 2 (RL 2)loss function and the best tree is selected using longitudinal cross-validation. In addition, we consider ad hoc covariate-dependent partitioning, which is defined via the following notation: { } R(X j) = {i I : X i,j 50}, {i I : X i,j > 50}, j = 1,, 5. and { R(X j1, X j2 ) = {i I : X i,j1 50, X i,j2 50}, {i I : X i,j1 50, X i,j2 > 50}, } {i I : X i,j1 > 50, X i,j2 50}, {i I : X i,j1 > 50, X i,j2 > 50}, for j 1, j 2 = 1,..., 5 with j 1 j 2. We consider R(X 2), R(X 4), R(X 1, X 2, X 3), R(X 1, X 2, X 4), R(X 2, X 3, X 4), R(X 1, X 3, X 4), and R(X 1, X 2, X 3, X 4). 20 / 23

Simulation Studies Evaluation Metric Prediction error for a given partitioning {I 1,..., I K }: PE = 1 I I K i=1 k=1 where π (H)(k) i is the resulting premium prediction µ(x i, Θ i) is the true net premium The collective prediction error: PE 0 = 1 I ( 2 I{X i I k } π (H)(k) i µ(x i, Θ i )), (15) I i=1 ( P (H) i µ(x i, Θ i )) 2, which does not use any covariate information and is anticipated to underperform compared to various kinds of covariate-dependent partitioning. The relative prediction error (RPE): R = PE/PE 0. 21 / 23

Simulation Studies Simulation Results 21 / 23

Concluding Remarks 22 / 23

Concluding Remarks Concluding Remarks We propose novel regression tree credibility (RTC) model, and bring machine learning techniques into the framework of credibility theory to enhance the prediction accuracy of credibility premium. In our proposed model, no ex ante analysis on the relationship between individual net premium and covariate variables is necessary, and the designed regression tree algorithm automatically selects influential covariate variables and informative cutting points to form a partition of data space, upon which a well-performed premium prediction rule can be consequently established. Our simulation studies and data analysis show that the proposed RTC model performs very well compared to no partitioning, ad-hoc partitioning and the L 2 loss based binary partition procedure. 22 / 23

Concluding Remarks Although only the Classification and Regression Trees is introduced in this paper, it will be fruitful to pursue further research by considering other recursive partitioning methods, e.g., partdsa (partitioning deletion/substitution/addition algorithm) and MARS (multivariate adaptive regression splines) It will be even more promising to consider the applications of ensemble algorithms, such as bagging, boosting, and random forests. It will be useful to develop an algorithm which can adopt time-dependent covaraites. It will be even more fruitful to consider their applications in various other insurance problems in addition to the premium rating, since many practical insurance problems amount to quantifying the relationship between insureds claims and their demographic information. It is the authors hope that the present paper will stimulate more actuarial applications of these machine learning techniques and eventually contribute to the development of insurance predictive analytics in general. 23 / 23