REGRESSION TREE CREDIBILITY MODEL

Size: px
Start display at page:

Download "REGRESSION TREE CREDIBILITY MODEL"

Transcription

1 LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017

2 Overview Statistical }{{ Method } Regression Trees + Actuarial Model }{{} Credibility Model 1 / 23

3 Outline 1. CREDIBILITY MODEL 2. BENEFIT OF PARTITIONING THE DATA SPACE 3. REGRESSION TREE CREDIBILITY MODEL 4. SIMULATION STUDIES 5. AN APPLICATION TO US MEDICARE DATA 6. CONCLUDING REMARKS 2 / 23

4 Credibility Model Bühlmann-Straub Credibility Model Bühlmann-Straub Credibility Model CREDIBILITY THEORY has become the paradigm for insurance experience rating and widely used by actuaries. Bühlmann model (1967, 1969), and the Bühlmann-Straub model (1970). Consider a portfolio of I risks, where each individual risk i has n i years of claim experiences, i = 1, 2,..., I. Let Y i,j denote the claim ratio of individual risk i in year j, and m i,j be the associated volume measure, also known as weight variable. Collect all the claims experience of individual risk i into a vector Y i, i.e., Y i = (Y i,1,..., Y i,ni ) T. The profile of individual risk i is characterized by θ i, which is the realization of a random element Θ i (usually either a random variable or a random vector). 3 / 23

5 Credibility Model Bühlmann-Straub Credibility Model Assume that the following conditions are satisfied: H01. Conditionally given Θ i = θ i, {Y i,j : j = 1, 2,..., n i} are independent E[Y i,j Θ i = θ i] = µ(θ i) and Var[Y i,j Θ i = θ i] = σ2 (θ i) m i,j for some unknown but deterministic functions µ( ) and σ 2 ( ); H02. The pairs (Θ 1, Y 1),..., (Θ I, Y I) are independent, and {Θ 1,..., Θ I} are independent and identically distributed. Define structural parameters σ 2 = E[σ 2 (Θ i )] and τ 2 = Var[µ(Θ i )] for risks within the collective I := {1, 2,..., I}, and denote n i m i = m i,j, m = j=1 I m i, Y i = i=1 n i j=1 m i,j m i Y i,j, Y = I i=1 m i m Y i, Let µ = E[Y i,j ] denote the collective net premium. We are interested an estimator ˆµ(Θ i ) of µ(θ i ), is called the correct individual premium of the individual risk (the fair risk premium), such that it makes the following quadratic loss as small as possible E [ (ˆµ(Θ i ) µ(θ i )) 2]. 4 / 23

6 Credibility Model Inhomogeneous Credibility Premium Inhomogeneous Credibility Premium The (inhomogeneous) credibility premium for an individual risk is defined as the best premium predictor P i among the class I Q : Q = a n i 0 + a i,j Y i,j, a 0, a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )µ, (1) where α i = is called credibility factor for an individual risk. m i m i + σ 2 /τ 2 (2) 5 / 23

7 Credibility Model Inhomogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by The minimum quadratic loss for E σ 2 L i = m i + σ 2 /τ 2. (3) L i = [ (Q i Y i,n+1 ) 2] is given by σ 2 m i + σ 2 /τ 2 + σ2. (4) 6 / 23

8 Credibility Model Homogeneous Credibility Premium Homogeneous Credibility Premium The homogeneous credibility premium for an individual risk i from the collective I is defined as the best premium predictor P i among the class I Q n i i : Q i = a i,j Y i,j, E[Q i ] = E[µ(Θ i )], a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )Y, (5) where α i = is called credibility factor for individual risk i. m i m i + σ 2 /τ 2 (6) 7 / 23

9 Credibility Model Homogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by where α = I i=1 α i. The minimum quadratic loss for E where α = I i=1 α i. ( L i = τ 2 (1 α i ) α ) i, (7) α [ (Q i Y i,n+1 ) 2] is given by ( L i = τ 2 (1 α i ) α ) i + σ 2, (8) α 8 / 23

10 Benefit of Partitioning the Collective Better if we partition the collective? 9 / 23

11 Benefit of Partitioning the Collective The answer is artificially YES but genuinely not necessary For n i = n and m i,j = 1 for and i = 1,..., I and j = 1,..., n. Consider the loss function of homogenous premium and µ(θ i ). Arbitrarily partition a given collective of individuals into two sub-collectives, and then apply credibility formula separately for each of the two resulting sub-collectives. Let L 1 and L 2 be the total credibility losses we have for the two sub-collectives respectively, while L denotes the credibility loss with premium prediction apply with the whole collective without any partitioning. We formally proved L 1 + L 2 L. The above result relies on an artificial assumption: We know the credibility factor α for each sub-collective". The premium prediction is given by P i = α i Y i + (1 α i )Y, where α i is an estimator for α i, i = 1,..., I. 10 / 23

12 Regression Tree Credibility Model Overview of Regression Trees (Breiman, Friedman, Stone and Olshen, 1984) CLASSIFICATION AND REGRESSION TREES (CART; Breiman, Friedman, Stone and Olshen, 1984) and RANDOM FORESTS (RF; Breiman, 2001) are the most popular single-tree and ensemble recursive partitioning methods respectively. Any tree building process can be broadly described in three steps: 1 Choosing a criterion for making splitting decisions; 2 Generating a corresponding sequence of candidate trees; 3 Selecting best candidate tree. It s common to allow each step to depend on a given loss function Prevailing software implementation of CART: rpart by R 11 / 23

13 Regression Tree Credibility Model Covariate-Dependent Model Setup Consider a portfolio of I risks numbered with 1,..., I. Let Y i = (Y i,1,..., Y i,ni ) T be the vector of claim ratios, m i = (m i,1,..., m i,ni ) be the corresponding weight vector, and X i = (X i,1,..., X i,p ) T be the covariate vector associated with individual risk i, i = 1,..., I. The risk profile of each individual risk i is characterized by a scalar θ i, which is a realization of a random element Θ i. The following two conditions are further assumed: H11. The triplets (Θ 1, Y 1, X 1),..., (Θ I, Y I, X I) are independent; H12. Conditionally given Θ i = θ i and X i = x i, the entries Y i,j, j = 1,..., n, are independent with E[Y i,j X i = x i, Θ i = θ i] = µ(x i, θ i) and Var[Y i,j X i = x i, Θ i = θ i] = σ2 (x i, θ i) m i,j for some unknown but deterministic functions µ(, ) and σ 2 (, ). 12 / 23

14 Regression Tree Credibility Model We approximate µ(x i, Θ i) and σ 2 (X i, Θ i): K I{X i A k}µ (k) (Θ i), and k=1 K k=1 I{X i A k} σ2 (k)(θ i) m i,j, where {A 1, A 2,..., A K} is a partition of the covariate space µ (k) (Θ i) and σ 2 (k)(θ i) respectively represent the net premium and the variance of an individual risk i from the kth sub-collective with a risk profile Θ i, i.e., µ (k) (θ i) = E(Y i,j X i A k, Θ i = θ i) and σ 2 (k)(θ i) = Var(Y i,j X i A k, Θ i = θ i). The condition of X i A k means that the individual risk i is classified into the kth sub-collective based on its covariate information. 13 / 23

15 Regression Tree Credibility Model Regression Tree Credibility Premium We target to find a Good" partition {A 1, A 2,..., A K } and apply credibility formula for each sub-collective A i separately. By [ Good", we should minimize the true prediction error ( ) ] 2 E µ(θ i ) P i as much as possible Credibility Regression Trees Adopt one of the four credibility loss function with plugged-in estimates of structure parameters; Invent a heuristic longitudinal cross-validation. Credibility formula is applied for premium prediction for each terminal node separately, which is Regression Tree Credibility Premium. 14 / 23

16 Simulation Studies Simulation Studies Consider a collective of I = 300 individual risks For each i = 1,..., I, independently simulate covariate vector when p = 10 and p = 50. X i = (X i,1,..., X i,p ) i.i.d. U{1,..., 100} p Balanced Claims Model: individual risks with n = 5, 10 and 20 years of claims experience, respectively. Unbalanced claims model is also explored. 1, 000 independent samples 15 / 23

17 Simulation Studies Simulation Schemes Simulation Scheme 1 Interaction effect For each i = 1,..., I, independently simulate ε i,j from a given distribution function F( ), for j = 1,..., n, and define the n claims of individual risk i as where Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (9) f (X i ) = 0.01 ( X i,1 + 2X i,2 X i,3 + 2 X i,1 X i,3 X i,2 X i,4 ). (10) In our simulation, F takes one of the following three distributions: EXP(1.6487) LN (0, 1) PAR(3, ) 16 / 23

18 Simulation Studies Simulation Schemes Simulation Scheme 2 Interactive effect+heterogeneous variance For each i = 1,..., I, independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariate X i of the individual risk i, for j = 1,..., n, and define the n claims of individual risk i as Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (11) where f (X i ) is given by (10). We respectively consider three distinct distributions for F( ; X i ): (1) EXP ( e γ(xi)/2), (2) LN(0, γ(x i )), (3) PAR(3, 2e γ(xi)/2 ), (12) where γ(x i ) = Xi,1 X i,2 + X i,1 X i,2. 17 / 23

19 Simulation Studies Simulation Schemes Simulation Scheme 3 Interactive effect + heterogeneous variance + multiplication random effect For each i = 1,..., I, (1) independently simulate random effect variable Θ i from the uniform distribution U(0.9, 1.1) (2) independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariates X i associated with risk i, for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i ) is given by (10). Y i,j = Θ i [ e f (Xi) + ε i,j ], j = 1,..., n, (13) We consider each of the three distributions described in Scheme 2 for the distribution F( ; X i ). 18 / 23

20 Simulation Studies Simulation Schemes Simulation Scheme 4 Interactive effect + heterogeneous variance + complex random effect structure For each i = 1,..., I, Θ i = (ξ i,1, ξ i,2 ) T, (1) independently simulate random effect variables ξ i,1 and ξ i,2 from the uniform distribution U(0.9, 1.1). (2) independently simulate ε i,j from a distribution function F( ; X i, ξ i,2) for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i) is defined in (10). Y i,j = e ξ i,1 f (X i ) + ε i,j, j = 1,..., n, (14) We respectively consider the three distributions as defined in equation (12) in Scheme 2 with γ(x i ) replaced by ξ i,2 γ(x i ) so that the distribution of ε i,j depends on the random effect variable ξ i,2 in addition to covariate variable X i. 19 / 23

21 Simulation Studies Methods Compared Data-driven covariate-dependent partitioning: For each simulated sample, we grow and prune regression trees built using four credibility loss functions (R1-4) and L 2 (RL 2)loss function and the best tree is selected using longitudinal cross-validation. In addition, we consider ad hoc covariate-dependent partitioning, which is defined via the following notation: { } R(X j) = {i I : X i,j 50}, {i I : X i,j > 50}, j = 1,, 5. and { R(X j1, X j2 ) = {i I : X i,j1 50, X i,j2 50}, {i I : X i,j1 50, X i,j2 > 50}, } {i I : X i,j1 > 50, X i,j2 50}, {i I : X i,j1 > 50, X i,j2 > 50}, for j 1, j 2 = 1,..., 5 with j 1 j 2. We consider R(X 2), R(X 4), R(X 1, X 2, X 3), R(X 1, X 2, X 4), R(X 2, X 3, X 4), R(X 1, X 3, X 4), and R(X 1, X 2, X 3, X 4). 20 / 23

22 Simulation Studies Evaluation Metric Prediction error for a given partitioning {I 1,..., I K }: PE = 1 I I K i=1 k=1 where π (H)(k) i is the resulting premium prediction µ(x i, Θ i) is the true net premium The collective prediction error: PE 0 = 1 I ( 2 I{X i I k } π (H)(k) i µ(x i, Θ i )), (15) I i=1 ( P (H) i µ(x i, Θ i )) 2, which does not use any covariate information and is anticipated to underperform compared to various kinds of covariate-dependent partitioning. The relative prediction error (RPE): R = PE/PE / 23

23 Simulation Studies Simulation Results 21 / 23

24 Concluding Remarks 22 / 23

25 Concluding Remarks Concluding Remarks We propose novel regression tree credibility (RTC) model, and bring machine learning techniques into the framework of credibility theory to enhance the prediction accuracy of credibility premium. In our proposed model, no ex ante analysis on the relationship between individual net premium and covariate variables is necessary, and the designed regression tree algorithm automatically selects influential covariate variables and informative cutting points to form a partition of data space, upon which a well-performed premium prediction rule can be consequently established. Our simulation studies and data analysis show that the proposed RTC model performs very well compared to no partitioning, ad-hoc partitioning and the L 2 loss based binary partition procedure. 22 / 23

26 Concluding Remarks Although only the Classification and Regression Trees is introduced in this paper, it will be fruitful to pursue further research by considering other recursive partitioning methods, e.g., partdsa (partitioning deletion/substitution/addition algorithm) and MARS (multivariate adaptive regression splines) It will be even more promising to consider the applications of ensemble algorithms, such as bagging, boosting, and random forests. It will be useful to develop an algorithm which can adopt time-dependent covaraites. It will be even more fruitful to consider their applications in various other insurance problems in addition to the premium rating, since many practical insurance problems amount to quantifying the relationship between insureds claims and their demographic information. It is the authors hope that the present paper will stimulate more actuarial applications of these machine learning techniques and eventually contribute to the development of insurance predictive analytics in general. 23 / 23

Regression Tree Credibility Model. Liqun Diao, Chengguo Weng

Regression Tree Credibility Model. Liqun Diao, Chengguo Weng Regression Tree Credibility Model Liqun Diao, Chengguo Weng Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, Canada Version: Sep 16, 2016 Abstract Credibility

More information

The Credibility Estimators with Dependence Over Risks

The Credibility Estimators with Dependence Over Risks Applied Mathematical Sciences, Vol. 8, 2014, no. 161, 8045-8050 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.410803 The Credibility Estimators with Dependence Over Risks Qiang Zhang

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

A few basics of credibility theory

A few basics of credibility theory A few basics of credibility theory Greg Taylor Director, Taylor Fry Consulting Actuaries Professorial Associate, University of Melbourne Adjunct Professor, University of New South Wales General credibility

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Influence measures for CART

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Experience Rating in General Insurance by Credibility Estimation

Experience Rating in General Insurance by Credibility Estimation Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new

More information

Censoring Unbiased Regression Trees and Ensembles

Censoring Unbiased Regression Trees and Ensembles Johns Hopkins University, Dept. of Biostatistics Working Papers 1-31-216 Censoring Unbiased Regression Trees and Ensembles Jon Arni Steingrimsson Department of Biostatistics, Johns Hopkins Bloomberg School

More information

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY 1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging

More information

Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Classification of Longitudinal Data Using Tree-Based Ensemble Methods Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen 29.06.2009 Overview 1 Ensemble classification of dependent observations 2 3 4 Classification of dependent observations

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms Marco Sandri and Paola Zuccolotto University of Brescia - Department of Quantitative Methods C.da Santa

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

Variable Selection and Weighting by Nearest Neighbor Ensembles

Variable Selection and Weighting by Nearest Neighbor Ensembles Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Random Forests for Ordinal Response Data: Prediction and Variable Selection

Random Forests for Ordinal Response Data: Prediction and Variable Selection Silke Janitza, Gerhard Tutz, Anne-Laure Boulesteix Random Forests for Ordinal Response Data: Prediction and Variable Selection Technical Report Number 174, 2014 Department of Statistics University of Munich

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

arxiv: v1 [math.st] 14 Mar 2016

arxiv: v1 [math.st] 14 Mar 2016 Impact of subsampling and pruning on random forests. arxiv:1603.04261v1 math.st] 14 Mar 2016 Roxane Duroux Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France roxane.duroux@upmc.fr Erwan Scornet

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m ) CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive

More information

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Gradient Boosting, Continued

Gradient Boosting, Continued Gradient Boosting, Continued David Rosenberg New York University December 26, 2016 David Rosenberg (New York University) DS-GA 1003 December 26, 2016 1 / 16 Review: Gradient Boosting Review: Gradient Boosting

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

Supervised Learning via Decision Trees

Supervised Learning via Decision Trees Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions

More information

Discriminative v. generative

Discriminative v. generative Discriminative v. generative Naive Bayes 2 Naive Bayes P (x ij,y i )= Y i P (y i ) Y j P (x ij y i ) P (y i =+)=p MLE: max P (x ij,y i ) a j,b j,p p = 1 N P [yi =+] P (x ij =1 y i = ) = a j P (x ij =1

More information

Machine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Machine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Machine Learning, 6, 81-92 (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note A Distance-Based Attribute Selection Measure for Decision Tree Induction R. LOPEZ

More information

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Statistical Consulting Topics Classification and Regression Trees (CART)

Statistical Consulting Topics Classification and Regression Trees (CART) Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given

More information

Introduction to Data Science Data Mining for Business Analytics

Introduction to Data Science Data Mining for Business Analytics Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Rule Generation using Decision Trees

Rule Generation using Decision Trees Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

COMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto

COMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto COMPSTAT2010 in Paris Ensembled Multivariate Adaptive Regression Splines with Nonnegative Garrote Estimator t Hiroki Motogaito Osaka University Masashi Goto Biostatistical Research Association, NPO. JAPAN

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation If Y {1,..., K} and g k denotes the density for the conditional distribution of X given Y = k the Bayes classifier is f (x) = argmax π k g k (x) k If ĝ k for k = 1,..., K are

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Regression tree for functional response : application in oceanology

Regression tree for functional response : application in oceanology Regression tree for functional response : application in oceanology David Nerini (a) Badih Ghattas (b) (a) Centre d Océanologie de Marseille UMR LMGEM 6117 CNRS Campus de Luminy, Case 901 13288 MARSEILLE

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting

More information

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods Orianna DeMasi, Juan Meza, David H. Bailey Lawrence Berkeley National Laboratory 1 Cyclotron

More information

Monitoring of Mineral Processing Operations based on Multivariate Similarity Indices

Monitoring of Mineral Processing Operations based on Multivariate Similarity Indices Monitoring of Mineral Processing Operations based on Multivariate Similarity Indices L. Auret, C. Aldrich* Department of Process Engineering, University of Stellenbosch, Private Bag X1, Matieland 7602,

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu

More information

arxiv: v5 [stat.me] 18 Apr 2016

arxiv: v5 [stat.me] 18 Apr 2016 Correlation and variable importance in random forests Baptiste Gregorutti 12, Bertrand Michel 2, Philippe Saint-Pierre 2 1 Safety Line 15 rue Jean-Baptiste Berlier, 75013 Paris, France arxiv:1310.5726v5

More information

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced

More information

Performance of Cross Validation in Tree-Based Models

Performance of Cross Validation in Tree-Based Models Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Structured Problems and Algorithms

Structured Problems and Algorithms Integer and quadratic optimization problems Dept. of Engg. and Comp. Sci., Univ. of Cal., Davis Aug. 13, 2010 Table of contents Outline 1 2 3 Benefits of Structured Problems Optimization problems may become

More information

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Contemporary Mathematics Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Robert M. Haralick, Alex D. Miasnikov, and Alexei G. Myasnikov Abstract. We review some basic methodologies

More information

Asset Pricing. Chapter IX. The Consumption Capital Asset Pricing Model. June 20, 2006

Asset Pricing. Chapter IX. The Consumption Capital Asset Pricing Model. June 20, 2006 Chapter IX. The Consumption Capital Model June 20, 2006 The Representative Agent Hypothesis and its Notion of Equilibrium 9.2.1 An infinitely lived Representative Agent Avoid terminal period problem Equivalence

More information

Harrison B. Prosper. Bari Lectures

Harrison B. Prosper. Bari Lectures Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information