REGRESSION TREE CREDIBILITY MODEL
|
|
- Joanna Gilmore
- 5 years ago
- Views:
Transcription
1 LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017
2 Overview Statistical }{{ Method } Regression Trees + Actuarial Model }{{} Credibility Model 1 / 23
3 Outline 1. CREDIBILITY MODEL 2. BENEFIT OF PARTITIONING THE DATA SPACE 3. REGRESSION TREE CREDIBILITY MODEL 4. SIMULATION STUDIES 5. AN APPLICATION TO US MEDICARE DATA 6. CONCLUDING REMARKS 2 / 23
4 Credibility Model Bühlmann-Straub Credibility Model Bühlmann-Straub Credibility Model CREDIBILITY THEORY has become the paradigm for insurance experience rating and widely used by actuaries. Bühlmann model (1967, 1969), and the Bühlmann-Straub model (1970). Consider a portfolio of I risks, where each individual risk i has n i years of claim experiences, i = 1, 2,..., I. Let Y i,j denote the claim ratio of individual risk i in year j, and m i,j be the associated volume measure, also known as weight variable. Collect all the claims experience of individual risk i into a vector Y i, i.e., Y i = (Y i,1,..., Y i,ni ) T. The profile of individual risk i is characterized by θ i, which is the realization of a random element Θ i (usually either a random variable or a random vector). 3 / 23
5 Credibility Model Bühlmann-Straub Credibility Model Assume that the following conditions are satisfied: H01. Conditionally given Θ i = θ i, {Y i,j : j = 1, 2,..., n i} are independent E[Y i,j Θ i = θ i] = µ(θ i) and Var[Y i,j Θ i = θ i] = σ2 (θ i) m i,j for some unknown but deterministic functions µ( ) and σ 2 ( ); H02. The pairs (Θ 1, Y 1),..., (Θ I, Y I) are independent, and {Θ 1,..., Θ I} are independent and identically distributed. Define structural parameters σ 2 = E[σ 2 (Θ i )] and τ 2 = Var[µ(Θ i )] for risks within the collective I := {1, 2,..., I}, and denote n i m i = m i,j, m = j=1 I m i, Y i = i=1 n i j=1 m i,j m i Y i,j, Y = I i=1 m i m Y i, Let µ = E[Y i,j ] denote the collective net premium. We are interested an estimator ˆµ(Θ i ) of µ(θ i ), is called the correct individual premium of the individual risk (the fair risk premium), such that it makes the following quadratic loss as small as possible E [ (ˆµ(Θ i ) µ(θ i )) 2]. 4 / 23
6 Credibility Model Inhomogeneous Credibility Premium Inhomogeneous Credibility Premium The (inhomogeneous) credibility premium for an individual risk is defined as the best premium predictor P i among the class I Q : Q = a n i 0 + a i,j Y i,j, a 0, a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )µ, (1) where α i = is called credibility factor for an individual risk. m i m i + σ 2 /τ 2 (2) 5 / 23
7 Credibility Model Inhomogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by The minimum quadratic loss for E σ 2 L i = m i + σ 2 /τ 2. (3) L i = [ (Q i Y i,n+1 ) 2] is given by σ 2 m i + σ 2 /τ 2 + σ2. (4) 6 / 23
8 Credibility Model Homogeneous Credibility Premium Homogeneous Credibility Premium The homogeneous credibility premium for an individual risk i from the collective I is defined as the best premium predictor P i among the class I Q n i i : Q i = a i,j Y i,j, E[Q i ] = E[µ(Θ i )], a i,j R i=1 j=1 to minimize the quadratic loss E [(Q i µ(θ i )) 2] and its formula is given by P i = α i Y i + (1 α i )Y, (5) where α i = is called credibility factor for individual risk i. m i m i + σ 2 /τ 2 (6) 7 / 23
9 Credibility Model Homogeneous Credibility Premium [ The corresponding minimum quadratic loss for E (Q i µ(θ i )) 2] is given by where α = I i=1 α i. The minimum quadratic loss for E where α = I i=1 α i. ( L i = τ 2 (1 α i ) α ) i, (7) α [ (Q i Y i,n+1 ) 2] is given by ( L i = τ 2 (1 α i ) α ) i + σ 2, (8) α 8 / 23
10 Benefit of Partitioning the Collective Better if we partition the collective? 9 / 23
11 Benefit of Partitioning the Collective The answer is artificially YES but genuinely not necessary For n i = n and m i,j = 1 for and i = 1,..., I and j = 1,..., n. Consider the loss function of homogenous premium and µ(θ i ). Arbitrarily partition a given collective of individuals into two sub-collectives, and then apply credibility formula separately for each of the two resulting sub-collectives. Let L 1 and L 2 be the total credibility losses we have for the two sub-collectives respectively, while L denotes the credibility loss with premium prediction apply with the whole collective without any partitioning. We formally proved L 1 + L 2 L. The above result relies on an artificial assumption: We know the credibility factor α for each sub-collective". The premium prediction is given by P i = α i Y i + (1 α i )Y, where α i is an estimator for α i, i = 1,..., I. 10 / 23
12 Regression Tree Credibility Model Overview of Regression Trees (Breiman, Friedman, Stone and Olshen, 1984) CLASSIFICATION AND REGRESSION TREES (CART; Breiman, Friedman, Stone and Olshen, 1984) and RANDOM FORESTS (RF; Breiman, 2001) are the most popular single-tree and ensemble recursive partitioning methods respectively. Any tree building process can be broadly described in three steps: 1 Choosing a criterion for making splitting decisions; 2 Generating a corresponding sequence of candidate trees; 3 Selecting best candidate tree. It s common to allow each step to depend on a given loss function Prevailing software implementation of CART: rpart by R 11 / 23
13 Regression Tree Credibility Model Covariate-Dependent Model Setup Consider a portfolio of I risks numbered with 1,..., I. Let Y i = (Y i,1,..., Y i,ni ) T be the vector of claim ratios, m i = (m i,1,..., m i,ni ) be the corresponding weight vector, and X i = (X i,1,..., X i,p ) T be the covariate vector associated with individual risk i, i = 1,..., I. The risk profile of each individual risk i is characterized by a scalar θ i, which is a realization of a random element Θ i. The following two conditions are further assumed: H11. The triplets (Θ 1, Y 1, X 1),..., (Θ I, Y I, X I) are independent; H12. Conditionally given Θ i = θ i and X i = x i, the entries Y i,j, j = 1,..., n, are independent with E[Y i,j X i = x i, Θ i = θ i] = µ(x i, θ i) and Var[Y i,j X i = x i, Θ i = θ i] = σ2 (x i, θ i) m i,j for some unknown but deterministic functions µ(, ) and σ 2 (, ). 12 / 23
14 Regression Tree Credibility Model We approximate µ(x i, Θ i) and σ 2 (X i, Θ i): K I{X i A k}µ (k) (Θ i), and k=1 K k=1 I{X i A k} σ2 (k)(θ i) m i,j, where {A 1, A 2,..., A K} is a partition of the covariate space µ (k) (Θ i) and σ 2 (k)(θ i) respectively represent the net premium and the variance of an individual risk i from the kth sub-collective with a risk profile Θ i, i.e., µ (k) (θ i) = E(Y i,j X i A k, Θ i = θ i) and σ 2 (k)(θ i) = Var(Y i,j X i A k, Θ i = θ i). The condition of X i A k means that the individual risk i is classified into the kth sub-collective based on its covariate information. 13 / 23
15 Regression Tree Credibility Model Regression Tree Credibility Premium We target to find a Good" partition {A 1, A 2,..., A K } and apply credibility formula for each sub-collective A i separately. By [ Good", we should minimize the true prediction error ( ) ] 2 E µ(θ i ) P i as much as possible Credibility Regression Trees Adopt one of the four credibility loss function with plugged-in estimates of structure parameters; Invent a heuristic longitudinal cross-validation. Credibility formula is applied for premium prediction for each terminal node separately, which is Regression Tree Credibility Premium. 14 / 23
16 Simulation Studies Simulation Studies Consider a collective of I = 300 individual risks For each i = 1,..., I, independently simulate covariate vector when p = 10 and p = 50. X i = (X i,1,..., X i,p ) i.i.d. U{1,..., 100} p Balanced Claims Model: individual risks with n = 5, 10 and 20 years of claims experience, respectively. Unbalanced claims model is also explored. 1, 000 independent samples 15 / 23
17 Simulation Studies Simulation Schemes Simulation Scheme 1 Interaction effect For each i = 1,..., I, independently simulate ε i,j from a given distribution function F( ), for j = 1,..., n, and define the n claims of individual risk i as where Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (9) f (X i ) = 0.01 ( X i,1 + 2X i,2 X i,3 + 2 X i,1 X i,3 X i,2 X i,4 ). (10) In our simulation, F takes one of the following three distributions: EXP(1.6487) LN (0, 1) PAR(3, ) 16 / 23
18 Simulation Studies Simulation Schemes Simulation Scheme 2 Interactive effect+heterogeneous variance For each i = 1,..., I, independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariate X i of the individual risk i, for j = 1,..., n, and define the n claims of individual risk i as Y i,j = e f (Xi) + ε i,j, j = 1,..., n, (11) where f (X i ) is given by (10). We respectively consider three distinct distributions for F( ; X i ): (1) EXP ( e γ(xi)/2), (2) LN(0, γ(x i )), (3) PAR(3, 2e γ(xi)/2 ), (12) where γ(x i ) = Xi,1 X i,2 + X i,1 X i,2. 17 / 23
19 Simulation Studies Simulation Schemes Simulation Scheme 3 Interactive effect + heterogeneous variance + multiplication random effect For each i = 1,..., I, (1) independently simulate random effect variable Θ i from the uniform distribution U(0.9, 1.1) (2) independently simulate ε i,j from a distribution function F( ; X i ) which depends on the covariates X i associated with risk i, for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i ) is given by (10). Y i,j = Θ i [ e f (Xi) + ε i,j ], j = 1,..., n, (13) We consider each of the three distributions described in Scheme 2 for the distribution F( ; X i ). 18 / 23
20 Simulation Studies Simulation Schemes Simulation Scheme 4 Interactive effect + heterogeneous variance + complex random effect structure For each i = 1,..., I, Θ i = (ξ i,1, ξ i,2 ) T, (1) independently simulate random effect variables ξ i,1 and ξ i,2 from the uniform distribution U(0.9, 1.1). (2) independently simulate ε i,j from a distribution function F( ; X i, ξ i,2) for j = 1,..., n, and (3) define the n claims of individual risk i as where f (X i) is defined in (10). Y i,j = e ξ i,1 f (X i ) + ε i,j, j = 1,..., n, (14) We respectively consider the three distributions as defined in equation (12) in Scheme 2 with γ(x i ) replaced by ξ i,2 γ(x i ) so that the distribution of ε i,j depends on the random effect variable ξ i,2 in addition to covariate variable X i. 19 / 23
21 Simulation Studies Methods Compared Data-driven covariate-dependent partitioning: For each simulated sample, we grow and prune regression trees built using four credibility loss functions (R1-4) and L 2 (RL 2)loss function and the best tree is selected using longitudinal cross-validation. In addition, we consider ad hoc covariate-dependent partitioning, which is defined via the following notation: { } R(X j) = {i I : X i,j 50}, {i I : X i,j > 50}, j = 1,, 5. and { R(X j1, X j2 ) = {i I : X i,j1 50, X i,j2 50}, {i I : X i,j1 50, X i,j2 > 50}, } {i I : X i,j1 > 50, X i,j2 50}, {i I : X i,j1 > 50, X i,j2 > 50}, for j 1, j 2 = 1,..., 5 with j 1 j 2. We consider R(X 2), R(X 4), R(X 1, X 2, X 3), R(X 1, X 2, X 4), R(X 2, X 3, X 4), R(X 1, X 3, X 4), and R(X 1, X 2, X 3, X 4). 20 / 23
22 Simulation Studies Evaluation Metric Prediction error for a given partitioning {I 1,..., I K }: PE = 1 I I K i=1 k=1 where π (H)(k) i is the resulting premium prediction µ(x i, Θ i) is the true net premium The collective prediction error: PE 0 = 1 I ( 2 I{X i I k } π (H)(k) i µ(x i, Θ i )), (15) I i=1 ( P (H) i µ(x i, Θ i )) 2, which does not use any covariate information and is anticipated to underperform compared to various kinds of covariate-dependent partitioning. The relative prediction error (RPE): R = PE/PE / 23
23 Simulation Studies Simulation Results 21 / 23
24 Concluding Remarks 22 / 23
25 Concluding Remarks Concluding Remarks We propose novel regression tree credibility (RTC) model, and bring machine learning techniques into the framework of credibility theory to enhance the prediction accuracy of credibility premium. In our proposed model, no ex ante analysis on the relationship between individual net premium and covariate variables is necessary, and the designed regression tree algorithm automatically selects influential covariate variables and informative cutting points to form a partition of data space, upon which a well-performed premium prediction rule can be consequently established. Our simulation studies and data analysis show that the proposed RTC model performs very well compared to no partitioning, ad-hoc partitioning and the L 2 loss based binary partition procedure. 22 / 23
26 Concluding Remarks Although only the Classification and Regression Trees is introduced in this paper, it will be fruitful to pursue further research by considering other recursive partitioning methods, e.g., partdsa (partitioning deletion/substitution/addition algorithm) and MARS (multivariate adaptive regression splines) It will be even more promising to consider the applications of ensemble algorithms, such as bagging, boosting, and random forests. It will be useful to develop an algorithm which can adopt time-dependent covaraites. It will be even more fruitful to consider their applications in various other insurance problems in addition to the premium rating, since many practical insurance problems amount to quantifying the relationship between insureds claims and their demographic information. It is the authors hope that the present paper will stimulate more actuarial applications of these machine learning techniques and eventually contribute to the development of insurance predictive analytics in general. 23 / 23
Regression Tree Credibility Model. Liqun Diao, Chengguo Weng
Regression Tree Credibility Model Liqun Diao, Chengguo Weng Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, Canada Version: Sep 16, 2016 Abstract Credibility
More informationThe Credibility Estimators with Dependence Over Risks
Applied Mathematical Sciences, Vol. 8, 2014, no. 161, 8045-8050 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.410803 The Credibility Estimators with Dependence Over Risks Qiang Zhang
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationA few basics of credibility theory
A few basics of credibility theory Greg Taylor Director, Taylor Fry Consulting Actuaries Professorial Associate, University of Melbourne Adjunct Professor, University of New South Wales General credibility
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationInfluence measures for CART
Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationExperience Rating in General Insurance by Credibility Estimation
Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new
More informationCensoring Unbiased Regression Trees and Ensembles
Johns Hopkins University, Dept. of Biostatistics Working Papers 1-31-216 Censoring Unbiased Regression Trees and Ensembles Jon Arni Steingrimsson Department of Biostatistics, Johns Hopkins Bloomberg School
More informationRANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY
1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging
More informationClassification of Longitudinal Data Using Tree-Based Ensemble Methods
Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen 29.06.2009 Overview 1 Ensemble classification of dependent observations 2 3 4 Classification of dependent observations
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationVapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012
Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationAnalysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms
Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms Marco Sandri and Paola Zuccolotto University of Brescia - Department of Quantitative Methods C.da Santa
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationVariable Selection and Weighting by Nearest Neighbor Ensembles
Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationRandom Forests for Ordinal Response Data: Prediction and Variable Selection
Silke Janitza, Gerhard Tutz, Anne-Laure Boulesteix Random Forests for Ordinal Response Data: Prediction and Variable Selection Technical Report Number 174, 2014 Department of Statistics University of Munich
More informationRegression tree-based diagnostics for linear multilevel models
Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationarxiv: v1 [math.st] 14 Mar 2016
Impact of subsampling and pruning on random forests. arxiv:1603.04261v1 math.st] 14 Mar 2016 Roxane Duroux Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France roxane.duroux@upmc.fr Erwan Scornet
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive
More informationProbabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS
University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationGradient Boosting, Continued
Gradient Boosting, Continued David Rosenberg New York University December 26, 2016 David Rosenberg (New York University) DS-GA 1003 December 26, 2016 1 / 16 Review: Gradient Boosting Review: Gradient Boosting
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationSupervised Learning via Decision Trees
Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions
More informationDiscriminative v. generative
Discriminative v. generative Naive Bayes 2 Naive Bayes P (x ij,y i )= Y i P (y i ) Y j P (x ij y i ) P (y i =+)=p MLE: max P (x ij,y i ) a j,b j,p p = 1 N P [yi =+] P (x ij =1 y i = ) = a j P (x ij =1
More informationMachine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Machine Learning, 6, 81-92 (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note A Distance-Based Attribute Selection Measure for Decision Tree Induction R. LOPEZ
More informationStatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech
StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationStatistical Consulting Topics Classification and Regression Trees (CART)
Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given
More informationIntroduction to Data Science Data Mining for Business Analytics
Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationCOMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto
COMPSTAT2010 in Paris Ensembled Multivariate Adaptive Regression Splines with Nonnegative Garrote Estimator t Hiroki Motogaito Osaka University Masashi Goto Biostatistical Research Association, NPO. JAPAN
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationKernel Density Estimation
Kernel Density Estimation If Y {1,..., K} and g k denotes the density for the conditional distribution of X given Y = k the Bayes classifier is f (x) = argmax π k g k (x) k If ĝ k for k = 1,..., K are
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationRegression tree for functional response : application in oceanology
Regression tree for functional response : application in oceanology David Nerini (a) Badih Ghattas (b) (a) Centre d Océanologie de Marseille UMR LMGEM 6117 CNRS Campus de Luminy, Case 901 13288 MARSEILLE
More informationStat 587: Key points and formulae Week 15
Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More information15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting
More informationDimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods
Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods Orianna DeMasi, Juan Meza, David H. Bailey Lawrence Berkeley National Laboratory 1 Cyclotron
More informationMonitoring of Mineral Processing Operations based on Multivariate Similarity Indices
Monitoring of Mineral Processing Operations based on Multivariate Similarity Indices L. Auret, C. Aldrich* Department of Process Engineering, University of Stellenbosch, Private Bag X1, Matieland 7602,
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationBINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES
BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationJEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA
1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu
More informationarxiv: v5 [stat.me] 18 Apr 2016
Correlation and variable importance in random forests Baptiste Gregorutti 12, Bertrand Michel 2, Philippe Saint-Pierre 2 1 Safety Line 15 rue Jean-Baptiste Berlier, 75013 Paris, France arxiv:1310.5726v5
More informationBayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects
Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced
More informationPerformance of Cross Validation in Tree-Based Models
Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationStructured Problems and Algorithms
Integer and quadratic optimization problems Dept. of Engg. and Comp. Sci., Univ. of Cal., Davis Aug. 13, 2010 Table of contents Outline 1 2 3 Benefits of Structured Problems Optimization problems may become
More informationPattern Recognition Approaches to Solving Combinatorial Problems in Free Groups
Contemporary Mathematics Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Robert M. Haralick, Alex D. Miasnikov, and Alexei G. Myasnikov Abstract. We review some basic methodologies
More informationAsset Pricing. Chapter IX. The Consumption Capital Asset Pricing Model. June 20, 2006
Chapter IX. The Consumption Capital Model June 20, 2006 The Representative Agent Hypothesis and its Notion of Equilibrium 9.2.1 An infinitely lived Representative Agent Avoid terminal period problem Equivalence
More informationHarrison B. Prosper. Bari Lectures
Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More information