Unbiased Recursive Partitioning I: A Non-parametric Conditional Inference Framework
|
|
- Ann Wells
- 5 years ago
- Views:
Transcription
1 Unbiased Recursive Partitioning I: A Non-parametric Conditional Inference Framework Torsten Hothorn, Kurt Hornik 2 and Achim Zeileis 2 Friedrich-Alexander-Universität Erlangen-Nürnberg 2 Wirtschaftsuniversität Wien
2 Recursive Partitioning Morgan and Sonquist (963): Automated Interaction Detection. Many variants have been (and still are) published, the majority of which are special cases of a simple two-stage algorithm: Step : partition the observations by univariate splits in a recursive way Step 2: fit a constant model in each cell of the resulting partition. Most prominent representatives: CART (Breiman et al., 984) and C4.5 (Quinlan, 993), both implementing an exhaustive search.
3 Two Fundamental Problems Overfitting: Mingers (987) notes that the algorithm [... ] has no concept of statistical significance, and so cannot distinguish between a significant and an insignificant improvement in the information measure. Selection Bias: Exhaustive search methods suffer from a severe selection bias towards covariates with many possible splits and / or missing values.
4 A Statistical Approach We enter at the point where White and Liu (994) demand for [... ] a statistical approach [to recursive partitioning] which takes into account the distributional properties of the measures. and present a unified framework embedding recursive binary partitioning into the well defined theory of Part I: permutation tests developed by Strasser and Weber (999), Part II: tests for parameter instability in (parametric) regression models.
5 The Regression Problem The distribution of a (possibly multivariate) response Y Y is to be regressed on a m-dimensional covariate vector X = (X,..., X m ) X = X X m : D(Y X) = D(Y X,..., X m ) = D(Y f(x,..., X m )), based on a learning sample of n observations L n = {(Y i, X i,..., X mi ); i =,..., n}. possibly with case counts w = (w,..., w n ). We are interested in estimating f.
6 A Generic Algorithm. For case weights w test the global null hypothesis of independence between any of the m covariates and the response. Stop if this hypothesis cannot be rejected. Otherwise select the covariate X j with strongest association to Y. 2. Choose a set A X j in order to split X j into two disjoint sets A and X j \ A. The case weights w left and w right determine the two subgroups with w left,i = w i I(X j i A ) and w right,i = w i I(X j i A ) for all i =,..., n (I( ) denotes the indicator function). 3. Recursively repeat steps and 2 with modified case weights w left and w right, respectively.
7 Recursive Partitioning by Conditional Inference In each node identified by case weights w, the global hypothesis of independence is formulated in terms of the m partial hypotheses H j : D(Y X j) = D(Y) with global null hypothesis H = m j= Hj. Stop recursion when H can not be rejected at a pre-specified level α. Otherwise: Measure the association between Y and each of the covariates X j, j =,..., m, by test statistics or P -values indicating the deviation from the partial hypotheses H j.
8 Use a (multivariate) linear statistic Linear Statistics ( n ) T j (L n, w) = vec w i g j (X ji )h(y i, (Y,..., Y n )) i= R p jq with g j a transformation of X j and influence function h : Y Y n R q. This type of statistics was suggested by Strasser and Weber (999).
9 Conditional Expectation and Covariance under H j (( n ) ) µ j = E(T j (L n, w) S(L n, w)) = vec w i g j (X ji ) E(h S(L n, w)), Σ j = V(T j (L n, w) S(L n, w)) = w w V(h S(L n, w)) i= ( ) w i g j (X ji ) w i g j (X ji ) i ( ) ( w V(h S(L n, w)) w i g j (X ji ) w i g j (X ji ) with w = n i= w i. i i )
10 Test Statistics A (multivariate) linear statistic T j can now be used to construct a test statistic for testing H j, for example via or c max (t, µ, Σ) = max k=,...,pq (t µ) k (Σ)kk c quad (t, µ, Σ) = (t µ)σ + (t µ)
11 Variable Selection and Stopping Criteria Test H based on P,..., P m, P j = P j H (c(t j (L n, w), µ j, Σ j ) c(t j, µ j, Σ j ) S(L n, w)) conditional on all permutations S(L n, w) of the data. problem. This solves the overfitting When we can reject H in step of the generic algorithm we select the covariate with minimum P -value P j = P j H (c(t j (L n, w), µ j, Σ j ) c(t j, µ j, Σ j ) S(L n, w)) of the conditional test for H j. This prevents a variable selection bias.
12 Splitting Criteria The goodness of a split is evaluated by two-sample linear statistics which are special cases of the linear statistic T. For all possible subsets A of the sample space X j the linear statistic ( n ) T A j (L n, w) = vec w i I(X j i A)h(Y i, (Y,..., Y n )) i= induces a two-sample statistic and we implement the best split A = argmax A c(t A j, µa j, ΣA j ).
13 Examples: Tree Pipit Abundance The occurrence of tree pipits was recorded several times at n = 86 stands and the impact of nine environmental factors on the abundance is of special interest here. R> library("party") R> data("treepipit", package = "coin") R> tptree <- ctree(counts ~., data = treepipit) R> plot(tptree, terminal_panel = node_hist(tptree, + breaks = :6 -.5, ymax = 65, horizontal = FALSE, + freq = TRUE))
14 coverstorey p =.2 4 > 4 6 Node 2 (n = 24) 6 Node 3 (n = 62)
15 Examples: Glaucoma & Laser Scanning Images Laser scanning images taken from the eye background are expected to serve as the basis of an automated system for glaucoma diagnosis. For 98 patients and 98 controls, matched by age and gender, 62 covariates describing the eye morphology are available. R> data("glaucomam", package = "ipred") R> gtree <- ctree(class ~., data = GlaucomaM) R> plot(gtree)
16 vari p <. 2 vasg p <..59 >.59 3 vart p =..46 >.46.5 >.5 Node 4 (n = 5) Node 5 (n = 22) Node 6 (n = 4) Node 7 (n = 9) glaucoma normal glaucoma normal glaucoma normal glaucoma normal
17 Examples: Glaucoma & Laser Scanning Images Interested in the class distribution in each inner node? Want to explore the process of the split statistics in each inner node? R> plot(gtree, inner_panel = node_barplot)
18 vari (p <.) Node (n = 96) glaucoma normal.59 > vasg (p <.) Node 2 (n = 87) glaucoma normal.46 > vart (p =.) Node 3 (n = 73) glaucoma normal.5 > Node 4 (n = 5) Node 5 (n = 22) Node 6 (n = 4) Node 7 (n = 9) glaucoma normal glaucoma normal glaucoma normal glaucoma normal
19 Examples: Glaucoma & Laser Scanning Images R> cex <-.6 R> inner <- nodes(gtree, :3) R> layout(matrix(:length(inner), ncol = length(inner))) R> out <- sapply(inner, function(i) { + splitstat <- i$psplit$splitstatistic + x <- GlaucomaM[[i$psplit$variableName]][splitstat > + ] + plot(x, splitstat[splitstat > ], main = paste("node", + i$nodeid), xlab = i$psplit$variablename, + ylab = "Statistic", ylim = c(, ), + cex.axis = cex, cex.lab = cex, cex.main = cex) + abline(v = i$psplit$splitpoint, lty = 3) + })
20 Node vari Statistic Node 2 vasg Statistic Node 3 vart Statistic
21 Examples: Node Positive Breast Cancer Evaluation of prognostic factors for the German Breast Cancer Study Group (GBSG2) data, a prospective controlled clinical trial on the treatment of node positive breast cancer patients. Complete data of seven prognostic factors of 686 women are used for prognostic modeling. R> data("gbsg2", package = "ipred") R> stree <- ctree(surv(time, cens) ~., data = GBSG2) R> plot(stree)
22 pnodes p <. 3 > 3 2 horth p =.35 5 progrec p <. no yes 2 > 2 Node 3 (n = 248) Node 4 (n = 28) Node 6 (n = 44) Node 7 (n = 66)
23 Examples: Mammography Experience Ordinal response variables are common in investigations where the response is a subjective human interpretation. We use an example given by Hosmer and Lemeshow (2), p. 264, studying the relationship between the mammography experience (never, within a year, over one year) and opinions about mammography expressed in questionnaires answered by n = 42 women. R> data("mammoexp", package = "party") R> mtree <- ctree(me ~., data = mammoexp, scores = list(me = :3, + SYMPT = :4, DECT = :3)) R> plot(mtree)
24 SYMPT p <. Agree > Agree 3 PB p =.2 8 > 8 Node 2 (n = 3) Node 4 (n = 28) Node 5 (n = 9) Never Within a YearOver a Year Never Within a YearOver a Year Never Within a YearOver a Year
25 Benchmark Experiments Hypothesis : Conditional inference trees with statistical stop criterion perform as good as an exhaustive search algorithm with pruning. Hypothesis 2: Conditional inference trees with statistical stop criterion perform as good as parametric unbiased recursive partitioning (QUEST,GUIDE, Loh, 22, is a starting point). Equivalence measured by ratio of misclassification or mean squared error with a equivalence margin of ±% and Fieller confidence intervals.
26 ctree vs. rpart Boston Housing Breast Cancer Diabetes.735 (.7,.76).955 (.93,.979) *.2 (.3,.28) * Glass.95 (.8,.8) Glaucoma Ionosphere Ozone Servo Sonar.43 (.23,.64) *.93 (.892,.934).834 (.84,.855).962 (.949,.975) *.96 (.944,.976) * Soybean.53 (.2,.85) * Vehicle.997 (.989,.4) * Vowel.99 (.92,.95) * Performance ratio
27 ctree vs. QUEST / GUIDE Boston Housing.9 (.886,.95) Breast Cancer Diabetes Glass.945 (.923,.967) *.999 (.99,.7) *.2 (.9,.32) * Glaucoma.6 (.987,.24) * Ionosphere Ozone Servo Sonar.5 (.489,.53).932 (.92,.944) *.958 (.939,.978) *.955 (.94,.97) * Soybean.495 (.47,.59) Vehicle Vowel.5 (.4,.58) *.86 (.76,.96) * Performance ratio
28 Summary The separation of variable selection and split point estimation first implemented in CHAID (Kass, 98) is the basis for unbiased recursive partitioning for responses and covariates measured at arbitrary scales. The statistical internal stop criterion ensures that interpretations drawn from such trees are valid in a statistical sense, i.e., with appropriate control of type I errors. Even the algorithm has no concept of prediction error, the performance is at least equivalent to established procedures. We are committed to reproducible research, see R> vignette(package = "party")
29 References L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth, California, 984. David W. Hosmer and Stanley Lemeshow. Applied Logistic Regression. John Wiley & Sons, New York, 2nd edition, 2. G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):9 27, 98. Wei-Yin Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 2:36 386, 22. John Mingers. Expert systems rule induction with statistical data. Journal of the Operations Research Society, 38():39 47, 987. James N. Morgan and John A. Sonquist. Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58:45 434, 963. John R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publ., San Mateo, California, 993. Helmut Strasser and Christian Weber. On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics, 8:22 25, 999. Allan P. White and Wei Zhong Liu. Bias in information-based measures in decision tree induction. Machine Learning, 5:32 329, 994.
The Design and Analysis of Benchmark Experiments Part II: Analysis
The Design and Analysis of Benchmark Experiments Part II: Analysis Torsten Hothorn Achim Zeileis Friedrich Leisch Kurt Hornik Friedrich Alexander Universität Erlangen Nürnberg http://www.imbe.med.uni-erlangen.de/~hothorn/
More informationGrowing a Large Tree
STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................
More informationIndividual Treatment Effect Prediction Using Model-Based Random Forests
Individual Treatment Effect Prediction Using Model-Based Random Forests Heidi Seibold, Achim Zeileis, Torsten Hothorn https://eeecon.uibk.ac.at/~zeileis/ Motivation: Overall treatment effect Base model:
More informationRegression tree methods for subgroup identification I
Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem
More informationSequential/Adaptive Benchmarking
Sequential/Adaptive Benchmarking Manuel J. A. Eugster Institut für Statistik Ludwig-Maximiliams-Universität München Validation in Statistics and Machine Learning, 2010 1 / 22 Benchmark experiments Data
More informationARBEITSPAPIERE ZUR MATHEMATISCHEN WIRTSCHAFTSFORSCHUNG
ARBEITSPAPIERE ZUR MATHEMATISCHEN WIRTSCHAFTSFORSCHUNG Some Remarks about the Usage of Asymmetric Correlation Measurements for the Induction of Decision Trees Andreas Hilbert Heft 180/2002 Institut für
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationThe Design and Analysis of Benchmark Experiments
University of Wollongong Research Online Faculty of Commerce - Papers Archive) Faculty of Business 25 The Design and Analysis of Benchmark Experiments Torsten Hothorn University of Erlangen-Nuremberg Friedrich
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More information2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University
2018 CS420, Machine Learning, Lecture 5 Tree Models Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html ML Task: Function Approximation Problem setting
More informationComputing and using the deviance with classification trees
Computing and using the deviance with classification trees Gilbert Ritschard Dept of Econometrics, University of Geneva Compstat, Rome, August 2006 Outline 1 Introduction 2 Motivation 3 Deviance for Trees
More informationA Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index
A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,
More informationDyadic Classification Trees via Structural Risk Minimization
Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract
More informationStatistical Consulting Topics Classification and Regression Trees (CART)
Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given
More informationInduction of Decision Trees
Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationAbstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand
Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationA Lego System for Conditional Inference
A Lego System for Conditional Inference Torsten Hothorn 1, Kurt Hornik 2, Mark A. van de Wiel 3, Achim Zeileis 2 1 Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationAn Introduction to Multivariate Statistical Analysis
An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationBoulesteix: Maximally selected chi-square statistics and binary splits of nominal variables
Boulesteix: Maximally selected chi-square statistics and binary splits of nominal variables Sonderforschungsbereich 386, Paper 449 (2005) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Maximally
More informationResampling Methods CAPT David Ruth, USN
Resampling Methods CAPT David Ruth, USN Mathematics Department, United States Naval Academy Science of Test Workshop 05 April 2017 Outline Overview of resampling methods Bootstrapping Cross-validation
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationMachine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Machine Learning, 6, 81-92 (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note A Distance-Based Attribute Selection Measure for Decision Tree Induction R. LOPEZ
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationCS 543 Page 1 John E. Boon, Jr.
CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationRANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY
1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationClassification and regression trees
Classification and regression trees Pierre Geurts p.geurts@ulg.ac.be Last update: 23/09/2015 1 Outline Supervised learning Decision tree representation Decision tree learning Extensions Regression trees
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationDecision Trees.
. Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationProbabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS
University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities
More informationA regression tree approach to identifying subgroups with differential treatment effects
A regression tree approach to identifying subgroups with differential treatment effects arxiv:1410.1932v1 [stat.me] 7 Oct 2014 Wei-Yin Loh loh@stat.wisc.edu Department of Statistics, University of Wisconsin
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationGeneralized Maximally Selected Statistics
Generalized Maximally Selected Statistics Torsten Hothorn Ludwig-Maximilians-Universität München Achim Zeileis Wirtschaftsuniversität Wien Abstract Maximally selected statistics for the estimation of simple
More informationMaximally selected chi-square statistics for at least ordinal scaled variables
Maximally selected chi-square statistics for at least ordinal scaled variables Anne-Laure Boulesteix anne-laure.boulesteix@stat.uni-muenchen.de Department of Statistics, University of Munich, Akademiestrasse
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationJEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA
1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu
More informationAijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.
Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present
More informationLearning From Crowds. Presented by: Bei Peng 03/24/15
Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y
More informationREGRESSION TREE CREDIBILITY MODEL
LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationTree-based methods. Patrick Breheny. December 4. Recursive partitioning Bias-variance tradeoff Example Further remarks
Tree-based methods Patrick Breheny December 4 Patrick Breheny STA 621: Nonparametric Statistics 1/36 Introduction Trees Algorithm We ve seen that local methods and splines both operate locally either by
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationM chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng
1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis
More informationGeneralization Error on Pruning Decision Trees
Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationSupplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests
Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,
More informationInformation Theory & Decision Trees
Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationepub WU Institutional Repository
epub WU Institutional Repository Torsten Hothorn and Achim Zeileis Generalized Maximally Selected Statistics Paper Original Citation: Hothorn, Torsten and Zeileis, Achim (2007) Generalized Maximally Selected
More informationDecision Trees.
. Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationTREE-BASED METHODS FOR SURVIVAL ANALYSIS AND HIGH-DIMENSIONAL DATA. Ruoqing Zhu
TREE-BASED METHODS FOR SURVIVAL ANALYSIS AND HIGH-DIMENSIONAL DATA Ruoqing Zhu A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationML techniques. symbolic techniques different types of representation value attribute representation representation of the first order
MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive
More informationKernel Density Estimation
Kernel Density Estimation If Y {1,..., K} and g k denotes the density for the conditional distribution of X given Y = k the Bayes classifier is f (x) = argmax π k g k (x) k If ĝ k for k = 1,..., K are
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationArtificial Intelligence Roman Barták
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationRandom Forests: Finding Quasars
This is page i Printer: Opaque this Random Forests: Finding Quasars Leo Breiman Michael Last John Rice Department of Statistics University of California, Berkeley 0.1 Introduction The automatic classification
More informationarxiv: v1 [cs.ds] 28 Sep 2018
Minimization of Gini impurity via connections with the k-means problem arxiv:1810.00029v1 [cs.ds] 28 Sep 2018 Eduardo Laber PUC-Rio, Brazil laber@inf.puc-rio.br October 2, 2018 Abstract Lucas Murtinho
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationVapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012
Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationCSCI 5622 Machine Learning
CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor
More informationPackage CoxRidge. February 27, 2015
Type Package Title Cox Models with Dynamic Ridge Penalties Version 0.9.2 Date 2015-02-12 Package CoxRidge February 27, 2015 Author Aris Perperoglou Maintainer Aris Perperoglou
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationBias Correction in Classification Tree Construction ICML 2001
Bias Correction in Classification Tree Construction ICML 21 Alin Dobra Johannes Gehrke Department of Computer Science Cornell University December 15, 21 Classification Tree Construction Outlook Temp. Humidity
More informationThe Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling
The Lady Tasting Tea More Predictive Modeling R. A. Fisher & the Lady B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea Fisher was skeptical that she could distinguish
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationDetection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees
arxiv:1511.07178v1 [stat.me] 23 Nov 2015 Detection of Uniform and Non-Uniform Differential Functioning by Focussed Trees Moritz Berger & Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationAdaptive Trial Designs
Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees Andrés Cano and Andrés R. Masegosa and Serafín Moral Department of Computer Science and Artificial Intelligence University of Granada
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More information