Introduction to Machine Learning 2008/B Assignment 5

Similar documents
Machine Learning 2nd Edition

Holdout and Cross-Validation Methods Overfitting Avoidance

the tree till a class assignment is reached

Machine Learning 2nd Edi7on

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Lecture 7: DecisionTrees

Jeffrey D. Ullman Stanford University

Informal Definition: Telling things apart

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Nonlinear Classification

Decision Trees Part 1. Rao Vemuri University of California, Davis

CHAPTER-17. Decision Tree Induction

Lecture 3: Decision Trees

EECS 349:Machine Learning Bryan Pardo

Decision Tree Learning Lecture 2

C4.5 - pruning decision trees

Classification: Decision Trees

CS145: INTRODUCTION TO DATA MINING

Notes on Machine Learning for and

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Data Mining and Analysis: Fundamental Concepts and Algorithms

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning & Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Tree Learning

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Decision Trees. Gavin Brown

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Machine Learning 3. week

Lecture 3: Decision Trees

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Final Exam, Machine Learning, Spring 2009

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Chapter 6: Classification

SF2930 Regression Analysis

Decision trees. Decision tree induction - Algorithm ID3

Empirical Risk Minimization, Model Selection, and Model Assessment

Chapter 3: Decision Tree Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Induction of Decision Trees

Data classification (II)

Introduction to Machine Learning. Figures. Ethem Alpaydın

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Decision Trees.

CS 6375 Machine Learning

Decision Tree Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Classification: Decision Trees

Decision Tree Learning

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Randomized Decision Trees

Midterm: CS 6375 Spring 2015 Solutions

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

day month year documentname/initials 1

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Machine Learning Alternatives to Manual Knowledge Acquisition

Decision Tree And Random Forest

Midterm: CS 6375 Spring 2018

Decision Trees. Tirgul 5

Support Vector Machines

Support Vector Machine. Industrial AI Lab.

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Decision Trees.

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Machine Learning for Data Science (CS4786) Lecture 11

Classification and Support Vector Machine

Artificial Intelligence Decision Trees

Data Mining and Knowledge Discovery: Practice Notes

Machine Learning, Fall 2009: Midterm

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Stephen Scott.

Learning Decision Trees

Chapter 3: Decision Tree Learning (part 2)

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Discriminative v. generative

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Lecture 7 Decision Tree Classifier

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Statistical Methods in Particle Physics

Microarray Data Analysis: Discovery

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Decision Trees Entropy, Information Gain, Gain Ratio

Dan Roth 461C, 3401 Walnut

Statistics and learning: Big Data

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

10701/15781 Machine Learning, Spring 2007: Homework 2

Naïve Bayes classification

Transcription:

Introduction to Machine Learning 008/B Assignment 5 Michael Orlov Department of Computer Science orlovm@cs.bgu.ac.il September 6, 008 Abstract Submission of Assignment 5 in Introduction to Machine Learning, 0- -546. Question You are given the following data set: 0.04, 0.08, 0., 0.3, 0.44, 0.78, 0.79, 0.88, 0.9. Question (a Build a histogram with h = 0.5, and h = 0.. The naive estimator is given by ˆp(x = {x t : x h < x t x + h} Nh for bin width h and samples {x t } N t=. It can also be written as ˆp(x = Nh w(u = { N ( x xt w, h t=, if u < 0, otherwise. This reformulation changes the x t x+h condition to x t < x+h however, it changes value of ˆp(x at only a finite number of points, and it is still a probability density function. For implementation purposes, we note that w(u = min( u, The resulting histograms are shown in Fig. and Fig... Perhaps the question asked for a histogram with fixed bin points. Since naive estimator is more general than that, I am leaving the answer as-is.

Assignment 5 M. Orlov ˆp(x 0.8 0.6 0.4 0. Histogram estimator, h = 0.5 naive estimation samples 0 0.5 0 0.5.5 x Figure : Histogram with h = 0.5. ˆp(x.5.5 0.5 Histogram estimator, h = 0. naive estimation samples 0 0.5 0 0.5.5 x Figure : Histogram with h = 0..

M. Orlov Assignment 5 ˆp(x 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0 Kernel estimator, h = 0. kernel estimation samples 0.5 0 0.5.5 x Figure 3: Kernel estimate with h = 0.. Question (b Using a Gaussian Kernel estimator with h = 0., what is the probability density at 0.? In Gaussian Kernel estimator, we use K(u instead of w(u in the previous question, K(u = π e u. Substituting K(u for w(u, we have ˆp(0. 0.88495, for h = 0.. Fig. 3 shows the complete kernel estimate. Question (c Using k-nearest neighbor with k =, what is the probability density at 0.? The k-nearest neighbor density estimate is given by ˆp(x = k Nd k (x. For k =, we have ˆp(x = Nd (x = 9d (x, where d (x is the distance from x to second-closest sample. For x = 0., the closest sample is 0.3, and the second-closest sample is 0., therefore ˆp(0. = 9 0. 0..38889. 3

Assignment 5 M. Orlov size color shape result medium blue brick yes small red sphere yes large green pillar yes large green sphere yes small red wedge no large red wedge no large red pillar no Table : Classification data. Question (d Using k-nearest neighbor with k = and a Gaussian kernel, what is the probability density at 0.? With Gaussian kernel, the probability density is given by ˆp(x = Nd k (x N ( x xt K d k (x t= For k = and x = 0., we have ˆp(0. = = Question 9d (0. 9 0.08.356. 9 ( x xt K d (x t= 9 ( x xt K 0.08 t=. Use the data in Table to build a decision-tree. Stop only with pure leaf nodes. Attributes on which to split tree nodes are picked according to the entropybased impurity measure. If at node m, N mj of N m samples take branch j, and N i mj of those belong to class C i, the post-split impurity of node m is given by I m = n j= = N m = N m N mj N m n K i= K j= i= j= N i mj N mj log N i mj N mj N i mj log N i mj N mj n ( K N mj log N mj Nmj i log Nmj i i=. 4

M. Orlov Assignment 5 shape brick sphere pillar wedge yes yes color no red green no yes Figure 4: The decision tree. The blue branch of decision node color is not shown, since it is unreachable. For two-class classification problem, the above can be written as I m = N m n ( N mj log N mj Nmj log Nmj Nmj log Nmj j=. Minimizing the impurity after each split [, p. 79], the root split is on the shape attribute, after which pillar is the only decision node it is then split on the color attribute. The resulting decision tree is shown in Fig. 4. Question 3 Use the same data to learn rules. Learn two rules without pruning. Here, the rule induction process is according to Ripper algorithm [, p. 87], but without pruning. Rules are learned one at a time, and explain positive samples. The conditions are also added to a rule one at a time, maximizing the information gain, ( Gain(R, R = s log N + N log N + N where R is a rule R after adding one condition, N and N are the number of samples covered by the rules, N + and N + are the number of true positives in them, and s is the number of true positives in R that are still true positives in R i.e., N +. After the rule is grown (covers no negative samples, all the samples that it covers are removed from the training set., 5

Assignment 5 M. Orlov As shown in the appendix, when growing the first rule, the first condition maximizing the information gain can be either green or sphere. We arbitrarily pick green, and the rule is now complete, since no negative samples are covered. The two relevant samples can now be removed from Table. The first rule is thus: IF color = green THEN yes. The second rule is similarly IF size = medium THEN yes, but can also discriminate on several other attribute values. A Listings Question xt = [ 0.04 0.08 0. 0.3 0.44 0.78 0.79 0.88 0.9 ]; function retval = naive(x retval = ( - floor(min(abs(x, / ; endfunction function retval = kernel(x retval = / sqrt( * pi * exp(- (x.* x / ; endfunction function retval = ne(h,xt,x retval = /(length(xt * h * sum(naive((x-xt / h; endfunction function retval = ke(h,xt,x retval = /(length(xt * h * sum(kernel((x-xt / h; endfunction x = [-0.5:0.0:.5]; for i = :length(x y05(i = ne(0.5,xt,x(i; y0(i = ne(0.,xt,x(i; k0(i = ke(0.,xt,x(i; endfor resh05 = [x y05 ]; resh0 = [x y0 ]; save res-h05 resh05 save res-h0 resh0 printf("p K(0.: %.0f\n", ke(0., xt, 0.; 6

M. Orlov Assignment 5 resk0 = [x k0 ]; save res-k0 resk0 printf("p(0.: %.0f\n", /(9*0.08 * sum(kernel((0.-xt / 0.08; Questions and 3 7

Sheet Split node Attribute Values N_mj^yes N_mj^no N_mj N_m Partial impurity Impurity root size small 7 0.9 0.86 medium 0 7 0.00 large 4 7 0.57 color red 3 4 7 0.46 0.46 green 0 7 0.00 blue 0 7 0.00 shape brick 0 7 0.00 0.9 sphere 0 7 0.00 pillar 7 0.9 wedge 0 7 0.00 pillar size small 0 0 0 0.00.00 medium 0 0 0 0.00 large.00 color red 0 0.00 0.00 green 0 0.00 blue 0 0 0 0.00 Page

Sheet Condition Attribute Values N N+ N' N'+ s Gain root size small 7 4-0.9 medium 7 4 0.8 large 7 4 4-0.39 color red 7 4 4 -.9 green 7 4.6 blue 7 4 0.8 shape brick 7 4 0.8 sphere 7 4.6 pillar 7 4-0.9 wedge 7 4 0 0 0.00 root size small 5 0.3 medium 5.3 large 5 0 0 0.00 color red 5 4-0.68 blue 5.3 shape brick 5.3 sphere 5.3 pillar 5 0 0 0.00 wedge 5 0 0 0.00 Page

Assignment 5 M. Orlov References [] Ethem Alpaydin. Introduction to Machine Learning. The MIT Press, October 004. ISBN 0-6-0-. 0