Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2

Similar documents
Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Statistical Pattern Recognition

Lecture 11: Decision Trees

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Machine Learning. Ilya Narsky, Caltech

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Machine Learning, Spring 2011: Homework 1 Solution

CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

1 Review of Probability & Statistics

Introduction to Machine Learning DIS10

10-701/ Machine Learning Mid-term Exam Solution

Stat410 Probability and Statistics II (F16)

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Lecture 9: Independent Groups & Repeated Measures t-test

Lecture 10: Performance Evaluation of ML Methods

Vector Quantization: a Limiting Case of EM

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Confidence Intervals

HOMEWORK #10 SOLUTIONS

Statistical inference: example 1. Inferential Statistics

Lecture 1 Probability and Statistics

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

NUMERICAL METHODS FOR SOLVING EQUATIONS

4.3 Growth Rates of Solutions to Recurrences

Sample Size Determination (Two or More Samples)

Final Review for MATH 3510

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Infinite Sequences and Series

Chapter 2 The Solution of Numerical Algebraic and Transcendental Equations

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Exercises Advanced Data Mining: Solutions

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

1.010 Uncertainty in Engineering Fall 2008

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

CS 332: Algorithms. Linear-Time Sorting. Order statistics. Slide credit: David Luebke (Virginia)

Economics Spring 2015

Properties and Hypothesis Testing

Chapter 10: Power Series

Output Analysis (2, Chapters 10 &11 Law)

Intro to Learning Theory

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Lecture 15: Learning Theory: Concentration Inequalities

Probabilistic Classifiers Using Nearest Neighbor Balls. Climate Change Workshop, Malta, March, 2009

Chapter 7. Support Vector Machine

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Understanding Samples

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

CS321. Numerical Analysis and Computing

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

6.046 Recitation 5: Binary Search Trees Bill Thies, Fall 2004 Outline

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Spectral Partitioning in the Planted Partition Model

Lecture 1 Probability and Statistics

Random Variables, Sampling and Estimation

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Information-based Feature Selection

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Review Problems for the Final

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

3 Resampling Methods: The Jackknife

Different kinds of Mathematical Induction

IP Reference guide for integer programming formulations.

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

We will conclude the chapter with the study a few methods and techniques which are useful

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

This Lecture. Divide and Conquer. Merge Sort: Algorithm. Merge Sort Algorithm. MergeSort (Example) - 1. MergeSort (Example) - 2

Elementary Statistics

Analysis of Experimental Measurements

General IxJ Contingency Tables

Mixtures of Gaussians and the EM Algorithm

Context-free grammars and. Basics of string generation methods

There is no straightforward approach for choosing the warmup period l.

CSE 527, Additional notes on MLE & EM

Empirical Process Theory and Oracle Inequalities

Lesson 10: Limits and Continuity

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Math 113 Exam 3 Practice

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

15-780: Graduate Artificial Intelligence. Density estimation

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

1036: Probability & Statistics

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Increasing timing capacity using packet coloring

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

(all terms are scalars).the minimization is clearer in sum notation:

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Lecture 10 October Minimaxity and least favorable prior sequences

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Introduction to regression

Riemann Sums y = f (x)

Transcription:

Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2, θ 2 = (x( 1 + x 3 )/2, θ 1 = (x( 1 + x 2 )/2, θ = (θ( 1 + θ 2 + θ 2 )/3 θ Q = gθ-(gg (g-1) θ_= 3θ-(33 (3-1) θ_= (x( 1 + x 2 + x 3 )/3 I this case, the Jackkife Estimator is the same as the usual estimator. Pretice Hall 2 Jackkife Estimator: Example 2 Estimate of variace for X={1, 4, 4}, =3, g=3, m=1, θ = σ 2 σ 2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2 θ 1 = ((4-4) 4) 2 + (4-4) 4) 2 ) /2 = 0, 0 θ 2 = 2.25, θ 3 = 2.25 θ = (θ 1 + θ 2 + θ 2 )/3 = 1.5 θ Q = gθ-(g-1) θ_= 3θ-(33 (3-1) θ_ =3(2)-2(1.5)=3 2(1.5)=3 I this case, the Jackkife Estimator is differet from the usual estimator. Jackkife Estimator: Example 2(cot Example 2(cot d) I geeral, apply the Jackkife techique to the biased estimator σ 2 σ 2 = Σ (x i x ) 2 / the the jackkife estimator is s 2 s 2 = Σ (x i x ) 2 / ( -1) Which is kow to be ubiased for σ 2 Pretice Hall 3 Pretice Hall 4 Review: Distace-based Algorithms Place items i class to which they are closest. Similarity measures or distace measures Simple approach K Nearest Neighbors Decisio Tree issues, pros ad cos Pretice Hall 5 Classificatio Usig Decisio Trees Partitioig based: Divide search space ito rectagular regios. Tuple placed ito class based o the regio withi which it falls. DT approaches differ i how the tree is built: DT Iductio Iteral odes associated with attribute ad arcs with values for that attribute. Algorithms: ID3, C4.5, CART Pretice Hall 6 1

Decisio Tree Give: D = {t 1,, t } where t i =<t i1,, t ih > Database schema cotais {A 1, A 2,,, A h } Classes C={C 1,., C m } Decisio or Classificatio Tree is a tree associated with D such that Each iteral ode is labeled with attribute, A i Each arc is labeled with predicate which ca be applied to attribute at paret Each leaf ode is labeled with a class, C j DT Iductio Pretice Hall 7 Pretice Hall 8 Iformatio Decisio Tree Iductio is ofte based o Iformatio Theory So Pretice Hall 9 Pretice Hall 10 DT Iductio Whe all the marbles i the bowl are mixed up, little iformatio is give. Whe the marbles i the bowl are all from oe class ad those i the other two classes are o either side, more iformatio is give. Use this approach with DT Iductio! Iformatio/Etropy Give probabilities p 1, p 2,.., p s whose sum is 1, Etropy is defied as: Etropy measures the amout of radomess or surprise or ucertaity. Its value is betwee 0 ad 1. Reaches the maximum whe all the probabilities are the same. Goal i classificatio o surprise etropy = 0 Pretice Hall 11 Pretice Hall 12 2

Etropy ID3 Creates tree usig iformatio theory cocepts ad tries to reduce expected umber of compariso.. ID3 chooses split attribute with the highest iformatio gai: Iformatio gai: the differece betwee how much iformatio is eeded to make a correct classificatio before the split versus how much iformatio is eeded after the split. log (1/p) H(p,1-p) Pretice Hall 13 Pretice Hall 14 Height Example Data N a m e G e d e r H e i g h t O u t p u t 1 O u t p u t 2 K r is ti a F 1. 6 m S h o r t M e d iu m J im M 2 m T a ll M e d iu m M a g g ie F 1. 9 m M e d iu m T a ll M a r th a F 1. 8 8 m M e d iu m T a ll S te p h a ie F 1. 7 m S h o r t M e d iu m B o b M 1. 8 5 m M e d iu m M e d iu m K a t h y F 1. 6 m S h o r t M e d iu m D a v e M 1. 7 m S h o r t M e d iu m W o r t h M 2. 2 m T a ll T a ll S te v e M 2. 1 m T a ll T a ll D e b b ie F 1. 8 m M e d iu m M e d iu m T o d d M 1. 9 5 m M e d iu m M e d iu m K im F 1. 9 m M e d iu m T a ll A m y F 1. 8 m M e d iu m M e d iu m W y e t te F 1. 7 5 m M e d iu m M e d iu m Pretice Hall 15 Iformatio Gai Choose geder as the split attribute H(D): etropy before split E(H(D)) : expected etropy after split Iformatio gai = Choose height as the split attribute H(D): etropy before split E(H(D)) : expected etropy after split Iformatio gai = Pretice Hall 16 ID3 Example (Output1) Startig state etropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384 Gai usig geder: Female: 3/9 log(9/3)+6/9 log(9/6)=0.2764 Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = 0.4392 Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) = 0.34152 Gai: 0.4384 0.34152 = 0.09688 Gai usig height: 0.4384 (2/15)(0.301) = 0.3983 Choose height as first splittig attribute Pretice Hall 17 ID3 Example (Output1) Startig state etropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384 Gai usig geder: 0.09688 Gai usig height: 0.4384 (2/15)(0.301) = 0.3983 Choose height as first splittig attribute Pretice Hall 18 3

C4.5 ID3 favors attributes with large umber of divisios Improved versio of ID3: Missig Data Cotiuous Data Pruig Rules GaiRatio: C4.5: Example Calculate the GaiRatio for the geder split Etropy associated with the split igorig classes H(9/15, 6/15) = 0.292 The GaiRatio value for the geder attribute 0.09688/0.292 = 0.332 Pretice Hall 19 Pretice Hall 20 C5.0 A commercial versio of C4.5 widely used i may data miig packages. Targeted toward use with large datasets. Produce more accurate rules. Improves o memory usage by 90% Ru much faster tha C4.5 CART Create Biary Tree Uses etropy for best splittig attribute (as with ID3) Formula to choose split poit, s, for ode t: P L,P R probability that a tuple i the traiig set will be o the left or right side of the tree. P(C j t L ), P(C j t R ) :probability that a tuple is i class C j ad i the left (or right) subtree. Pretice Hall 21 Pretice Hall 22 CART Example At the start, there are six choices for split poit (right brach o equality): ϕ(geder)= (Geder)=2(6/15)(9/15)(2/15 + 4/15 + 3/15)=0.224 ϕ(1.6) = 0 ϕ(1.7) = 2(2/15)(13/15)(0 + 8/15 + 3/15) = 0.169 ϕ(1.8) = 2(5/15)(10/15)(4/15 + 6/15 + 3/15) = 0.385 ϕ(1.9) = 2(9/15)(6/15)(4/15 + 2/15 + 3/15) = 0.256 ϕ(2.0) = 2(12/15)(3/15)(4/15 + 8/15 + 3/15) = 0.32 Best split at 1.8 What is ext? Scalable DT Techiques SPRINT Creatio of DTs for large datasets. Based o CART techiques Pretice Hall 23 Pretice Hall 24 4

Next Lecture: Rule-based algorithms Combig techiques Pretice Hall 25 5