Artificial Intelligence Learning of decision trees

Similar documents
Bayes (Naïve or not) Classifiers: Generative Approach

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Multiple Choice Test. Chapter Adequacy of Models for Regression

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Probability and Statistics. What is probability? What is statistics?

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Continuous Random Variables: Conditioning, Expectation and Independence

2. Independence and Bernoulli Trials

Chain Rules for Entropy

Nonparametric Density Estimation Intro

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Naïve Bayes MIT Course Notes Cynthia Rudin

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Parameter Estimation

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Entropy, Relative Entropy and Mutual Information

CHAPTER VI Statistical Analysis of Experimental Data

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Two Fuzzy Probability Measures

2SLS Estimates ECON In this case, begin with the assumption that E[ i

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Binary classification: Support Vector Machines

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 10 Two Stage Sampling (Subsampling)

Lecture 9: Tolerant Testing

Continuous Distributions

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

6. Nonparametric techniques

Kernel-based Methods and Support Vector Machines

Unsupervised Learning and Other Neural Networks

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Supervised learning: Linear regression Logistic regression

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

STK3100 and STK4100 Autumn 2017

Point Estimation: definition of estimators

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Summary of the lecture in Biostatistics

ρ < 1 be five real numbers. The

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Chapter 3 Sampling For Proportions and Percentages

Special Instructions / Useful Data

Computational Geometry

Generative classification models

Lecture 3 Probability review (cont d)

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

10.1 Approximation Algorithms

(b) By independence, the probability that the string 1011 is received correctly is

Introduction to local (nonparametric) density estimation. methods

STK3100 and STK4100 Autumn 2018

ENGI 3423 Simple Linear Regression Page 12-01

Parameter, Statistic and Random Samples

Linear regression (cont.) Linear methods for classification

Chapter 4 Multiple Random Variables

Class 13,14 June 17, 19, 2015

PTAS for Bin-Packing

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Given a table of data poins of an unknown or complicated function f : we want to find a (simpler) function p s.t. px (

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Module 7. Lecture 7: Statistical parameter estimation

A New Family of Transformations for Lifetime Data

Centroids Method of Composite Areas

Applied Poker Test for General Digital Sequences

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Lecture Notes Types of economic variables

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Econometric Methods. Review of Estimation

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Chapter 11 Systematic Sampling

A tighter lower bound on the circuit size of the hardest Boolean functions

New Trade Theory (1979)

18.413: Error Correcting Codes Lab March 2, Lecture 8

Test Paper-II. 1. If sin θ + cos θ = m and sec θ + cosec θ = n, then (a) 2n = m (n 2 1) (b) 2m = n (m 2 1) (c) 2n = m (m 2 1) (d) none of these

Announcements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier

Statistics: Unlocking the Power of Data Lock 5

Rademacher Complexity. Examples

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Computations with large numbers

Factorization of Finite Abelian Groups

Chapter 13 Student Lecture Notes 13-1

MEASURES OF DISPERSION

About a Fuzzy Distance between Two Fuzzy Partitions and Application in Attribute Reduction Problem

Recursive linear estimation for discrete time systems in the presence of different multiplicative observation noises

Simulation Output Analysis

The Mathematical Appendix

Chapter 9 Jordan Block Matrices

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Minimizing Total Completion Time in a Flow-shop Scheduling Problems with a Single Server

Transcription:

Artfcal Itellgece Learg of decso trees Peter Atal atal@mt.bme.hu A.I. November 21, 2016 1

Problem: decde whether to wat for a table at a restaurat, based o the followg attrbutes: 1. Alterate: s there a alteratve restaurat earby? 2. Bar: s there a comfortable bar area to wat? 3. Fr/Sat: s today Frday or Saturday? 4. Hugry: are we hugry? 5. Patros: umber of eole the restaurat (Noe, Some, Full) 6. Prce: rce rage ($, $$, $$$) 7. Rag: s t rag outsde? 8. Reservato: have we made a reservato? 9. Tye: kd of restaurat (Frech, Itala, Tha, Burger) 10. WatEstmate: estmated watg tme (0-10, 10-30, 30-60, >60)

Examles descrbed by attrbute values (Boolea, dscrete, cotuous) E.g., stuatos where I wll/wo't wat for a table: Classfcato of examles s ostve (T) or egatve (F)

Oe ossble reresetato for hyotheses E.g., here s the true tree for decdg whether to wat:

Decso trees ca exress ay fucto of the ut attrbutes. E.g., for Boolea fuctos, truth table row ath to leaf: Trvally, there s a cosstet decso tree for ay trag set wth oe ath to leaf for each examle (uless f odetermstc x) but t robably wo't geeralze to ew examles Prefer to fd more comact decso trees

How may dstct decso trees wth Boolea attrbutes? = umber of Boolea fuctos = umber of dstct truth tables wth 2 rows = 2 2 E.g., wth 6 Boolea attrbutes, there are 18,446,744,073,709,551,616 trees

How may dstct decso trees wth Boolea attrbutes? = umber of Boolea fuctos = umber of dstct truth tables wth 2 rows = 2 2 E.g., wth 6 Boolea attrbutes, there are 18,446,744,073,709,551,616 trees How may urely cojuctve hyotheses (e.g., Hugry Ra)? Each attrbute ca be (ostve), (egatve), or out 3 dstct cojuctve hyotheses More exressve hyothess sace creases chace that target fucto ca be exressed creases umber of hyotheses cosstet wth trag set may get worse redctos

Total error I ractce, the target tycally s ot sde the hyothess sace: the total real error ca be decomosed to bas + varace bas : exected error/modellg error varace : estmato/emrcal selecto error For a gve samle sze the error s decomosed: Modelg error Statstcal error (Model selecto error) Total error Model comlexty

Am: fd a small tree cosstet wth the trag examles Idea: (recursvely) choose "most sgfcat" attrbute as root of (sub)tree

Idea: a good attrbute slts the examles to subsets that are (deally) "all ostve" or "all egatve" Patros? s a better choce

To mlemet Choose-Attrbute the DTL algorthm Iformato Cotet (Etroy): I(P(v 1 ),, P(v )) = Σ =1 -P(v ) log 2 P(v ) For a trag set cotag ostve examles ad egatve examles: I(, ) log 2 log 2

A chose attrbute A dvdes the trag set E to subsets E 1,, E v accordg to ther values for A, where A has v dstct values. Iformato Ga (IG) or reducto etroy from the attrbute test: Choose the attrbute wth the largest IG v I A remader 1 ), ( ) ( ) ( ), ( ) ( A remader I A IG

For the trag set, = = 6, I(6/12, 6/12) = 1 bt Cosder the attrbutes Patros ad Tye (ad others too): 2 4 6 2 4 IG( Patros) 1[ I(0,1) I(1,0) I(, )].0541bts 12 12 12 6 6 2 1 1 2 1 1 4 2 2 4 2 IG( Tye) 1[ I(, ) I(, ) I(, ) I( 12 2 2 12 2 2 12 4 4 12 4 2, )] 4 0 bts Patros has the hghest IG of all attrbutes ad so s chose by the DTL algorthm as the root

Decso tree leared from the 12 examles: Substatally smler tha true tree---a more comlex hyothess s t justfed by small amout of data

abset Bleedg weak strog Oset Regularty P(D Bleedg=strog) Oset=early Oset=late regular P(D a,e) Mutato P(D w,r) h.wld mutated P(D a,l,h.w.) P(D a,l,m) rregular Mutato h.wld mutated P(D w,,h.w.) P(D w,,m) Decso tree: Each teral ode rereset a (uvarate) test, the leafs cotas the codtoal robabltes gve the values alog the ath. Decso grah: If codtos are equvalet, the subtrees ca be merged. E.g. If (Bleedg=abset,Oset=late) ~ (Bleedg=weak,Regularty=rreg)