Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place

Similar documents
DATA MINING: NAÏVE BAYES

Lets start off with a visual intuition

Insect ID. Antennae Length. Insect Class. Abdomen Length

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Bayesian Classification. Bayesian Classification: Why?

CISC 4631 Data Mining

Supervised Learning: Regression & Classifiers. Fall 2010

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Classification

Induction on Decision Trees

Algorithms for Classification: The Basic Methods

UNIT 6 Classification and Prediction

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification and Regression Trees

The Naïve Bayes Classifier. Machine Learning Fall 2017

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Decision Trees. Gavin Brown

Artificial Intelligence Programming Probability

The Solution to Assignment 6

Numerical Learning Algorithms

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Stephen Scott.

Introduction to Machine Learning CMU-10701

Probability Based Learning

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Mining Classification Knowledge

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees. Tirgul 5

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Administrative notes. Computational Thinking ct.cs.ubc.ca

Mining Classification Knowledge

CS 6375 Machine Learning

Machine Learning

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Bayesian Learning. Bayesian Learning Criteria

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Machine Learning Alternatives to Manual Knowledge Acquisition

COMP 328: Machine Learning

Decision Support. Dr. Johan Hagelbäck.

Learning Decision Trees

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Data classification (II)

Decision Trees. Danushka Bollegala

Learning Decision Trees

UVA CS / Introduc8on to Machine Learning and Data Mining

10-701/ Machine Learning: Assignment 1

Machine Learning Recitation 8 Oct 21, Oznur Tastan

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

UVA CS 4501: Machine Learning

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Symbolic methods in TC: Decision Trees

Machine Learning 2nd Edi7on

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Introduction to Machine Learning

Bayesian Learning Features of Bayesian learning methods:

The Bayesian Learning

Naïve Bayes Classifiers

Naïve Bayes Lecture 6: Self-Study -----

Introduction to Machine Learning

Intuition Bayesian Classification

Decision Tree And Random Forest

Data Mining Part 4. Prediction

Building Bayesian Networks. Lecture3: Building BN p.1

CSCE 478/878 Lecture 6: Bayesian Learning


Naive Bayes Classifier. Danushka Bollegala

Data Mining and Machine Learning

Dan Roth 461C, 3401 Walnut

Classification Lecture 2: Methods

Machine Learning

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Classification: Decision Trees

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

CC283 Intelligent Problem Solving 28/10/2013

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Decision Tree Learning - ID3

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

What does Bayes theorem give us? Lets revisit the ball in the box example.

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Modern Information Retrieval

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Chapter 4.5 Association Rules. CSCI 347, Data Mining

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Lecture 3: Decision Trees

Classification Using Decision Trees

Transcription:

Uncertainty Yeni Herdiyeni Departemen Ilmu Komputer IPB The World is very Uncertain Place 1

Ketidakpastian Presiden Indonesia tahun 2014 adalah Perempuan Tahun depan saya akan lulus Jumlah penduduk Indonesia sekitar 200 juta Tahun depan saya akan naik jabatan Jika gejalanya adalah demam tinggi maka penyakitnya adalah tipes Dll Ketidakpastian menyulitkan penyelesaian masalah 2

Antenna Length Naïve Bayes Classifier Thomas Bayes 1702-1761 We will start off with a visual intuition, before looking at the math Grasshoppers Katydids 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length Remember this example? Let s get lots more data 3

Antenna Length With a lot of data, we can build a histogram. Let us just build one for Antenna Length for now 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Katydids Grasshoppers We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides 4

We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification p(c j d) = probability of class c j, given that we have observed d 3 Antennae length is 3 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 3 ) = 10 / (10 + 2) = 0.833 P(Katydid 3 ) = 2 / (10 + 2) = 0.166 2 10 3 Antennae length is 3 5

p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 7 ) = 3 / (3 + 9) = 0.250 P(Katydid 7 ) = 9 / (3 + 9) = 0.750 3 9 7 Antennae length is 7 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 5 ) = 6 / (6 + 6) = 0.500 P(Katydid 5 ) = 6 / (6 + 6) = 0.500 6 6 5 Antennae length is 5 6

Bayes Classifiers That was a visual intuition for a simple case of the Bayes classifier, also called: Idiot Bayes Naïve Bayes Simple Bayes We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea. Find out the probability of the previously unseen instance belonging to each class, then simply pick the most probable class. Bayes Classifiers Bayesian classifiers use Bayes theorem, which says p(c j d ) = p(d c j ) p(c j ) p(d) p(c j d) = probability of instance d being in class c j, This is what we are trying to compute p(d c j ) = probability of generating instance d given class c j, We can imagine that being in class c j, causes you to have feature d with some probability p(c j ) = probability of occurrence of class c j, This is just how frequent the class c j, is in our database p(d) = probability of instance d occurring This can actually be ignored, since it is the same for all classes 7

Assume that we have two classes c 1 = male, and c 2 = female. We have a person whose sex we do not know, say drew or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male drew) or p(female drew) (Note: Drew can be a male or female name ) Drew Barrymore What is the probability of being called drew given that you are a male? p(male drew) = p(drew male ) p(male) p(drew) Drew Carey What is the probability of being a male? What is the probability of being named drew? (actually irrelevant, since it is that same for all classes) This is Officer Drew (who arrested me in 1997). Is Officer Drew a Male or? Luckily, we have a small database with names and sex. Officer Drew We can use it to apply Bayes rule p(c j d) = p(d c j ) p(c j ) p(d) Name Drew Claudia Drew Drew Alberto Karin Nina Sergio Sex Male Male Male 8

Officer Drew p(c j d) = p(d c j ) p(c j ) p(d) p(male drew) = 1/3 * 3/8 = 0.125 3/8 3/8 p(female drew) = 2/5 * 5/8 = 0.250 3/8 3/8 Name Drew Claudia Drew Drew Alberto Karin Nina Sergio Sex Male Male Male Officer Drew is more likely to be a. Officer Drew IS a female! Officer Drew p(male drew) = 1/3 * 3/8 = 0.125 3/8 3/8 p(female drew) = 2/5 * 5/8 = 0.250 3/8 3/8 9

So far we have only considered Bayes Classification when we have one attribute (the antennae length, or the name ). But we may have many features. How do we use all the features? p(c j d) = p(d c j ) p(c j ) p(d) Name Over 170CM Eye Hair length Sex Drew No Blue Short Male Claudia Yes Brown Long Drew No Blue Long Drew No Blue Long Alberto Yes Brown Short Male Karin No Blue Long Nina Yes Brown Short Sergio Yes Blue Long Male To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d c j ) = p(d 1 c j ) * p(d 2 c j ) *.* p(d n c j ) The probability of class c j generating instance d, equals. The probability of class c j generating the observed value for feature 1, multiplied by.. The probability of class c j generating the observed value for feature 2, multiplied by.. 10

To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d c j ) = p(d 1 c j ) * p(d 2 c j ) *.* p(d n c j ) p(officer drew c j ) = p(over_170 cm = yes c j ) * p(eye =blue c j ) *. Officer Drew is blue-eyed, over 170 cm tall, and has long hair p(officer drew ) = 2/5 * 3/5 *. p(officer drew Male) = 2/3 * 2/3 *. The Naive Bayes classifiers is often represented as this type of graph c j Note the direction of the arrows, which state that each class causes certain features, with a certain probability p(d 1 c j ) p(d 2 c j ) p(d n c j ) 11

Naïve Bayes is fast and space efficient c j We can look up all the probabilities with a single scan of the database and store them in a (small) table p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over190 cm Male Yes 0.15 No 0.85 Yes 0.01 No 0.99 Sex Long Hair Male Yes 0.05 No 0.95 Yes 0.70 No 0.30 Sex Male Naïve Bayes is NOT sensitive to irrelevant features... Suppose we are trying to classify a persons sex based on several features, including eye color. (Of course, eye color is completely irrelevant to a persons gender) p(jessica c j ) = p(eye = brown c j ) * p( wears_dress = yes c j ) *. p(jessica ) = 9,000/10,000 * 9,975/10,000 *. p(jessica Male) = 9,001/10,000 * 2/10,000 *. Almost the same! However, this assumes that we have good enough estimates of the probabilities, so the more data the better. 12

An obvious point. I have used a simple two class problem, and two possible values for each example, for my previous examples. However we can have an arbitrary number of classes, or feature values c j p(d 1 c j ) p(d 2 c j ) p(d n c j ) Animal Mass >10 kg Cat Yes 0.15 No 0.85 Dog Yes 0.91 No 0.09 Pig Yes 0.99 No 0.01 Animal Color Cat Black 0.33 White 0.23 Brown 0.44 Dog Black 0.97 White 0.03 Brown 0.90 Pig Black 0.04 White 0.01 Brown 0.95 Animal Cat Dog Pig Problem! Naïve Bayes assumes independence of features p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over 6 foot Male Yes 0.15 No 0.85 Yes 0.01 No 0.99 Sex Over 200 pounds Male Yes 0.11 No 0.80 Yes 0.05 No 0.95 13

Solution Consider the relationships between attributes p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over 6 foot Male Yes 0.15 No 0.85 Yes 0.01 No 0.99 Sex Over 200 pounds Male Yes and Over 6 foot 0.11 No and Over 6 foot 0.59 Yes and NOT Over 6 foot 0.05 No and NOT Over 6 foot 0.35 Yes and Over 6 foot 0.01 Solution Consider the relationships between attributes p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) But how do we find the set of connecting arcs?? 14

The Naïve Bayesian Classifier has a quadratic decision boundary 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Advantages/Disadvantages of Naïve Bayes Advantages: Fast to train (single scan). Fast to classify Not sensitive to irrelevant features Handles real and discrete data Handles streaming data well Disadvantages: Assumes independence of features 15

Naive Bayesian Classifier (II) Given a training set, we can compute the probabilities Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N O utlook P N H um idity P N su n n y 2 /9 3 /5 h ig h 3 /9 4 /5 overcast 4 /9 0 norm al 6 /9 1 /5 rain 3 /9 2 /5 Tem preature W indy hot 2 /9 2 /5 tru e 3 /9 3 /5 m ild 4 /9 2 /5 false 6 /9 2 /5 cool 3 /9 1 /5 Play-tennis example: estimating P(x i C) Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N P(p) = 9/14 P(n) = 5/14 outlook P(sunny p) = 2/9 P(sunny n) = 3/5 P(overcast p) = 4/9 P(overcast n) = 0 P(rain p) = 3/9 P(rain n) = 2/5 temperature P(hot p) = 2/9 P(hot n) = 2/5 P(mild p) = 4/9 P(mild n) = 2/5 P(cool p) = 3/9 P(cool n) = 1/5 humidity P(high p) = 3/9 P(high n) = 4/5 P(normal p) = 6/9 P(normal n) = 2/5 windy P(true p) = 3/9 P(true n) = 3/5 P(false p) = 6/9 P(false n) = 2/5 16

Latihan 17