Bayesian Classification. Bayesian Classification: Why?

Similar documents
Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Data Mining Part 4. Prediction

Naïve Bayesian. From Han Kamber Pei

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Mining Classification Knowledge

Bayesian Learning Features of Bayesian learning methods:

Algorithms for Classification: The Basic Methods

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Mining Classification Knowledge

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

The Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning

Chapter 6 Classification and Prediction (2)

COMP 328: Machine Learning

UNIT 6 Classification and Prediction

Induction on Decision Trees

Decision Trees. Tirgul 5

Lecture 9: Bayesian Learning

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Learning Decision Trees

Decision Support. Dr. Johan Hagelbäck.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

The Solution to Assignment 6

Rule Generation using Decision Trees

Bayesian Learning. Bayesian Learning Criteria

Decision Tree Learning and Inductive Inference

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Intuition Bayesian Classification

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

Learning Classification Trees. Sargur Srihari

Learning Decision Trees

Classification Using Decision Trees

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Decision Trees. Danushka Bollegala

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Data classification (II)

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Decision Trees.

Chapter 4.5 Association Rules. CSCI 347, Data Mining

5. Classification and Prediction. 5.1 Introduction

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Stephen Scott.

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Lecture 3: Decision Trees

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

CS6220: DATA MINING TECHNIQUES

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Decision Trees. Gavin Brown

Artificial Intelligence. Topic

Administrative notes. Computational Thinking ct.cs.ubc.ca

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

UVA CS / Introduc8on to Machine Learning and Data Mining

EECS 349:Machine Learning Bryan Pardo

Naïve Bayes Lecture 6: Self-Study -----

Machine Learning 2nd Edi7on

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Tree Learning

CS6220: DATA MINING TECHNIQUES

Decision Trees.

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Probability Based Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Quiz3_NaiveBayesTest

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

CS 6375 Machine Learning

Artificial Intelligence Programming Probability

Introduction to Bayesian Learning. Machine Learning Fall 2018

Machine Learning Recitation 8 Oct 21, Oznur Tastan

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

CHAPTER-17. Decision Tree Induction

4. Classification and Prediction

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Decision Trees Part 1. Rao Vemuri University of California, Davis

Bias Correction in Classification Tree Construction ICML 2001

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Bayesian Learning Extension

Classification and Prediction

BAYESIAN CLASSIFICATION

Classification: Decision Trees

Classification and Prediction

Bayesian Belief Network

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Modern Information Retrieval

Bayesian Learning (II)

the tree till a class assignment is reached

Lecture 3: Decision Trees

Naïve Bayes classification

Machine Learning Alternatives to Manual Knowledge Acquisition

Decision Tree And Random Forest

Transcription:

Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms 1

Bayesian Theorem Given training data D, posteriori probability of a hypothesis h, h follows the Bayes theorem P ( h = MAP (maximum posteriori) hypothesis h MAP argmax h = argmax h h H h H Practical difficulty: Initial knowledge of probabilities is required, significant computational cost ). Bayesian Theorem Probability that h olds given D P ( h = Training data set D describes computers by their speed and price. h = is hypothesis that D is a computer (i.e., D belongs to a specific category C ) = probability that D is a computer given the speed and price is difficult to compute 2

Naïve Bayes Classifier (1) simplified assumption: features are conditionally independent: P ( h = C C v C V) ) ) j j i j i= 1 n Naïve Bayes Classifier Features Decision Object Outlook Temperature Humidity Wind Play Ball 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No 3

Naive Bayesian Classifier (2) Given a training set, we can compute the probabilities Outlook P N Humidity P N sunny 2/9 3/5 high 3/9 4/5 overcast 4/9 0 normal 6/9 1/5 rain 3/9 2/5 Temperature Windy hot 2/9 2/5 true 3/9 3/5 mild 4/9 2/5 false 6/9 2/5 cool 3/9 1/5 Bayesian classification P ( h = The classification problem may be formalized using a-posteriori probabilities: C X) = probability that the sample tuple X=<x 1,,x k > is of class C E.g., class=n outlook=sunny, windy=true, ) Idea: assign sample X class label C such that C X) is maximal 4

Estimating a-posteriori probabilities Bayes theorem: C X) = X C) C) / X) X) is constant for all classes C) = relative freq of class C samples C such that C X) is maximum = C such that X C) C) is maximum Problem: computing X C) is infeasible! Naïve Bayesian Classification Naïve assumption: attribute independence x 1,,x k C) = x 1 C) x k C) If i-th attribute is categorical: x i C) is estimated as the relative freq of samples having value x i as i-th attribute in class C If i-th attribute is continuous: x i C) is estimated thru a Gaussian density function Computationally easy in both cases 5

Play-tennis example: estimating x i C) outlook Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N p) = 9/14 n) = 5/14 sunny p) = 2/9 overcast p) =4/9 rain p) = 3/9 temperature hot p) = 2/9 mild p) = 4/9 cool p) = 3/9 humidity high p) = 3/9 normal p) = 6/9 windy true p) = 3/9 false p) = 6/9 sunny n) = 3/5 overcast n) = 0 rain n) = 2/5 hot n) = 2/5 mild n) = 2/5 cool n) = 1/5 high n) = 4/5 normal n) = 2/5 true n) = 3/5 false n) = 2/5 Play-tennis example An unseen sample X = <rain, hot, high, false> X p) p) = rain p) hot p) high p) false p) p) = 3/9 2/9 3/9 6/9 9/14 = 0.010582 X n) n) = rain n) hot n) high n) false n) n) = 2/5 2/5 4/5 2/5 5/14 = 0.018286 Sample X is classified in class n (don t play) 6

The independence hypothesis makes computation possible yields optimal classifiers when satisfied but is seldom satisfied in practice, as attributes (variables) are often correlated. Attempts to overcome this limitation: Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes Decision trees, that reason on one attribute at the time, considering most important attributes first Bayesian Belief Networks (1) Family History Smoker (FH, S) (FH, ~S)(~FH, S) (~FH, ~S) LC 0.8 0.5 0.7 0.1 LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9 The conditional probability table for the variable LungCancer PositiveXRay Dyspnea Bayesian Belief Networks 7

Bayesian Belief Networks (2) Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Several cases of learning Bayesian belief networks Given both network structure and all the variables: easy Given network structure but only some variables When the network structure is not known in advance Reference J. Han, M. Kamber; Data Mining, 2001; Morgan Kaufmann Publishers: San Francisco, CA. 8