CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Similar documents
Algorithms for Classification: The Basic Methods

The Solution to Assignment 6

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

The Naïve Bayes Classifier. Machine Learning Fall 2017

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Naïve Bayes Lecture 6: Self-Study -----

Decision Support. Dr. Johan Hagelbäck.

Tools of AI. Marcin Sydow. Summary. Machine Learning

Mining Classification Knowledge

Mining Classification Knowledge

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Bayesian Classification. Bayesian Classification: Why?

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Bayesian Learning. Bayesian Learning Criteria

Decision Trees Part 1. Rao Vemuri University of California, Davis

Data Mining Part 4. Prediction

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Quiz3_NaiveBayesTest

The popular table. Table (relation) Example. Table represents a sample from a larger population Attribute

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Machine Learning Alternatives to Manual Knowledge Acquisition

COMP 328: Machine Learning

Decision Trees. Gavin Brown

Learning Decision Trees

Data Mining. Chapter 1. What s it all about?


COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Administrative notes. Computational Thinking ct.cs.ubc.ca

KLASIFIKACIJA NAIVNI BAJES. NIKOLA MILIKIĆ URL:

Decision Trees. Danushka Bollegala

Data classification (II)

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

The Bayesian Learning

Classification, Linear Models, Naïve Bayes

CC283 Intelligent Problem Solving 28/10/2013

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Lazy Rule Learning Nikolaus Korfhage

Bayesian Learning Features of Bayesian learning methods:

Machine Learning for natural language processing

Naive Bayes Classifier. Danushka Bollegala

Probability Based Learning

Decision Tree Learning

Symbolic methods in TC: Decision Trees

Data Mining and Machine Learning

Slides for Data Mining by I. H. Witten and E. Frank

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Least Squares Classification

Numerical Learning Algorithms

Einführung in Web- und Data-Science

Machine Learning Chapter 4. Algorithms

Classification Using Decision Trees

Unsupervised Learning. k-means Algorithm

Classification: Decision Trees

Decision Tree Learning and Inductive Inference

Modern Information Retrieval

Entropy. Expected Surprise

Will it rain tomorrow?

Machine Learning in Action

CS 6375 Machine Learning

Empirical Approaches to Multilingual Lexical Acquisition. Lecturer: Timothy Baldwin

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Stephen Scott.

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Rule Generation using Decision Trees

Learning Classification Trees. Sargur Srihari

Data Mining and Knowledge Discovery: Practice Notes

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Induction on Decision Trees

Learning Decision Trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

UVA CS / Introduc8on to Machine Learning and Data Mining

Lecture 9: Bayesian Learning

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Intuition Bayesian Classification

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Decision Trees. Tirgul 5

Modern Information Retrieval

Bias Correction in Classification Tree Construction ICML 2001

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Symbolic methods in TC: Decision Trees

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

Stephen Scott.

Analysis of Data Mining Techniques for Weather Prediction

Naïve Bayes, Maxent and Neural Models

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Learning from Data 1 Naive Bayes

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Bayesian Decision Theory

EECS 349:Machine Learning Bryan Pardo

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Transcription:

CLASSIFICATION NAIVE BAYES NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com

WHAT IS CLASSIFICATION? A supervised learning task of determining the class of an instance; it is assumed that: feature values for the given instance are known the set of possible classes is known and given Classes are given as nominal values; for instance: classification of email messages: spam, not-spam classification of news articles: politics, sport, culture i sl.

Example 1 ToPlayOtNotToPlay.arff dataset

Sunny weather Suppose you know that it is sunny outside Then 60% chance that Play = no

How well does outlook predict play?

How well does outlook predict play? For each attribute

2 occurences of Play = no, where Outlook = rainy 5 occurrences of Play = no Values to ratios Covert values to ratios

Likelihood of playing under these weather conditions Calculate the likelihood that: Outlook = sunny (0.22) Temperature = cool (0.33) Humidity = high (0.33) Windy = true (0.33) Play = yes (0.64) Likelihood of playing under these weather conditions 0.22 x 0.33 x 0.33 x 0.33 x 0.64 = 0.0053

Likelihood of NOT playing under these weather conditions Calculate the likelihood that: Outlook = sunny (0.60) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36) Likelihood of NOT playing under these weather conditions 0.60 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0206

The Bayes Theorem Given these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true Probability of Play = yes: 0.0053 = 20.5% 0.0053 + 0.0206 Probability of Play = no: 0.0206 = 79.5% 0.0053 + 0.0206

Likelihood of NOT playing under these weather conditions Calculate the likelihood that: Outlook = overcast (0.00) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36) 0.00 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0000

Laplace estimator The original dataset Laplace estimator: Add 1 to each count After the Laplace estimator

Laplace estimator Convert incremented counts to ratios after implementing the Laplace estimator

Laplace estimator Outlook = ovecast, Temperature = cool, Humidity = high, Windy = true Play = no: 0.13 x 0.25 x 0.71 x 0.57 x 0.36 = 0.046 Play = yes: 0.42 x 0.33 x 0.36 x 0.36 x 0.64 = 0.0118 Probability of Play = no: 0.0046 = 28% 0.0046 + 0.0118 Probability of Play = yes: 0.0118 = 72% 0.0046 + 0.0118

Laplace estimator Under these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true NOT using Laplace estimator: Play = no: 79.5% Play = yes: 20.5% Using Laplace estimator: Play = no: 72.0% Play = yes: 28.0% The effect of Laplace estimator has little effect as sample size grows.

Prediction rules Repeat previous calculation for all other combinations of weather conditions. Calculate the rules for each pair. Then throw out the rules with p < 0.5

Prediction rules Calculate probabilities for all 36 combinations

Prediction rules Rules predicting class for all combinations of attributes The instance 6 is missing

Comparing the prediction with the original data

Weka Waikato Environment for Knowledge Analysis Java Software for data mining Set of algorithms for machine learning and data mining Developed at the University of Waikato, New Zealand Open-source Website: http://www.cs.waikato.ac.nz/ml/weka

Datasets we use We use datasets from the Technology Forge: http://www.technologyforge.net/datasets

ARFF file Attribut-Relation File Format ARFF Text file Attributes could be: Numerical Nominal @relation TPONTPNom!! @attribute Outlook {sunny, overcast, rainy}! @attribute Temp. {hot, mild, cool}! @attribute Humidity {high, normal}! @attribute Windy {'false', 'true'}! @attribute Play {no, yes}!! @data! sunny, hot, high, 'false', no! sunny, hot, high, 'true', no! overcast, hot, high, 'false', yes!...!

Classification in Weka ToPlayOtNotToPlay.arff dataset

Classification results The Laplace estimator is automatically applied

Classification results Instance 6 is marked as a wrong identified instance Probability of each instance in the dataset

Precision, Recall, and F-Measure True Positives Rate False Positives Rate Precision = TP (TP + FP) Recall = TP (TP + NP) F measure = 2 * Precision * Recall Precision + Recall

Confusion Matrix TP = True Positive FP = False Positive TN = True Negative FN = False Negative

Example 2 Eatable Mushrooms dataset Eatable Mushrooms dataset based on National Audubon Society Field Guide to North American Mushrooms Hypothetical samples with descriptions corresponding to 23 species of mushrooms There are 8124 instances with 22 nominal attributes which describe mushroom characteristics; one of which is whether a mushroom is eatable or not Our goal is to predict whether a mushroom is eatable or not

Thank you! Weka Tutorials and Assignments @ The Technology Forge Link: http://www.technologyforge.net/wekatutorials/

(Anonymous) survey for your comments and suggestions http://goo.gl/cqdp3i

ANY QUESTIONS? NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com