KLASIFIKACIJA NAIVNI BAJES. NIKOLA MILIKIĆ URL:

Similar documents
CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Fajl koji je korišćen može se naći na

The Solution to Assignment 6

Chapter 4.5 Association Rules. CSCI 347, Data Mining

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Algorithms for Classification: The Basic Methods

Projektovanje paralelnih algoritama II

Administrative notes. Computational Thinking ct.cs.ubc.ca

Bayesian Classification. Bayesian Classification: Why?

Decision Support. Dr. Johan Hagelbäck.

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Quiz3_NaiveBayesTest


Mathcad sa algoritmima

Slika 1. Slika 2. Da ne bismo stalno izbacivali elemente iz skupa, mi ćemo napraviti još jedan niz markirano, gde će

Lazy Rule Learning Nikolaus Korfhage

The popular table. Table (relation) Example. Table represents a sample from a larger population Attribute

Data Mining. Chapter 1. What s it all about?

Decision Tree Learning

Decision Trees. Gavin Brown

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Data classification (II)

Uvod u relacione baze podataka

Decision Trees. Danushka Bollegala

Decision Trees Part 1. Rao Vemuri University of California, Davis

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Naïve Bayes Lecture 6: Self-Study -----

Decision Trees. Tirgul 5

Tools of AI. Marcin Sydow. Summary. Machine Learning

TEORIJA SKUPOVA Zadaci

Classification: Decision Trees

Bias Correction in Classification Tree Construction ICML 2001

Decision Tree Learning and Inductive Inference

Mining Classification Knowledge

Naive Bayes Classifier. Danushka Bollegala

24. Balkanska matematiqka olimpijada

The Naïve Bayes Classifier. Machine Learning Fall 2017

Rule Generation using Decision Trees

Mining Classification Knowledge

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Intuition Bayesian Classification

Learning Decision Trees

Bayesian Learning. Bayesian Learning Criteria

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Learning Classification Trees. Sargur Srihari

Unsupervised Learning. k-means Algorithm

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Slides for Data Mining by I. H. Witten and E. Frank

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Einführung in Web- und Data-Science

FTN Novi Sad Katedra za motore i vozila. Drumska vozila Uputstvo za izradu vučnog proračuna motornog vozila. 1. Ulazni podaci IZVOR:

Artificial Intelligence. Topic

Classification Using Decision Trees

CC283 Intelligent Problem Solving 28/10/2013

Administrative notes February 27, 2018

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Induction on Decision Trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Machine Learning Recitation 8 Oct 21, Oznur Tastan

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Symbolic methods in TC: Decision Trees

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Machine Learning Chapter 4. Algorithms

Empirical Approaches to Multilingual Lexical Acquisition. Lecturer: Timothy Baldwin

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Evaluation & Credibility Issues

Đorđe Đorđević, Dušan Petković, Darko Živković. University of Niš, The Faculty of Civil Engineering and Architecture, Serbia

Rešenja zadataka za vežbu na relacionoj algebri i relacionom računu

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Machine Learning 2nd Edi7on

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

VELOCITY PROFILES AT THE OUTLET OF THE DIFFERENT DESIGNED DIES FOR ALUMINIUM EXTRUSION

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

COMP 328: Machine Learning

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Inductive Learning. Chapter 18. Why Learn?

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

SYSTEM IDENTIFICATION APPROACH APPLICATION FOR EVALUATION OF SYSTEM PROPERTIES DEGRADATION UDC : (045)=20

Data Mining and Knowledge Discovery: Practice Notes

Classification and Prediction

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

Symbolic methods in TC: Decision Trees

ANALYSIS OF INFLUENCE OF PARAMETERS ON TRANSFER FUNCTIONS OF APERIODIC MECHANISMS UDC Života Živković, Miloš Milošević, Ivan Ivanov

Decision Tree Learning

Typical Supervised Learning Problem Setting

Uvod u nadgledano mašinsko učenje

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Learning Decision Trees

Osobine metode rezolucije: zaustavlja se, pouzdanost i kompletnost. Iskazna logika 4

Numerical Learning Algorithms

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

CHAPTER 4: PREDICTION AND ESTIMATION OF RAINFALL DURING NORTHEAST MONSOON

Machine Learning Alternatives to Manual Knowledge Acquisition

MODELIRANJE TEHNOLOŠKIH PROCESA U RUDARSTVU U USLOVIMA NEDOVOLJNOSTI PODATAKA PRIMENOM TEORIJE GRUBIH SKUPOVA

Transcription:

KLASIFIKACIJA NAIVNI BAJES NIKOLA MILIKIĆ EMAIL: nikola.milikic@fon.bg.ac.rs URL: http://nikola.milikic.info

ŠTA JE KLASIFIKACIJA? Zadatak određivanja klase kojoj neka instanca pripada instanca je opisana vrednošću atributa; skup mogućih klasa je poznat i dat

Primer Predviđanje da li će se predstava odigrati ToPlayOtNotToPlay.arff dataset

Kako sunčano vreme (outlook=sunny) utiće na ishod? Pretpostavimo da znamo da je sunčano napolju Onda je 60% šanse da bude Play = no

Kako vrednost atributa Outlook utiče na ishod?

Kako vrednosti svih atributa utiču na ishod? Ponoviti za svaki atribut

2 pojavljivanja Play = no, gde je Outlook = rainy 5 pojavljivanja Play = no Pretvaramo brojeve pojavljivanja u procene verovatnoća Pretvoriti brojeve u procene verovatnoća

Verovatnoća da će se igrati pod uslovima U1 Pod uslovima U1: Outlook = sunny (0.22) Temperature = cool (0.33) Humidity = high (0.33) Windy = true (0.33) Izračunati verovatnoću da: Play = yes (0.64) Verovatnoća da će se igrati pod datim vremenskim uslovima 0.22 x 0.33 x 0.33 x 0.33 x 0.64 = 0.0053

Verovatnoća da se NEĆE igrati pod uslovima U1 Pod uslovima U1: Outlook = sunny (0.60) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Izračunati verovatnoću da: Play = no (0.36) Verovatnoća da se NEĆE igrati pod datim vremenskim uslovima 0.60 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0206

Računamo verovatnoće da li će se odigrati pod U1 Pod uslovima U1: Outlook = sunny Temperature = cool Humidity = high Windy = true Verovatnoća da Play = yes: 0.0053 = 20.5% 0.0053 + 0.0206 Verovatnoća da Play = no: 0.0206 = 79.5% 0.0053 + 0.0206

Verovatnoća da se NEĆE igrati pod uslovima U2 Pod uslovima U2: Outlook = ovecast (0.00) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Izračunati verovatnoću da: Play = no (0.36) Neće se igrati 0%, odnosno Igraće se 100%? 0.00 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0000

Primena Laplace estimator-a Pojavljivanja iz originalnog dataset-a Laplace estimator: Dodati 1 svakom broju Nakon dodavanja 1 svakom broju (Laplace estimator)

Računanje procena verovatnoća nakon primene Laplace estimator-a Pretvoriti inkrementirane vrednosti u procene verovatnoća nakon primene Laplace estimator

Verovatnoće da li će se odigrati predstava pod U2 Pod uslovima U2: Outlook = ovecast, Temperature = cool, Humidity = high, Windy = true Play = no: 0.13 x 0.25 x 0.71 x 0.57 x 0.36 = 0.046 Play = yes: 0.42 x 0.33 x 0.36 x 0.36 x 0.64 = 0.0118 Verovatnoća da Play = no: 0.0046 = 28% 0.0046 + 0.0118 Verovatnoća da Play = yes: 0.0118 = 72% 0.0046 + 0.0118

Verovatnoće pod uslovima U1 pre i nakon primene Laplace estimator-a Pod uslovima U1: Outlook = sunny Temperature = cool Humidity = high Windy = true Bez korišćenja Laplace estimator-a: Play = no: 79.5% Play = yes: 20.5% Korišćenjem Laplace estimator-a: Play = no: 72.0% Play = yes: 28.0% Efekat Laplace estimator-a opada povećanjem broja uzoraka.

Predikciona pravila Ponoviti prethodne kalkulacije za sve moguće kombinacije vremenskih prilika. Odbaciti kombinacije kod kojih je verovatnoća < 0.5

Predikciona pravila verovatnoće svih kombinacija Izračunati verovatnoće za svih 36 kombinacija

Predikciona pravila verovatnoće svih kombinacija Pravila koja predviđaju klasu svih kombinacija atributa Nedostaje instanca 6

Poređenje originalnih i predviđenih odluka

Weka Softver za data mining u Javi Skup algoritama za mašinsko učenje i data mining Razvijen pri Univerzitetu Waikato, Novi Zeland Open-source Vebsajt: http://www.cs.waikato.ac.nz/ml/weka

Skupovi podataka korišćeni na vežbama Korišćeni skupovi podataka sa sajta Technology Forge: http://www.technologyforge.net/datasets

ARFF fajl Attribute-Relation File Format ARFF Tekstualni fajl Atributi mogu biti: Numerički Nominalni @relation TPONTPNom!! @attribute Outlook {sunny, overcast, rainy}! @attribute Temp. {hot, mild, cool}! @attribute Humidity {high, normal}! @attribute Windy {'false', 'true'}! @attribute Play {no, yes}!! @data! sunny, hot, high, 'false', no! sunny, hot, high, 'true', no! overcast, hot, high, 'false', yes!...!

Pokretanje klasifikacije u Weka-i ToPlayOtNotToPlay.arff dataset

Prikaz rezultata klasifikacije Klasifikator automatski primenjuje Laplace estimator

Prikaz rezultata klasifikacije Instanca 6 je obeležena kao pogrešno klasifikovana Verovatnoća svake instance u datasetu

Precision, Recall i F measure True Positives Rate False Positives Rate Precision = TP (TP + FP) Recall = TP (TP + FN) F measure = 2 * Precision * Recall Precision + Recall

Matrica zabune (Confusion Matrix) TP = True Positive FP = False Positive TN = True Negative FN = False Negative

Primer 2 Skup podataka Jestive pečurke Skup podataka Jestive pečurke je nastao na osnovu knjige National Audubon Society Field Guide to North American Mushrooms Skup podataka obuhvata opise hipotetičkih uzoraka pečuraka koje spadaju jednoj od 23 vrste pečuraka Postoji ukupno 8124 instanci sa 22 atributa nominalnih vrednosti koje opisuju karakteristike pečuraka i imaju podatak da li su pečurke jestive ili ne Naš cilj je da predvidimo da li je nepoznata pečurka jestiva ili ne

Preporuke i zahvalnice Weka Tutorials and Assignments @ The Technology Forge Link: http://www.technologyforge.net/wekatutorials/

(Anonimni) upitnik za vaše kritike, komentare, predloge: http://goo.gl/cqdp3i

PITANJA? NIKOLA MILIKIĆ EMAIL: nikola.milikic@fon.bg.ac.rs URL: http://nikola.milikic.info