Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Similar documents
Naïve Bayes Lecture 6: Self-Study -----

Algorithms for Classification: The Basic Methods

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Data Mining Part 4. Prediction

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Bayesian Learning (II)

The Bayes classifier

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Mining Classification Knowledge

The Naïve Bayes Classifier. Machine Learning Fall 2017

Slides for Data Mining by I. H. Witten and E. Frank

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Bayesian Methods: Naïve Bayes

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Mining Classification Knowledge

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

The Bayesian Learning

Machine Learning 4. week

Bayesian Learning Features of Bayesian learning methods:

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Algorithmisches Lernen/Machine Learning

Naïve Bayes classification

Machine Learning Chapter 4. Algorithms

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Day 5: Generative models, structured classification

MLE/MAP + Naïve Bayes

Least Squares Regression

Introduction to Computer Vision

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Least Squares Regression

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Generative v. Discriminative classifiers Intuition

BAYESIAN DECISION THEORY

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Will it rain tomorrow?

Stats notes Chapter 5 of Data Mining From Witten and Frank

Learning from Data 1 Naive Bayes

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Introduction to Bayesian Learning. Machine Learning Fall 2018

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice.

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Naïve Bayesian. From Han Kamber Pei

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

5. Discriminant analysis

Assignment No A-05 Aim. Pre-requisite. Objective. Problem Statement. Hardware / Software Used

Machine Learning (CSE 446): Probabilistic Machine Learning

Machine Learning for NLP

CPSC 340: Machine Learning and Data Mining

Bayesian Models in Machine Learning

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative v. Discriminative classifiers Intuition

Probabilistic Machine Learning. Industrial AI Lab.

Bayesian Classification. Bayesian Classification: Why?

Bias-Variance Tradeoff

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Pattern Recognition. Parameter Estimation of Probability Density Functions

Generative Clustering, Topic Modeling, & Bayesian Inference

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Decision Trees Entropy, Information Gain, Gain Ratio

Machine Learning for Signal Processing Bayes Classification

Naïve Bayes, Maxent and Neural Models

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Machine Learning

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

MLE/MAP + Naïve Bayes

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Generative Models. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Classification, Linear Models, Naïve Bayes

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Stochastic Gradient Descent

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

0.5. (b) How many parameters will we learn under the Naïve Bayes assumption?

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Estimating Parameters

Machine Learning in Action

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Machine Learning

MODULE -4 BAYEIAN LEARNING

Nonparametric Bayesian Methods (Gaussian Processes)

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

EECS 349:Machine Learning Bryan Pardo

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

An Introduction to Statistical and Probabilistic Linear Models

Transcription:

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten, E. Frank and M. A. Hall

Statistical modeling Opposite of R: use all the attributes Two assumptions: Attributes are equally important statistically independent (given the class value) I.e., knowing the value of one attribute says nothing about the value of another (if the class is known) Independence assumption is never correct! But this scheme works well in practice

Probabilities for weather data Temperature 0 /9 /5 /9 /5 /9 0/5 /9 /5 /9 /5 /9 /5 rmal 6 rmal 6 9 5 /9 /5 6/9 /5 6/9 /5 /9 /5 9/ 5/ Temp rmal rmal rmal rmal rmal rmal rmal Data Mining: Practical Machine Learning Tools and Techniques (Chapter )

Probabilities for weather data Temperature 0 /9 /5 /9 /5 /9 0/5 /9 /5 /9 /5 /9 /5 A new day: rmal 6 rmal 6 9 5 /9 /5 6/9 /5 6/9 /5 /9 /5 9/ 5/ Temp.? Likelihood of the two classes For yes = /9 /9 /9 /9 9/ = 0.005 For no = /5 /5 /5 /5 5/ = 0.006 Conversion into a probability by normalization: P( yes ) = 0.005 / (0.005 + 0.006) = 0.05 P( no ) = 0.006 / (0.005 + 0.006) = 0.795

Bayes s rule Probability of event H given evidence E: Pr [E H]Pr [H] Pr [H E]= Pr [E] A priori probability of H : Probability of event before evidence is seen A posteriori probability of H : Pr [H] Probability of event after evidence is seen Pr [H E] Thomas Bayes Born: 70 in London, England Died: 76 in Tunbridge Wells, Kent, England 5

Naïve Bayes for classification Classification learning: what s the probability of the class given an instance? Evidence E = instance Event H = class value for instance Naïve assumption: evidence splits into parts (i.e. attributes) that are independent Pr [E H]Pr [E H] Pr [En H]Pr [H] Pr [H E]= Pr [E] 6

Weather data example Temp.? Evidence E Pr [ yes E]=Pr [ = yes] Pr [Temperature= yes] Pr [= yes] Probability of class yes Pr [ = yes] Pr [ yes] Pr [E] 9 9 9 9 9 = Pr [E] 7

The zero-frequency problem What if an attribute value doesn t occur with every class value? (e.g. = high for class yes ) Probability will be zero! Pr [= yes]=0 A posteriori probability will also be zero! Pr [yes E]=0 ( matter how likely the other values are!) Remedy: add to the count for every attribute value-class combination (Laplace estimator) Result: probabilities will never be zero! (also: stabilizes probability estimates) 8

Missing values Training: instance is not included in frequency count for attribute value-class combination Classification: attribute will be omitted from calculation Example: Temp.?? Likelihood of yes = /9 /9 /9 9/ = 0.08 Likelihood of no = /5 /5 /5 5/ = 0.0 P( yes ) = 0.08 / (0.08 + 0.0) = % P( no ) = 0.0 / (0.08 + 0.0) = 59% 0

Numeric attributes Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) The probability density function for the normal distribution is defined by two parameters: Sample mean µ Standard deviation σ Then the density function f(x) is f (x)= n = x i n i= e π σ n σ= (x i μ) n i= (x μ) σ

Statistics for weather data Temperature 6, 68, 65,7, 65, 70, 70, 85, 0 69, 70, 7,80, 70, 75, 90, 9, 7, 85, 80, 95, /9 /5 µ =7 µ =75 µ =79 /9 0/5 σ =6. σ =7.9 σ =0. /9 /5 6 9 5 µ =86 6/9 /5 σ =9.7 /9 /5 9/ 5/ Example density value: f temperature=66 yes = e 6. 66 7 6. =0.00

Classifying a new day A new day: Temp. 66 90 true? Likelihood of yes = /9 0.00 0.0 /9 9/ = 0.00006 Likelihood of no = /5 0.0 0.08 /5 5/ = 0.00008 P( yes ) = 0.00006 / (0.00006 + 0. 00008) = 5% P( no ) = 0.00008 / (0.00006 + 0. 00008) = 75% Missing values during training are not included in calculation of mean and standard deviation