Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Similar documents
Naïve Bayesian. From Han Kamber Pei

Chapter 6 Classification and Prediction (2)

CS6220: DATA MINING TECHNIQUES

Data Mining Part 4. Prediction

CS6220: DATA MINING TECHNIQUES

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Bayesian Classification. Bayesian Classification: Why?

Generative Model (Naïve Bayes, LDA)

BAYESIAN CLASSIFICATION

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Assignment No A-05 Aim. Pre-requisite. Objective. Problem Statement. Hardware / Software Used

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Bayesian Learning Features of Bayesian learning methods:

Data Mining: Concepts and Techniques

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Data Mining: Concepts and Techniques

Classification and Prediction

CS145: INTRODUCTION TO DATA MINING

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Mining Classification Knowledge

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

MODULE -4 BAYEIAN LEARNING

CS 412 Intro. to Data Mining

CSCE 478/878 Lecture 6: Bayesian Learning

CHAPTER-17. Decision Tree Induction

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Classification Lecture 2: Methods

Introduction to Bayesian Learning. Machine Learning Fall 2018

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Naïve Bayes classification

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Mining Classification Knowledge

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Algorithms for Classification: The Basic Methods

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Lecture 9: Bayesian Learning

Decision T ree Tree Algorithm Week 4 1

Stephen Scott.

Lecture 7 Decision Tree Classifier

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Learning Extension

Bayesian Learning (II)

CS590D: Data Mining Prof. Chris Clifton

CSE 5243 INTRO. TO DATA MINING

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Cse352 AI Homework 2 Solutions

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning. Bayesian Learning. Michael M. Richter. Michael M. Richter

Machine Learning

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Introduction to Machine Learning

Naive Bayes classification

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

Decision Tree And Random Forest

Introduction to Machine Learning

CS6220: DATA MINING TECHNIQUES

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Bayesian Methods: Naïve Bayes

Machine Learning for Signal Processing Bayes Classification

Estimating Parameters

Parts 3-6 are EXAMPLES for cse634

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Lecture VII: Classification I. Dr. Ouiem Bchir

Machine Learning Linear Classification. Prof. Matteo Matteucci

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Probabilistic Classification

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Bayesian Decision Theory

Probabilistic Methods in Bioinformatics. Pabitra Mitra

CS145: INTRODUCTION TO DATA MINING

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

Bayesian Models in Machine Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Logistic Regression. Machine Learning Fall 2018

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Machine Learning

Machine Learning 4. week

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

ECE521 week 3: 23/26 January 2017

Be able to define the following terms and answer basic questions about them:

Algorithmisches Lernen/Machine Learning

Bayesian Networks Inference with Probabilistic Graphical Models

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

MLE/MAP + Naïve Bayes

Transcription:

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification Model Evaluation and Selection Techniques to Improve Classification Accuracy: Ensemble Methods Summary

Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities Foundation: Based on Bayes Theorem. Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct prior knowledge can be combined with observed data Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured 3 Bayes Theorem: Basics Total probability Theorem: Bayes Theorem: M B) B A ) A ) i i i H / Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H, (i.e., posteriori probability): the probability that the hypothesis holds given the observed data sample X (prior probability): the initial probability E.g., X will buy computer, regardless of age, income, : probability that sample data is observed (likelihood): the probability of observing the sample X, given that the hypothesis holds E.g., Given that X will buy computer, the prob. that X is 3..40, medium income 4

Prediction Based on Bayes Theorem Given training data X, posteriori probability of a hypothesis H, H, follows the Bayes theorem H / Informally, this can be viewed as posteriori = likelihood x prior/evidence Predicts X belongs to C i iff the probability C i is the highest among all the C k for all the k classes Practical difficulty: It requires initial knowledge of many probabilities, involving significant computational cost 5 Classification Is to Derive the Maximum Posteriori Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-d attribute vector X = (x, x,, x n ) Suppose there are m classes C, C,, C m. Classification is to derive the maximum posteriori, i.e., the maximal C i This can be derived from Bayes theorem C ) C ) C i i i Since is constant for all classes, only C C ) C ) i i i needs to be maximized 6 3

Naïve Bayes Classifier A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes): n X Ci) x Ci) x ) ( )... ( Ci P x P x Ci k k This greatly reduces the computation cost: Only counts the class distribution If A k is categorical, x k C i ) is the # of tuples in C i having value x k for A k divided by C i, D (# of tuples of C i in D) If A k is continous-valued, x k C i ) is usually computed based on Gaussian distribution with a mean μ and standard deviation σ and x k C i ) is ( x ) g( x,, ) e X Ci) g( x k, C i, C ) i n Ci) 7 Naïve Bayes Classifier: Training Dataset Class: C:buys_computer = yes C:buys_computer = no Data to be classified: X = (age <=30, Income = medium, Student = yes Credit_rating = Fair) age income studentcredit_rating_comp <=30 high no fair no <=30 high no excellent no 3 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 3 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 3 40 medium no excellent yes 3 40 high yes fair yes >40 medium no excellent no 8 4

Naïve Bayes Classifier: An Example age income studentcredit_rating_comp high no fair no <=30 <=30 high no excellent no 3 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 3 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 3 40 medium no excellent yes 3 40 high yes fair yes >40 medium no excellent no C i ): buys_computer = yes ) = 9/4 = 0.643 buys_computer = no ) = 5/4= 0.357 Compute C i ) for each class age = <=30 buys_computer = yes ) = /9 = 0. age = <= 30 buys_computer = no ) = 3/5 = 0.6 income = medium buys_computer = yes ) = 4/9 = 0.444 income = medium buys_computer = no ) = /5 = 0.4 student = yes buys_computer = yes) = 6/9 = 0.667 student = yes buys_computer = no ) = /5 = 0. credit_rating = fair buys_computer = yes ) = 6/9 = 0.667 credit_rating = fair buys_computer = no ) = /5 = 0.4 X = (age <= 30, income = medium, student = yes, credit_rating = fair) C i ) : buys_computer = yes ) = 0. x 0.444 x 0.667 x 0.667 = 0.044 buys_computer = no ) = 0.6 x 0.4 x 0. x 0.4 = 0.09 C i )*C i ) : buys_computer = yes ) * buys_computer = yes ) = 0.08 buys_computer = no ) * buys_computer = no ) = 0.007 Therefore, X belongs to class ( buys_computer = yes ) 9 Avoiding the Zero-Probability Problem Naïve Bayesian prediction requires each conditional prob. be non-zero. Otherwise, the predicted prob. will be zero X Ci) n xk Ci) k Ex. Suppose a dataset with 000 tuples, income=low (0), income= medium (990), and income = high (0) Use Laplacian correction (or Laplacian estimator) Adding to each case Prob(income = low) = /003 Prob(income = medium) = 99/003 Prob(income = high) = /003 The corrected prob. estimates are close to their uncorrected counterparts 0 5

Naïve Bayes Classifier: Comments Advantages Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. Dependencies among these cannot be modeled by Naïve Bayes Classifier How to deal with these dependencies? Bayesian Belief Networks (Chapter 9) 6