Generative Model (Naïve Bayes, LDA)

Size: px

Start display at page:

Download "Generative Model (Naïve Bayes, LDA)"

Berniece Hutchinson
6 years ago
Views:

1 Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning book (Murphy) 1

2 Review: Bayesian Statistics posterior likelihood prior p( D) / p(d )p( ) Final guess Observe 3 heads, 2 tails, guess 3/5 Head, tail probability is close to 0.5 and 0.5 Beta( H + H, T + T ) H (1 ) T Beta( H, T ) Maximum a Posterior (MAP) Maximum Likelihood (MLE) When uniform prior, MLE=MAP 2

3 Review: Bayesian Statistics posterior likelihood prior p( D) / p(d )p( ) Maximum a Posterior (MAP) Discrimina+ve model: Logis3c regression SVM Decision tree KNN Maximum Likelihood (MLE) Genera+ve model: Naïve bayes classifier Linear discriminant analysis Mixture model Hidden Markov Model 3

4 Review: Bayesian Rule p( D) = p(d )p( ) p(d) D: data θ: model Different nota3ons p(y x) = p(x y)p(y) p(x) p(g X) = p(x G)p(G) p(x) in most materials In today s lecture 4

5 Naïve Bayes Classifier 5

6 Naïve Bayes Classifier A naïve assumption attributes are conditionally independent (i.e., no dependence relation between attributes) p(x G) =p(x 1 G)p(x 2 G) p(x k G) 6

7 Naïve Bayes Classifier: Example Class: buys_computer = yes buys_computer = no Data to be classified: X = (age <=30, Income = medium, Student = yes Credit_ra3ng = Fair) age income student credit_ratings_comp <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no 7

8 Naïve Bayes Classifier: Example Class: yes or no Data to be classified X = (age <=30, Income = medium, Student = yes, Credit_ra3ng = Fair) p(yes X) / p(x yes)p(yes) p(x yes) =p(age <= 30 yes) p(income = medium yes) p(student = yes yes) p(credit = fair yes) p(no X) / p(x no)p(no) Predic3on: p(yes X) > p(no X)? 8

9 Naïve Bayes Classifier: Example Data: X = (age <= 30, income = medium, student = yes, credit_ra3ng = fair) Prior P(yes) = 9/14 = Likelihood P(age = <=30 yes) = 2/9 = P(income = medium yes) = 4/9 = P(student = yes yes) = 6/9 = P(credit_ra3ng = fair yes) = 6/9 = age income student credit_ratings_comp <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no P(X yes) = x x x = Posterior P(X yes) * P(yes) = x =

10 Naïve Bayes Classifier: Example Data: X = (age <= 30, income = medium, student = yes, credit_ra3ng = fair) Prior P(no) = 5/14= Likelihood P(age = <=30 no) = 3/5 = 0.6 P(income = medium no) = 2/5 = 0.4 P(student = yes no) = 1/5 = 0.2 P(credit_ra3ng = fair no) = 2/5 = 0.4 age income student credit_ratings_comp <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no P(X no) = 0.6 x 0.4 x 0.2 x 0.4 = Posterior P(X no) * P(no) = x =

11 Naïve Bayes Classifier: Example Data: X = (age <= 30, income = medium, student = yes, credit_ra3ng = fair) Posterior P(yes X) = P(no X) = P(yes X) > P(no X) à Yes age income student credit_ratings_comp <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no 11

12 Avoiding the Zero-Probability Problem Naïve Bayesian predic3on requires each condi3onal prob. be nonzero. Otherwise, the predicted prob. will be zero p(x G) =p(x 1 G)p(x 2 G) p(x k G) Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10) Use Laplacian correc+on (or Laplacian es3mator) Adding 1 to each case Prob(income = low) = 1/1003 Prob(income = medium) = 991/1003 Prob(income = high) = 11/1003 The corrected prob. es3mates are close to their uncorrected counterparts 12

13 Naïve Bayes Classifier: Continuous Example Class: Male or Female Data: X= (height=6, weight=130, foot=8) sex Height Weight foot size male male 5.92 (5'11") male 5.58 (5'7") male 5.92 (5'11") female female 5.5 (5'6") female 5.42 (5'5") female 5.75 (5'9") P(male X) = P(X male) P(male) = P(height=6 male)p(weight=130 male)p(foot=8 male) P(male) P(male) = 0.5 P(height=6 male)=??? Source: wikipedia 13

14 Naïve Bayes Classifier: Continuous Example Class: Male or Female Data: X= (height=6, weight=130, foot=8) P(height=6 male)=??? Use Gaussian distribu3on for con3nuous data es3ma3on sex Height Weight foot size male male 5.92 (5'11") male 5.58 (5'7") male 5.92 (5'11") female female 5.5 (5'6") female 5.42 (5'5") female 5.75 (5'9") mean variance mean variance mean variance sex (height) (height) (weight) (weight) (foot size) (foot size) male E E E-01 female E E E+00 Source: wikipedia P(height=6 male) =

15 Naïve Bayes Classifier: Continuous Example Class: Male or Female Data: X= (height=6, weight=130, foot=8) Source: wikipedia 15

16 Linear/Quadratic Discriminant Analysis (LDA/QDA) 16

17 Linear Discriminant Analysis (LDA) Pr(G = k X = x) / Pr(X = x G = k)pr(g = k) f k (x) k Multivariable Gaussian Denote as Formula above is Quadratic Discriminant Analysis (QDA) Linear Discriminant Analysis (LDA): LDA Mean defines place, covariance defines the shape Same shapes across all classes (shifted versions) 17

18 LDA: Class Density 3 classes. same shapes, different mean 18

19 LDA: Class Density 19

20 Solve Optimization 20

21 Solve Optimization 21

22 Example: Binary Classification 22

23 Example: Binary Classification 23

24 Murphy, Figure 4.5, discrimanalysisdboundariesdemo 24

25 Estimate Gaussian Distribution 25

26 Diabetes Example R script for this example: htps://onlinecourses.science.psu.edu/stat857/node/135 26

27 Diabetes Example: PCA 27

PCA: Principle Component Analysis PCA is a mathematical procedure that uses orthogonal transformation to convert a set of observations of possibly

28 PCA: Principle Component Analysis PCA is a mathematical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Frequently used for dimension reduction 28

29 Diabetes Example 29

30 Example: Diabetes Data - LDA 30

31 Diabetes Example: LDA Decision Boundary 31

Diabetes Example: Evaluation Error rate = # of samples having wrong prediction # of samples ensitivity = pecificity = # of samples that prediction as positive and

32 Diabetes Example: Evaluation Error rate = # of samples having wrong prediction # of samples ensitivity = pecificity = # of samples that prediction as positive and truth is positive # of samples that prediction as positive # of samples that prediction as negative and truth is negative # of samples that prediction as negative 32

33 Quadratic Discriminant Analysis (QDA) 33

34 Diabetes Example: QDA 34

35 Diabetes Example: QDA Decision Boundary 35

36 Diabetes Example: QDA Evaluation Error Rate Sensi+vity Specifica+on LDA 28.26% 45.90% 85.60% QDA 29.04% 45.90% 84.40% Sensi3vity (QDA) is the same as LDA, but specifica3on is lower 36

37 Summary Generative model Assume data is generated by some model: P(D θ) Naïve Bayes classifier Features are independent Categorical features Continuous features Linear discriminant analysis and (LDA) and quadratic discriminant analysis (QDA) Multivariate Gaussian distribution Differences between LDA and QDA LDA assumes the covariance across classes are the same 37

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification