Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3

Size: px

Start display at page:

Download "Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3"

Gerald Gordon
5 years ago
Views:

1 Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. 23

2 Follow Me On PREDICTUM IC. 23

3 I am a new guest blogger for the JMP Blog! PREDICTUM IC. 23

4 Discriminant Analysis in JMP and SAS PREDICTUM IC Eric Cai, M.Sc. StaBsBcian

5 A MarkeBng Survey Will it last a long /me? (Durability) Image by RRZEicons, Wikimedia Does it work well? (Performance) Image by Peng and Rainer Zenz, Wikimedia Will I buy this new toaster? PREDICTUM IC. 23

6 Survey Results Durability Performance Buy Toaster? 5 6 es 7 4 o 8 9 es 4 5 o 6 7 es PREDICTUM IC. 23

7 PREDICTUM IC Sca2er Plot of Survey Results Performance Durability

8 Is Durability a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23

9 Is Performance a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23

10 Who will buy the toaster? Durability alone is not a perfect predictor. Performance alone is not a perfect predictor. Can we combine Durability and Performance into really good predictor? PREDICTUM IC. 23

11 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability

12 Discriminant Analysis A predicbve modelling technique Used for classificabon Target variable: categorical Predictor variables: conbnuous PREDICTUM IC. 23

Machine Learning Supervised Learning Use inputs to predict targets Unsupervised Learning Finding pa2erns among unlabeled data Classifica/on The target variable is categorical or discrete Regression

13 Machine Learning Supervised Learning Use inputs to predict targets Unsupervised Learning Finding pa2erns among unlabeled data Classifica/on The target variable is categorical or discrete Regression The target variable is con/nuous Clustering Group data into categories based on the data s own pa2erns Discriminant Analysis Density Es/ma/on Es/mate an underlying probability distribu/on func/on Dimensional Reduc/on Reduce the number of random variables being considered while preserving informa/on in the variables PREDICTUM IC. 23

14 How Does Discriminant Analysis Work? Toaster Example: Binary Target 2 classes: es or o For each observabon, Given the predictors for a parbcular observabon, find the condibonal probability of each class P(es Durability i, Performance i ) P(o Durability i, Performance i ) Pick the class with the highest condibonal probability PREDICTUM IC. 23

15 Will the 3 rd customer buy the toaster? Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23

16 PredicBon: The 3 rd will buy the toaster Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23

17 Will the 5 th customer buy the toaster? Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23

18 PredicBon: The 5 th customer will not buy the toaster Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23

19 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es ? 7 4 o ? 8 9 es ? 4 5 o ? 6 7 es ? PREDICTUM IC. 23

20 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es es 7 4 o o 8 9 es o 4 5 o es 6 7 es es The 3 rd and 4 th customers were misclassified by my discriminant. PREDICTUM IC. 23

21 How Does Discriminant Analysis Work? How do we get these probabilibes? Goal: EsBmate the following probabilibes P(es Durability i, Performance i ) P(o Durability i, Performance i ) PREDICTUM IC. 23

22 How do we get these probabilibes? Bayes Rule for Binary, ConBnuous X P(=1 X=x) = P(X=x, =1) = P(X) P(X=x =1)P(=1) P(X, = 1) + P(X,=0) P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) PREDICTUM IC. 23

23 Bayes Rule P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23

24 The Assump/ons of Discriminant Analysis Specifically: Gaussian Discriminant Analysis Assume that X =1 and X =0 have normal (Gaussian) distribubons *es, there are other ways to do discriminant analysis Fisher s Discriminant Analysis on- Parametric Discriminant Analysis PREDICTUM IC. 23

25 Assume that X =es and X =o have normal (Gaussian) distribubons o es Durability PREDICTUM IC. 23

26 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC Will Ron buy the toaster?

27 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC PredicBon: Ron will buy the toaster. His rabng on durability is closer to the es class.

28 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Shortest Distance Between Predictors and Class Mean Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Ron es Durability 3 = 7 Durability PREDICTUM IC. 23

29 What other assumpbon do you see in this picture? o es Durability PREDICTUM IC. 23

30 Assume Equal Variance Between X = o and X = es o es Durability This results in a LIEAR discriminant funcbon of Durability. PREDICTUM IC. 23

31 Equal Covariance Matrices Will Sue by the toaster? Sue Performance es o Durability PREDICTUM IC. 23

32 Equal Covariance Matrices The model predicts that Sue won t buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23

33 Equal Covariance Matrices Which class is closer to Sue? Will Sue buy the toaster? Sue Performance es o Durability PREDICTUM IC. 23

34 Equal Covariance Matrices Which class is closer to Sue? Answer: It depends on how you define distance Sue Performance es o Durability PREDICTUM IC. 23

35 Eulidean Distance PREDICTUM IC. 23

36 Which class is closer to Sue? By Euclidean distance, Sue is closer to the mean of o. Sue Performance es o Durability PREDICTUM IC. 23

37 Which class is closer to Sue? However, the variance of es is higher in the direcbon of Sue compared to the variance of o in the direcbon of Sue. Sue is less standard devia;ons away from es than from o Sue Performance es o Durability PREDICTUM IC. 23

38 Which class is closer to Sue? By Mahalanobis distance, Sue will buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23

39 Mahalanobis Distance It accounts for the fact that the variances in each direcbon are different. It accounts for the covariance between variables. It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance. PREDICTUM IC. 23

40 PREDICTUM IC. 23

41 PREDICTUM IC. 23

42 The Simplest Discriminant Assume that X =1 and X =0 have normal (Gaussian) distribubons Assume that the covariance matrices are equal Result: A linear discriminant* A funcbon is created to separate the 2 classes This funcbon is linear with respect to the predictors *It takes some math to show this PREDICTUM IC. 23

43 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability

44 Some complicabons What if the covariance matrices are different between the classes? What if the prior probabilibes are different between the classes? PREDICTUM IC. 23

45 What if the covariance matrices are different? Sue Performance es o Durability PREDICTUM IC. 23

46 What if the covariance matrices are different? This results in a QUADRATIC discriminant. *It takes some math to show this. Sue Performance es o Durability PREDICTUM IC. 23

47 PREDICTUM IC A Quadra/c Discriminant Performance Durability

48 What if the prior probabilibes are unequal? P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) ormal (Gaussian) distribubons Equal or Unequal Covariance Matrices We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23

49 Prior ProbabiliBes So far, we have assumed that the prior probabilibes are equal. This is not always realisbc! PREDICTUM IC. 23

50 Unequal Prior ProbabiliBes - Example Do ot Have Skin Cancer Have Skin Cancer For the general populabon, P(o Skin Cancer) >> P(Have Skin Cancer) Prior probabilibes PREDICTUM IC. 23

51 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: ProporBonal to sample proporbons Toaster Example 60 will buy the toaster 40 will not buy the toaster P(Buy Toaster = es) = 0.60 P(Buy Toaster = o) = 0.40 PREDICTUM IC. 23

52 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: Based on background knowledge/belief Toaster Example Based on anecdotal experience (e.g. conversabons with past customers), you believe that 70% of your customers will buy the new toaster P(Buy Toaster = es) = 0.7 P(Buy Toaster = o) = 0.3 PREDICTUM IC. 23

53 Generalized Mahalanobis Distance It can be shown that the Mahalanobis distance can be generalized* to account for unequal variances unequal prior probabilibes *This takes some math to show. PREDICTUM IC. 23

54 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Shortest Generalized Mahalanobis Distance Between Predictors and Class Mean Ron es Durability 3 = 7 Durability PREDICTUM IC. 23

55 Discriminant Analysis in SAS PROC DISCRIM - predicbve discriminant analysis Generate discriminant Predict classes in new data sets PROC CADISC - descripbve discriminant analysis IdenBfy the predictors that best separate the groups Used for variable reducbon technique PROC STEPDISC - stepwise discriminant analysis Looks for the "best" subset of predictors for separabng the groups PREDICTUM IC. 23

56 Tips for PROC DISCRIM Use CROSSLIST/CROSSVALIDATE to enact cross- validabon when building your model LIST (default) does not use cross- validabon CROSSLIST enacts CROSSVALIDATE and shows the results CROSSLISTERR will only show results for misclassified data Reduces unnecessary output METHOD = ORMAL - > Gaussian discriminant analysis PAR - > non- parametric discriminant analysis more robust, but cannot predict new data set PREDICTUM IC. 23

57 Tips for PROC DISCRIM POOL = ES - > linear discriminabon O - > quadrabc discriminabon TEST - > used with SLPOOL opbon Selects linear or quadrabc based on hypothesis test using Barlev s modificabon of likelihood rabo test ot robust to non- normality PREDICTUM IC. 23

58 Discriminant Analysis in JMP o cross- validabon o non- parametric methods Has ROC Curve Has regularized discriminant analysis Compromise between linear and quadrabc discriminabon GO TO JMP DEMOSTRATIO PREDICTUM IC. 23

59 Follow Predictum on Twi2er! PREDICTUM IC. 23

60 Follow Predictum on LinkedIn! PREDICTUM IC. 23

61 Stay tuned for our free webinars on stabsbcs, analybcs and predicbve modelling! PREDICTUM IC.

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon