Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3

Size: px
Start display at page:

Download "Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3"

Transcription

1 Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. 23

2 Follow Me On PREDICTUM IC. 23

3 I am a new guest blogger for the JMP Blog! PREDICTUM IC. 23

4 Discriminant Analysis in JMP and SAS PREDICTUM IC Eric Cai, M.Sc. StaBsBcian

5 A MarkeBng Survey Will it last a long /me? (Durability) Image by RRZEicons, Wikimedia Does it work well? (Performance) Image by Peng and Rainer Zenz, Wikimedia Will I buy this new toaster? PREDICTUM IC. 23

6 Survey Results Durability Performance Buy Toaster? 5 6 es 7 4 o 8 9 es 4 5 o 6 7 es PREDICTUM IC. 23

7 PREDICTUM IC Sca2er Plot of Survey Results Performance Durability

8 Is Durability a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23

9 Is Performance a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23

10 Who will buy the toaster? Durability alone is not a perfect predictor. Performance alone is not a perfect predictor. Can we combine Durability and Performance into really good predictor? PREDICTUM IC. 23

11 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability

12 Discriminant Analysis A predicbve modelling technique Used for classificabon Target variable: categorical Predictor variables: conbnuous PREDICTUM IC. 23

13 Machine Learning Supervised Learning Use inputs to predict targets Unsupervised Learning Finding pa2erns among unlabeled data Classifica/on The target variable is categorical or discrete Regression The target variable is con/nuous Clustering Group data into categories based on the data s own pa2erns Discriminant Analysis Density Es/ma/on Es/mate an underlying probability distribu/on func/on Dimensional Reduc/on Reduce the number of random variables being considered while preserving informa/on in the variables PREDICTUM IC. 23

14 How Does Discriminant Analysis Work? Toaster Example: Binary Target 2 classes: es or o For each observabon, Given the predictors for a parbcular observabon, find the condibonal probability of each class P(es Durability i, Performance i ) P(o Durability i, Performance i ) Pick the class with the highest condibonal probability PREDICTUM IC. 23

15 Will the 3 rd customer buy the toaster? Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23

16 PredicBon: The 3 rd will buy the toaster Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23

17 Will the 5 th customer buy the toaster? Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23

18 PredicBon: The 5 th customer will not buy the toaster Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23

19 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es ? 7 4 o ? 8 9 es ? 4 5 o ? 6 7 es ? PREDICTUM IC. 23

20 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es es 7 4 o o 8 9 es o 4 5 o es 6 7 es es The 3 rd and 4 th customers were misclassified by my discriminant. PREDICTUM IC. 23

21 How Does Discriminant Analysis Work? How do we get these probabilibes? Goal: EsBmate the following probabilibes P(es Durability i, Performance i ) P(o Durability i, Performance i ) PREDICTUM IC. 23

22 How do we get these probabilibes? Bayes Rule for Binary, ConBnuous X P(=1 X=x) = P(X=x, =1) = P(X) P(X=x =1)P(=1) P(X, = 1) + P(X,=0) P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) PREDICTUM IC. 23

23 Bayes Rule P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23

24 The Assump/ons of Discriminant Analysis Specifically: Gaussian Discriminant Analysis Assume that X =1 and X =0 have normal (Gaussian) distribubons *es, there are other ways to do discriminant analysis Fisher s Discriminant Analysis on- Parametric Discriminant Analysis PREDICTUM IC. 23

25 Assume that X =es and X =o have normal (Gaussian) distribubons o es Durability PREDICTUM IC. 23

26 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC Will Ron buy the toaster?

27 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC PredicBon: Ron will buy the toaster. His rabng on durability is closer to the es class.

28 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Shortest Distance Between Predictors and Class Mean Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Ron es Durability 3 = 7 Durability PREDICTUM IC. 23

29 What other assumpbon do you see in this picture? o es Durability PREDICTUM IC. 23

30 Assume Equal Variance Between X = o and X = es o es Durability This results in a LIEAR discriminant funcbon of Durability. PREDICTUM IC. 23

31 Equal Covariance Matrices Will Sue by the toaster? Sue Performance es o Durability PREDICTUM IC. 23

32 Equal Covariance Matrices The model predicts that Sue won t buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23

33 Equal Covariance Matrices Which class is closer to Sue? Will Sue buy the toaster? Sue Performance es o Durability PREDICTUM IC. 23

34 Equal Covariance Matrices Which class is closer to Sue? Answer: It depends on how you define distance Sue Performance es o Durability PREDICTUM IC. 23

35 Eulidean Distance PREDICTUM IC. 23

36 Which class is closer to Sue? By Euclidean distance, Sue is closer to the mean of o. Sue Performance es o Durability PREDICTUM IC. 23

37 Which class is closer to Sue? However, the variance of es is higher in the direcbon of Sue compared to the variance of o in the direcbon of Sue. Sue is less standard devia;ons away from es than from o Sue Performance es o Durability PREDICTUM IC. 23

38 Which class is closer to Sue? By Mahalanobis distance, Sue will buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23

39 Mahalanobis Distance It accounts for the fact that the variances in each direcbon are different. It accounts for the covariance between variables. It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance. PREDICTUM IC. 23

40 PREDICTUM IC. 23

41 PREDICTUM IC. 23

42 The Simplest Discriminant Assume that X =1 and X =0 have normal (Gaussian) distribubons Assume that the covariance matrices are equal Result: A linear discriminant* A funcbon is created to separate the 2 classes This funcbon is linear with respect to the predictors *It takes some math to show this PREDICTUM IC. 23

43 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability

44 Some complicabons What if the covariance matrices are different between the classes? What if the prior probabilibes are different between the classes? PREDICTUM IC. 23

45 What if the covariance matrices are different? Sue Performance es o Durability PREDICTUM IC. 23

46 What if the covariance matrices are different? This results in a QUADRATIC discriminant. *It takes some math to show this. Sue Performance es o Durability PREDICTUM IC. 23

47 PREDICTUM IC A Quadra/c Discriminant Performance Durability

48 What if the prior probabilibes are unequal? P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) ormal (Gaussian) distribubons Equal or Unequal Covariance Matrices We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23

49 Prior ProbabiliBes So far, we have assumed that the prior probabilibes are equal. This is not always realisbc! PREDICTUM IC. 23

50 Unequal Prior ProbabiliBes - Example Do ot Have Skin Cancer Have Skin Cancer For the general populabon, P(o Skin Cancer) >> P(Have Skin Cancer) Prior probabilibes PREDICTUM IC. 23

51 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: ProporBonal to sample proporbons Toaster Example 60 will buy the toaster 40 will not buy the toaster P(Buy Toaster = es) = 0.60 P(Buy Toaster = o) = 0.40 PREDICTUM IC. 23

52 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: Based on background knowledge/belief Toaster Example Based on anecdotal experience (e.g. conversabons with past customers), you believe that 70% of your customers will buy the new toaster P(Buy Toaster = es) = 0.7 P(Buy Toaster = o) = 0.3 PREDICTUM IC. 23

53 Generalized Mahalanobis Distance It can be shown that the Mahalanobis distance can be generalized* to account for unequal variances unequal prior probabilibes *This takes some math to show. PREDICTUM IC. 23

54 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Shortest Generalized Mahalanobis Distance Between Predictors and Class Mean Ron es Durability 3 = 7 Durability PREDICTUM IC. 23

55 Discriminant Analysis in SAS PROC DISCRIM - predicbve discriminant analysis Generate discriminant Predict classes in new data sets PROC CADISC - descripbve discriminant analysis IdenBfy the predictors that best separate the groups Used for variable reducbon technique PROC STEPDISC - stepwise discriminant analysis Looks for the "best" subset of predictors for separabng the groups PREDICTUM IC. 23

56 Tips for PROC DISCRIM Use CROSSLIST/CROSSVALIDATE to enact cross- validabon when building your model LIST (default) does not use cross- validabon CROSSLIST enacts CROSSVALIDATE and shows the results CROSSLISTERR will only show results for misclassified data Reduces unnecessary output METHOD = ORMAL - > Gaussian discriminant analysis PAR - > non- parametric discriminant analysis more robust, but cannot predict new data set PREDICTUM IC. 23

57 Tips for PROC DISCRIM POOL = ES - > linear discriminabon O - > quadrabc discriminabon TEST - > used with SLPOOL opbon Selects linear or quadrabc based on hypothesis test using Barlev s modificabon of likelihood rabo test ot robust to non- normality PREDICTUM IC. 23

58 Discriminant Analysis in JMP o cross- validabon o non- parametric methods Has ROC Curve Has regularized discriminant analysis Compromise between linear and quadrabc discriminabon GO TO JMP DEMOSTRATIO PREDICTUM IC. 23

59 Follow Predictum on Twi2er! PREDICTUM IC. 23

60 Follow Predictum on LinkedIn! PREDICTUM IC. 23

61 Stay tuned for our free webinars on stabsbcs, analybcs and predicbve modelling! PREDICTUM IC.

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon

More information

Naïve Bayes Lecture 17

Naïve Bayes Lecture 17 Naïve Bayes Lecture 17 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Mehryar Mohri Bayesian Learning Use Bayes rule! Data Likelihood Prior Posterior

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13 Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Flowchart. (b) (c) (d)

Flowchart. (b) (c) (d) Flowchart (c) (b) (d) This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Bayesian Structure Modeling. SPFLODD December 1, 2011

Bayesian Structure Modeling. SPFLODD December 1, 2011 Bayesian Structure Modeling SPFLODD December 1, 2011 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

Linear Regression and Correla/on

Linear Regression and Correla/on Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Class Notes. Examining Repeated Measures Data on Individuals

Class Notes. Examining Repeated Measures Data on Individuals Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical

More information

DART Tutorial Sec'on 1: Filtering For a One Variable System

DART Tutorial Sec'on 1: Filtering For a One Variable System DART Tutorial Sec'on 1: Filtering For a One Variable System UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The

More information

The Mysteries of Quantum Mechanics

The Mysteries of Quantum Mechanics The Mysteries of Quantum Mechanics Class 4: History and the Quantum Atom Steve Bryson www.stevepur.com/quantum QuesBons? Class Outline 1) IntroducBon: ParBcles vs. Waves 2) Quantum Wave picture, uncertainty

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

STAD68: Machine Learning

STAD68: Machine Learning STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final

More information

Visual matching: distance measures

Visual matching: distance measures Visual matching: distance measures Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Chapter 8: Regression Models with Qualitative Predictors

Chapter 8: Regression Models with Qualitative Predictors Chapter 8: Regression Models with Qualitative Predictors Some predictors may be binary (e.g., male/female) or otherwise categorical (e.g., small/medium/large). These typically enter the regression model

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Indiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes

Indiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes Indiana University, Fall 2014 P309 Intermediate Physics Lab Lecture 1: Experimental UncertainBes Reading: Bevington & Robinson, Chapters 1-3 Handouts from hmp://physics.indiana.edu/~courses/p309/f14/ Experimental

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #4 Similarity Edward Y. Chang Edward Chang Foundations of LSMM 1 Edward Chang Foundations of LSMM 2 Similar? Edward Chang

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

ComputaBonal Physics. StaBsBcal Data Analysis - Fundamental Concepts. Korea University Eunil Won

ComputaBonal Physics. StaBsBcal Data Analysis - Fundamental Concepts. Korea University Eunil Won ComputaBonal Physics StaBsBcal Data Analysis - Fundamental Concepts Korea University Eunil Won Before I start We are entering into scienbfic compubng era and in general this is a huge area. So I would

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Image Processing 1 (IP1) Bildverarbeitung 1

Image Processing 1 (IP1) Bildverarbeitung 1 MIN-Fakultät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) Image Processing 1 (IP1) Bildverarbeitung 1 Lecture 16 Decision Theory Winter Semester 014/15 Dr. Benjamin Seppke Prof. Siegfried SKehl

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test? Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the

More information

L5: Quadratic classifiers

L5: Quadratic classifiers L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information