Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3
|
|
- Gerald Gordon
- 5 years ago
- Views:
Transcription
1 Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. 23
2 Follow Me On PREDICTUM IC. 23
3 I am a new guest blogger for the JMP Blog! PREDICTUM IC. 23
4 Discriminant Analysis in JMP and SAS PREDICTUM IC Eric Cai, M.Sc. StaBsBcian
5 A MarkeBng Survey Will it last a long /me? (Durability) Image by RRZEicons, Wikimedia Does it work well? (Performance) Image by Peng and Rainer Zenz, Wikimedia Will I buy this new toaster? PREDICTUM IC. 23
6 Survey Results Durability Performance Buy Toaster? 5 6 es 7 4 o 8 9 es 4 5 o 6 7 es PREDICTUM IC. 23
7 PREDICTUM IC Sca2er Plot of Survey Results Performance Durability
8 Is Durability a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23
9 Is Performance a Good Discriminant? Performance Misclassified Misclassified Durability PREDICTUM IC. 23
10 Who will buy the toaster? Durability alone is not a perfect predictor. Performance alone is not a perfect predictor. Can we combine Durability and Performance into really good predictor? PREDICTUM IC. 23
11 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability
12 Discriminant Analysis A predicbve modelling technique Used for classificabon Target variable: categorical Predictor variables: conbnuous PREDICTUM IC. 23
13 Machine Learning Supervised Learning Use inputs to predict targets Unsupervised Learning Finding pa2erns among unlabeled data Classifica/on The target variable is categorical or discrete Regression The target variable is con/nuous Clustering Group data into categories based on the data s own pa2erns Discriminant Analysis Density Es/ma/on Es/mate an underlying probability distribu/on func/on Dimensional Reduc/on Reduce the number of random variables being considered while preserving informa/on in the variables PREDICTUM IC. 23
14 How Does Discriminant Analysis Work? Toaster Example: Binary Target 2 classes: es or o For each observabon, Given the predictors for a parbcular observabon, find the condibonal probability of each class P(es Durability i, Performance i ) P(o Durability i, Performance i ) Pick the class with the highest condibonal probability PREDICTUM IC. 23
15 Will the 3 rd customer buy the toaster? Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23
16 PredicBon: The 3 rd will buy the toaster Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 Durability 3 = 7 PREDICTUM IC. 23
17 Will the 5 th customer buy the toaster? Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23
18 PredicBon: The 5 th customer will not buy the toaster Probability P(o Durability 5 = 2) = 0.80 P(es Durability 5 = 2) = 0.20 Durability 5 = 2 PREDICTUM IC. 23
19 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es ? 7 4 o ? 8 9 es ? 4 5 o ? 6 7 es ? PREDICTUM IC. 23
20 Discriminant Durability Performance Buy Toaster? P(o X) P(es X) Predic/on (Buy?) 5 6 es es 7 4 o o 8 9 es o 4 5 o es 6 7 es es The 3 rd and 4 th customers were misclassified by my discriminant. PREDICTUM IC. 23
21 How Does Discriminant Analysis Work? How do we get these probabilibes? Goal: EsBmate the following probabilibes P(es Durability i, Performance i ) P(o Durability i, Performance i ) PREDICTUM IC. 23
22 How do we get these probabilibes? Bayes Rule for Binary, ConBnuous X P(=1 X=x) = P(X=x, =1) = P(X) P(X=x =1)P(=1) P(X, = 1) + P(X,=0) P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) PREDICTUM IC. 23
23 Bayes Rule P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23
24 The Assump/ons of Discriminant Analysis Specifically: Gaussian Discriminant Analysis Assume that X =1 and X =0 have normal (Gaussian) distribubons *es, there are other ways to do discriminant analysis Fisher s Discriminant Analysis on- Parametric Discriminant Analysis PREDICTUM IC. 23
25 Assume that X =es and X =o have normal (Gaussian) distribubons o es Durability PREDICTUM IC. 23
26 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC Will Ron buy the toaster?
27 An observabon is assigned to the class whose mean is closest to it. o Ron es Durability PREDICTUM IC PredicBon: Ron will buy the toaster. His rabng on durability is closer to the es class.
28 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Shortest Distance Between Predictors and Class Mean Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Ron es Durability 3 = 7 Durability PREDICTUM IC. 23
29 What other assumpbon do you see in this picture? o es Durability PREDICTUM IC. 23
30 Assume Equal Variance Between X = o and X = es o es Durability This results in a LIEAR discriminant funcbon of Durability. PREDICTUM IC. 23
31 Equal Covariance Matrices Will Sue by the toaster? Sue Performance es o Durability PREDICTUM IC. 23
32 Equal Covariance Matrices The model predicts that Sue won t buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23
33 Equal Covariance Matrices Which class is closer to Sue? Will Sue buy the toaster? Sue Performance es o Durability PREDICTUM IC. 23
34 Equal Covariance Matrices Which class is closer to Sue? Answer: It depends on how you define distance Sue Performance es o Durability PREDICTUM IC. 23
35 Eulidean Distance PREDICTUM IC. 23
36 Which class is closer to Sue? By Euclidean distance, Sue is closer to the mean of o. Sue Performance es o Durability PREDICTUM IC. 23
37 Which class is closer to Sue? However, the variance of es is higher in the direcbon of Sue compared to the variance of o in the direcbon of Sue. Sue is less standard devia;ons away from es than from o Sue Performance es o Durability PREDICTUM IC. 23
38 Which class is closer to Sue? By Mahalanobis distance, Sue will buy the toaster. Sue Performance es o Durability PREDICTUM IC. 23
39 Mahalanobis Distance It accounts for the fact that the variances in each direcbon are different. It accounts for the covariance between variables. It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance. PREDICTUM IC. 23
40 PREDICTUM IC. 23
41 PREDICTUM IC. 23
42 The Simplest Discriminant Assume that X =1 and X =0 have normal (Gaussian) distribubons Assume that the covariance matrices are equal Result: A linear discriminant* A funcbon is created to separate the 2 classes This funcbon is linear with respect to the predictors *It takes some math to show this PREDICTUM IC. 23
43 PREDICTUM IC A Perfect Linear Discriminant D = 0.79*Durability *Performance Performance Durability
44 Some complicabons What if the covariance matrices are different between the classes? What if the prior probabilibes are different between the classes? PREDICTUM IC. 23
45 What if the covariance matrices are different? Sue Performance es o Durability PREDICTUM IC. 23
46 What if the covariance matrices are different? This results in a QUADRATIC discriminant. *It takes some math to show this. Sue Performance es o Durability PREDICTUM IC. 23
47 PREDICTUM IC A Quadra/c Discriminant Performance Durability
48 What if the prior probabilibes are unequal? P(=1 X=x) = P(X=x =1)P(=1) P(X=x =1)P(=1) + P(X=x =0)P(=0) We need a way to model the distribubons P(X=x =1) P(X=x =0) ormal (Gaussian) distribubons Equal or Unequal Covariance Matrices We need a way to esbmate prior probabilibes P(=1) P(=0) PREDICTUM IC. 23
49 Prior ProbabiliBes So far, we have assumed that the prior probabilibes are equal. This is not always realisbc! PREDICTUM IC. 23
50 Unequal Prior ProbabiliBes - Example Do ot Have Skin Cancer Have Skin Cancer For the general populabon, P(o Skin Cancer) >> P(Have Skin Cancer) Prior probabilibes PREDICTUM IC. 23
51 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: ProporBonal to sample proporbons Toaster Example 60 will buy the toaster 40 will not buy the toaster P(Buy Toaster = es) = 0.60 P(Buy Toaster = o) = 0.40 PREDICTUM IC. 23
52 Unequal Prior ProbabiliBes Common ways to set prior probabilibes: Based on background knowledge/belief Toaster Example Based on anecdotal experience (e.g. conversabons with past customers), you believe that 70% of your customers will buy the new toaster P(Buy Toaster = es) = 0.7 P(Buy Toaster = o) = 0.3 PREDICTUM IC. 23
53 Generalized Mahalanobis Distance It can be shown that the Mahalanobis distance can be generalized* to account for unequal variances unequal prior probabilibes *This takes some math to show. PREDICTUM IC. 23
54 Discriminant Analysis 2 Equivalent Ways of Discrimina/on Highest P(C k X=x) Probability P(es Durability 3 = 7) = 0.65 P(o Durability 3 = 7) = 0.35 o Shortest Generalized Mahalanobis Distance Between Predictors and Class Mean Ron es Durability 3 = 7 Durability PREDICTUM IC. 23
55 Discriminant Analysis in SAS PROC DISCRIM - predicbve discriminant analysis Generate discriminant Predict classes in new data sets PROC CADISC - descripbve discriminant analysis IdenBfy the predictors that best separate the groups Used for variable reducbon technique PROC STEPDISC - stepwise discriminant analysis Looks for the "best" subset of predictors for separabng the groups PREDICTUM IC. 23
56 Tips for PROC DISCRIM Use CROSSLIST/CROSSVALIDATE to enact cross- validabon when building your model LIST (default) does not use cross- validabon CROSSLIST enacts CROSSVALIDATE and shows the results CROSSLISTERR will only show results for misclassified data Reduces unnecessary output METHOD = ORMAL - > Gaussian discriminant analysis PAR - > non- parametric discriminant analysis more robust, but cannot predict new data set PREDICTUM IC. 23
57 Tips for PROC DISCRIM POOL = ES - > linear discriminabon O - > quadrabc discriminabon TEST - > used with SLPOOL opbon Selects linear or quadrabc based on hypothesis test using Barlev s modificabon of likelihood rabo test ot robust to non- normality PREDICTUM IC. 23
58 Discriminant Analysis in JMP o cross- validabon o non- parametric methods Has ROC Curve Has regularized discriminant analysis Compromise between linear and quadrabc discriminabon GO TO JMP DEMOSTRATIO PREDICTUM IC. 23
59 Follow Predictum on Twi2er! PREDICTUM IC. 23
60 Follow Predictum on LinkedIn! PREDICTUM IC. 23
61 Stay tuned for our free webinars on stabsbcs, analybcs and predicbve modelling! PREDICTUM IC.
STA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationNaïve Bayes Lecture 17
Naïve Bayes Lecture 17 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Mehryar Mohri Bayesian Learning Use Bayes rule! Data Likelihood Prior Posterior
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationComputer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13
Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationAn Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University
An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationFlowchart. (b) (c) (d)
Flowchart (c) (b) (d) This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationBayesian Structure Modeling. SPFLODD December 1, 2011
Bayesian Structure Modeling SPFLODD December 1, 2011 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationLinear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz
Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques
More informationLinear Regression and Correla/on
Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationClass Notes. Examining Repeated Measures Data on Individuals
Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical
More informationDART Tutorial Sec'on 1: Filtering For a One Variable System
DART Tutorial Sec'on 1: Filtering For a One Variable System UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationThe Mysteries of Quantum Mechanics
The Mysteries of Quantum Mechanics Class 4: History and the Quantum Atom Steve Bryson www.stevepur.com/quantum QuesBons? Class Outline 1) IntroducBon: ParBcles vs. Waves 2) Quantum Wave picture, uncertainty
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationSTAD68: Machine Learning
STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final
More informationVisual matching: distance measures
Visual matching: distance measures Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationChapter 8: Regression Models with Qualitative Predictors
Chapter 8: Regression Models with Qualitative Predictors Some predictors may be binary (e.g., male/female) or otherwise categorical (e.g., small/medium/large). These typically enter the regression model
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationIndiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes
Indiana University, Fall 2014 P309 Intermediate Physics Lab Lecture 1: Experimental UncertainBes Reading: Bevington & Robinson, Chapters 1-3 Handouts from hmp://physics.indiana.edu/~courses/p309/f14/ Experimental
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationUVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #4 Similarity Edward Y. Chang Edward Chang Foundations of LSMM 1 Edward Chang Foundations of LSMM 2 Similar? Edward Chang
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationComputaBonal Physics. StaBsBcal Data Analysis - Fundamental Concepts. Korea University Eunil Won
ComputaBonal Physics StaBsBcal Data Analysis - Fundamental Concepts Korea University Eunil Won Before I start We are entering into scienbfic compubng era and in general this is a huge area. So I would
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationImage Processing 1 (IP1) Bildverarbeitung 1
MIN-Fakultät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) Image Processing 1 (IP1) Bildverarbeitung 1 Lecture 16 Decision Theory Winter Semester 014/15 Dr. Benjamin Seppke Prof. Siegfried SKehl
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More informationL5: Quadratic classifiers
L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More information