Data Mining and Analysis: Fundamental Concepts and Algorithms

Size: px
Start display at page:

Download "Data Mining and Analysis: Fundamental Concepts and Algorithms"

Transcription

1 Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 18: Probabilistic Classification Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 1 / 22

2 Bayes Classifier Let the training dataset D consist of n points x i in a d-dimensional space, and let y i denote the class for each point, with y i {c 1, c 2,...,c k }. The Bayes classifier estimates the posterior probability P(c i x) for each class c i, and chooses the class that has the largest probability. The predicted class for x is given as ŷ = arg max{p(c i x)} c i According to the Bayes theorem, we have P(c i x) = P(x c i) P(c i ) P(x) Because P(x) is fixed for a given point, Bayes rule can be rewritten as { } P(x ci )P(c i ) { ŷ = arg max{p(c i x)} = arg max = arg max P(x ci )P(c i ) } c i c i P(x) c i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 2 / 22

3 Estimating the Prior Probability: P(c i ) Let D i denote the subset of points in D that are labeled with class c i : D i = {x j D x j has class y j = c i } Let the size of the dataset D be given as D = n, and let the size of each class-specific subset D i be given as D i = n i. The prior probability for class c i can be estimated as follows: ˆP(c i ) = n i n Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 3 / 22

4 Estimating the Likelihood: Numeric Attributes, Parametric Approach To estimate the likelihood P(x c i ), we have to estimate the joint probability of x across all the d dimensions, i.e., we have to estimate P ( x = (x 1, x 2,...,x d ) c i ). In the parametric approach we assume that each class c i is normally distributed, and we use the estimated mean ˆµ i and covariance matrix Σ i to compute the probability density at x ˆf i (x) = ˆf(x ˆµ i, Σ 1 i ) = ( exp 2π) Σ d i The posterior probability is then given as P(c i x) = ˆf i (x)p(c i ) k j=1 ˆf j (x)p(c j ) The predicted class for x is: } ŷ = arg max {ˆf i (x)p(c i ) c i { (x ˆµ 1 T i) Σ i (x ˆµ i ) 2 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 4 / 22 }

5 Bayes Classifier Algorithm BAYESCLASSIFIER (D = {(x j, y j )} n j=1 ): for i = 1,...,k do D i { x j y j = c i, j = 1,...,n } // class-specific subsets n i D i // cardinality ˆP(c i ) n i /n// prior probability ˆµ i 1 n i x j D i x j // mean Z i D i 1 ni ˆµ T i // centered data Σ i 1 n i Z T i Z i // covariance matrix return ˆP(c i ), ˆµ i, Σ i for all i = 1,...,k TESTING (x and ˆP(c i ), ˆµ i, Σ i, for all i [1, k]): { ŷ arg max f(x ˆµi, Σ i ) P(c i ) } c i return ŷ Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 5 / 22

6 Bayes Classifier: Iris Data X 1 :sepallength versus X 2 :sepal width X 2 x = (6.75, 4.25) T X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 6 / 22

7 Bayes Classifier: Categorical Attributes Let X j be a categorical attribute over the domain dom(x j ) = {a j1, a j2,...,a jmj }. Each categorical attribute X j is modeled as an m j -dimensional multivariate Bernoulli random variable X j that takes on m j distinct vector values e j1, e j2,...,e jmj, where e jr is the rth standard basis vector in R m j and corresponds to the rth value or symbol a jr dom(x j ). The entire d-dimensional dataset is modeled as the vector random variable X = (X 1, X 2,...,X d ) T. Let d = d j=1 m j; a categorical point x = (x 1, x 2,...,x d ) T is therefore represented as the d -dimensional binary vector v = v 1.. v d = e 1r1.. e drd where v j = e jrj provided x j = a jrj is the r j th value in the domain of X j. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 7 / 22

8 Bayes Classifier: Categorical Attributes The probability of the categorical point x is obtained from the joint probability mass function (PMF) for the vector random variable X: P(x c i ) = f(v c i ) = f ( X 1 = e 1r1,...,X d = e drd c i ) The joint PMF can be estimated directly from the data D i for each class c i as follows: ˆf(v c i ) = n i(v) n i where n i (v) is the number of times the value v occurs in class c i. However, to avoid zero probabilities we add a pseudo-count of 1 for each value ˆf(v c i ) = n i(v)+1 n i + d j=1 m j Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 8 / 22

9 Discretized Iris Data: sepallength andsepal width Bins Domain [4.3, 5.2] Very Short (a 11 ) (5.2, 6.1] Short (a 12 ) (6.1, 7.0] Long (a 13 ) (7.0, 7.9] Very Long (a 14 ) (a) Discretizedsepal length Bins Domain [2.0, 2.8] Short (a 21 ) (2.8, 3.6] Medium (a 22 ) (3.6, 4.4] Long (a 23 ) (b) Discretizedsepal width Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 9 / 22

10 Class-specific Empirical Joint Probability Mass Function Class: c 1 X 2 ˆfX1 Short (e 21 ) Medium (e 22 ) Long (e 23 ) X 1 Short (e 12 ) 0 3/50 8/50 13/50 Long (e 13 ) Very Short (e 11) 1/50 33/50 5/50 39/50 Very Long (e 14 ) ˆf X2 1/50 36/50 13/50 Class: c 2 X 2 ˆfX1 Short (e 21 ) Medium (e 22 ) Long (e 23 ) X 1 Short (e 12 ) 24/100 15/ /100 Long (e 13 ) 13/100 30/ /100 Very Short (e 11) 6/ /100 Very Long (e 14 ) 3/100 7/100 2/100 12/100 ˆf X2 46/100 52/100 2/100 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 10 / 22

11 Iris Data: Test Case Consider a test point x = (5.3, 3.0) T corresponding to the categorical point (Short, Medium), which is represented as v = ( e T 12 e T 22) T. The prior probabilities of the classes are ˆP(c 1 ) = 0.33 and ˆP(c 2 ) = The likelihood and posterior probability for each class is given as ˆP(x c 1 ) = ˆf(v c 1 ) = 3/50 = 0.06 ˆP(x c 2 ) = ˆf(v c 2 ) = 15/100 = 0.15 ˆP(c 1 x) = ˆP(c 2 x) = In this case the predicted class is ŷ = c 2. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 11 / 22

12 Iris Data: Test Case with Pesudo-counts The test point x = (6.75, 4.25) T corresponds to the categorical point (Long, Long), and it is represented as v = ( e T 13 e T 23) T. Unfortunately the probability mass at v is zero for both classes. We adjust the PMF via pseudo-counts noting that the number of possible values are m 1 m 2 = 4 3 = 12. The likelihood and prior probability can then be computed as ˆP(x c 1 ) = ˆf(v c 1 ) = = ˆP(x c 2 ) = ˆf(v c 2 ) = = ˆP(c 1 x) ( ) 0.33 = ˆP(c 2 x) ( ) 0.67 = Thus, the predicted class is ŷ = c 2. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 12 / 22

13 Bayes Classifier: Challenges The main problem with the Bayes classifier is the lack of enough data to reliably estimate the joint probability density or mass function, especially for high-dimensional data. For numeric attributes we have to estimate O(d 2 ) covariances, and as the dimensionality increases, this requires us to estimate too many parameters. For categorical attributes we have to estimate the joint probability for all the possible values of v, given as j dom( X j ). Even if each categorical attribute has only two values, we would need to estimate the probability for 2 d values. However, because there can be at most n distinct values for v, most of the counts will be zero. Naive Bayes classifier addresses these concerns. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 13 / 22

14 Naive Bayes Classifier: Numeric Attributes The naive Bayes approach makes the simple assumption that all the attributes are independent, which implies that the likelihood can be decomposed into a product of dimension-wise probabilities: P(x c i ) = P(x 1, x 2,...,x d c i ) = d P(x j c i ) The likelihood for class c i, for dimension X j, is given as { } P(x j c i ) f(x j ˆµ ij,ˆσ ij) 2 1 = exp (x j ˆµ ij ) 2 2πˆσij 2ˆσ ij 2 where ˆµ ij and ˆσ 2 ij denote the estimated mean and variance for attribute X j, for class c i. j=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 14 / 22

15 Naive Bayes Classifier: Numeric Attributes The naive assumption corresponds to setting all the covariances to zero in Σ i, that is, σ 2 i σi Σ i = σid 2 The naive Bayes classifier thus uses the sample mean ˆµ i = (ˆµ i1,..., ˆµ id ) T and a diagonal sample covariance matrix Σ i = diag(σi1 2,...,σ2 id ) for each class c i. In total 2d parameters have to be estimated, corresponding to the sample mean and sample variance for each dimension X j. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 15 / 22

16 Naive Bayes Algorithm NAIVEBAYES (D = {(x j, y j )} n j=1): 1 for i = 1,...,k do D i { x j y j = c i, j = 1,...,n } 2 // class-specific subsets 3 n i D i // cardinality 4 ˆP(c i ) n i /n // prior probability 5 ˆµ i 1 n i x j D i x j // mean 6 Z i = D i 1 ˆµ T i // centered data for class c i 7 for j = 1,.., d do// class-specific variance for X j 8 ˆσ ij 2 1 n i Zij T Z ij // variance ˆσ i = (ˆσ 2 i1,...,ˆσ 2 id) T // class-specific attribute variances 9 return ˆP(c i ), ˆµi, ˆσ i for all i = 1,...,k TESTING (x and ˆP(c i ), ˆµ i, ˆσ i, for all i [1, k]): { d } ŷ arg max ˆP(c i ) f(x j ˆµ ij,ˆσ ij) 2 c i return ŷ j=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 16 / 22

17 Naive Bayes versus Full Bayes Classifier: Iris 2D Data X 1 :sepallength versus X 2 :sepalwidth X2 X x = (6.75, 4.25) T x = (6.75, 4.25) T X X1 (a) Naive Bayes (b) Full Bayes Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 17 / 22

18 Naive Bayes: Categorical Attributes The independence assumption leads to a simplification of the joint probability mass function d d P(x c i ) = P(x j c i ) = f ( ) X j = e jrj c i j=1 where f(x j = e jrj c i ) is the probability mass function for X j, which can be estimated from D i as follows: j=1 ˆf(v j c i ) = n i(v j ) n i where n i (v j ) is the observed frequency of the value v j = e j r j corresponding to the r j th categorical value a jrj for the attribute X j for class c i. If the count is zero, we can use the pseudo-count method to obtain a prior probability. The adjusted estimates with pseudo-counts are given as where m j = dom(x j ). ˆf(v j c i ) = n i(v j )+1 n i + m j Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 18 / 22

19 Nonparametric Approach: K Nearest Neighbors Classifier We consider a non-parametric approach for likelihood estimation using the nearest neighbors density estimation. Let D be a training dataset comprising n points x i R d, and let D i denote the subset of points in D that are labeled with class c i, with n i = D i. Given a test point x R d, and K, the number of neighbors to consider, let r denote the distance from x to its K th nearest neighbor in D. Consider the d-dimensional hyperball of radius r around the test point x, defined as B d (x, r) = { x i D δ(x, x i ) r } Here δ(x, x i ) is the distance between x and x i, which is usually assumed to be the Euclidean distance, i.e., δ(x, x i ) = x x i 2. We assume that B d (x, r) = K. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 19 / 22

20 Nonparametric Approach: K Nearest Neighbors Classifier Let K i denote the number of points among the K nearest neighbors of x that are labeled with class c i, that is K i = { x j B d (x, r) y j = c i } The class conditional probability density at x can be estimated as the fraction of points from class c i that lie within the hyperball divided by its volume, that is ˆf(x c i ) = K i/n i V = K i n i V where V = vol(b d (x, r)) is the volume of the d-dimensional hyperball. The posterior probability P(c i x) can be estimated as P(c i x) = ˆf(x c i )ˆP(c i ) k j=1 ˆf(x c j )ˆP(c j ) However, because ˆP(c i ) = n i n, we have ˆf(x c i )ˆP(c i ) = K i n i V ni n = K i nv Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 20 / 22

21 Nonparametric Approach: K Nearest Neighbors Classifier The posterior probability is given as P(c i x) = K i nv k K j j=1 nv = K i K Finally, the predicted class for x is { } Ki ŷ = arg max{p(c i x)} = arg max = arg max{k i } c i c i K c i Because K is fixed, the KNN classifier predicts the class of x as the majority class among its K nearest neighbors. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 21 / 22

22 Iris Data: K Nearest Neighbors Classifier X r x = (6.75, 4.25) T X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 18: Probabilistic Classification 22 / 22

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

Data Mining for Social and Cyber-Physical Systems

Data Mining for Social and Cyber-Physical Systems Data Mining for Social and Cyber-Physical Systems Wagner Meira Jr. 1 1 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil August 2, 2017 Meira Jr. (UFMG) Data Mining

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

MULTIVARIATE HOMEWORK #5

MULTIVARIATE HOMEWORK #5 MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Naive Bayes & Introduction to Gaussians

Naive Bayes & Introduction to Gaussians Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 2 of Pattern Recognition and Machine Learning by Bishop (with an emphasis on section 2.5) Möller/Mori 2 Outline Last

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Bayesian Classification Methods

Bayesian Classification Methods Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information