COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Size: px
Start display at page:

Download "COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification"

Transcription

1 COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: Class web page: Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor s written permission.

2 Today s quiz Q. What is a linear classifier? (In contrast to a non-linear classifier.) Q. Describe the difference between discriminative and generative classifiers. Q. Consider the following data set. If you use logistic regression to compute a decision boundary, what is the prediction for x 6? Data Feature Feature Feature Output x x 0 0 x x 4 x 5 0 x ? COMP-55: Applied Machine Learning

3 Quick recap Two approaches for linear classification: Discriminative learning: Directly estimate P(y x). Logistic regression, P(y x) : σ(wx) = / ( + e -WX ) Generative learning: Separately model P(x y) and P(y). Use these, through Bayes rule, to estimate P(y x). COMP-55: Applied Machine Learning

4 Linear discriminant analysis (LDA) Return to Bayes rule: P(y x) = P(x y)p(y) P(x) LDA makes explicit assumptions about P(x y): (x µ )T Σ (x µ ) P(x y) = e (π ) / Σ / Multivariate Gaussian, with mean μ and covariance matrix Σ. Notation: here x is a single instance, represented as an m* vector. Key assumption of LDA: Both classes have the same covariance matrix, Σ. COMP-55: Applied Machine Learning 4

5 Linear discriminant analysis (LDA) Return to Bayes rule: P(y x) = P(x y)p(y) P(x) LDA makes explicit assumptions about P(x y): (x µ )T Σ (x µ ) P(x y) = e (π ) / Σ / Multivariate Gaussian, with mean μ and covariance matrix Σ. Notation: here x is a single instance, represented as an m* vector. Key assumption of LDA: Both classes have the same covariance matrix, Σ. Consider the log-odds ratio (again, P(x) doesn t matter for decision): ln P(x y =)P(y =) P(x y = 0)P(y = 0) P(y =) = ln P(y = 0) µ T Σ µ + µ T Σ µ + x T Σ (µ µ ) This is a linear decision boundary! w 0 + x T w COMP-55: Applied Machine Learning 5

6 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. COMP-55: Applied Machine Learning 6

7 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. COMP-55: Applied Machine Learning 7

8 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. COMP-55: Applied Machine Learning 8

9 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. Σ = k=0: i=:n I(y i =0) (x i μ k )(x i μ k ) T / (N 0 +N -N k ) COMP-55: Applied Machine Learning 9

10 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. Σ = k=0: i=:n I(y i =0) (x i μ k )(x i μ k ) T / (N 0 +N -N k ) Decision boundary: Given an input x, classify it as class if the log-odds ratio is >0, classify it as class 0 otherwise. COMP-55: Applied Machine Learning 0

11 More complex decision boundaries Want to accommodate more than classes? Consider the per-class linear discriminant function: δ y (x) = x T Σ µ y µ T y Σ µ y + log P(y) Use the following lowing decision rule: # Output = argmaxδ y (x) = argmax x T Σ µ y y y µ & T y Σ µ y + log P(y) $ % ' ( Want more flexible (non-linear) decision boundaries? Recall trick from linear regression of re-coding variables. COMP-55: Applied Machine Learning

12 Using higher-order features COMP-55: Applied Machine Learning FIGURE 4.. The left plot shows some data from three classes, with linear decision boundaries found by linear discriminant analysis. The right plot shows quadratic decision boundaries. These were obtained by finding linear boundaries in the five-dimensional space X,X,X X,X,X. Linear inequalities in this space are quadratic inequalities in the original space.

13 Quadratic discriminant analysis LDA assumes all classes have the same covariance matrix, Σ. QDA allows different covariance matrices, Σ y for each class y. Linear discriminant function: Quadratic discriminant function: δ y (x) = x T Σ µ y µ T y Σ µ y + log P(y) δ y (x) = Σ y (x µ y) T Σ y (x µ y )+ log P(y) QDA has more parameters to estimate, but greater flexibility to estimate the target function. Bias-variance trade-off. COMP-55: Applied Machine Learning

14 4 LDA vs QDA FIGURE 4.6. Two methods for fitting quadratic boundaries. The left plot shows the quadratic decision boundaries for the data in Figure 4. (obtained using LDA in the five-dimensional space X,X,X X,X,X ). The right plot shows the quadratic decision boundaries found by QDA. The differences are small, as is usually the case. COMP-55: Applied Machine Learning

15 Generative learning Scaling up Consider applying generative learning (e.g. LDA) to Project data. COMP-55: Applied Machine Learning 5

16 Generative learning Scaling up Consider applying generative learning (e.g. LDA) to Project data. E.g. Task: Given first few words, determine language (without dictionary). What are features? How to encode the data? How to handle high dimensional feature space? Assume same co-variance matrix for both classes? (Maybe that is ok.) What kinds of assumptions can we make to simplify the problem? COMP-55: Applied Machine Learning 6

17 Naïve Bayes assumption Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are conditionally independent given y. In other words: P(x j y) = P(x j y, x k ), for all j, k COMP-55: Applied Machine Learning 7

18 Naïve Bayes assumption Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are conditionally independent given y. In other words: P(x j y) = P(x j y, x k ), for all j, k Generative model structure: P(x y) = P(x, x,, x m y) = P(x y) P(x y, x ) P(x y, x, x ) P(x m y, x, x,, x m- ) (from general rules of probabilities) = P(x y) P(x y) P(x y) P(x m y) (from the Naïve Bayes assumption above) COMP-55: Applied Machine Learning 8

19 Naïve Bayes graphical model y x x x x m How many parameters to estimate? Assume m binary features. COMP-55: Applied Machine Learning 9

20 Naïve Bayes graphical model y x x x x m How many parameters to estimate? Assume m binary features. without Naïve Bayes assumption: O( m ) numbers to describe model. With Naïve Bayes assumption: O(m) numbers to describe model. Useful when the number of features is high. COMP-55: Applied Machine Learning 0

21 Training a Naïve Bayes classifier Assume x, y are binary variables, m=. Estimate the parameters P(x y) and P(y) from data. Define: Θ = Pr (y=) Θ j, = Pr (x j = y=) Θ j,0 = Pr (x j = y=0). y x COMP-55: Applied Machine Learning

22 Training a Naïve Bayes classifier Assume x, y are binary variables, m=. Estimate the parameters P(x y) and P(y) from data. y Define: Θ = Pr (y=) Θ j, = Pr (x j = y=) Θ j,0 = Pr (x j = y=0). x Evaluation criteria: Find parameters that maximize the loglikelihood function. Likelihood: Pr(y x) Pr(y)Pr(x y) = i=:n ( P(y i ) j=:m P(x i,j y i ) ) Samples i are independent, so we take product over n. Input features are independent (cond. on y) so we take product over m. COMP-55: Applied Machine Learning

23 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] COMP-55: Applied Machine Learning

24 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] COMP-55: Applied Machine Learning 4

25 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] (will have other form if params P(x y) have other form, e.g. Gaussian). COMP-55: Applied Machine Learning 5

26 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] (will have other form if params P(x y) have other form, e.g. Gaussian). To estimate Θ, take derivative of logl with respect to Θ, set to 0: L / Θ = Σ i=:n (y i /Θ - (-y i )/(-Θ )) = 0 COMP-55: Applied Machine Learning 6

27 Training a Naïve Bayes classifier Solving for Θ we get: Θ = (/n) Σ i=:n y i = number of examples where y= / number of examples COMP-55: Applied Machine Learning 7

28 Training a Naïve Bayes classifier Solving for Θ we get: Θ = (/n) Σ i=:n y i = number of examples where y= / number of examples Similarly, we get: Θ j, = number of examples where x j = and y= / number of examples where y= Θ j,0 = number of examples where x j = and y=0 / number of examples where y=0 COMP-55: Applied Machine Learning 8

29 Naïve Bayes decision boundary Consider again the log-odds ratio: log Pr(y = x) Pr(y = 0 x) = log Pr(x y =)P(y =) Pr(x y = 0)P(y = 0) = log = log m ( ) P(y =) P(y = 0) + log P x j y = j= P x j y = 0 m j= ( ) ( ) ( ) m P(y =) P(y = 0) + log P x j y = P x j y = 0 j= COMP-55: Applied Machine Learning 9

30 Naïve Bayes decision boundary Consider the case where features are binary: x j = {0, } Define: w j,0 = log P(x j = 0 y =) P(x j = 0 y = 0) ; w j, = log P(x j = y =) P(x j = y = 0) Now we have: log Pr(y = x) Pr(y = 0 x) ( ) ( ) m P(y =) = log P(y = 0) + log P x y = j P x j y = 0 This is a linear decision boundary! w 0 + x T w j= m P(y =) = log P(y = 0) + (w ( x )+ w x ) j,0 j j, j j= m m P(y =) = log P(y = 0) + w j,0 + (w j, w j,0 )x j j= j= COMP-55: Applied Machine Learning 0

31 Text classification example Using Naïve Bayes, we can compute probabilities for all the words which appear in the document collection. P(y=c) is the probability of class c P(x j y=c) is the probability of seeing word j in documents of class c Class c word word word word m COMP-55: Applied Machine Learning

32 Text classification example Using Naïve Bayes, we can compute probabilities for all the words which appear in the document collection. P(y=c) is the probability of class c P(x j y=c) is the probability of seeing word j in documents of class c Set of classes depends on the application, e.g. Topic modeling: each class corresponds to documents on a given topic, e.g. {Politics, Finance, Sports, Arts}. Class c What happens when a word is not observed in the training data? word word word word m COMP-55: Applied Machine Learning

33 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= COMP-55: Applied Machine Learning

34 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= With the following: Pr(x j y=) = (number of instance with x j = and y=) + (number of examples with y=) + COMP-55: Applied Machine Learning 4

35 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= With the following: Pr(x j y=) = (number of instance with x j = and y=) + (number of examples with y=) + If no example from that class, it reduces to a prior probability or Pr=/. If all examples have x j =, then Pr(x j =0 y) has Pr = / (#examples + ). If a word appears frequently, the new estimate is only slightly biased. This is an example of Bayesian prior for Naïve Bayes. COMP-55: Applied Machine Learning 5

36 Example: 0 newsgroups Given 000 training documents from each group, learn to classify new documents according to which newsgroup they came from: comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x alt.atheism soc.religion.christian talk.religion.misc talk.politics.mideast talk.politics.misc misc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.space sci.crypt sci.electronics sci.med talk.politics.guns Naïve Bayes: 89% classification accuracy (comparable to other state-of-the-art methods.) COMP-55: Applied Machine Learning 6

37 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. COMP-55: Applied Machine Learning 7

38 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. Option assumes you have a multi-class version of the classifier. For Naïve Bayes, compute P(y x) for each class, and select the class with highest probability. COMP-55: Applied Machine Learning 8

39 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. Option assumes you have a multi-class version of the classifier. For Naïve Bayes, compute P(y x) for each class, and select the class with highest probability. Option applies to all binary classifiers, so more flexible. But often slower (need to learn many classifiers), and creates a class imbalance problem (the target class has relatively fewer data points, compared to the aggregation of the other classes.) COMP-55: Applied Machine Learning 9

40 Best features extracted From MSc thesis by Jason Rennie COMP-55: Applied Machine Learning 40

41 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n COMP-55: Applied Machine Learning 4

42 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n If we assume the same Σ for all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagonal (i.e. features are independent): Gaussian Naïve Bayes. COMP-55: Applied Machine Learning 4

43 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n If we assume the same Σ for all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagonal (i.e. features are independent): Gaussian Naïve Bayes. How do we estimate parameters? Derive the maximum likelihood estimators for μ and Σ. COMP-55: Applied Machine Learning 4

44 More generally: Bayesian networks If not all conditional independence relations are true, we can introduce new arcs to show what dependencies are there. At each node, we have the conditional probability distribution of the corresponding variable, given its parents. This more general type of graph, annotated with conditional distributions, is called a Bayesian network. More on this in COMP-65. COMP-55: Applied Machine Learning 44

45 What you should know Naïve Bayes assumption Log-odds ratio decision boundary How to estimate parameters for Naïve Bayes Laplace smoothing Relation between Naïve Bayes, LDA, QDA, Gaussian Naïve Bayes. COMP-55: Applied Machine Learning 45

Co-training and Learning with Noise

Co-training and Learning with Noise Co-training and Learning with Noise Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science and Singapore-MIT Alliance, National University of Singapore, Singapore 117543, Republic of Singapore

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Two Roles for Bayesian Methods

Two Roles for Bayesian Methods Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description length principle Bayes optimal classifier Naive Bayes learner Example: Learning over text data Bayesian belief networks

More information

Machine Learning. Bayesian Learning.

Machine Learning. Bayesian Learning. Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Bayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.

Bayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6. Machine Learning Bayesian Learning Bayes Theorem Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Machine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller

Machine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller Machine Learning Bayesian Learning Dr. Joschka Boedecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

BAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6]

BAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] 1 BAYESIAN LEARNING [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, ML hypotheses, MAP learners Minimum description length principle Bayes optimal classifier, Naive Bayes learner Example:

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

人工知能学会インタラクティブ情報アクセスと可視化マイニング研究会 ( 第 3 回 ) SIG-AM Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi Departmen

人工知能学会インタラクティブ情報アクセスと可視化マイニング研究会 ( 第 3 回 ) SIG-AM Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi Departmen Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi 2 1 1 Department of Information Science, Faculty of Science, Ochanomizu University 2 2 Advanced Science, Graduate School of

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Day 5: Generative models, structured classification

Day 5: Generative models, structured classification Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

LEARNING AND REASONING ON BACKGROUND NETS FOR TEXT CATEGORIZATION WITH CHANGING DOMAIN AND PERSONALIZED CRITERIA

LEARNING AND REASONING ON BACKGROUND NETS FOR TEXT CATEGORIZATION WITH CHANGING DOMAIN AND PERSONALIZED CRITERIA International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 1, January 2013 pp. 47 67 LEARNING AND REASONING ON BACKGROUND NETS FOR

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Bayesian Learning. Bayes Theorem. MAP, MLhypotheses. MAP learners. Minimum description length principle. Bayes optimal classier. Naive Bayes learner

Bayesian Learning. Bayes Theorem. MAP, MLhypotheses. MAP learners. Minimum description length principle. Bayes optimal classier. Naive Bayes learner Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, MLhypotheses MAP learners Minimum description length principle Bayes optimal classier Naive Bayes learner Example:

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Using Part-of-Speech Information for Transfer in Text Classification

Using Part-of-Speech Information for Transfer in Text Classification Using Part-of-Speech Information for Transfer in Text Classification Jason D. M. Rennie jrennie@csail.mit.edu December 17, 2003 Abstract Consider the problem of text classification where there are very

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

arxiv: v1 [cs.lg] 5 Jul 2010

arxiv: v1 [cs.lg] 5 Jul 2010 The Latent Bernoulli-Gauss Model for Data Analysis Amnon Shashua Gabi Pragier School of Computer Science and Engineering Hebrew University of Jerusalem arxiv:1007.0660v1 [cs.lg] 5 Jul 2010 Abstract We

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information