COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
|
|
- Gloria Park
- 6 years ago
- Views:
Transcription
1 COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: Class web page: Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor s written permission.
2 Today s quiz Q. What is a linear classifier? (In contrast to a non-linear classifier.) Q. Describe the difference between discriminative and generative classifiers. Q. Consider the following data set. If you use logistic regression to compute a decision boundary, what is the prediction for x 6? Data Feature Feature Feature Output x x 0 0 x x 4 x 5 0 x ? COMP-55: Applied Machine Learning
3 Quick recap Two approaches for linear classification: Discriminative learning: Directly estimate P(y x). Logistic regression, P(y x) : σ(wx) = / ( + e -WX ) Generative learning: Separately model P(x y) and P(y). Use these, through Bayes rule, to estimate P(y x). COMP-55: Applied Machine Learning
4 Linear discriminant analysis (LDA) Return to Bayes rule: P(y x) = P(x y)p(y) P(x) LDA makes explicit assumptions about P(x y): (x µ )T Σ (x µ ) P(x y) = e (π ) / Σ / Multivariate Gaussian, with mean μ and covariance matrix Σ. Notation: here x is a single instance, represented as an m* vector. Key assumption of LDA: Both classes have the same covariance matrix, Σ. COMP-55: Applied Machine Learning 4
5 Linear discriminant analysis (LDA) Return to Bayes rule: P(y x) = P(x y)p(y) P(x) LDA makes explicit assumptions about P(x y): (x µ )T Σ (x µ ) P(x y) = e (π ) / Σ / Multivariate Gaussian, with mean μ and covariance matrix Σ. Notation: here x is a single instance, represented as an m* vector. Key assumption of LDA: Both classes have the same covariance matrix, Σ. Consider the log-odds ratio (again, P(x) doesn t matter for decision): ln P(x y =)P(y =) P(x y = 0)P(y = 0) P(y =) = ln P(y = 0) µ T Σ µ + µ T Σ µ + x T Σ (µ µ ) This is a linear decision boundary! w 0 + x T w COMP-55: Applied Machine Learning 5
6 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. COMP-55: Applied Machine Learning 6
7 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. COMP-55: Applied Machine Learning 7
8 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. COMP-55: Applied Machine Learning 8
9 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. Σ = k=0: i=:n I(y i =0) (x i μ k )(x i μ k ) T / (N 0 +N -N k ) COMP-55: Applied Machine Learning 9
10 Learning in LDA: class case Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. P(y=0) = N 0 / (N 0 + N ) P(y=) = N / (N 0 + N ) where N, N 0, be # of training samples from classes and 0, respectively. μ 0 = i=:n I(y i =0) x i / N 0 μ = i=:n I(y i =) x i / N where I(x) is the indicator function: I(x)=0 if x=0, I(x)= if x=. Σ = k=0: i=:n I(y i =0) (x i μ k )(x i μ k ) T / (N 0 +N -N k ) Decision boundary: Given an input x, classify it as class if the log-odds ratio is >0, classify it as class 0 otherwise. COMP-55: Applied Machine Learning 0
11 More complex decision boundaries Want to accommodate more than classes? Consider the per-class linear discriminant function: δ y (x) = x T Σ µ y µ T y Σ µ y + log P(y) Use the following lowing decision rule: # Output = argmaxδ y (x) = argmax x T Σ µ y y y µ & T y Σ µ y + log P(y) $ % ' ( Want more flexible (non-linear) decision boundaries? Recall trick from linear regression of re-coding variables. COMP-55: Applied Machine Learning
12 Using higher-order features COMP-55: Applied Machine Learning FIGURE 4.. The left plot shows some data from three classes, with linear decision boundaries found by linear discriminant analysis. The right plot shows quadratic decision boundaries. These were obtained by finding linear boundaries in the five-dimensional space X,X,X X,X,X. Linear inequalities in this space are quadratic inequalities in the original space.
13 Quadratic discriminant analysis LDA assumes all classes have the same covariance matrix, Σ. QDA allows different covariance matrices, Σ y for each class y. Linear discriminant function: Quadratic discriminant function: δ y (x) = x T Σ µ y µ T y Σ µ y + log P(y) δ y (x) = Σ y (x µ y) T Σ y (x µ y )+ log P(y) QDA has more parameters to estimate, but greater flexibility to estimate the target function. Bias-variance trade-off. COMP-55: Applied Machine Learning
14 4 LDA vs QDA FIGURE 4.6. Two methods for fitting quadratic boundaries. The left plot shows the quadratic decision boundaries for the data in Figure 4. (obtained using LDA in the five-dimensional space X,X,X X,X,X ). The right plot shows the quadratic decision boundaries found by QDA. The differences are small, as is usually the case. COMP-55: Applied Machine Learning
15 Generative learning Scaling up Consider applying generative learning (e.g. LDA) to Project data. COMP-55: Applied Machine Learning 5
16 Generative learning Scaling up Consider applying generative learning (e.g. LDA) to Project data. E.g. Task: Given first few words, determine language (without dictionary). What are features? How to encode the data? How to handle high dimensional feature space? Assume same co-variance matrix for both classes? (Maybe that is ok.) What kinds of assumptions can we make to simplify the problem? COMP-55: Applied Machine Learning 6
17 Naïve Bayes assumption Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are conditionally independent given y. In other words: P(x j y) = P(x j y, x k ), for all j, k COMP-55: Applied Machine Learning 7
18 Naïve Bayes assumption Generative learning: Estimate P(x y), P(y). Then calculate P(y x). Naïve Bayes: Assume the x j are conditionally independent given y. In other words: P(x j y) = P(x j y, x k ), for all j, k Generative model structure: P(x y) = P(x, x,, x m y) = P(x y) P(x y, x ) P(x y, x, x ) P(x m y, x, x,, x m- ) (from general rules of probabilities) = P(x y) P(x y) P(x y) P(x m y) (from the Naïve Bayes assumption above) COMP-55: Applied Machine Learning 8
19 Naïve Bayes graphical model y x x x x m How many parameters to estimate? Assume m binary features. COMP-55: Applied Machine Learning 9
20 Naïve Bayes graphical model y x x x x m How many parameters to estimate? Assume m binary features. without Naïve Bayes assumption: O( m ) numbers to describe model. With Naïve Bayes assumption: O(m) numbers to describe model. Useful when the number of features is high. COMP-55: Applied Machine Learning 0
21 Training a Naïve Bayes classifier Assume x, y are binary variables, m=. Estimate the parameters P(x y) and P(y) from data. Define: Θ = Pr (y=) Θ j, = Pr (x j = y=) Θ j,0 = Pr (x j = y=0). y x COMP-55: Applied Machine Learning
22 Training a Naïve Bayes classifier Assume x, y are binary variables, m=. Estimate the parameters P(x y) and P(y) from data. y Define: Θ = Pr (y=) Θ j, = Pr (x j = y=) Θ j,0 = Pr (x j = y=0). x Evaluation criteria: Find parameters that maximize the loglikelihood function. Likelihood: Pr(y x) Pr(y)Pr(x y) = i=:n ( P(y i ) j=:m P(x i,j y i ) ) Samples i are independent, so we take product over n. Input features are independent (cond. on y) so we take product over m. COMP-55: Applied Machine Learning
23 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] COMP-55: Applied Machine Learning
24 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] COMP-55: Applied Machine Learning 4
25 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] (will have other form if params P(x y) have other form, e.g. Gaussian). COMP-55: Applied Machine Learning 5
26 Training a Naïve Bayes classifier Likelihood for binary output variable: L(Θ y=) = Θ y (-Θ ) -y Log-likelihood for all parameters: log L(Θ,Θ i,,θ i,0 D) = Σ i=:n [ log P(y i ) + Σ j=:m log P(x i,j y i ) ] = Σ i=:n [ y i log Θ + (-y i )log(-θ ) + Σ j=:m y i ( x i,j logθ i, + (-x i,j )log(-θ i, ) ) + Σ j=:m (-y i )( x i,j logθ i,0 + (-x i,j )log(-θ i,0 ) ) ] (will have other form if params P(x y) have other form, e.g. Gaussian). To estimate Θ, take derivative of logl with respect to Θ, set to 0: L / Θ = Σ i=:n (y i /Θ - (-y i )/(-Θ )) = 0 COMP-55: Applied Machine Learning 6
27 Training a Naïve Bayes classifier Solving for Θ we get: Θ = (/n) Σ i=:n y i = number of examples where y= / number of examples COMP-55: Applied Machine Learning 7
28 Training a Naïve Bayes classifier Solving for Θ we get: Θ = (/n) Σ i=:n y i = number of examples where y= / number of examples Similarly, we get: Θ j, = number of examples where x j = and y= / number of examples where y= Θ j,0 = number of examples where x j = and y=0 / number of examples where y=0 COMP-55: Applied Machine Learning 8
29 Naïve Bayes decision boundary Consider again the log-odds ratio: log Pr(y = x) Pr(y = 0 x) = log Pr(x y =)P(y =) Pr(x y = 0)P(y = 0) = log = log m ( ) P(y =) P(y = 0) + log P x j y = j= P x j y = 0 m j= ( ) ( ) ( ) m P(y =) P(y = 0) + log P x j y = P x j y = 0 j= COMP-55: Applied Machine Learning 9
30 Naïve Bayes decision boundary Consider the case where features are binary: x j = {0, } Define: w j,0 = log P(x j = 0 y =) P(x j = 0 y = 0) ; w j, = log P(x j = y =) P(x j = y = 0) Now we have: log Pr(y = x) Pr(y = 0 x) ( ) ( ) m P(y =) = log P(y = 0) + log P x y = j P x j y = 0 This is a linear decision boundary! w 0 + x T w j= m P(y =) = log P(y = 0) + (w ( x )+ w x ) j,0 j j, j j= m m P(y =) = log P(y = 0) + w j,0 + (w j, w j,0 )x j j= j= COMP-55: Applied Machine Learning 0
31 Text classification example Using Naïve Bayes, we can compute probabilities for all the words which appear in the document collection. P(y=c) is the probability of class c P(x j y=c) is the probability of seeing word j in documents of class c Class c word word word word m COMP-55: Applied Machine Learning
32 Text classification example Using Naïve Bayes, we can compute probabilities for all the words which appear in the document collection. P(y=c) is the probability of class c P(x j y=c) is the probability of seeing word j in documents of class c Set of classes depends on the application, e.g. Topic modeling: each class corresponds to documents on a given topic, e.g. {Politics, Finance, Sports, Arts}. Class c What happens when a word is not observed in the training data? word word word word m COMP-55: Applied Machine Learning
33 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= COMP-55: Applied Machine Learning
34 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= With the following: Pr(x j y=) = (number of instance with x j = and y=) + (number of examples with y=) + COMP-55: Applied Machine Learning 4
35 Laplace smoothing Replace the maximum likelihood estimator: Pr(x j y=) = number of instance with x j = and y= number of examples with y= With the following: Pr(x j y=) = (number of instance with x j = and y=) + (number of examples with y=) + If no example from that class, it reduces to a prior probability or Pr=/. If all examples have x j =, then Pr(x j =0 y) has Pr = / (#examples + ). If a word appears frequently, the new estimate is only slightly biased. This is an example of Bayesian prior for Naïve Bayes. COMP-55: Applied Machine Learning 5
36 Example: 0 newsgroups Given 000 training documents from each group, learn to classify new documents according to which newsgroup they came from: comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x alt.atheism soc.religion.christian talk.religion.misc talk.politics.mideast talk.politics.misc misc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.space sci.crypt sci.electronics sci.med talk.politics.guns Naïve Bayes: 89% classification accuracy (comparable to other state-of-the-art methods.) COMP-55: Applied Machine Learning 6
37 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. COMP-55: Applied Machine Learning 7
38 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. Option assumes you have a multi-class version of the classifier. For Naïve Bayes, compute P(y x) for each class, and select the class with highest probability. COMP-55: Applied Machine Learning 8
39 Multi-class classification Generally two options:. Learn a single classifier that can produce 0 distinct output values.. Learn 0 different -vs-all binary classifiers. Option assumes you have a multi-class version of the classifier. For Naïve Bayes, compute P(y x) for each class, and select the class with highest probability. Option applies to all binary classifiers, so more flexible. But often slower (need to learn many classifiers), and creates a class imbalance problem (the target class has relatively fewer data points, compared to the aggregation of the other classes.) COMP-55: Applied Machine Learning 9
40 Best features extracted From MSc thesis by Jason Rennie COMP-55: Applied Machine Learning 40
41 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n COMP-55: Applied Machine Learning 4
42 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n If we assume the same Σ for all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagonal (i.e. features are independent): Gaussian Naïve Bayes. COMP-55: Applied Machine Learning 4
43 Gaussian Naïve Bayes Extending Naïve Bayes to continuous inputs: P(y) is still assumed to be a binomial distribution. P(x y) is assumed to be a multivariate Gaussian (normal) distribution with mean μ R n and covariance matrix Σ R n xr n If we assume the same Σ for all classes: Linear discriminant analysis. If Σ is distinct between classes: Quadratic discriminant analysis. If Σ is diagonal (i.e. features are independent): Gaussian Naïve Bayes. How do we estimate parameters? Derive the maximum likelihood estimators for μ and Σ. COMP-55: Applied Machine Learning 4
44 More generally: Bayesian networks If not all conditional independence relations are true, we can introduce new arcs to show what dependencies are there. At each node, we have the conditional probability distribution of the corresponding variable, given its parents. This more general type of graph, annotated with conditional distributions, is called a Bayesian network. More on this in COMP-65. COMP-55: Applied Machine Learning 44
45 What you should know Naïve Bayes assumption Log-odds ratio decision boundary How to estimate parameters for Naïve Bayes Laplace smoothing Relation between Naïve Bayes, LDA, QDA, Gaussian Naïve Bayes. COMP-55: Applied Machine Learning 45
Co-training and Learning with Noise
Co-training and Learning with Noise Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science and Singapore-MIT Alliance, National University of Singapore, Singapore 117543, Republic of Singapore
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationTwo Roles for Bayesian Methods
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description length principle Bayes optimal classifier Naive Bayes learner Example: Learning over text data Bayesian belief networks
More informationMachine Learning. Bayesian Learning.
Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de
More informationBayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.
Machine Learning Bayesian Learning Bayes Theorem Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationMachine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller
Machine Learning Bayesian Learning Dr. Joschka Boedecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationBAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6]
1 BAYESIAN LEARNING [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, ML hypotheses, MAP learners Minimum description length principle Bayes optimal classifier, Naive Bayes learner Example:
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More information人工知能学会インタラクティブ情報アクセスと可視化マイニング研究会 ( 第 3 回 ) SIG-AM Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi Departmen
Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi 2 1 1 Department of Information Science, Faculty of Science, Ochanomizu University 2 2 Advanced Science, Graduate School of
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationMachine Learning - MT Classification: Generative Models
Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationMachine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationLEARNING AND REASONING ON BACKGROUND NETS FOR TEXT CATEGORIZATION WITH CHANGING DOMAIN AND PERSONALIZED CRITERIA
International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 1, January 2013 pp. 47 67 LEARNING AND REASONING ON BACKGROUND NETS FOR
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationBayesian Learning. Bayes Theorem. MAP, MLhypotheses. MAP learners. Minimum description length principle. Bayes optimal classier. Naive Bayes learner
Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, MLhypotheses MAP learners Minimum description length principle Bayes optimal classier Naive Bayes learner Example:
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationRuslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group
NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear regression
COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationUsing Part-of-Speech Information for Transfer in Text Classification
Using Part-of-Speech Information for Transfer in Text Classification Jason D. M. Rennie jrennie@csail.mit.edu December 17, 2003 Abstract Consider the problem of text classification where there are very
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationarxiv: v1 [cs.lg] 5 Jul 2010
The Latent Bernoulli-Gauss Model for Data Analysis Amnon Shashua Gabi Pragier School of Computer Science and Engineering Hebrew University of Jerusalem arxiv:1007.0660v1 [cs.lg] 5 Jul 2010 Abstract We
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationGenerative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang
Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More information