Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
|
|
- Hilary Farmer
- 6 years ago
- Views:
Transcription
1 Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012)
2 Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation Generalized Linear Regression 2 Logistic Regression 3
3 A Regression Example Linear Regression Probabilistic Interpretation Generalized Linear Regression Consider the house price problem as follows: living area (m 2 ) location price (RMB10,000) 65 Zhongguancun Shangdi Shangdi Changping 168. Input samples: X = {x (1),..., x (N) } Features include x (i) (i) 1 : living area; x 2 : location Output target: Y = {y (1),..., y (N) } Linear regression model: f w (x) = w 0 + w 1 x w D x D = w = (w 0, w 1,..., w D ) T, x = (1, x 1,..., x D ) T.. D w j x j = w T x j=0
4 Least Mean Square Algorithm Linear Regression Probabilistic Interpretation Generalized Linear Regression Cost function: J(w) = 1 2 N (f w (x (i) ) y (i) ) 2 Gradient descent: w j := w j α w j J(w) For a single training example: w j := w j + α(y (i) f w (x (i) ))x (i) j
5 Least Mean Square Algorithm (2) Linear Regression Probabilistic Interpretation Generalized Linear Regression For a training set of more than one example: Batch gradient descent: Repeat until convergence { w j := w j + α N (y (i) f w(x (i) ))x (i) j } Stochastic gradient descent: Loop { For i = 1 to N { w j := w j + α(y (i) f w(x (i) ))x (i) j } }
6 Probabilistic Assumptions Linear Regression Probabilistic Interpretation Generalized Linear Regression Relate the inputs and target via: y (i) = w T x (i) + ɛ (i) with ɛ (i) IID Gaussian distributed, i.e., p(ɛ (i) ) = 1 2πσ exp( (ɛ(i) ) 2 2σ 2 ) Thus, p(y (i) x (i) ; w) = 1 2πσ exp( (y (i) w T x (i) ) 2 2σ 2 ) Likelihood function: N L(w) = L(w; X, y) = p(y X; w) = p(y (i) x (i) ; w)
7 Maximum Likelihood Linear Regression Probabilistic Interpretation Generalized Linear Regression Log likelihood L(w) = log L(w) = N 1 log exp( (y (i) w T x (i) ) 2 2πσ 2σ 2 ) = N log 1 1 2πσ σ N (y (i) w T x (i) ) 2 Maximizing L(w) gives the same answer as minimizing 1 2 N (y (i) w T x (i) ) 2 Under the previous probabilistic assumptions on the data, least-square regression corresponds to finding the maximum likelihood estimate of w.
8 Linear Regression Probabilistic Interpretation Generalized Linear Regression Locally Weighted Linear Regression (LWR) The LWR algorithm: 1 Fit w to minimize i θ(i) (y (i) w T x (i) ) 2 2 Output w T x A standard choice for the weights is: θ (i) = exp( x(i) x 2 2τ 2 ) More general form is possible The weights depend on the particular point x at which we are trying to evaluate. LWR is an example of non-parametric algorithm.
9 Linear Regression Probabilistic Interpretation Generalized Linear Regression Linear Regression with Nonlinear Basis Consider linear combinations of nonlinear basis functions of the input variables: M 1 f w (x) = w 0 + w j φ j (x) = w T φ(x) j=1 φ = (φ 0, φ 1,..., φ M 1 ), φ 0 (x) = 1 Examples of basis functions: φ(x) = (1, x, x 2,..., x M 1 ) φ j (x) = exp( (x µ j )2 2s 2 ) Can still use least squares method to estimate Linear method can model nonlinear functions through using nonlinear basis functions.
10 Two-Class Example Logistic Regression Let P(y = 1 x) = t, P(y = 0 x) = 1 t Classification rule: { y = 1 if t > 0.5 Choose = y = 0 otherwise Equivalent test for classification rule: choose y = 1 if t 1 t > 1 or log t 1 t > 0 where t 1 t is called the odds of t and log t 1 t is called the log odds or logit function of t.
11 Logistic Regression Logistic Regression The posterior probability (or the hypothesis) is written as: P(y = 1 x) = f w (x) = g(w T x) = where g( ) is called logistic or sigmoid function exp( w T x) The derivative of the sigmoid function: g (z) = g(z)(1 g(z)) Note that this is a model for classification rather than regression.
12 Parameter Estimation Logistic Regression Since thus P(y = 1 x; w) = f w (x) P(y = 0 x; w) = 1 f w (x), P(y x; w) = (f w (x)) y (1 f w (x)) 1 y. The likelihood of the parameters (given N training examples): N N L(w) = P(y (i) x (i) ; w) = (f w (x (i) )) y (i) (1 f w (x (i) )) 1 y (i) The log likelihood function: N L(w) = log L(w) = y (i) log f w (x (i) ) + (1 y (i) ) log(1 f w (x (i) ))
13 Parameter Estimation (2) Logistic Regression As for one training example (x, y), maximizing the log likelihood gives w j L(w) = (y f w (x))x j Use the fact that f w(x) = g(w T x) and g (z) = g(z)(1 g(z)) This therefore gives the stochastic gradient ascent rule: w j := w j + α(y (i) f w (x (i) ))x (i) j The solution is identical to least mean squares update rule for linear regression!
14 Logistic Discrimination Logistic Regression Logistic discrimination does not model the class-conditional densities p(x C k ) but their ratio. Parametric classification approach (studied later) learns the classifier by estimating the parameters of p(x C k ); logistic discrimination estimates the parameters of the discriminant function directly.
15 Discriminative vs. Generative classification (Revisited) Discriminative classification: model the posterior probabilities or learn discriminant function directly. Compute P(C k x): such as logistic regression. Compute discriminant function: such as perceptron, SVM. Makes an assumption on the form of the discriminants, but not the densities. Generative classification: explicitly or implicitly model the distribution of inputs as well as outputs. Assume a model for the class-conditional probability densities p(x C k ). Estimate p(x C k ) and P(C k ) from data, and apply Bayes rule to compute the posterior probabilities P(C k x). Perform optimal classification based on P(C k x).
16 Multivariate Normal Distribution x N D (µ, Σ) p(x) = 1 (2π) D/2 Σ exp( 1 1/2 2 (x µ)t Σ 1 (x µ))
17 Multivariate Normal Distribution (2) Mahalanobis distance measures the distance from x to µ in terms of Σ: (x µ) T Σ 1 (x µ) (x µ) T Σ 1 (x µ) = c 2 is the D-dimensional hyperellipsoid centered at µ. Its shape and orientation are defined by Σ. Euclidean distance is a special case of Mahalanobis distance when Σ = s 2 I; the hyperellipsoid degenerates into a hypersphere.
18 Gaussian Discriminant Analysis (GDA) GDA models p(x y) using a multivariate normal distribution: y Bernoulli(β) x y = 0 N (µ 0, Σ) x y = 1 N (µ 1, Σ) The parameters of the model are {β, µ 0, µ 1, Σ}. Note that common covariance matrix is usually used.
19 Gaussian Discriminant Analysis (2) The log likelihood of the data is: N L(β, µ 0, µ 1, Σ) = log p(x (i), y (i) ; β, µ 0, µ 1, Σ) = log N p(x (i) y (i) ; β, µ 0, µ 1, Σ)p(y (i) ; β) The maximum likelihood estimates are: β = 1 N 1{y (i) = 1} N µ k = N 1{y (i) = k}x (i) N 1{y, k = {0, 1} (i) = k} Σ = 1 N N (x (i) µ y (i))(x (i) µ y (i)) T
20 Illustration of GDA Linear Models for Regression
21 GDA and Logistic Regression If we view P(y = 1 x; β, µ 0, µ 1, Σ) as a function of x, it can be expressed in the form: P(y = 1 x; β, µ 0, µ 1, Σ) = exp( w T x), where w is some appropriate function of β, µ 0, µ 1, Σ. This is exactly the form that logistic regression (a discriminative algorithm) used to model P(y = 1 x). GDA makes stronger modeling assumptions (i.e., p(x y) is multivariate Gaussian) about the data, and is more data efficient. Logistic regression makes weaker assumptions, and is more robust to deviations from modeling assumptions.
22 Bernoulli and Multinomial Bernoulli: the distribution for a single binary variable x {0, 1}, governed by a single continuous parameter β [0, 1] which represents the probability of x = 1. Bern(x β) = β x (1 β) 1 x Binomial: gives the probability of observing m occurrences of x = 1 in a set of N samples from a Bernoulli distribution. ( ) N Bin(m N, β) = β m (1 β) N m m Multinomial: multivariate generalization of binomial, gives the distribution over counts m k for K state discrete variable to be in state k: ( Mult(m 1,..., m K N, β) = ) N K m 1 m 2... m K k=1 β m k k
23 Spam Filter Linear Models for Regression Classifying s to spam (y = 1) or non-spam (y = 0) One example of broader set of problems called text classification Feature vector: x = a aardwolf buy. zygmurgy x {0, 1} D, D is the size of the set of words (vocabulary). Word selection: include words that appear in the s but not in a dictionary; exclude very high frequency words; etc. x j s can take more general values in {1, 2,..., k j }, then the model p(x j y) is multinomial rather than Bernoulli.
24 Spam Filter (2) To model p(x y), we assume that x j s are conditionally independent given y. This is assumption. In classifier: D p(x 1,..., x D y) = p(x 1 y)p(x 2 y)... p(x D y) = p(x j y) j=1 Given a training set {x (i), y (i) }, i = 1,..., N, the joint log likelihood of the data: N L(p(x j y), P(y)) = p(x (i), y (i) )
25 Spam Filter (3) Maximizing L w.r.t. p(x j y), P(y) gives the following estimates: p(x j = 1 y = 1) = p(x j = 1 y = 0) = p(y = 1) = N N 1{x (i) j = 1 y (i) = 1} N 1{y (i) = 1} 1{x (i) j = 1 y (i) = 0} N 1{y (i) = 0} N 1{y (i) = 1} N
26 Spam Filter (4) To make a prediction on a new example x, we simply calculate: p(y = 1 x) = = p(x y = 1)p(y = 1) p(x) ( D j=1 p(x j y = 1))p(y = 1) ( D j=1 p(x j y = 1))p(y = 1) + ( D j=1 p(x j y = 0))p(y = 0) When the original, continuous-valued attributes are not well-modeled by a multivariate normal distribution, discretizing the features and using (instead of GDA) will often result in a better classifier.
27 Laplace Smoothing If you haven t seen some word before in your finite training set, the classifier does not know how to make a decision. To estimate the mean of multinomial random variable x taking values {1,..., k}, given N observations {x (1),..., x (N) }, the maximum likelihood estimates are: N p(x = j) = 1{x (i) = j}, j = 1,..., k N Laplace smoothing: N p(x = j) = 1{x (i) = j} + 1, j = 1,..., k N + k k j=1 p(x = j) = 1 still holds p(x = j) 0 for all j
28 Laplace Smoothing (2) Returning to classifier, with Laplace smoothing we obtain the following estimates: p(x j = 1 y = 1) = p(x j = 1 y = 0) = N N 1{x (i) j = 1 y (i) = 1} + 1 N 1{y (i) = 1} + 2 1{x (i) j = 1 y (i) = 0} + 1 N 1{y (i) = 0} + 2 In practice, it usually doesn t matter whether we apply Laplace smoothing to p(y = 1) or not, since we typically have a fair fraction between spam and non-spam s.
29 Event Models for Text Classification uses multivariate Bernoulli event model: the probability of an was given by P(y) D j=1 p(x j y). A different way to represent s: x = (x 1,..., x M ), x j denotes the j th word in the , taking values in {1,..., V }; V is the vocabulary; M is the length of the . Multinomial event model: p(x j y) is now multinomial. The overall probability of an is P(y) M j=1 p(x j y).
30 Event Models for Text Classification (2) Given a training set {x (i), y (i) }, i = 1,..., N, where x (i) = (x (i) 1,..., x (i) M i ) T (M i is the number of words in i th training example), the maximum likelihood estimates of the model are: p(x j = k y = 1) = p(x j = k y = 0) = P(y = 1) = N Mi (i) j=1 1{x j = k y (i) = 1} N 1{y, for any j (i) = 1}M i N Mi (i) j=1 1{x j = k y (i) = 0} N 1{y, for any j (i) = 0}M i N 1{y (i) = 1} N
31 vs. Logistic Regression Asymptotic comparison ( training examples infinite) When model assumption is correct: (NB), Logistic regression (LR) produce identical classifier When model assumption is incorrect: LR is less biased - does not assume conditional, independence therefore expected to outperform NB.
32 vs. Logistic Regression (2) Non-aymptotic comparison (see [Ng and Jordan, 2002]) Convergence rate of parameter estimation - how many training examples needed to assume good estimates? NB order log D (where D = of attributes in X) LR order D NB converges more quickly to its asymptotic estimates.
Generative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationGenerative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang
Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationLecture 15: Logistic Regression
Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationMachine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More information