Ch 4. Linear Models for Classification

Size: px
Start display at page:

Download "Ch 4. Linear Models for Classification"

Transcription

1 Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro, Nam-gu, Pohang , Korea

2 Contents 4.1. Discriminant Functions 4.2. Probabilistic Generative Models 4.3 Probabilistic Discriminative Models 4.4 he Laplace Approximation 4.5 Bayesian Logistic Regression 2

3 Classification Models Linear classification model (D-1)-dimensional hyperplane for D-dimensional input space 1-of-K coding scheme for K>2 classes, such as t = (0, 1, 0, 0, 0) Discriminant function Directly assigns each vector x to a specific class. ex. Fishers linear discriminant Approaches using conditional probability pc k x Separation of inference and decision states wo approaches Direct modeling of the posterior probability Generative approach Modeling likelihood and prior probability to calculate the posterior probability Capable of generating samples 3

4 Discriminant Functions-wo Classes Classification by hyperplanes y x yx w x w if 0, xc otherwise, x C or y x w x w where w, w and x 1, x 0 4

5 Discriminant Functions-Multiple Classes One-versus-the-rest classifier K-1 classifiers for a K-class discriminant Ambiguous when more than two classifiers say yes. One-versus-one classifier K(K-1)/2 binary discriminant functions Majority voting ambiguousness with equal scores One-versus-the-rest One-versus-one 5

6 Discriminant Functions-Multiple Classes (Cont d) K-class discriminant comprising K linear functions Assigns x to the corresponding class having the maximum output. x k wk x k y w 0, k 1,..., K he decision regions are always singly connected and convex. x C if y x y x for j k k k j xa xb Ck xˆ xa xb yk xˆ yk xa yk xb x x x x y xˆ y xˆ j k For,, let 1. hen 1. y y and y y for j k, k A j A k B j B therefore for. k j 6

7 Approaches for Learning Parameters for Linear Discriminant Functions Least square method Fisher s linear discriminant Relation to least squares Multiple classes Perceptron algorithm 7

8 Least Square Method Minimization of the sum-of-squares error (SSE) 1-of-K binary coding scheme for the target vector t. y x W x W w 1 w2 wk wk wk 0, wk where... and. For a training data set, {x n, t n } where n = 1,,N. he sum of squares error function is W XW XW 1 ED r, 2 Minimizing SSE gives where X x x... x and t t... t. 1 2 N N W X X X X Pseudo inverse. 8

9 Least Square Method (Cont d) -Limit and Disadvantage he least-squares solutions yields y(x) whose elements sum to 1, but do not ensure the outputs to be in the range [0,1]. Vulnerable to outliers Because SSE function penalizes too correct examples i.e. far from the decision boundary. ML under Gaussian conditional distribution Unimodal vs. multimodal 9

10 Least Square Method (Cont d) -Limit and Disadvantage Lack of robustness comes from Least square method corresponds to the maximum likelihood under the assumption of Gaussian distribution. Binary target vectors are far from this assumption. Least square solution Logistic regression 10

11 Fisher s Linear Discriminant Linear classification model as dimensionality reduction from the D-dimensional space to one dimension. In case of two classes y w x if y w, then x C otherwise, x C 0 1 Finding w such that the projected data are clustered well. 2 11

12 Fisher s Linear Discriminant (Cont d) Maximizing projected mean distance? he distance between the cluster means, m 1 and m 2 projected onto w. m m w m m m 1 1 x and m x 1 n 2 N1 N nc 2 nc 1 2 Not appropriate when the covariances are nondiagonal. n 12

13 Fisher s Linear Discriminant (Cont d) Integrate the within-class variance of the projected data. Finding w that maximizes J(w). 2 m2 m1 2 2 J w, where s y m J w 2 2 i 2 s s B w S w w SW w J(w) is maximized when B k n k n C k S m m m m Fisher s linear discriminant w SW m2 m1 If the within-class covariance is isotropic, w is proportional to the difference of the class means as in the previous case. S B : Between-class covariance matrix S W : Within-class covariance matrix S x m x m x m x m W n 1 n 1 n 2 n 2 nc nc w S ws w B W W B 1 w S w S w in the direction of (m 2 -m 1 ) 13

14 Fisher s Linear Discriminant -Relation to Least Squares- Fisher criterion as a special case of least squares When setting target values as: N/N 1 for class C 1 and N/N 2 for class C 2. N N 2 de / dw w xn w0 tn 0 0 w x n1 n 0 n N n1 de / dw 0 w x w0 t 1 E w t 2 N w m, where n Nx m m N n1 w m N N NN 1 2 SW SB N N w m m 1 2. n1 by solving (2) with the w 0 (1) x 0 (2) n n n by solving (1). 0 above. w S 1 m m S w : always in the direction of m m W 1 2. B

15 Fisher s Discriminant for Multiple Classes K > 2 classes Dimension reduction from D to D D > 1 linear features, y k (k = 1,,D ) k Generalization of S W and S B K k y w x 1 S S, where S x m x m and m x. W k k n k n k k n N k 1 nc k nc K N N K 1 1 S x m x m, where m x N m. n n n k k N N n1 n1 k 1 S S S W B S N m m m m B k k k k 1. k S B is from the decomposition of total covariance matrix (Duda and Hart, 1997) k 15

16 Fisher s Discriminant for Multiple Classes (Cont d) Covariance matrices in the projected y-space K K sw yk μk yk μk and sb Nk μk μμk μ, k 1 nc k 1 k 1 1 where μ y and μ N μ. k n k k Nk N nc k 1 k K Fukunaga s criterion Another criterion J 1 1 W r s r W sb WSW W WSBW Duda et al. Pattern Classification, Ch Determinant: the product of the eigenvalues, i.e. the variances in the principal directions. WS W J sb B W = s WS W W W 16

17 Fisher s Discriminant for Multiple Classes (Cont d) 17

18 Perceptron Algorithm Classification of x by a perceptron, where 1, a y x f w x f a 0. 1, a 0 Error functions he total number of misclassified patterns Piecewise constant and discontinuous gradient is zero almost everywhere. Perceptron criterion. w w EP ntn, where tn is the target output. nm 18

19 Perceptron Algorithm (cont d) Stochastic gradient descent algorithm 1 w w E w w t he error from a misclassified pattern is reduced after each iteration. Not imply the overall error is reduced. Perceptron convergence theorem. If there exists an exact solution (i.e. linear separable), the perceptron learning algorithm is guaranteed to find it. However P Learning speed, linearly nonseparable, multiple classes n n 1 w t w t t t w t n n n n n n n n n n 19

20 Perceptron Algorithm (cont d) (a) (b) (c) (d) 20

21 Probabilistic Generative Models Computation of posterior probabilities using class-conditional densities and class priors. x and x p C p C p C wo classes p C 1 x k k k p x C1 p C1 x x p C p C p C p C 1 1exp Generalization to K > 2 classes x pc x k a a p Ck p Ck exp ak, p C p C exp a x x where a ln p C p C. j k k k j j j j x 1 1 x p C p C where a ln. p C p C 2 2 he normalized exponential is also known as the softmax function, i.e. smoothed version of the max function. 21

22 Probabilistic Generative Models -Continuous Inputs- Posterior probabilities when the class-conditional densities are Gaussian. When sharing the same covariance matrix, p x C exp / 2 1/ 2 k. D x μ 2 2 k x μ k p x Ck wo classes 1 x w x w0 p C w μ1 μ2 and w0 μ1 μ1 μ2 μ2 ln 2 2 p C p C 1 2 pc 1 x he quadratic terms in x from the exponents are cancelled. he resulting decision boundary is linear in input space. he prior only shifts the decision boundary, i.e. parallel contour. 22

23 Probabilistic Generative Models -Continuous Inputs (cont d)- Generalization to K classes a x k wk x wk 0 1 w μ μ μ k k and wk 0 k k ln p Ck When sharing the same covariance matrix, the decision boundaries are linear again. If each class-condition density have its own covariance matrix, we will obtain quadratic functions of x, giving rise to a quadratic discriminant. 23

24 Probabilistic Generative Models -Maximum Likelihood Solution- Determining the parameters for px Ck and pck using maximum likelihood from a training data set. wo classes x t n N pc pc Data set:,, 1,..., xn, 1 1 xn 1 xn μ1, x, x 1 x μ, p C p C p C N p C p C p C N n 2 2 n 2 n 2 he likelihood function n n t 1 or 0, (denoting C and C, respectively) n N 1 2 Priors: and t t x, μ, μ, x μ, n 1 x μ, 1 p N N 1 2 n 1 n 2 n1 t n t t1,..., tn 24

25 Probabilistic Generative Models -Maximum Likelihood Solution (cont d)- wo classes (cont d) Maximization of the likelihood with respect to π. erms of the log likelihood that depend on π. Setting the derivative with respect to π equal to zero. N tnln 1 tnln 1 N N N tn n1 Maximization with respect to μ 1. N N N N N n1 1 x μ x μ x μ n n 1 n n 1 n 1 2 n1 n1 1 N N 1 1 tn n N and analogously μ2 1 tn N 1 2 n 1 n 1 N t ln N, t const. μ x x n 25

26 Probabilistic Generative Models -Maximum Likelihood Solution (cont d)- wo classes (cont d) Maximization of the likelihood with respect to the shared covariance matrix. N N 1 1 t t x μ x μ n n n 1 n 1 n1 n1 N t 1 t x μ x μ n n n 2 n 2 n1 n1 N N ln r 2 2 N 1 S N1 N2 S S1 S2 N N 1 S N x μ x μ k k n k n k k n C S Weighted average of the covariance matrices associated with each classes. But not robust to outliers. 26

27 Probabilistic Generative Models -Discrete Features- Discrete feature values xi 0,1 General distribution would correspond to a 2 D size table. When we have D inputs, the table size grows exponentially with the number of features. Naïve Bayes assumption, conditioned on the class C k D xi x 1 1 p C k i1 ki D ki x x i ln p C p C x ln 1 x ln 1 ln p C k k i ki i ki k i1 Linear with respect to the features as in the continuous features. 27

28 Bayes Decision Boundaries: 2D -Pattern Classification, Duda et al. pp.42 28

29 Bayes Decision Boundaries: 3D -Pattern Classification, Duda et al. pp.43 29

30 Probabilistic Generative Models -Exponential Family- For both Gaussian distributed and discrete inputs he posterior class probabilities are given by Generalized linear models with logistic sigmoid or softmax activation functions. Generalization to the class-conditional densities of the exponential family wo-classes he subclass for which u(x) = x. Exponential family x λ x λ exp λ ux p h g k k k For some scaling parameter s, p x λ k, s h g k exp k. s s x λ s λ x x λ λ x ln λ ln λ ln ln a g g p C p C x a p C. 1 1 K-classes a x λ x ln g λ ln p C Linear with respect to x again. k k k k exp ak where pck x. exp a j j 30

31 3 Approaches for classification Discriminant Functions Probabilistic Generative Models Fit class-conditional densities and class priors separately Apply Bayes theorem to find the posterior class probabilities Posterior probability of a class can be written as Logistic sigmoid acting on a linear function of x (2 classes) Softmax transformation of a linear function of x (Multiclass) he parameters of the densities as well as the class priors can be determined using Maximum Likelihood Probabilistic Discriminative Models Use the functional form of the generalized linear model explicitly Determine the parameters directly using Maximum Likelihood 31

32 Fixed basis functions Assume fixed nonlinear transformation ransform inputs using a vector of basis functions he resulting decision boundaries will be linear in the feature space 32

33 Logistic regression Logistic regression model Posterior probability of a class for two-class problem: he number of adjustable parameters (M-dimensional, 2-class) 2 Gaussian class conditional densities (generative model) 2M parameters for means M(M+1)/2 parameters for (shared) covariance matrix Grows quadratically with M Logistic regression (discriminative model) M parameters for Grows linearly with M 33

34 Logistic regression (Cont d) Determining the parameters using ML Likelihood function: Cross-entropy error function (negative log likelihood) he gradient of the error function w.r.t. w (the same form as the linear regression model) 34

35 Iterative reweighted least squares Linear regression models in ch.3 ML solution on the assumption of a Gaussian noise leads to a close-form solution, as a consequence of the quadratic dependence of the log likelihood on the parameter w. Logistic regression model No longer a closed-form solution But the error function is concave and has a unique minimum Efficient iterative technique can be used he Newton-Raphson update to minimize a function E(w) Where H is the Hessian matrix, the second derivatives of E(w) 35

36 Iterative reweighted least squares (Cont d) Sum-of-squares error function: Newton-Raphson update: Cross-entropy error function: Newton-Rhapson update: (iterative reweighted least squares) 36

37 Multiclass logistic regerssion Posterior probability for multiclass classification We can use ML to determine the parameters directly. Likelihood function using 1-of-K coding scheme Cross-entropy error function for the multiclass classification 37

38 Multiclass logistic regression (Cont d) he derivative of the error function Same form, the product of error times the basis function. he Hessian matrix IRLS algorithm can also be used for a batch processing 38

39 Probit regression For a broad range of class-conditional distributions, described by the exponential family, the resulting posterior class probabilities are given by a logistic(or softmax) transformation acting on a linear function of the feature variables. However this is not the case for all choices of class-conditional density It might be worth exploring other types of discriminative probabilistic model 39

40 Probit regression Noisy threshold model Corresponding activation function when θ is drawn from p(θ) he probit function Sigmoidal shape he generalized linear model based on a probit activation function is known as probit regression. 40

41 Canonical link functions We have seen that for some models, if we take the derivative of the error function w.r.t the parameter w, it takes the form of the error times the feature vector. Logistic regression model with sigmoid activation function Logistic regression model with softmax activation function his is a general result of assuming a conditional distribution for the target variable from the exponential family, along with a corresponding choice for the activation function known as the canonical link function. 41

42 Canonical link functions (Cont d) Conditional distributions of the target variable Log likelihood: he derivative of the log likelihood: where he canonical link function: then 42

43 he Laplace approximation We cannot integrate exactly over the parameter vector since the posterior is no longer Gaussian. he Laplace approximation: find a Gaussian approximation centered on the mode of the distribution. aylor expansion of the logarithm of the target function: Resulting approximated Gaussian distribution: 43

44 he Laplace approximation (Cont d) M-dimensional case 44

45 Model comparison and BIC Laplace approximation to the normalization constant Z his result can be used to obtain an approximation to the model evidence, which plays a central role in Bayesian model comparison. Consider a set of models having parameters he log of model evidence can be approximated as Further approximation with some more assumption: Bayesian Information Criterion (BIC) 45

46 Bayesian Logistic Regression Exact Bayesian inference is intractable. Gaussian prior: Posterior: Log of posterior: Laplace approximation of posterior distribution 46

47 Predictive distribution Can be obtained by marginalizing w.r.t the posterior distribution p (w t) which is approximated by a Gaussian q(w) where a is a marginal distribution of a Gaussian which is also Gaussian 47

48 Predictive distribution Resulting variational approximation to the predictive distribution o integrate over a, we make use of the close similarity between the logistic sigmoid function and the probit function hen where Finally we get 48

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Linear Models for Classification

Linear Models for Classification Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Probabilistic generative models

Probabilistic generative models Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Variational Bayesian Logistic Regression

Variational Bayesian Logistic Regression Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

The Laplace Approximation

The Laplace Approximation The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Machine Learning - Waseda University Logistic Regression

Machine Learning - Waseda University Logistic Regression Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information