Ch 4. Linear Models for Classification
|
|
- Blaise Kelley
- 5 years ago
- Views:
Transcription
1 Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro, Nam-gu, Pohang , Korea
2 Contents 4.1. Discriminant Functions 4.2. Probabilistic Generative Models 4.3 Probabilistic Discriminative Models 4.4 he Laplace Approximation 4.5 Bayesian Logistic Regression 2
3 Classification Models Linear classification model (D-1)-dimensional hyperplane for D-dimensional input space 1-of-K coding scheme for K>2 classes, such as t = (0, 1, 0, 0, 0) Discriminant function Directly assigns each vector x to a specific class. ex. Fishers linear discriminant Approaches using conditional probability pc k x Separation of inference and decision states wo approaches Direct modeling of the posterior probability Generative approach Modeling likelihood and prior probability to calculate the posterior probability Capable of generating samples 3
4 Discriminant Functions-wo Classes Classification by hyperplanes y x yx w x w if 0, xc otherwise, x C or y x w x w where w, w and x 1, x 0 4
5 Discriminant Functions-Multiple Classes One-versus-the-rest classifier K-1 classifiers for a K-class discriminant Ambiguous when more than two classifiers say yes. One-versus-one classifier K(K-1)/2 binary discriminant functions Majority voting ambiguousness with equal scores One-versus-the-rest One-versus-one 5
6 Discriminant Functions-Multiple Classes (Cont d) K-class discriminant comprising K linear functions Assigns x to the corresponding class having the maximum output. x k wk x k y w 0, k 1,..., K he decision regions are always singly connected and convex. x C if y x y x for j k k k j xa xb Ck xˆ xa xb yk xˆ yk xa yk xb x x x x y xˆ y xˆ j k For,, let 1. hen 1. y y and y y for j k, k A j A k B j B therefore for. k j 6
7 Approaches for Learning Parameters for Linear Discriminant Functions Least square method Fisher s linear discriminant Relation to least squares Multiple classes Perceptron algorithm 7
8 Least Square Method Minimization of the sum-of-squares error (SSE) 1-of-K binary coding scheme for the target vector t. y x W x W w 1 w2 wk wk wk 0, wk where... and. For a training data set, {x n, t n } where n = 1,,N. he sum of squares error function is W XW XW 1 ED r, 2 Minimizing SSE gives where X x x... x and t t... t. 1 2 N N W X X X X Pseudo inverse. 8
9 Least Square Method (Cont d) -Limit and Disadvantage he least-squares solutions yields y(x) whose elements sum to 1, but do not ensure the outputs to be in the range [0,1]. Vulnerable to outliers Because SSE function penalizes too correct examples i.e. far from the decision boundary. ML under Gaussian conditional distribution Unimodal vs. multimodal 9
10 Least Square Method (Cont d) -Limit and Disadvantage Lack of robustness comes from Least square method corresponds to the maximum likelihood under the assumption of Gaussian distribution. Binary target vectors are far from this assumption. Least square solution Logistic regression 10
11 Fisher s Linear Discriminant Linear classification model as dimensionality reduction from the D-dimensional space to one dimension. In case of two classes y w x if y w, then x C otherwise, x C 0 1 Finding w such that the projected data are clustered well. 2 11
12 Fisher s Linear Discriminant (Cont d) Maximizing projected mean distance? he distance between the cluster means, m 1 and m 2 projected onto w. m m w m m m 1 1 x and m x 1 n 2 N1 N nc 2 nc 1 2 Not appropriate when the covariances are nondiagonal. n 12
13 Fisher s Linear Discriminant (Cont d) Integrate the within-class variance of the projected data. Finding w that maximizes J(w). 2 m2 m1 2 2 J w, where s y m J w 2 2 i 2 s s B w S w w SW w J(w) is maximized when B k n k n C k S m m m m Fisher s linear discriminant w SW m2 m1 If the within-class covariance is isotropic, w is proportional to the difference of the class means as in the previous case. S B : Between-class covariance matrix S W : Within-class covariance matrix S x m x m x m x m W n 1 n 1 n 2 n 2 nc nc w S ws w B W W B 1 w S w S w in the direction of (m 2 -m 1 ) 13
14 Fisher s Linear Discriminant -Relation to Least Squares- Fisher criterion as a special case of least squares When setting target values as: N/N 1 for class C 1 and N/N 2 for class C 2. N N 2 de / dw w xn w0 tn 0 0 w x n1 n 0 n N n1 de / dw 0 w x w0 t 1 E w t 2 N w m, where n Nx m m N n1 w m N N NN 1 2 SW SB N N w m m 1 2. n1 by solving (2) with the w 0 (1) x 0 (2) n n n by solving (1). 0 above. w S 1 m m S w : always in the direction of m m W 1 2. B
15 Fisher s Discriminant for Multiple Classes K > 2 classes Dimension reduction from D to D D > 1 linear features, y k (k = 1,,D ) k Generalization of S W and S B K k y w x 1 S S, where S x m x m and m x. W k k n k n k k n N k 1 nc k nc K N N K 1 1 S x m x m, where m x N m. n n n k k N N n1 n1 k 1 S S S W B S N m m m m B k k k k 1. k S B is from the decomposition of total covariance matrix (Duda and Hart, 1997) k 15
16 Fisher s Discriminant for Multiple Classes (Cont d) Covariance matrices in the projected y-space K K sw yk μk yk μk and sb Nk μk μμk μ, k 1 nc k 1 k 1 1 where μ y and μ N μ. k n k k Nk N nc k 1 k K Fukunaga s criterion Another criterion J 1 1 W r s r W sb WSW W WSBW Duda et al. Pattern Classification, Ch Determinant: the product of the eigenvalues, i.e. the variances in the principal directions. WS W J sb B W = s WS W W W 16
17 Fisher s Discriminant for Multiple Classes (Cont d) 17
18 Perceptron Algorithm Classification of x by a perceptron, where 1, a y x f w x f a 0. 1, a 0 Error functions he total number of misclassified patterns Piecewise constant and discontinuous gradient is zero almost everywhere. Perceptron criterion. w w EP ntn, where tn is the target output. nm 18
19 Perceptron Algorithm (cont d) Stochastic gradient descent algorithm 1 w w E w w t he error from a misclassified pattern is reduced after each iteration. Not imply the overall error is reduced. Perceptron convergence theorem. If there exists an exact solution (i.e. linear separable), the perceptron learning algorithm is guaranteed to find it. However P Learning speed, linearly nonseparable, multiple classes n n 1 w t w t t t w t n n n n n n n n n n 19
20 Perceptron Algorithm (cont d) (a) (b) (c) (d) 20
21 Probabilistic Generative Models Computation of posterior probabilities using class-conditional densities and class priors. x and x p C p C p C wo classes p C 1 x k k k p x C1 p C1 x x p C p C p C p C 1 1exp Generalization to K > 2 classes x pc x k a a p Ck p Ck exp ak, p C p C exp a x x where a ln p C p C. j k k k j j j j x 1 1 x p C p C where a ln. p C p C 2 2 he normalized exponential is also known as the softmax function, i.e. smoothed version of the max function. 21
22 Probabilistic Generative Models -Continuous Inputs- Posterior probabilities when the class-conditional densities are Gaussian. When sharing the same covariance matrix, p x C exp / 2 1/ 2 k. D x μ 2 2 k x μ k p x Ck wo classes 1 x w x w0 p C w μ1 μ2 and w0 μ1 μ1 μ2 μ2 ln 2 2 p C p C 1 2 pc 1 x he quadratic terms in x from the exponents are cancelled. he resulting decision boundary is linear in input space. he prior only shifts the decision boundary, i.e. parallel contour. 22
23 Probabilistic Generative Models -Continuous Inputs (cont d)- Generalization to K classes a x k wk x wk 0 1 w μ μ μ k k and wk 0 k k ln p Ck When sharing the same covariance matrix, the decision boundaries are linear again. If each class-condition density have its own covariance matrix, we will obtain quadratic functions of x, giving rise to a quadratic discriminant. 23
24 Probabilistic Generative Models -Maximum Likelihood Solution- Determining the parameters for px Ck and pck using maximum likelihood from a training data set. wo classes x t n N pc pc Data set:,, 1,..., xn, 1 1 xn 1 xn μ1, x, x 1 x μ, p C p C p C N p C p C p C N n 2 2 n 2 n 2 he likelihood function n n t 1 or 0, (denoting C and C, respectively) n N 1 2 Priors: and t t x, μ, μ, x μ, n 1 x μ, 1 p N N 1 2 n 1 n 2 n1 t n t t1,..., tn 24
25 Probabilistic Generative Models -Maximum Likelihood Solution (cont d)- wo classes (cont d) Maximization of the likelihood with respect to π. erms of the log likelihood that depend on π. Setting the derivative with respect to π equal to zero. N tnln 1 tnln 1 N N N tn n1 Maximization with respect to μ 1. N N N N N n1 1 x μ x μ x μ n n 1 n n 1 n 1 2 n1 n1 1 N N 1 1 tn n N and analogously μ2 1 tn N 1 2 n 1 n 1 N t ln N, t const. μ x x n 25
26 Probabilistic Generative Models -Maximum Likelihood Solution (cont d)- wo classes (cont d) Maximization of the likelihood with respect to the shared covariance matrix. N N 1 1 t t x μ x μ n n n 1 n 1 n1 n1 N t 1 t x μ x μ n n n 2 n 2 n1 n1 N N ln r 2 2 N 1 S N1 N2 S S1 S2 N N 1 S N x μ x μ k k n k n k k n C S Weighted average of the covariance matrices associated with each classes. But not robust to outliers. 26
27 Probabilistic Generative Models -Discrete Features- Discrete feature values xi 0,1 General distribution would correspond to a 2 D size table. When we have D inputs, the table size grows exponentially with the number of features. Naïve Bayes assumption, conditioned on the class C k D xi x 1 1 p C k i1 ki D ki x x i ln p C p C x ln 1 x ln 1 ln p C k k i ki i ki k i1 Linear with respect to the features as in the continuous features. 27
28 Bayes Decision Boundaries: 2D -Pattern Classification, Duda et al. pp.42 28
29 Bayes Decision Boundaries: 3D -Pattern Classification, Duda et al. pp.43 29
30 Probabilistic Generative Models -Exponential Family- For both Gaussian distributed and discrete inputs he posterior class probabilities are given by Generalized linear models with logistic sigmoid or softmax activation functions. Generalization to the class-conditional densities of the exponential family wo-classes he subclass for which u(x) = x. Exponential family x λ x λ exp λ ux p h g k k k For some scaling parameter s, p x λ k, s h g k exp k. s s x λ s λ x x λ λ x ln λ ln λ ln ln a g g p C p C x a p C. 1 1 K-classes a x λ x ln g λ ln p C Linear with respect to x again. k k k k exp ak where pck x. exp a j j 30
31 3 Approaches for classification Discriminant Functions Probabilistic Generative Models Fit class-conditional densities and class priors separately Apply Bayes theorem to find the posterior class probabilities Posterior probability of a class can be written as Logistic sigmoid acting on a linear function of x (2 classes) Softmax transformation of a linear function of x (Multiclass) he parameters of the densities as well as the class priors can be determined using Maximum Likelihood Probabilistic Discriminative Models Use the functional form of the generalized linear model explicitly Determine the parameters directly using Maximum Likelihood 31
32 Fixed basis functions Assume fixed nonlinear transformation ransform inputs using a vector of basis functions he resulting decision boundaries will be linear in the feature space 32
33 Logistic regression Logistic regression model Posterior probability of a class for two-class problem: he number of adjustable parameters (M-dimensional, 2-class) 2 Gaussian class conditional densities (generative model) 2M parameters for means M(M+1)/2 parameters for (shared) covariance matrix Grows quadratically with M Logistic regression (discriminative model) M parameters for Grows linearly with M 33
34 Logistic regression (Cont d) Determining the parameters using ML Likelihood function: Cross-entropy error function (negative log likelihood) he gradient of the error function w.r.t. w (the same form as the linear regression model) 34
35 Iterative reweighted least squares Linear regression models in ch.3 ML solution on the assumption of a Gaussian noise leads to a close-form solution, as a consequence of the quadratic dependence of the log likelihood on the parameter w. Logistic regression model No longer a closed-form solution But the error function is concave and has a unique minimum Efficient iterative technique can be used he Newton-Raphson update to minimize a function E(w) Where H is the Hessian matrix, the second derivatives of E(w) 35
36 Iterative reweighted least squares (Cont d) Sum-of-squares error function: Newton-Raphson update: Cross-entropy error function: Newton-Rhapson update: (iterative reweighted least squares) 36
37 Multiclass logistic regerssion Posterior probability for multiclass classification We can use ML to determine the parameters directly. Likelihood function using 1-of-K coding scheme Cross-entropy error function for the multiclass classification 37
38 Multiclass logistic regression (Cont d) he derivative of the error function Same form, the product of error times the basis function. he Hessian matrix IRLS algorithm can also be used for a batch processing 38
39 Probit regression For a broad range of class-conditional distributions, described by the exponential family, the resulting posterior class probabilities are given by a logistic(or softmax) transformation acting on a linear function of the feature variables. However this is not the case for all choices of class-conditional density It might be worth exploring other types of discriminative probabilistic model 39
40 Probit regression Noisy threshold model Corresponding activation function when θ is drawn from p(θ) he probit function Sigmoidal shape he generalized linear model based on a probit activation function is known as probit regression. 40
41 Canonical link functions We have seen that for some models, if we take the derivative of the error function w.r.t the parameter w, it takes the form of the error times the feature vector. Logistic regression model with sigmoid activation function Logistic regression model with softmax activation function his is a general result of assuming a conditional distribution for the target variable from the exponential family, along with a corresponding choice for the activation function known as the canonical link function. 41
42 Canonical link functions (Cont d) Conditional distributions of the target variable Log likelihood: he derivative of the log likelihood: where he canonical link function: then 42
43 he Laplace approximation We cannot integrate exactly over the parameter vector since the posterior is no longer Gaussian. he Laplace approximation: find a Gaussian approximation centered on the mode of the distribution. aylor expansion of the logarithm of the target function: Resulting approximated Gaussian distribution: 43
44 he Laplace approximation (Cont d) M-dimensional case 44
45 Model comparison and BIC Laplace approximation to the normalization constant Z his result can be used to obtain an approximation to the model evidence, which plays a central role in Bayesian model comparison. Consider a set of models having parameters he log of model evidence can be approximated as Further approximation with some more assumption: Bayesian Information Criterion (BIC) 45
46 Bayesian Logistic Regression Exact Bayesian inference is intractable. Gaussian prior: Posterior: Log of posterior: Laplace approximation of posterior distribution 46
47 Predictive distribution Can be obtained by marginalizing w.r.t the posterior distribution p (w t) which is approximated by a Gaussian q(w) where a is a marginal distribution of a Gaussian which is also Gaussian 47
48 Predictive distribution Resulting variational approximation to the predictive distribution o integrate over a, we make use of the close similarity between the logistic sigmoid function and the probit function hen where Finally we get 48
LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationProbabilistic generative models
Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationThe Laplace Approximation
The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More information