Logistic Regression. Seungjin Choi
|
|
- Morgan Grant McDonald
- 6 years ago
- Views:
Transcription
1 Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin 1 / 30
2 Binary Classification 2 / 30
3 Binary Classification Consider a binary classification problem, i.e., C = {C 1, C 2 }. Given a particular vector x, we wish to assign it to one of the two classes. To this end, we use Bayes rule and calculate the relevant posterior probability: P(C 1 x) = P(x C 1)P(C 1 ) P(x) P(x C 1 )P(C 1 ) = P(x C 1 )P(C 1 ) + P(x C 2 )P(C 2 ) 1 = 1 + P(x C2) P(C 2) = P(x C 1) P(C 1) { 1 + exp log 1 [ P(x C1) P(x C 2) ] log [ ]} P(C1) P(C 2) 3 / 30
4 It can be written in the form of the logistic function: where P(C 1 x) = e ξ, [ ] [ ] P(x C1 ) P(C1 ) ξ = log + log. P(x C 2 ) P(C 2 ) }{{}}{{} likelihood ratio prior ratio Choose a particular form for the class-conditional density function P(x C i ). 4 / 30
5 Multivariate Gaussian Model Assume that the class-conditional densities are multivariate Gaussian with identical covariance matrix Σ: { 1 P(x C i ) = exp 1 } (2π) D 2 Σ (x µ i) Σ 1 (x µ i ). After a simple calculation, we have where P(C 1 x) = e (w x+b), w = Σ 1 (µ 1 µ 2 ), b = 1 2 (µ 2 + µ 1 ) Σ 1 (µ 2 µ 1 ) + log [ ] P(C1 ). P(C 2 ) 5 / 30
6 Properties of Logistic Function σ(ξ) 0 as ξ. σ(ξ) 1 as ξ. σ( ξ) = 1 σ(ξ). d dξ [σ(ξ)] = σ(ξ)σ( ξ) / 30
7 Maximum Likelihood Formulation 7 / 30
8 Logistic Regression Predict a binary output y n {0, 1} from an input x n. The logistic regression models the input-output by a conditional Bernoulli distribution where E [y n x n ] = P(y n = 1 x n ) = σ(w x n ), σ(ξ) = 1 eξ = 1 + e ξ 1 + e ξ. Discriminative model: Directly model p(y x). 8 / 30
9 Logistic Regression: MLE Given {(x n, y n ) n = 1,..., N}, the likelihood is given by p(y X, w) = = N p(y n = 1 x n ) yn (1 p(y n = 1 x n )) 1 yn N σ(w x n ) ( yn 1 σ(w x n ) ) 1 y n. Then log-likelihood function is given by L = N log p(y n x n ) = where σ n = σ(w x n ). N { } y n log σ n + (1 y n ) log(1 σ n ), This is a nonlinear function of w whose maximum cannot be computed in a closed form. Iterative re-weighted least squares (IRLS) is a popular algorithm, derived from Newton s method. 9 / 30
10 Newton s Method 10 / 30
11 Mathematical Preliminaries Gradient Hessian matrix Gradient descent/ascent Newton s method 11 / 30
12 Gradient Consider a real-valued function f (x), that takes a real-valued vector x R D as an input, f (x) : R D R. The gradient of f (x) is defined by f (x) = = f (x) x [ f,, x 1 ] f. x D 12 / 30
13 Hessian Matrix If f (x) belongs to the class C 2, the Hessian matrix H is defined as the symmetric matrix with the (i, j)-element 2 f (x) x i x j, H = 2 f (x) [ 2 ] f (x) = x i x j = 2 f x f x D x 1 [ f (x) 2 f x 1 x 2 2 f x 1 x D. 2 f x D x 2 2 f xd 2 ]. = x x = [ f (x)] x 13 / 30
14 Gradient Descent/Ascent The gradient descent/ascent learning is a simple (first-order) iterative method for minimization/maximization. Gradient descent: iterative minimization ( ) J w (k+1) = w (k) η. w Gradient ascent: iterative maximization ( ) J w (k+1) = w (k) + η. w Learning rate: η > 0 14 / 30
15 Newton s Method The basic idea of Newton s method is to optimize the quadratic approximation of the objective function J (w) around the current point w (k). The second-order Taylor series expansion of J (w) at the current point w (k) gives [ ] J 2(w) = J (w (k) ) + J (w (k) ) (w w (k)) + 1 ( w w (k)) ( 2 J (w (k) ) w w (k)), 2 where 2 J (w (k) ) is the Hessian of J (w) evaluated at w = w (k). Differentiate this w.r.t. w and set it equal 0, which leads to J (w (k) ) + 2 J (w (k) )w 2 J (w (k) )w (k) = 0, 15 / 30
16 Thus we have [ 1 w = w (k) 2 J (w )] (k) J (w (k) ). Hence the Newton s method is of the form [ 1 w (k+1) = w (k) 2 J (w )] (k) J (w (k) ). Remark: The Hessian 2 J (w (k) ) should be positive definite for all t. 16 / 30
17 Illustration of Newton s Method f(x) f quad (x) f(x) f quad (x) x k x k +d k x k x k +d k (a) convex (b) non-convex [Figure source: Murphy s book] 17 / 30
18 Logistic Regression: Algorithms 18 / 30
19 Logistic Regression: Gradient Ascent The gradient ascent learning has the form w new = w old ( ) L + η. w Calculate the gradient L w = t { yn σ(w x n ) } x n. Hence, the gradient ascent update rule for w is w new = w old + η { ( ( y n σ w old) )} xn x n. t 19 / 30
20 Detailed Calculation of Gradient Recall the log-likelihood L = N [ ] y n log σ n + (1 y n) log(1 σ n). Calculate N L w = [ ] σ n y n x n + (1 y n) σ n x n σ n 1 σ n = = = N [ ] σ n(1 σ n) σn(1 σn) y n (1 y n) x n σ n 1 σ n N [y n(1 σ n) (1 y n)σ n] x n N (y n σ n) x n. 20 / 30
21 Detailed Calculation of Hessian Calculate the Hessian: 2 L = = = w [ L] [ N ] (y n σ n) x n w N σ n(1 σ n)x nx n. 21 / 30
22 We set the objective function J (w) as the negative log-likelihood: J (w) = L(w) = N Thus, the gradient and the Hessian are: J (w) = 2 J (w) = [ ] y n log σ n + (1 y n ) log(1 σ n ). N (y n σ n ) x n, N σ n (1 σ n )x n x n. 22 / 30
23 Logistic Regression: IRLS Newton s update has the form [ N w = σ n (1 σ n ) x n x n ] 1 }{{} inverse of Hessian, [ 2 J (w)] 1 [ ] N (y n σ n ) x n, } {{ } gradient, J (w) Newton s update reduces to iterative re-weighted least squares (IRLS): where S = b = w = ( XSX ) 1 XSb, σ 1 (1 σ 1 ) σ N (1 σ N ) y 1 σ 1 σ 1(1 σ 1). y N σ N σ N (1 σ N )., 23 / 30
24 Alternatively we write w k+1 = w k + ( XS k X ) 1 XSk b k, where S k and b k are computed using w k. That is, w k+1 = w k + ( XS k X ) 1 XSk b k = ( XS k X ) 1 [( XSk X ) ] w k + XS k b k = ( XS k X ) 1 XSk [ X w k + b k ]. 24 / 30
25 Algorithm Outline Algorithm 1 IRLS Input: {(x n, y n ) n = 1,..., N} 1: Initialize w = 0 and w 0 = log (y/(1 y)) 2: repeat 3: for n = 1,..., N do 4: Compute η n = w x n + w 0 5: Compute σ n = σ(η n ) 6: Compute s n = σ n (1 σ n ) 7: Compute z n = η n + yn σn s n 8: end for 9: Construct S = diag(s 1:N ) 10: Update w = ( XSX ) 1 XSz 11: until convergence 12: return w 25 / 30
26 Generalized Linear Models 26 / 30
27 Generalized Linear Models The GLM is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM consists of three elements: 1. A probability distribution from the exponential family. 2. A linear predictor η = X θ 3. A link function g( ) such that E [y X] = µ = g 1 (η). 27 / 30
28 Logistic Regression: Generalized Linear Model Bernoulli distribution for a univariate binary random variable y {0, 1} with mean µ, p(y µ) = µ y (1 µ) 1 y. ( ) Logit transform (log-odds), θ = w x = log µ 1 µ, leads to [0, 1] R. The logistic function, which is the inverse-logit, gives µ = 1 1+e θ = σ(θ). Thus, we have p(y θ) = σ(θ) y (1 σ(θ)) 1 y = σ(θ) y σ( θ) 1 y 28 / 30
29 Multiclass Extension 29 / 30
30 Multiclass Logistic Regression (Multinomial Logistic Regression) Model input-output by a softmax transformation of logits θ k = w k x n: p(y n = k x n) = softmax(θ k ) = exp{w k x n} K j=1 exp{w j x n} Given Y R K N (each column y n R K follows the 1-of-K coding) and X R D N, the likelihood is given by p(y X, w 1,..., w K ) = The log-likelihood is given by L = N k=1 N k=1 K p(y n = k x n) Y k,n. K Y k,n log [p(y n = k x n)]. One can apply Newton s update to derive IRLS, as in logistic regression. 30 / 30
Ch 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLogistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015
Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationDATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 04: Logistic Regression 1 Outline Announcements Baseline Learning Machine Learning Pyramid Regression or Classification (that s it!) History of Classification History of
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMachine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang
Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLogistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48
Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCS 340 Lec. 16: Logistic Regression
CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationCS340 Machine learning Gaussian classifiers
CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationFisher s Linear Discriminant Analysis
Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62 Lecture 1b Logistic regression & neural network October 2, 2015 2/62 Table of contents 1 1 Bird s-eye
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationProbabilistic generative models
Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,
More informationWeighted Least Squares I
Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationLogistic Regression. Stochastic Gradient Descent
Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationClassification Logistic Regression
Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More information