Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Size: px
Start display at page:

Download "Machine Learning. Lecture 3: Logistic Regression. Feng Li."

Transcription

1 Machine Learning Lecture 3: Logistic Regression Feng Li School of Computer Science and Technology Shandong University Fall 2016

2 Logistic Regression Classification problem Similar to regression problem, but we would like to predict only a small number of discrete values (instead of continuous values) Binary classification problem: y takes only two values, 0 (negative class) and 1 (positive class) y (i) {0, 1} is also called the label for the training example Logistic regression Use a logistic function (or sigmoid function) g(z) = 1/(1 + e z ) to continuously approximate discrete classification 2 / 31

3 Logistic Regression (Contd.) Properties of the sigmoid function Bound g(z) (0, 1) Symmetric Gradient 1 g(z) = g( z) g (z) = g(z)(1 g(z)) 3 / 31

4 Logistic Regression (Contd.) Logistic regression defines h θ (x) using the sigmoid function h θ (x) = g(θ T x) = 1/(1 + e θt x ) Sigmoid computes a real-valued score (θ T x) for input x and squashes it between (0, 1) to turn this score into a probability (of x s label being 1) Thus, we have p(y = 1 x; θ) = h θ (x) = 1/(1 + exp( θ T x) p(y = 0 x; θ) = 1 h θ (x) = 1/(1 + exp(θ T x) 4 / 31

5 Logistic Regression: A Closer Look... What s the underlying decision rule in logistic regression? At the decision boundary, both classes are equiprobable; thus, we have p(y = 1 x; θ) = p(y = 0 x; θ) exp( θ T x) = exp(θ T x) exp(θ T x) = 1 θ T x = 0 Therefore, the decision boundary of logistic regression is nothing but a linear hyperplane Hence, y = 1 if θ T x 0; otherwise, y = 0 5 / 31

6 Interpreting The Probabilities... Recall that p(y = 1 x; θ) = exp( θ T x) The score θ T x is also a measure of distance of x from the hyperplane (the score is positive for pos. examples, and negative for neg. examples) High positive score: High probability of label 1 High negative score: Low probability of label 1 (high prob. of label 0) 6 / 31

7 Logistic Regression Formulation Logistic regression model h θ (x) = g(θ T x) = 1/(1 + e θt x ) Assume p(y = 1 x; θ) = h θ (x) and p(y = 0 x; θ) = 1 h θ (x), then we have p(y x; θ) = (h θ (x)) y (1 h θ (x)) 1 y If we assume y { 1, 1} instead of y {0, 1}, then p(y x; θ) = exp( yθ T x) Assuming the training examples were generated independently, we define the likelihood of the parameters as L(θ) = p(y X; θ) = i p(y (i) x (i) ; θ) = i (h θ (x (i) )) y(i) (1 h θ (x (i) )) 1 y(i) 7 / 31

8 Logistic Regression Formulation (Contd.) Maximize the log likelihood l(θ) = log L(θ) = m Gradient ascent algorithm θ j θ j + α θj l(θ) for j For one training example (x, y), ( ) y (i) log h(x (i) ) + (1 y (i) ) log(1 h(x (i) ) θ j l(θ) = (y h θ (x))x j If stochastic gradient ascent rule is employed θ j = θ j + α(y i h θ (x i))x i,j 8 / 31

9 Newton s Method Given a differentiable real-valued f : R R, how can we find θ such that f(θ) = 0? The following iteration is performed until a sufficiently accurate value is achieved x x f(x) f (x) 9 / 31

10 Newton s Method (Contd.) To maximize f(x), we have to find the stationary point of f(x) such that f (x) = 0. According to Newton s method, we have the following update x x f (x) f (x) Newton-Raphson method: For l : R m R, we generalization Newton s method to the multidimensional setting θ θ H 1 θ l(θ) where H is the Hessian matrix H i,j = 2 l(θ) θ i θ j 10 / 31

11 Newton s Method (Contd.) Higher convergence speed than (batch) gradient descent Fewer iterations to approach the minimum However, each iteration is more expensive than the one of gradient descent Finding and inverting an m m Hessian More details about Newton s method can be found at 11 / 31

12 Generalized Linear Models (GLMs) So far, Regression problem y x; θ N (µ, σ 2 ) Classification problem y x; θ Bernoulli(φ) In fact, both of these methods are special cases of a broader family of models, so-called generalized linear models (GLMs) 12 / 31

13 The Exponential Family An exponential family is a set of probability distributions of the following form p(y; η) = b(y) exp(η T Φ(y) a(η)) η is called the natural parameter (also called the canonical parameter) Φ(y) is the sufficient statistic (we usually take Φ(y) = y) b(y) is the underlying measure (e.g., counting measure or Lebesgue measure) a(η) is the log partition (or cumulant) function exp (a(η)) is a constant for normalization purpose, such that a(η) is not a degree of freedom in the specification of an exponential family density Legesgue integral : exp (a(η)) = b(y) exp ( η T Φ(y) ) dx Φ( ), a( ) and b( ) define a family of distributions parameterized by η 13 / 31

14 The Exponential Family (Contd.) For exponential family, we have p(y; η)dy = b(y) exp(η T Φ(y) a(η))dy = 1 y Calculate derivative w.r.t. η a (η) = b(y) exp(η T Φ(y) a(η))φ(y)dy Maximizing the log-likelihood l(η) = y y m log b(y (i) ) ma(η) + η T m Φ(y (i) ) Thus a (η) = 1 m m Φ(y(i) ) Sufficiency: The likelihood for η only depends on y through Φ(y) When m, a (η) E(Φ(y)) 14 / 31

15 The Exponential Family (Contd.) Bernoulli distribution p(y; ψ) = ψ y (1 ψ) 1 y = exp(y log ψ + (1 y) log(1 ψ)) = exp (y log ψ + log(1 ψ) y log(1 ψ)) ( ) ψ = exp y log + log(1 ψ) 1 ψ η = log(ψ/(1 ψ)) ( and thus ψ = 1/(1 + e η )) Φ(y) = y a(η) = log(1 ψ) = log(1 + e η ) b(y) = 1 15 / 31

16 The Exponential Family (Contd.) Gaussian distribution N (µ, σ 2 ) ( p(y; µ, σ 2 1 ) = exp 1 ) (y µ)2 2πσ 2σ2 ( 1 µ = exp 2π σ 2 y 1 2σ 2 y2 1 ) 2σ 2 µ2 log σ [ ] µ/σ 2 η = 1/2σ [ ] 2 y Φ(y) = y 2 a(η) = µ2 + log σ = η2 2σ 2 1 4η 2 1 log( 2η2) 2 b(y) = 1/ 2π 16 / 31

17 The Exponential Family (Contd.) Piosson distribution p(y; λ) = λy e λ y! = 1 exp (y log λ λ) y! η = log λ Φ(y) = y a(η) = λ = e η b(y) = 1/y! 17 / 31

18 The Exponential Family (Contd.) Multinomial distribution p(y; ψ) = = = = M! x 1!x 2! x k! ψx1 1 ψx2 2 ψx k k ( k ) M! x 1!x 2! x k! exp x i log ψ i M! x 1!x 2! x k! exp M! x 1!x 2! x k! exp ( k 1 ( k 1 k 1 x i log ψ i + (1 log( ψ i 1 k 1 ψ i k 1 x i ) log(1 k 1 )x i + log(1 ψ i ) ψ i ) ) ) 18 / 31

19 Constructing GLMs Assumptions y x; θ ExponentialFamily(η) Φ(y) = y which means h(x) = E[y x] η = θ T x Poisson distribution is an exponential family distribution, so the question is, how to apply GLM for the above problem. 19 / 31

20 Constructing GLMs (Contd.) Ordinary least square y x N (µ, σ 2 ) Since η = µ, we have h θ (x) = E[y x; θ] = µ = η = θ T x 20 / 31

21 Constructing GLMs (Contd.) Logistic regression For Bernoulli distribution, ψ = 1/(1 + e η ) Therefore, we have h θ (x) = E[y x; θ] = ψ = 1/(1 + e η ) = 1/(1 + e θt x ) 21 / 31

22 Multiclass Classification Multiclass (or multinomial) classification is the problem of classifying instances into one of the more than two classes The existing multiclass classification techniques can be categorized into Transformation to binary Extension from binary Hierarchical classification 22 / 31

23 Transformation to Binary One-vs.-rest (one-vs.-all, OvA or OvR, one-against-all, OAA) strategy is to train a single classifier per class, with the samples of that class as positive samples and all other samples as negative ones Inputs: A learning algorithm L, samples x, labels y where y i {1,..., K} is the label for the sample x i Output: A list of classifier f k for k {1,, K} Procedure: For each k {1,.K}, construct a new label vector z where z i = 1 if y i = k and z i = 0 otherwise, and then apply L to x and z to obtain f k Making decision: y = arg min k f k (x) 23 / 31

24 Transformation to Binary One-vs.-one reduction K(K 1)/2 binary classifiers is trained for a K-way multiclass problem Each receives the samples of a pair of classes from the original training set At prediction time, a voting scheme is applied: all classifiers are applied to the incoming sample, and the class that got the largest number of positive predictions is chosen 24 / 31

25 Softmax Regression (Contd.) A classification problem where y {1, 2,, k} Multinomial distribution A generalization of the binomial distribution A trial has k outcomes and we perform the trial n times independently, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories Assuming ψ i = p(y = i; ψ) (i = 1,, k) denote the probability of i-th outcomes, since i ψi = 1, the multinomial distribution can be parameterized by only k 1 parameters {ψ 1, ψ 2,, ψ k 1 } Question: Can we express the multinomial as an exponential family distribution? 25 / 31

26 Defining Φ(y) R k 1 Softmax Regression (Contd.) Φ(1) =. Φ(2) = 1 0 Φ(k 1) =.. Φ(k) = Assuming 1{T rue} = 1 and 1{F alse} = 0, we have p(y; ψ) = ψ 1{y=1} 1 ψ 1{y=2} 2 ψ 1{y=k} k = ψ 1{y=1} 1 ψ 1{y=2} 2 ψ 1 k 1 k 1{y=k} = ψ (Φ(y))1 1 ψ (Φ(y))2 2 ψ 1 k 1 (Φ(y))i k ( ( = exp log ψ (Φ(y))1 1 ψ (Φ(y))2 2 ψ 1 k 1 k ( k 1 ) = exp (Φ(y)) i log(ψ i /ψ k ) + log ψ k (Φ(y))i )) 26 / 31

27 Softmax Regression (Contd.) We have p(y; ψ) = exp ( k 1 ) (Φ(y)) i log(ψ i /ψ k ) + log ψ k = b(y) exp(η T Φ(y) a(η)) η = log(ψ 1/φ k ) log(ψ 2/φ k ) log(ψ k 1 /φ k ) a(η) = log(ψ k ) b(y) = 1 27 / 31

28 Softmax Regression (Contd.) The link function is Then, we have η i = log ψ i ψ k, for i = 1, 2,, k k e ηi = ψ i /ψ k, ψ i = ψ k e ηi, ψ k e ηi = Therefore, the response function is derived as follows ψ k = 1/ k ψ i = 1 k e ηi e ηi, ψ i = k j=1 eηj The function mapping from η s to the ψ s is so-called the softmax function 28 / 31

29 Softmax Regression (Contd.) Since η i = θi T x (for i = 1, 2,, k 1), we have the following softmax model p(y = i x, w) = ψ i = e ηi e θt i x k = k j=1 eηj j=1 eθt j x Our hypothesis outputs h θ (x) = E[Φ(y) x; θ] = ψ 1 ψ 2. ψ k 1 = exp(θ T 1 x) k j=1 exp(θt j x) exp(θ T 2 x) k j=1 exp(θt j x). exp(θ T k 1 x) k j=1 exp(θt j x) 29 / 31

30 Softmax Regression (Contd.) Training data set {(x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) )} Log-likelihood function m m l(θ) = log p(y i x i ; θ) = log ( ) 1{y k e θt r xi k j=1 eθt j x(i) r=1 Maximizing l(θ) through gradient ascent or Newton s method (i) =r} 30 / 31

31 Thanks! Q & A 31 / 31

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Supervised learning Lets start by talking about a few examples of supervised learning problems Suppose we have a dataset giving the living areas and prices of 47 houses from

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Generalized Linear Models and Logistic Regression

Generalized Linear Models and Logistic Regression Department of Electrical and Computer Engineering University of Texas at Austin asharma@ece.utexas.edu Machine Learning - What do we want the machines to learn? We want humanoid agents capable of making

More information

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Logistic Regression and Generalized Linear Models

Logistic Regression and Generalized Linear Models Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Generalized Linear Models and Exponential Families

Generalized Linear Models and Exponential Families Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear

More information

CS 340 Lec. 16: Logistic Regression

CS 340 Lec. 16: Logistic Regression CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

Machine Learning - Waseda University Logistic Regression

Machine Learning - Waseda University Logistic Regression Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62 Lecture 1b Logistic regression & neural network October 2, 2015 2/62 Table of contents 1 1 Bird s-eye

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

CPSC 340 Assignment 4 (due November 17 ATE)

CPSC 340 Assignment 4 (due November 17 ATE) CPSC 340 Assignment 4 due November 7 ATE) Multi-Class Logistic The function example multiclass loads a multi-class classification datasetwith y i {,, 3, 4, 5} and fits a one-vs-all classification model

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Linear and Logistic Regression

Linear and Logistic Regression Linear and Logistic Regression Marta Arias marias@lsi.upc.edu Dept. LSI, UPC Fall 2012 Linear regression Simple case: R 2 Here is the idea: 1. Got a bunch of points in R 2, {(x i, y i )}. 2. Want to fit

More information

ESS2222. Lecture 4 Linear model

ESS2222. Lecture 4 Linear model ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48 Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression

More information

Classification Based on Probability

Classification Based on Probability Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Notes on Andrew Ng s CS 229 Machine Learning Course

Notes on Andrew Ng s CS 229 Machine Learning Course Notes on Andrew Ng s CS 9 Machine Learning Course Tyler Neylon 331.016 These are notes I m taking as I review material from Andrew Ng s CS 9 course on machine learning. Specifically, I m watching these

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Regression and Classification with Linear Models CMPSCI 383 Nov 15, 2011! Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 04: Logistic Regression 1 Outline Announcements Baseline Learning Machine Learning Pyramid Regression or Classification (that s it!) History of Classification History of

More information