Machine Learning. Lecture 3: Logistic Regression. Feng Li.
|
|
- Blaze Carter
- 5 years ago
- Views:
Transcription
1 Machine Learning Lecture 3: Logistic Regression Feng Li School of Computer Science and Technology Shandong University Fall 2016
2 Logistic Regression Classification problem Similar to regression problem, but we would like to predict only a small number of discrete values (instead of continuous values) Binary classification problem: y takes only two values, 0 (negative class) and 1 (positive class) y (i) {0, 1} is also called the label for the training example Logistic regression Use a logistic function (or sigmoid function) g(z) = 1/(1 + e z ) to continuously approximate discrete classification 2 / 31
3 Logistic Regression (Contd.) Properties of the sigmoid function Bound g(z) (0, 1) Symmetric Gradient 1 g(z) = g( z) g (z) = g(z)(1 g(z)) 3 / 31
4 Logistic Regression (Contd.) Logistic regression defines h θ (x) using the sigmoid function h θ (x) = g(θ T x) = 1/(1 + e θt x ) Sigmoid computes a real-valued score (θ T x) for input x and squashes it between (0, 1) to turn this score into a probability (of x s label being 1) Thus, we have p(y = 1 x; θ) = h θ (x) = 1/(1 + exp( θ T x) p(y = 0 x; θ) = 1 h θ (x) = 1/(1 + exp(θ T x) 4 / 31
5 Logistic Regression: A Closer Look... What s the underlying decision rule in logistic regression? At the decision boundary, both classes are equiprobable; thus, we have p(y = 1 x; θ) = p(y = 0 x; θ) exp( θ T x) = exp(θ T x) exp(θ T x) = 1 θ T x = 0 Therefore, the decision boundary of logistic regression is nothing but a linear hyperplane Hence, y = 1 if θ T x 0; otherwise, y = 0 5 / 31
6 Interpreting The Probabilities... Recall that p(y = 1 x; θ) = exp( θ T x) The score θ T x is also a measure of distance of x from the hyperplane (the score is positive for pos. examples, and negative for neg. examples) High positive score: High probability of label 1 High negative score: Low probability of label 1 (high prob. of label 0) 6 / 31
7 Logistic Regression Formulation Logistic regression model h θ (x) = g(θ T x) = 1/(1 + e θt x ) Assume p(y = 1 x; θ) = h θ (x) and p(y = 0 x; θ) = 1 h θ (x), then we have p(y x; θ) = (h θ (x)) y (1 h θ (x)) 1 y If we assume y { 1, 1} instead of y {0, 1}, then p(y x; θ) = exp( yθ T x) Assuming the training examples were generated independently, we define the likelihood of the parameters as L(θ) = p(y X; θ) = i p(y (i) x (i) ; θ) = i (h θ (x (i) )) y(i) (1 h θ (x (i) )) 1 y(i) 7 / 31
8 Logistic Regression Formulation (Contd.) Maximize the log likelihood l(θ) = log L(θ) = m Gradient ascent algorithm θ j θ j + α θj l(θ) for j For one training example (x, y), ( ) y (i) log h(x (i) ) + (1 y (i) ) log(1 h(x (i) ) θ j l(θ) = (y h θ (x))x j If stochastic gradient ascent rule is employed θ j = θ j + α(y i h θ (x i))x i,j 8 / 31
9 Newton s Method Given a differentiable real-valued f : R R, how can we find θ such that f(θ) = 0? The following iteration is performed until a sufficiently accurate value is achieved x x f(x) f (x) 9 / 31
10 Newton s Method (Contd.) To maximize f(x), we have to find the stationary point of f(x) such that f (x) = 0. According to Newton s method, we have the following update x x f (x) f (x) Newton-Raphson method: For l : R m R, we generalization Newton s method to the multidimensional setting θ θ H 1 θ l(θ) where H is the Hessian matrix H i,j = 2 l(θ) θ i θ j 10 / 31
11 Newton s Method (Contd.) Higher convergence speed than (batch) gradient descent Fewer iterations to approach the minimum However, each iteration is more expensive than the one of gradient descent Finding and inverting an m m Hessian More details about Newton s method can be found at 11 / 31
12 Generalized Linear Models (GLMs) So far, Regression problem y x; θ N (µ, σ 2 ) Classification problem y x; θ Bernoulli(φ) In fact, both of these methods are special cases of a broader family of models, so-called generalized linear models (GLMs) 12 / 31
13 The Exponential Family An exponential family is a set of probability distributions of the following form p(y; η) = b(y) exp(η T Φ(y) a(η)) η is called the natural parameter (also called the canonical parameter) Φ(y) is the sufficient statistic (we usually take Φ(y) = y) b(y) is the underlying measure (e.g., counting measure or Lebesgue measure) a(η) is the log partition (or cumulant) function exp (a(η)) is a constant for normalization purpose, such that a(η) is not a degree of freedom in the specification of an exponential family density Legesgue integral : exp (a(η)) = b(y) exp ( η T Φ(y) ) dx Φ( ), a( ) and b( ) define a family of distributions parameterized by η 13 / 31
14 The Exponential Family (Contd.) For exponential family, we have p(y; η)dy = b(y) exp(η T Φ(y) a(η))dy = 1 y Calculate derivative w.r.t. η a (η) = b(y) exp(η T Φ(y) a(η))φ(y)dy Maximizing the log-likelihood l(η) = y y m log b(y (i) ) ma(η) + η T m Φ(y (i) ) Thus a (η) = 1 m m Φ(y(i) ) Sufficiency: The likelihood for η only depends on y through Φ(y) When m, a (η) E(Φ(y)) 14 / 31
15 The Exponential Family (Contd.) Bernoulli distribution p(y; ψ) = ψ y (1 ψ) 1 y = exp(y log ψ + (1 y) log(1 ψ)) = exp (y log ψ + log(1 ψ) y log(1 ψ)) ( ) ψ = exp y log + log(1 ψ) 1 ψ η = log(ψ/(1 ψ)) ( and thus ψ = 1/(1 + e η )) Φ(y) = y a(η) = log(1 ψ) = log(1 + e η ) b(y) = 1 15 / 31
16 The Exponential Family (Contd.) Gaussian distribution N (µ, σ 2 ) ( p(y; µ, σ 2 1 ) = exp 1 ) (y µ)2 2πσ 2σ2 ( 1 µ = exp 2π σ 2 y 1 2σ 2 y2 1 ) 2σ 2 µ2 log σ [ ] µ/σ 2 η = 1/2σ [ ] 2 y Φ(y) = y 2 a(η) = µ2 + log σ = η2 2σ 2 1 4η 2 1 log( 2η2) 2 b(y) = 1/ 2π 16 / 31
17 The Exponential Family (Contd.) Piosson distribution p(y; λ) = λy e λ y! = 1 exp (y log λ λ) y! η = log λ Φ(y) = y a(η) = λ = e η b(y) = 1/y! 17 / 31
18 The Exponential Family (Contd.) Multinomial distribution p(y; ψ) = = = = M! x 1!x 2! x k! ψx1 1 ψx2 2 ψx k k ( k ) M! x 1!x 2! x k! exp x i log ψ i M! x 1!x 2! x k! exp M! x 1!x 2! x k! exp ( k 1 ( k 1 k 1 x i log ψ i + (1 log( ψ i 1 k 1 ψ i k 1 x i ) log(1 k 1 )x i + log(1 ψ i ) ψ i ) ) ) 18 / 31
19 Constructing GLMs Assumptions y x; θ ExponentialFamily(η) Φ(y) = y which means h(x) = E[y x] η = θ T x Poisson distribution is an exponential family distribution, so the question is, how to apply GLM for the above problem. 19 / 31
20 Constructing GLMs (Contd.) Ordinary least square y x N (µ, σ 2 ) Since η = µ, we have h θ (x) = E[y x; θ] = µ = η = θ T x 20 / 31
21 Constructing GLMs (Contd.) Logistic regression For Bernoulli distribution, ψ = 1/(1 + e η ) Therefore, we have h θ (x) = E[y x; θ] = ψ = 1/(1 + e η ) = 1/(1 + e θt x ) 21 / 31
22 Multiclass Classification Multiclass (or multinomial) classification is the problem of classifying instances into one of the more than two classes The existing multiclass classification techniques can be categorized into Transformation to binary Extension from binary Hierarchical classification 22 / 31
23 Transformation to Binary One-vs.-rest (one-vs.-all, OvA or OvR, one-against-all, OAA) strategy is to train a single classifier per class, with the samples of that class as positive samples and all other samples as negative ones Inputs: A learning algorithm L, samples x, labels y where y i {1,..., K} is the label for the sample x i Output: A list of classifier f k for k {1,, K} Procedure: For each k {1,.K}, construct a new label vector z where z i = 1 if y i = k and z i = 0 otherwise, and then apply L to x and z to obtain f k Making decision: y = arg min k f k (x) 23 / 31
24 Transformation to Binary One-vs.-one reduction K(K 1)/2 binary classifiers is trained for a K-way multiclass problem Each receives the samples of a pair of classes from the original training set At prediction time, a voting scheme is applied: all classifiers are applied to the incoming sample, and the class that got the largest number of positive predictions is chosen 24 / 31
25 Softmax Regression (Contd.) A classification problem where y {1, 2,, k} Multinomial distribution A generalization of the binomial distribution A trial has k outcomes and we perform the trial n times independently, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories Assuming ψ i = p(y = i; ψ) (i = 1,, k) denote the probability of i-th outcomes, since i ψi = 1, the multinomial distribution can be parameterized by only k 1 parameters {ψ 1, ψ 2,, ψ k 1 } Question: Can we express the multinomial as an exponential family distribution? 25 / 31
26 Defining Φ(y) R k 1 Softmax Regression (Contd.) Φ(1) =. Φ(2) = 1 0 Φ(k 1) =.. Φ(k) = Assuming 1{T rue} = 1 and 1{F alse} = 0, we have p(y; ψ) = ψ 1{y=1} 1 ψ 1{y=2} 2 ψ 1{y=k} k = ψ 1{y=1} 1 ψ 1{y=2} 2 ψ 1 k 1 k 1{y=k} = ψ (Φ(y))1 1 ψ (Φ(y))2 2 ψ 1 k 1 (Φ(y))i k ( ( = exp log ψ (Φ(y))1 1 ψ (Φ(y))2 2 ψ 1 k 1 k ( k 1 ) = exp (Φ(y)) i log(ψ i /ψ k ) + log ψ k (Φ(y))i )) 26 / 31
27 Softmax Regression (Contd.) We have p(y; ψ) = exp ( k 1 ) (Φ(y)) i log(ψ i /ψ k ) + log ψ k = b(y) exp(η T Φ(y) a(η)) η = log(ψ 1/φ k ) log(ψ 2/φ k ) log(ψ k 1 /φ k ) a(η) = log(ψ k ) b(y) = 1 27 / 31
28 Softmax Regression (Contd.) The link function is Then, we have η i = log ψ i ψ k, for i = 1, 2,, k k e ηi = ψ i /ψ k, ψ i = ψ k e ηi, ψ k e ηi = Therefore, the response function is derived as follows ψ k = 1/ k ψ i = 1 k e ηi e ηi, ψ i = k j=1 eηj The function mapping from η s to the ψ s is so-called the softmax function 28 / 31
29 Softmax Regression (Contd.) Since η i = θi T x (for i = 1, 2,, k 1), we have the following softmax model p(y = i x, w) = ψ i = e ηi e θt i x k = k j=1 eηj j=1 eθt j x Our hypothesis outputs h θ (x) = E[Φ(y) x; θ] = ψ 1 ψ 2. ψ k 1 = exp(θ T 1 x) k j=1 exp(θt j x) exp(θ T 2 x) k j=1 exp(θt j x). exp(θ T k 1 x) k j=1 exp(θt j x) 29 / 31
30 Softmax Regression (Contd.) Training data set {(x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) )} Log-likelihood function m m l(θ) = log p(y i x i ; θ) = log ( ) 1{y k e θt r xi k j=1 eθt j x(i) r=1 Maximizing l(θ) through gradient ascent or Newton s method (i) =r} 30 / 31
31 Thanks! Q & A 31 / 31
CS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Supervised learning Lets start by talking about a few examples of supervised learning problems Suppose we have a dataset giving the living areas and prices of 47 houses from
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationGeneralized Linear Models and Logistic Regression
Department of Electrical and Computer Engineering University of Texas at Austin asharma@ece.utexas.edu Machine Learning - What do we want the machines to learn? We want humanoid agents capable of making
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationGeneralized Linear Models and Exponential Families
Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear
More informationCS 340 Lec. 16: Logistic Regression
CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS
LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and
More informationLecture 4 Logistic Regression
Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62 Lecture 1b Logistic regression & neural network October 2, 2015 2/62 Table of contents 1 1 Bird s-eye
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationCPSC 340 Assignment 4 (due November 17 ATE)
CPSC 340 Assignment 4 due November 7 ATE) Multi-Class Logistic The function example multiclass loads a multi-class classification datasetwith y i {,, 3, 4, 5} and fits a one-vs-all classification model
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationLinear and Logistic Regression
Linear and Logistic Regression Marta Arias marias@lsi.upc.edu Dept. LSI, UPC Fall 2012 Linear regression Simple case: R 2 Here is the idea: 1. Got a bunch of points in R 2, {(x i, y i )}. 2. Want to fit
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationMachine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang
Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationLogistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48
Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationNotes on Andrew Ng s CS 229 Machine Learning Course
Notes on Andrew Ng s CS 9 Machine Learning Course Tyler Neylon 331.016 These are notes I m taking as I review material from Andrew Ng s CS 9 course on machine learning. Specifically, I m watching these
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 04: Logistic Regression 1 Outline Announcements Baseline Learning Machine Learning Pyramid Regression or Classification (that s it!) History of Classification History of
More information