Generalized Linear Models and Logistic Regression

Size: px
Start display at page:

Download "Generalized Linear Models and Logistic Regression"

Transcription

1 Department of Electrical and Computer Engineering University of Texas at Austin

2 Machine Learning - What do we want the machines to learn? We want humanoid agents capable of making intelligent decisions Decisions often need to be made based on a certain target outcome associated with a specified input. So... If a machine can predict the target output after seeing an input, we can easily program (using a few simple if-else statements) to make the decision associated with the output. For example, if a robot can predict whether it is about to fall or not based on inputs from various motor sensors, it can be programmed to initiate balancing motor functions to prevent the fall.

3 Predictive Modeling The process of learning to predict an outcome from a given input is known as Predictive Modeling. The output can be A continuous value - This is called Regression. For example, predicting the force acting on the right limb of a robot given inputs from motor sensors. A discrete label - This is called Classification. For example, predicting whether the robot will fall or not based on the sensor inputs. Supervised Learning is an important class of methods for predictive modeling.

4 Supervised Learning - Introduction Given a set of input-output pairs, we want to learn a function f which relates an input to the corresponding output. This is Supervised Learning. The learnt function can then be used to predict output for unseen input values and the process is known as Prediction. The set of given input-output pairs is known as Training Set. Learning of the function f is supervised by the training set. The set of unseen input values is known as Test Set. Inputs are also called features/attributes or independent variables. Training and Test sets can be assumed to be sampled i.i.d. from a joint unknown distribtuion P(x, y) = P(y x)p(x).

5 Supervised Learning - Preliminaries Let X be the domain of the attributes. For a real valued input (input from a single sensor), X R For a vector valued input (from multiple sensors) X R D. For a categorical attribute, X S, a set of possible values of the attribute. For example, the set of allowed rotations of a limb. Let Y be the domain of the target output. For regression, Y R or if we have vector valued output Y R D. For classification, Y S, a set of possible labels. For robot example, S {0, 1}, 0 for fall and 1 for no fall. Given a training set {(x i, y i )} N i=1 of N input-output pairs, we need to learn a function f : X Y such that y = f (x). f is the model which relates X to Y and we want to learn this model.

6 Supervised Learning - Preliminaries The space of all possible functions f : X Y is infinite. Learning thus requires to search this infinite space for a function f that satisfies the input-output pairs of the training set. This is very expensive. So... We restrict the search process to a sub-space of functions The sub-space is represented by a parameterized class of functions having a particular form. y f (x; θ) The goal then is to learn the parameters θ which give the function that best approximates the output within this restricted class of functions. For example, we can only look for linear functions of inputs f (x; θ) = θ T x. This is the case for linear models. Or we can search for a more complicated non-linear function for better approximation, for example f (x; θ) = g(θ T x), where g( ) is some non-linear function.

7 Supervised Learning - Noise Model Restricting the search to a sub-space results in an approximate estimate of the output y i = f (x i ; θ) + ɛ i ɛ is the residue/noise that the learnt function fails to account for due to restricted search Since each input-output pair (x i, y i ) is sampled from P(x, y), We learn a model for the expected value of the output E[y i ] = f (x i ; θ) + E[ɛ i ] Finite size of training set will be another approximation as now the expected value is based on finite sample. The associated noise ɛ will also be random having same form of distribution as P(y x i ) (Note we just need to scale y i by a deterministic value f (x i ; θ) to get ɛ i ). Different distribution assumptions for the output, P(y x) give rise to different models. We will first look at simple models - least squares regression and logistic regression, and then generalize them to a class of models known as Generalized Linear Models (GLM).

8 Least Squares Regression - Gaussian Noise Model The domain of target output values is Y R. It is reasonable to assume that predicted output f (x i ; θ) is corrupted by a zero mean Gaussian noise. ɛ i N (0, σ 2 ) The target output y i then will also be normally distributed y i N (f (x i ; θ), σ 2 ) The training samples are assumed to be drawn i.i.d. from the distribution P(x, y) P({y i } {x i })P({x i }) = N N (f (x i ; θ), σ 2 )P({x i }) We can learn the parameters θ by maximizing the log-likelihood θ = arg max θ θ = arg min θ i=1 N log(p({y i } {x i })P({x i })) i=1 N (y i f (x i ; θ)) 2 + constant constant includes the terms independent of θ which do not affect the optimization for θ and can be ignored. i=1

9 Least Squares Regression - Gaussian Noise Model Least squares Regression arises when we assume output is normally distributed. For special case of linear models, we can assume f (x i ; θ) = θ T x i θ = arg min θ N ) 2 (y i θ T x i i=1 This is commonly used Linear Regression or Linear Least Squares discovered by Gauss! A closed form solution can be obtained for θ by taking derivative of objective function above and setting it zero. The solution is the famous pseudo inverse solution θ = (X T X) 1 X T y What happens when y is not normally distributed? We will now look at models that arise when y has a non-normal distribution. We will start with the case when y has Bernoulli distribution. This gives rise to well known Logistic Regression.

10 Logistic Regression - Introduction Output is now a Bernoulli random variable with domain Y {0, 1}. The problem is really a classification problem since we have just two labels. Regression stuck to make it consistent with a general class of models called GLM of which logistic regression is a special case. Remember we model the expected value of the target. Also the noise is generally assumed to be a zero mean random variable. E[y i ] = f (x i ; θ) = P(y i = 1 x i ; θ) The function that we want to learn thus is constrained to have a range [0, 1]. This is generally done by applying a non-linear squashing function to θ T x i so that output of linear model lies between [0, 1]. For logistic regression it is the sigmoid function ) f (x i ; θ) = σ (θ T 1 x i = 1 + exp ( θ T ) x i

11 Log Odds Modeling E[y] by a sigmoid function over linear model is equivalent to modeling log-odds by the linear model. This is easy to prove 1 E[y] = P(y = 1 x) = 1 + exp( θ T x) exp( θ T 1 P(y = 1 x) x) = P(y = 1 x) ( ) P(y = 1 x) θ T x = log P(y = 0 x) ( ) P(y=1 x) log are the log-odds for the Bernoulli random variable. P(y=0 x)

12 Logistic Sigmoid Function Figure: Logistic Sigmoid Function : σ(x) A smooth monotonic well-behaved function to squash the real line to range [0, 1] Well-behaved implies continuity, double differentiability etc. which are important during optimization. For logistic regression, it is the conditional probability of label 1 given the attributes. In Bayes decision theory, it is the posterior distribution of the positive class. We will not cover Bayes decision theory in these slides. Useful property - σ(x) = σ(x)(1 σ(x)). We will use this when learning the parameters.

13 Linear Decision Boundary Modeling expected value via sigmoid function applied to a linear model results in a linear separating hyperplane. This can easily be proved. In classification, decision boundary is given by A linear decision boundary. P(y = 1 x) = P(y = 0 x) P(y = 1 x) = 1 P(y = 1 x) exp( θ T x) = θ T x = 0 exp( θ T x) 1 + exp( θ T x)

14 Parameter Learning We modeled P(y i = 1 x i, θ) = σ(θ T x i ). Likewise, P(y i = 0 x i, θ) = 1 σ(θ T x i ) The distribution for the Bernoulli random variable is then given by ( ) P(y i x i, θ) = σ(θ T yi ( ) x i ) 1 σ(θ T (1 yi ) x i ) With i.i.d. assumption on the training set, the likelihood is given by P(y x, θ) = The log-likelihood is given by L(θ) = N i=1 N i=1 θ can be learnt by maximizing the L(θ) ( ) yi ( ) (1 yi ) σ(θ T x i ) 1 σ(θ T x i ) ( ) ( ) y i log σ(θ T x i ) + (1 y i ) log 1 σ(θ T x i ) θ = arg maxl(θ) θ Due to sigmoid function, a closed form solution does not exist. So we turn to gradient methods for optimization.

15 Gradient Ascent for Optimization L(θ) is a concave function, known as cross-entropy. Global maximum can be obtained using gradient ascent. Starting from an initial point, move in the direction of steepest ascent i.e. in the direction of gradient. We get the following iterative update procedure The gradient is given by θ (t+1) = θ (t) + α L(θ) (t) L(θ) (t) = N i=1 ( ) y i σ(θ (t) x i ) x i α is the learning rate and needs to be selected using line-search to ensure likelihood increases at every iteration which can be time consuming. In practise, α is gradually decreased as the iterations progress to avoid drastic jumps in the search space. To avoid expensive line search, other methods like Newton s method or Stochastic gradient ascent can be used.

16 IRLS - Iterated Re-weighted Least Squares Another method to minimize L(θ) (same as maximizing the liklihood) is by Newton s method involving inverse of the Hessian matrix ( 1 θ (t+1) = θ (t) L(θ )) (t) L(θ (t) ) The Hessian is given by N L(θ (t) ) = σ(θ (t) x i )(1 σ(θ (t) x i ))xi T x i = x T W (t) x i=1 ( ) W (t) = diag σ(θ (t) x i )(1 σ(θ (t) x i )) The update for the parameters is then given by ( 1 θ (t+1) = θ (t) x T W x) (t) x T (σ(θ (t) x) y) θ (t+1) = (x T W (t) x) 1 x T W (t) z z = xθ (t) (W (t) ) 1 (σ(θ (t) x) y) If Hessian is positive-definite (which is in this case), a unique optimum is found. The update is a weighted least squares problem with weights W (t) The weight matrix is dependent on current estimate of θ (t) and needs to be re-estimated every iteration. Hence, the name IRLS. For large problems, Hessian could be a large matrix with an expensive inverse computation.

17 Stochastic Gradient Ascent Incremental counterpart of Gradient Ascent method. We update the parameters as we receive more training points. The gradient is now computed only for the newly obtained point which gives the following update ( ) θ (t+1) = θ (t) + α y i σ(θ (t) x i ) x i Can be slow to reach the optimum as opposed to batch updates where we sum up over the gradients for all the training points. For large scale problems with streaming data, this is the ideal choice and is the reason for the popularity of logistic regression (GLMs in general) for large scale predictive modeling problems.

18 Multi-Class Classification - SoftMax Regression Logistic Regression can be generalized to multiple classes by SoftMax Regression. For K class case, the target is a K dimensional vector y such that y k = 1 if the training examples belongs to class k. Following standard noise model, we model the expected value of each y k as E[y ik ] = P(y ik = 1 x i, θ k ) = exp(θ T k x) K k =1 exp(θt k x) The likelihood for the training set can then be written as ( N K exp(θ T k P(Y X, {θ k }) = x) K i=1 k=1 k =1 exp(θt k x) The likelihood can be maximized by any of the methods discussed to learn the parameters. In SoftMax Regression, there is an implicit Multinomial distribution assumption for the target output. During prediction on test set, the learnt parameters estimate the probability of each class. Normally, the final classification is decided based on maximum estimated probability of a particular class. ) yik

19 Fisher Iris Dataset Figure: Fisher Iris Data One of the most widely used datasets in machine learning. Comes with Matlab! Just use the command load fisheriris to load the data into workspace. Three classes with 50 instances each. Setosa is linearly separable from Versicolor and Virginica. Independent variables - Sepal length, Sepal width, Petal length, Petal Width (all in cms). Let s try to classify setosa from the other two classes.

20 >>load fisheriris >> X = meas(:,1:2); >> Y(1:50) = 0; Y(51:150) = 1; Y = Y ; >> beta = glmfit(x, Y, binomial, logit ); >> scatter(x(1:50,1), X(1:50,2), +g ); >> hold on >> scatter(x(51:150,1), X(51:150,2), +r ); >> x = 4:0.01:8; >> y = (-beta(2).* x - beta(1))./ beta(3); >> plot(x,y); Figure: Classifying fisher iris dataset using features (1,2) and (3,4) glmfit uses IRLS for iteratively learning the model parameters.

21 The two classes are linearly separable in the four dimensional space of the given independent variables. Here is an illustration using first three attributes. Figure: Classifying fisher iris dataset using features (1,2,3)

22 Train-Test Split >> ind test = randsample(150, 50); >> X test = X(ind test, :); Y test = Y(ind test); >> ind train = setdiff([1:150], ind test); >> X train = X(idx train,:); Y train = Y(ind train); >> betas = glmfit(x train, Y train, binomial, logit ); >> Y hat = glmval(betas, X test, logit ); Figure: Classifying fisher iris dataset - train and test sets Only one point was miss-classified. The classes however, seem to be strongly linearly separable. Let us now look at a more challenging dataset which might not be linearly separable.

23 E. Coli Dataset Protein localization site for Ecoli. The available classes are 1 cp (cytoplasm), 2 im (inner membrane without signal sequence), 3 pp (perisplasm), 4 imu (inner membrane, uncleavable signal sequence), 5 om (outer membrane), 6 oml (outer membrane lipoprotein), 7 iml (inner membrane lipoprotein), 8 ims (inner membrane, cleavable signal sequence). We only focus on two classes cp and im as they form bulk of the available data (more than 95%). There are 7 predicitve attributes - 1 mcg: McGeoch s method for signal sequence recognition. 2 gvh: von Heijne s method for signal sequence recognition. 3 lip: von Heijne s Signal Peptidase II consensus sequence score. Binary attribute. 4 chg: Presence of charge on N-terminus of predicted lipoproteins. Binary attribute. 5 aac: score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. 6 alm1: score of the ALOM membrane spanning region prediction program. 7 alm2: score of ALOM program after excluding putative cleavable signal regions from the sequence.

24 Logistic Regression on Ecoli Dataset >> data = dlmread( ecoli.txt ); >> X = data(:,1:7); Y = data(:,8); >> betas = glmfit(x, Y, binomial, logit ); >> betas [ ] Some of the coefficients are very small relative to other. This shows that the corresponding dimensions are not informative in predicting the target class. Let us do a PCA on the data matrix and then do logistic regression.

25 >> data = dlmread( ecoli.txt ); >> X = data(:,1:7); Y = data(:,8); >> [coeff, X1, latent] = princomp(x); >> beta = glmfit(x1(:,1:3), Y, binomial, logit ); >> scatter3(x1(1:143,1), X1(1:143,2),X1(1:143,3), +g ); >> scatter3(x1(144:220,1), X1(144:220,2),X1(144:220,3), +r ); >> y = -0.6:0.01:0.6; x = -0.6:0.01:0.6; >> z = (-beta(2).* x - beta(3).* y - beta(1))./ beta(4); >> plot3(x,y,z); >> beta = [ ] Figure: Classsifying E.coli using first three principal components It gives a reasonable separating hyperplane even though the classes are not linearly separable.

26 Gradient Ascent Implementation function [theta] = learn theta(x, Y, prec, niters) log sigmoid 1./ (1 + exp( x)); linear model theta) x * theta; % Add a column of ones to X to account for the bias term X = [ones(length(y), 1) X]; theta current = rand(size(x,2), 1); theta new = 2 * theta current; t = 1; alpha current = 10; eta = 0.5; % the update rule for the step size theta stack(t,:) = theta current; alpha update iters = 1000; counter = 1; % Gradient ascent iterations while (norm(theta new theta current) > prec & t < niters) theta current = theta new; if (t == counter * alpha update iters) % update only every alpha update iters alpha current = update alpha(alpha current, eta); counter = counter + 1; end theta new = theta current + alpha current.* X' *... (Y log sigmoid(linear model(x, theta current))); t = t + 1; theta stack(t,:) = theta new; end theta = theta current; x = 1:size(theta stack, 1); for ii = 1:length(theta) plot(x, theta stack(:,ii)'); hold on; end return; % Step size selection function [alpha ret] = update alpha(alpha current, eta) alpha ret = eta * alpha current; if (alpha ret < 0.001) alpha ret = alpha current; end return;

27 Gradient Ascent on Fisher Iris Dataset Figure: Parameter convergence and performance using gradient ascent

28 Gradient ascent can easily be implemented in matlab using matrix operations. It is considerably faster than IRLS since it avoids expensive inverse computation. We however, need to play around with the step size α to get reasonable convergence of the parameters. The plots in the last slide were generated by starting with a high α = 0.1 and then reducing it by 0.5 times every 1000 iterations till it reached We want longer steps in the beginning to quickly reach the basin of convergence but need to reduce step size progressively to avoid oscillations around the optimum. Note that the separation is not as good as glmfit. This is because we used a simple stepsize selection rule and stopped early. Both IRLS and gradient ascent are guranteed to reach the same correct optimum. Other sophisticated (but expensive!) line-search methods can be used for step-size selection. Now, let us try to see how gradient ascent behaves when the classes are not linearly separable. Let us try to separate virginica from other two classes

29 Figure: Parameter convergence and performance using gradient ascent in non-linear separation case Seems to be a reasonable separation. Note that the code for gradient ascent accepts X and Y as inputs. To generate these plots I just changed the class labels in Y to separate virginica from other two classes.

30 Train-Test Split for Fisher Iris Dataset Figure: Classifying fisher iris dataset - train and test sets As in case of IRLS, only one point was miss-classified using gradient ascent method.

31 Gradient Ascent for E.coli Dataset Figure: Parameter convergence for E.coli dataset >>Y hat = glmval(theta, X, logit ); >> idx = find(y hat >= 0.8); >> Y hat(idx) = 1; >> idx = find(y hat < 0.8); >> Y hat(idx) = 0; glmval computes the expected value of the target by applying the inverse link function to linear model. For logistic regression, this is probability of target being 1. We threshold by 0.8 to get the target class The model correctly classifies 217 (98.64%) training instances.

32 Generalizing to other distributions We now generalize the formulation to other possible distributions that the target output can assume. We will consider a rich class of distributions called Exponential Family of Distributions. Many known distributions belong to this family such as Gaussian, Bernoulli, Binomial, Gamma, Beta and many more. Following this family, the class of models we obtain is known as Generalized Linear Models or (GLMs). Lets begin with the basics of Exponential Family of Distributions and see how many commonly known distributions are the special cases.

33 Exponential Family of Distributions A parameterized family of distributions having the form p(x θ) = exp( t(x), θ Ψ(θ))p 0 (x) Few properties In canonical form, t(x) = x θ is the natural parameter of the distribution Ψ( ) is the cummulant function associated with the distribution p 0(x) is the Lebesgue-Steiltjes integral with respect to the underlying measure. We are only interested in the property that relates E[x] to the natural parameter θ E[x] = Ψ(θ) This property forms the basis of GLMs. Ψ 1 ( ) has a special name called canonical link function - Simplest function that links the expected value to the natural parameter. Table: Important special members of Exponential family Distribution Ψ(θ) Ψ(θ) Ψ ( 1 (t) ) exp(θ) Bernoulli log(1 + exp(θ)) log t (1+exp(θ)) 1 t Binomial N log(1 + exp(θ)) N exp(θ) ( ) log t (1+exp(θ)) N t Poisson exp(θ) 1 exp(θ) log t Gaussian θ 2 θ t 2 Gamma log( θ) 1 1 θ t Note that the inverse link function for the Bernoulli distribution is the logistic sigmoid function.

34 Generalized Linear Models (GLMs) The target output is now distributed according to a particular member of the exponential family of distributions. We assume a linear model for the natural parameter of the distribution θ = β T x We changed notation slightly! We thus model the expected value of the target via inverse canonical link function computed at the linear model Ψ 1 (E[y]) = β T x We obtain the following expression for the likelihood P(y x; β) = exp( t(x), β T x Ψ(β T x))p 0 (x) The likelihood can then be maximized using methods described in previous slides to learn the model parameters βs.

35 Generalized Linear Models - GLMs GLMs are an important class of prediction models covering a large class of probability distributions. They can model a large class of problems by having more complicated link functions that don t need to be inverse cummulant function. For example, in probit regression, the link function is the Gaussian CDF for a standard normal random variable. Parameters can be easily learnt by maximizing the likelihood using the gradient methods. Table: Important special members of Generalized Linear Models Distribution Support Prediction Model Bernoulli {0, 1} Logistic Regression Poisson Z + Poisson Regression Gaussian R Least Squares Gamma R + Gamma Regression

36 Summary GLMs in general and Logistic Regression in particular are supervised learning methods catering to large class of distributions called exponential family of distributions. Simple, scalable parameter learning via gradient methods. Linear separating hyperplane for logistic regression. More complicated models can be obtained using complicated, non-monotonic link functions.

37 References Pattern Recognition and Machine Learning. Christopher M.Bishop, Springer Generalized Linear Models. P. McCullagh and J.A. Nelder, Chapman & Hall/CRC, Machine Learning. Thomas M. Mitchell, McGraw-Hill Higher Education, Generalized Linear Models, with Application in Engineering and Sciences. R.H. Myers, D.C. Montgomery and C.G. Vining, John Wiley & Sons, A probabilistic classification system for predicting the cellular localization sites of proteins. Paul Horton and Kenta Nakai, Intelligent Systems in Molecular Biology, , St. Louis, USA E.coli Dataset. UCI Machine Learning Repository.

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Why Bother? Predicting the Cellular Localization Sites of Proteins Using Bayesian Model Averaging. Yetian Chen

Why Bother? Predicting the Cellular Localization Sites of Proteins Using Bayesian Model Averaging. Yetian Chen Predicting the Cellular Localization Sites of Proteins Using Bayesian Model Averaging Yetian Chen 04-27-2010 Why Bother? 2008 Nobel Prize in Chemistry Roger Tsien Osamu Shimomura Martin Chalfie Green Fluorescent

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Logistic Regression and Generalized Linear Models

Logistic Regression and Generalized Linear Models Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Meta-Learning for Escherichia Coli Bacteria Patterns Classification

Meta-Learning for Escherichia Coli Bacteria Patterns Classification Meta-Learning for Escherichia Coli Bacteria Patterns Classification Hafida Bouziane, Belhadri Messabih, and Abdallah Chouarfia MB University, BP 1505 El M Naouer 3100 Oran Algeria e-mail: (h_bouziane,messabih,chouarfia)@univ-usto.dz

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins

A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins From: ISMB-96 Proceedings. Copyright 1996, AAAI (www.aaai.org). All rights reserved. A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins Paul Horton Computer

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information