# Logistic Regression Trained with Different Loss Functions. Discussion

Size: px
Start display at page:

## Transcription

1 Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx = + e d wd x d P (y = x; w) = h w (x) P (y = 0 x; w) = h w (x) 2 Maximum Likelihood Estimation 2. Goal Maximize likelihood: L(w) = p(y X; w) m = p(y i x i ; w) = m (h w (x i )) yi ( h w (x i )) yi

2 Or equivalently, maximize the log likelihood: l(w) = log L(w) m = y i log h(x i ) + ( y i ) log( h(x i )) 2.2 Stochastic Gradient Descent Update Rule w j l(w) = (y g(wx i ) ( y) g(wx i ) ) w j g(wx i) = (y g(wx i ) ( y) g(wx i ) )g(wx i)( g(wx i )) w j wx i = (y( g(wx i )) ( y)g(wx i ))x j i = (y h w (x i ))x j i w j := w j + λ(y i h w (x i )))x j i 3 Least Squared Error Estimation 3. Goal Minimize sum of squared error: L(w) = 2 m (y i h w (x i )) Stochastic Gradient Descent Update Rule w j L(w) = (y i h w (x i )) h w(x i ) w j = (y i h w (x i ))h w (x i )( h w (x i ))x j i 4 Comparison w j := w j + λ(y i h w (x i ))h w (x i )( h w (x i ))x j i 4. Update Rule For maximum likelihood logistic regression: w j := w j + λ(y i h w (x i )))x j i 2

3 For least squared error logistic regression: Let w j := w j + λ(y i h w (x i ))h w (x i )( h w (x i ))x j i f (h) = (y h), y {0, }, h (0, ) f 2 (h) = (y h)h( h), y {0, }, h (0, ) When y =, the plots of f (h) and f 2 (h) are shown in figure. f (h) is a monotonously decreasing function, the closer the predicted probability is to, the smaller the update is. The farther away the predicted probability is from, the bigger the update is. f 2 (h) is not monotone. When the predicted probability is close to 0 (far away from the true label), there is almost no update. The biggest update is achieved when h = 3. This behavior seems unintuitive. One tentative explanation for f 2 : f 2 is robust to outliers. In the range of ( 3, ), the farther away the predicted probability is from, the bigger the update is. In this range, f 2 is willing to better incorporate the data point into the model. In the range of (0, 3 ), the algorithm thinks the predicted value is too far away from the true value, and it s very likely that the data point is mislabeled. In this case, we should give up this data point. The farther away the predicted probability is from, the more likely the data point is mislabeled, the less effort we should put on it, thus the smaller the update should be. Therefor 3 can be regarded as a giving up threshold. When y = 0, the plots of f (h) and f 2 (h) are shown in figure 2. f (h) is again a monotone function. f 2 (h) is not monotone. When the predicted probability is close to (far away from the true label), there is almost no update. The biggest(by absolute value) update is achieved when h = 2 3. This behavior seems unintuitive. A similar argument based on robustness can be made. 4.2 Consistency Based on the analysis in [], both loss functions can ensure consistency. The classification errors converge to optimal Bayes error as sample size goes to infinity(if we don t restrict the function class). 4.3 Probability Estimation 4.4 Robustness If we think logistic regression outputs a probability distribution vector, then both methods try to minimize the distance between the true probability distribution vector and the predicted probability distribution vector. The only difference is in how the distance is defined. For maximum likelihood method, the distance measure is KL-divergence. For least squared error method, the distance is Euclidean distance. If for a data point x, the true probability distribution is [0, ], i.e., the true label is, but the predicted probability distribution is [0.99, 0.0], then 3

4 Figure : y= the KL-divergence between the two distribution is log 2 00 = 6.64; the squared Euclidean distance between the two distribution is =.96. Since KL-divergence is unbounded, but (squared) Euclidean distance has an upper bound 2, it seems logistic regression trained by minimizing squared error is more robust to outliers. 4.5 Convexity and Convergence The loss function for maximum likelihood estimation is m C (w) = l(w) = y i log h(x i ) + ( y i ) log( h(x i )) C (w) is a convex function. A proof for the convexity can be found at html For one data point x = (, ), y =, the plot of C (w) is shown in figure 3. The loss function for the least square estimation is C 2 (w) = L(w) = m (y i h w (x i )) 2 2 4

5 Figure 2: y=0 Figure 3: C (w) C 2 (w) is a non-convex function. For one data point x = (, ), y =, the plot of C 2 (w) is shown in figure 4. 5

6 Figure 4: C 2 (w) So we can see, training logistic regression by maximizing the likelihood is a convex optimization problem, which is easier; training logistic regression by minimizing squared error is a non-convex optimization problem, which is harder. If we think logistic regression trained with least square error as single node neural network, then it s easy to imagine that the same non-convexity problem would also arise in multi-layer neural networks. Regarding convexity vs nonconvexity in machine learning, there is an interesting discussion: Newton Method 5 Empirical Study 6 Conclusion 6

7 References [] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, pages 56 85,

### Regression with Numerical Optimization. Logistic

CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

### Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

### CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

### ECS171: Machine Learning

ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

### ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

### ECS171: Machine Learning

ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

### Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

### Linear Models in Machine Learning

CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

### Generative v. Discriminative classifiers Intuition

Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

### Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

### The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

### The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

### Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Machine Learning: Alvin Grissom II University of Colorado Boulder Slides adapted from Emily Fox Machine Learning: Alvin Grissom II Boulder Classification:

### Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

### Logistic Regression. Machine Learning Fall 2018

Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

### Linear Models for Classification

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

### Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

### Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

### STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

### Support Vector Machines

Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

### Introduction to Machine Learning

Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction

### Midterm, Fall 2003

5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

### Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:

### Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

### Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

### Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

### STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

### Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

### Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

### Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements HW3 due date as posted. make sure

### Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

### Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

### Classification Logistic Regression

Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

### Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

### Logistic Regression & Neural Networks

Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### Classification objectives COMS 4771

Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

### Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

### > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

### Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

### Least Mean Squares Regression

Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

### Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

### Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

### MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

### CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

### COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

### Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

### Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

### Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

### CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

### Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

### Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

### 18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

### Support Vector Machines and Kernel Methods

2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

### Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

### Support Vector Machines and Bayes Regression

Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

### Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

### Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

### Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

### A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

### DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

### Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

### The exam is closed book, closed notes except your one-page cheat sheet.

CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

### Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

### Linear classifiers: Logistic regression

Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The

### Linear Models for Regression CS534

Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

### Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

### SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

### Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks Xiaoin Zhu erryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Terminator 2 (1991) JOHN: Can you learn? So you can be... you know. More human. Not

### Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

### Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

### Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

### Reward-modulated inference

Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement

### Machine Learning for NLP

Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

### Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

### Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

### Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

### Generative v. Discriminative classifiers Intuition

Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

### Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

### Linear and logistic regression

Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

### Natural Language Processing

Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

### Linear Models for Regression CS534

Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

### Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

### FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

### CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

### Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:

### Neural Networks: Backpropagation

Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

### CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

### Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

### Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

### ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

### Homework 6. Due: 10am Thursday 11/30/17

Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).

### Least Squares Regression

CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the