Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Similar documents
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning

Machine Learning Gaussian Naïve Bayes Big Picture

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition

Bias-Variance Tradeoff

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

ECE 5984: Introduction to Machine Learning

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Machine Learning, Fall 2012 Homework 2

Machine Learning

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Machine Learning

Stochastic Gradient Descent

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Generative v. Discriminative classifiers Intuition

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Linear discriminant functions

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Logistic Regression. Seungjin Choi

Machine Learning for NLP

Classification Logistic Regression

ECE521 Lecture7. Logistic Regression

Logistic Regression. Machine Learning Fall 2018

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Lecture 5: Logistic Regression. Neural Networks

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Introduction to Logistic Regression and Support Vector Machine

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Introduction to Machine Learning

CS 340 Lec. 16: Logistic Regression

Logistic Regression. William Cohen

Machine Learning

Machine Learning - Waseda University Logistic Regression

Overfitting, Bias / Variance Analysis

10-701/ Machine Learning - Midterm Exam, Fall 2010

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

ECE521 week 3: 23/26 January 2017

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Linear classifiers: Overfitting and regularization

CSC 411: Lecture 04: Logistic Regression

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Linear Classification: Probabilistic Generative Models

CSCI567 Machine Learning (Fall 2014)

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Lecture 4 Logistic Regression

Gaussian Mixture Models

Lecture 2: Logistic Regression and Neural Networks

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Linear and logistic regression

Statistical Data Mining and Machine Learning Hilary Term 2016

CSC 411: Lecture 09: Naive Bayes

Ad Placement Strategies

Statistical Machine Learning Hilary Term 2018

Linear Models for Regression

Nonparametric Bayesian Methods (Gaussian Processes)

Ch 4. Linear Models for Classification

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Introduction to Bayesian Learning. Machine Learning Fall 2018

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Machine Learning Lecture 7

Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Linear Models in Machine Learning

Linear Models for Classification

Lecture 2 Machine Learning Review

CSCI-567: Machine Learning (Spring 2019)

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

Logistic Regression. COMP 527 Danushka Bollegala

Midterm exam CS 189/289, Fall 2015

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Is the test error unbiased for these programs?

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Linear Models for Classification

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Naïve Bayes classification

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Regression with Numerical Optimization. Logistic

Advanced statistical methods for data analysis Lecture 2

Introduction to Machine Learning

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

CPSC 540: Machine Learning

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning for NLP

Machine Learning Lecture 5

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1

Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2

Training Data Target Variable F1 F2 F3 F4 F5 Target If target variables are discrete: classification problem If target variables are continuous: regression problem F1 F2 F3 F4 F5 Pred Model Test Data!3

Approach 1: First solve the inference problem of P(X Y k ) and P(Y k ) separately for each class Y k. Then use Bayes theorem to solve: Approach 2: Infer directly from data!4

Generative Models Computationally demanding: requires computing joint distribution over both P(X Y) and P(Y) Requires large training set for high accuracy Useful for detecting data points that can t be explained by the current model: anomaly detection/novelty detection Discriminative Models Useful if all we want to do is classification!5

How to perform classification with a discriminative model We are given the training data, X = {<X 1,Y 1 >, <X 2,Y 2 >, <X L,Y L >} of L examples. 1. Pick a model 2. Estimate the parameters 3. Perform prediction!6

How to perform classification with a discriminative model We are given the training data, X = {<X 1,Y 1 >, <X 2,Y 2 >, <X L,Y L >} of L examples. 1. Pick a model 2. Estimate the parameters 3. Perform prediction!7

Binary logistic regression model Assuming Y can take Boolean values!8

Multi class logistic regression model One-versus-all classification How many sets of W's are we predicting?!9

Side note To shorten representation, we can add a column of 1's as the 0th feature of X so becomes Then P(Y=1 X) becomes!10

Sigmoid function!11

Lets plot the logit function a Monotonically decreases or increases!12

Logit function Range of Logit? Relationship with x? Range of odds?!13 p/(1-p) odds of an event y given x

How to estimate parameters W = <w0,,wn>? Under GNB assumptions General case!14

How to estimate parameters W = <w0,,wn>? Under GNB assumptions General case!15

Let s consider X is a vector of real-valued features X= <X 1, X 2,, X n > Xi are conditionally independent given Y P(Y) ~ Bernoulli( )!16

!17 Using conditional independence assumption and priors

Since variables have Gaussian distribution:!18

How to estimate parameters W = <w0,,wn>? Under GNB assumptions: 1. Variables Xi are conditionally independent given Y 2. General case A. MLE B. MAP!19

Estimating parameters with MLE Conditional likelihood What's data likelihood?!20

Let's write the log conditional likelihood first Is there a conditional independence assumption here? Taking the log!21

Objective function that I want to maximize One problem: No closed form solution!!!!22

Take partial derivatives with respect to wi Step size!23

Gradient descent First order optimization Taking steps to the direction of the negative gradient of the function Suppose we want to minimize F(x) which is defined and differentiable at point z F(x) decreases the fastest I start from point z and go to the direction of the negative gradient of F(z)!24

Gradient ascent First order optimization Taking steps to the direction of the positive gradient of the function Suppose we want to maximize F(x) which is defined and differentiable at point z F(x) increases the fastest I start from point z and go to the direction of the positive gradient of F(z)!25

Gradient descent Closed form solution The function we want to minimize is the parabola show in light blue The closed form solution is available Starting from a random point on parabola gradient descent takes steps to reach a local minima!26

!27

Sum over all training examples What if I have a large training data Summation after each iteration Slow!!!!!28

Function I want to minimize Batch gradient descent Stochastic gradient descent!29

Stochastic gradient descent update rule? How to pick the training instance?!30

How to pick the learning rate? Suppose the features have varying ranges. Would that be a problem?!31

Stochastic gradient descent Faster convergence when the training data is large Learning rate should be low otherwise there is a risk of going back and forth High accuracy is hard to reach!32

Batch gradient descent Slow convergence when the training data is large Guaranteed to reach to a local minimum under certain conditions!33

How to estimate parameters W = <w0,,wn>? Under GNB assumptions General case A. MLE B. MAP!34

Estimating parameters with MAP Assume P(W) has a Gaussian distribution with zero mean identity covariance From the prior!35

Defining parameters on W corresponds to regularization Pushes parameters towards 0 Avoids large weights and over fitting!36

Generative versus Discriminative Generative Assumes a functional form of P(X Y) and P(Y) Estimates P(X Y) and P(Y) from data, uses them to calculate P(Y X) Discriminative Assumes a functional form of P(Y X) Estimates P(Y X) directly from the data!37

Performance as training data reaches infinity Decision Surfaces Naive Bayes Gaussian Naive Bayes Logistic Regression!38

Things to think about Overfitting in LR Does LR make any assumptions on P(X Y)? GNB with class independent variances is generative equivalent of LR under GNB assumptions What's the objective function of LR? Can we reach global optimum?!39

Questions?!40

Acknowledgements Some slides are taken from Tom Mitchell's 10-601 lecture notes!41