Generative v. Discriminative classifiers Intuition

Similar documents
Generative v. Discriminative classifiers Intuition

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Bias-Variance Tradeoff

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Gaussian Naïve Bayes Big Picture

Generative v. Discriminative classifiers Intuition

ECE 5984: Introduction to Machine Learning

Machine Learning

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Stochastic Gradient Descent

Machine Learning

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Machine Learning

Classification Logistic Regression

Machine Learning, Fall 2012 Homework 2

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Logistic Regression Logistic

Support Vector Machines

Machine Learning

Introduction to Machine Learning

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Ad Placement Strategies

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Week 5: Logistic Regression & Neural Networks

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

CSE546: Naïve Bayes Winter 2012

Estimating Parameters

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Lecture 2: Logistic Regression and Neural Networks

CSE446: Clustering and EM Spring 2017

Linear Models for Classification

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Logistic Regression. Machine Learning Fall 2018

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

SVMs, Duality and the Kernel Trick

Linear discriminant functions

Undirected Graphical Models

Optimization and Gradient Descent

Gaussian Mixture Models

Support Vector Machines

Linear classifiers: Logistic regression

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Logistic Regression & Neural Networks

Regression with Numerical Optimization. Logistic

Linear classifiers: Overfitting and regularization

Introduction to Logistic Regression and Support Vector Machine

Iterative Reweighted Least Squares

CS 340 Lec. 16: Logistic Regression

Probabilistic Graphical Models

Lecture 5: Logistic Regression. Neural Networks

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Linear & nonlinear classifiers

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

PAC-learning, VC Dimension and Margin-based Bounds

Machine Learning

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

CPSC 540: Machine Learning

Logistic Regression. Seungjin Choi

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Logistic Regression Trained with Different Loss Functions. Discussion

CSC 411: Lecture 04: Logistic Regression

Machine Learning - Waseda University Logistic Regression

10-701/ Machine Learning - Midterm Exam, Fall 2010

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Classification: Logistic Regression from Data

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

ECE 5424: Introduction to Machine Learning

Naïve Bayes classification

Notes on Machine Learning for and

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Linear and logistic regression

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Hidden Markov Models

ECE521 week 3: 23/26 January 2017

CMU-Q Lecture 24:

Introduction to Machine Learning

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

PAC-learning, VC Dimension and Margin-based Bounds

Introduction to Machine Learning

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

The Naïve Bayes Classifier. Machine Learning Fall 2017

Transcription:

Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target classes Bayes optimal classifier P(Y X) Generative classifier, e.g., Naïve Bayes: Assume some functional form for P(X Y), P(Y) Estimate parameters of P(X Y), P(Y) directly from training data Use Bayes rule to calculate P(Y X= x) This is a generative model Indirect computation of P(Y X) through Bayes rule But, can generate a sample of the data, P(X) = y P(y) P(X y) Discriminative classifiers, e.g., Logistic Regression: Assume some functional form for P(Y X) Estimate parameters of P(Y X) directly from training data This is the discriminative model Directly learn P(Y X) But cannot obtain a sample of the data, because P(X) is not available 2

Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly! Assume a particular functional form Sigmoid applied to a linear function of the data: Z Features can be discrete or continuous! 3 Understanding the sigmoid w 0 =-2, w =- w 0 =0, w =- w 0 =0, w =-0.5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0-6 -4-2 0 2 4 6 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0-6 -4-2 0 2 4 6 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0-6 -4-2 0 2 4 6 4 2

Logistic Regression a Linear classifier 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0-6 -4-2 0 2 4 6 5 Very convenient! implies implies implies linear classification rule! 6 3

What if we have continuous X i? Eg., character recognition: X i is i th pixel Gaussian Naïve Bayes (GNB): Sometimes assume variance is independent of Y (i.e., σ i ), or independent of X i (i.e., σ k ) or both (i.e., σ) 7 Example: GNB for classifying mental states [Mitchell et al.] ~ mm resolution ~2 images per sec. 5,000 voxels/image non-invasive, safe 0 sec measures Blood Oxygen Level Dependent (BOLD) response Typical impulse response 8 4

Learned Bayes Models Means for P(BrainActivity WordCategory) Pairwise classification accuracy: 85% People words Animal words [Mitchell et al.] 9 Logistic regression v. Naïve Bayes Consider learning f: X Y, where X is a vector of real-valued features, < X X n > Y is boolean Could use a Gaussian Naïve Bayes classifier assume all X i are conditionally independent given Y model P(X i Y = y k ) as Gaussian N(µ ik,σ i ) model P(Y) as Bernoulli(θ,-θ) What does that imply about the form of P(Y X)? Cool!!!! 0 5

Derive form for P(Y X) for continuous X i Ratio of class-conditional probabilities 2 6

Derive form for P(Y X) for continuous X i 3 Gaussian Naïve Bayes v. Logistic Regression Set of Gaussian Naïve Bayes parameters (feature variance independent of class label) Set of Logistic Regression parameters Representation equivalence But only in a special case!!! (GNB with class-independent variances) But what s the difference??? LR makes no assumptions about P(X Y) in learning!!! Loss function!!! Optimize different functions Obtain different solutions 4 7

Logistic regression for more than 2 classes Logistic regression in more general case, where Y {Y... Y R } : learn R- sets of weights 5 Logistic regression more generally Logistic regression in more general case, where Y {Y... Y R } : learn R- sets of weights for k<r for k=r (normalization, so no weights for this class) Features can be discrete or continuous! 6 8

Loss functions: Likelihood v. Conditional Likelihood Generative (Naïve Bayes) Loss function: Data likelihood Discriminative models cannot compute P(x j w)! But, discriminative (logistic regression) loss function: Conditional Data Likelihood Doesn t waste effort learning P(X) focuses on P(Y X) all that matters for classification 7 Expressing Conditional Log Likelihood 8 9

Maximizing Conditional Log Likelihood Good news: l(w) is concave function of w no locally optimal solutions Bad news: no closed-form solution to maximize l(w) Good news: concave functions easy to optimize 9 Optimizing concave function Gradient ascent Conditional likelihood for Logistic Regression is concave Find optimum with gradient ascent Gradient: Update rule: Learning rate, η>0 Gradient ascent is simplest of optimization approaches e.g., Conjugate gradient ascent much better (see reading) 20 0

Maximize Conditional Log Likelihood: Gradient ascent 2 Gradient Descent for LR Gradient ascent algorithm: iterate until change < ε For i = n, repeat 22