Logistic Regression. Machine Learning Fall 2018
|
|
- Mitchell Gray
- 5 years ago
- Views:
Transcription
1 Logistic Regression Machine Learning Fall
2 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes classifier 2
3 This lecture Logistic regression Connection to Naïve Bayes Training a logistic regression classifier Back to loss minimization 3
4 This lecture Logistic regression Connection to Naïve Bayes Training a logistic regression classifier Back to loss minimization 4
5 Logistic Regression: Setup The setting Binary classification Inputs: Feature vectors x 2 < d Labels: y 2 {-1, +1} Training data S = {(x i, y i )}, m examples 5
6 Classification, but The output y is discrete valued (-1 or 1) Instead of predicting the output, let us try to predict P(y = 1 x) Expand hypothesis space to functions hose output is [0-1] Original problem: < d! {-1, 1} Modified problem: < d! [0-1] Effectively make the problem a regression problem Many hypothesis spaces possible 6
7 Classification, but The output y is discrete valued (-1 or 1) Instead of predicting the output, let us try to predict P(y = 1 x) Expand hypothesis space to functions hose output is [0-1] Original problem: < d! {-1, 1} Modified problem: < d! [0-1] Effectively make the problem a regression problem Many hypothesis spaces possible 7
8 The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed ith a sigmoid function (the logistic function) ¾ What is the domain and the range of the sigmoid function? This is a reasonable choice. We ill see hy later 8
9 The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed ith a sigmoid function (the logistic function) ¾ This is a reasonable choice. We ill see hy later 9
10 The Sigmoid function The hypothesis space for logistic regression: All functions of the form That is, a linear function, composed ith a sigmoid function (the logistic function) ¾ What is the domain and the range of the sigmoid function? This is a reasonable choice. We ill see hy later 10
11 The Sigmoid function ¾(z) z 11
12 The Sigmoid function What is its derivative ith respect to z? 12
13 The Sigmoid function What is its derivative ith respect to z? 13
14 Predicting probabilities According to the logistic regression model, e have 14
15 Predicting probabilities According to the logistic regression model, e have 15
16 Predicting probabilities According to the logistic regression model, e have 16
17 Predicting probabilities According to the logistic regression model, e have Or equivalently 17
18 Predicting probabilities According to the logistic regression model, e have Or equivalently Note that e are directly modeling P(y x) rather than P(x y) and P(y) 18
19 Predicting a label ith logistic regression Compute P(y =1 x; ) If this is greater than half, predict 1 else predict -1 What does this correspond to in terms of T x? 19
20 Predicting a label ith logistic regression Compute P(y =1 x; ) If this is greater than half, predict 1 else predict -1 What does this correspond to in terms of T x? Prediction = sgn( T x) 20
21 This lecture Logistic regression Connection to Naïve Bayes Training a logistic regression classifier Back to loss minimization 21
22 Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log P(y = 1 x, ) P(y = +1 x, ) = 2 x Here, the P s represent the Naïve Bayes posterior distribution, and can be used to calculate the priors and the likelihoods. That is, P(y = 1, x) is computed using P(x y = 1, ) and P(y = 1 ) 22
23 Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log P(y = 1 x, ) P(y = +1 x, ) = 2 x But e also kno that P y = +1 x, = 1 P(y = 1 x, ) 23
24 Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function log P(y = 1 x, ) P(y = +1 x, ) = 2 x But e also kno that P y = +1 x, = 1 P(y = 1 x, ) Substituting in the above expression, e get P y = +1, x = σ 2 x = exp ( 2 x) 24
25 Naïve Bayes and Logistic regression Remember that the naïve Bayes decision is a linear function P(y = 1 x, ) log P(y = +1 x, ) = 2 x That is, both naïve Bayes and logistic regression try to compute the same posterior distribution over the outputs But e also kno that P y = +1 x, = 1 P(y = 1 x, ) Naïve Bayes is a generative model. Substituting in the above expression, e get Logistic Regression is the discriminative version. P y = +1, x = σ 2 x = exp ( 2 x) 25
26 This lecture Logistic regression Connection to Naïve Bayes Training a logistic regression classifier First: Maximum likelihood estimation Then: Adding priors à Maximum a Posteriori estimation Back to loss minimization 26
27 Maximum likelihood estimation Let s get back to the problem of learning Training data S = {(x i, y i )}, m examples What e ant Find a such that P(S ) is maximized We kno that our examples are dran independently and are identically distributed (i.i.d) Ho do e proceed? 27
28 Maximum likelihood estimation argmax P S = argmax = ; P y < x <, ) <>? The usual trick: Convert products to sums by taking log Recall that this orks only because log is an increasing function and the maximizer ill not change 28
29 Maximum likelihood estimation argmax P S = argmax Equivalent to solving = ; P y < x <, ) <>? max log P y < x <, ) < 29
30 Maximum likelihood estimation = argmax P S = argmax = ; P y < x <, ) <>? log P y < x <, ) < But (by definition) e kno that P y, x = σ y < 2 x < = exp ( y < 2 x < ) 30
31 P y, x = Maximum likelihood estimation exp ( y B 2 x < ) argmax P S = argmax max = = ; P y < x <, ) log P y < x <, ) < Equivalent to solving = log(1 + exp ( y < 2 x < ) < 31
32 Maximum likelihood estimation argmax The goal: Maximum likelihood training of a discriminative probabilistic classifier under the logistic model for the posterior distribution. P S = argmax max = = ; P y < x <, ) log P y < x <, ) < Equivalent to solving = P y, x = log(1 + exp ( y < 2 x < ) < exp ( y B 2 x < ) 32
33 Maximum likelihood estimation argmax The goal: Maximum likelihood training of a discriminative probabilistic classifier under the logistic model for the posterior distribution. P S = argmax max = = ; P y < x <, ) log P y < x <, ) < Equivalent to solving = P y, x = log(1 + exp ( y < 2 x < ) < Equivalent to: Training a linear classifier by minimizing the logistic loss exp ( y B 2 x < ) 33
34 Maximum a posteriori estimation We could also add a prior on the eights Suppose each eight in the eight vector is dran independently from the normal distribution ith zero mean and standard deviation σ E p = ; p( < ) F>? E = ; 1 σ 2π exp J < σ J F>? 34
35 MAP estimation for logistic regression E p = ; p( < ) F>? E = ; 1 σ 2π exp J < σ J F>? Let us ork through this procedure again to see hat changes 35
36 MAP estimation for logistic regression E p = ; p( < ) F>? E = ; 1 σ 2π exp J < σ J F>? Let us ork through this procedure again to see hat changes What is the goal of MAP estimation? (In maximum likelihood, e maximized the likelihood of the data) 36
37 MAP estimation for logistic regression E p = ; p( < ) F>? E = ; 1 σ 2π exp J < σ J F>? What is the goal of MAP estimation? (In maximum likelihood, e maximized the likelihood of the data) To maximize the posterior probability of the model given the data (i.e. to find the most probable model, given the data) P S P S P() 37
38 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax E = ; 1 σ 2π exp J < σ J F>? P( S) = argmax P S P() 38
39 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() 39
40 MAP estimation for logistic regression We have already expanded out the first term. log(1 + exp ( y < 2 x < ) < E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() 40
41 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() log(1 + exp ( y < 2 x < ) < E < J F>? σ J + constants Expand the log prior 41
42 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() = log(1 + exp ( y < 2 x < ) < E < J F>? σ J + constants 42
43 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() = log(1 + exp ( y < 2 x < ) 1 σ J 2 < 43
44 MAP estimation for logistic regression E p = ; p( < ) F>? Learning by solving argmax Take log to simplify max E = ; 1 σ 2π exp J < σ J F>? P S P() log P S + log P() = log(1 + exp ( y < 2 x < ) < 1 σ J 2 Maximizing a negative function is the same as minimizing the function 44
45 Learning a logistic regression classifier Learning a logistic regression classifier is equivalent to solving = log(1 + exp ( y < 2 x < ) + 1 σ J 2 < 45
46 Learning a logistic regression classifier Learning a logistic regression classifier is equivalent to solving = log(1 + exp ( y < 2 x < ) + 1 σ J 2 < Where have e seen this before? 46
47 Learning a logistic regression classifier Learning a logistic regression classifier is equivalent to solving = log(1 + exp ( y < 2 x < ) + 1 σ J 2 < Where have e seen this before? The first question in the homeork: Write don the stochastic gradient descent algorithm for this? Historically, other training algorithms exist. In particular, you might run into LBFGS 47
48 Logistic regression is A classifier that predicts the probability that the label is +1 for a particular input The discriminative counter-part of the naïve Bayes classifier A discriminative classifier that can be trained via MAP or MLE estimation A discriminative classifier that minimizes the logistic loss over the training set 48
49 This lecture Logistic regression Connection to Naïve Bayes Training a logistic regression classifier Back to loss minimization 49
50 Learning as loss minimization The setup Examples x dran from a fixed, unknon distribution D Hidden oracle classifier f labels examples We ish to find a hypothesis h that mimics f The ideal situation Define a function L that penalizes bad hypotheses Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknon Instead, minimize empirical loss on the training set 50
51 Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? 51
52 Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? Overfitting! We need something that biases the learner toards simpler hypotheses Achieved using a regularizer, hich penalizes complex hypotheses 52
53 Regularized loss minimization Learning: With linear classifiers: (using l2 regularization) What is a loss function? Loss functions should penalize mistakes We are minimizing average loss over the training data What is the ideal loss function for classification? 53
54 The 0-1 loss Penalize classification mistakes beteen true label y and prediction y For linear classifiers, the prediction y = sgn( T x) Mistake if y T x 0 Minimizing 0-1 loss is intractable. Need surrogates 54
55 The loss function zoo Many loss functions exist Perceptron loss Hinge loss (SVM) Exponential loss (AdaBoost) Logistic loss (logistic regression) 55
56 The loss function zoo 56
57 The loss function zoo Zero-one 57
58 The loss function zoo Hinge: SVM Zero-one 58
59 The loss function zoo Perceptron Hinge: SVM Zero-one 59
60 The loss function zoo Hinge: SVM Perceptron Exponential: AdaBoost Zero-one 60
61 The loss function zoo Hinge: SVM Perceptron Exponential: AdaBoost Zero-one Logistic regression 61
62 The loss function zoo Zoomed out 62
63 The loss function zoo Zoomed out even more 63
Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationCOS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007
COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationLecture 3a: The Origin of Variational Bayes
CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationNatural Language Processing (CSEP 517): Text Classification
Natural Language Processing (CSEP 517): Text Classification Noah Smith c 2017 University of Washington nasmith@cs.washington.edu April 10, 2017 1 / 71 To-Do List Online quiz: due Sunday Read: Jurafsky
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationPredicting Sequences: Structured Perceptron. CS 6355: Structured Prediction
Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationEE 511 Online Learning Perceptron
Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Kevin Jamison EE 511 Online Learning Perceptron Instructor: Hanna Hajishirzi hannaneh@washington.edu
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More information