Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
|
|
- Marcus Rich
- 5 years ago
- Views:
Transcription
1 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012
2 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative Naïve Bayes 2
3 Classification problem Given: Training set labeled set of N input-output pairs D = x i, y i i=1 y *1,, K+ N Goal: Given an input x, assign it to one of K classes Examples: Spam filter Handwritten digit recognition 3
4 Types of classifiers Discriminant function f(x) shows the class of the input x Probabilistic classification approaches can be divided in two main categories: Generative Discriminative 4
5 Target coding scheme Target values: Binary classification: a target variable y 0,1 Multiple classes (K > 2): y is a vector of length K (1-of-K) TargetClass C j : y j = 1 i j y i = 0 e.g., y = 0,0,1,0 T 5
6 Linear classifiers Linear classifiers: Decision boundaries are linear functions d 1 dimensional hyper-plane within the d dimensional input space. Linearly separable: Data points can be exactly classified by linear decision surfaces. We start by binary linear classification 6
7 Discriminant functions f x; w = w T x x = 1 x 1 x 2 x d w =,w 0 w 1 w 2 w d - w 0 : bias if f x; w = w T x 0 then C 1 else C 2 Decision boundary: f x; w = 0 f x; w predicts discrete class labels 7
8 Decision boundary Linear x x 1 + x 2 = 0 if w T x 0 then y = 1 else y = x 1 w =,3, 0.75, 1-8
9 Non-linear decision boundary Choose non-linear features Classifier still linear in parameters w x x x 2 2 = 0 1 φ x =,1, x 1, x 2, x 1 2, x 2 2, x 1 x 2 - w =, 1, 0, 0,1,1, x 1 if w T φ(x) 0 then y = 1 else y = 0 x =,x 1, x 2-9
10 SSE cost function Two classes: Is it suitable for classification? J W = Xw y 2 X = 1 x 1 (1) 1 (n) 1 x 1 x 1 (2) (1) x d x (2) d x d (n) w = w 0 w 1 w d 10
11 Lest squares vs. logistic regression We will see Logistic regression in the next slides Least squares Logistic regression Least square penalizes too correct predictions Least squares also lack robustness to noise 11
12 SSE cost function Two classes: J W = Xw y 2 X = 1 x 1 (1) 1 (n) 1 x 1 x 1 (2) (1) x d x (2) d x d (n) w = w 0 w 1 w d 12
13 SSE cost function Two classes: g: a sigmoid function Can sigmoid solve the problem? J W = g(xw) y 2 X = 1 x 1 (1) 1 (n) 1 x 1 x 1 (2) (1) x d x (2) d x d (n) w = w 0 w 1 w d 13
14 Multi-class classification Two-class classification Multi-class classification x 2 x 2 x 1 x 1 14
15 Multi-class classification One-vs-all (one-vs-rest) x 2 x 1 x 2 x 2 15 Class 1: Class 2: Class 3: x 1 x 2 x 1 x
16 Multi-class approaches One-vs-all All-vs-all 16
17 Multiple class y *1,2,, K+ if f i x > f j x j i then decide C i 17
18 SSE cost function Multiple classes: J W = Tr XW Y T XW Y X = 1 x 1 (1) 1 (n) 1 x 1 x 1 (2) (1) x d x (2) d x d (n) W = w 1 w K Y = y 1 y K 18
19 Lest squares vs. logistic regression Least squares Logistic regression 19
20 Logistic regression More general than discriminant functions: f x; w predicts posterior probabilities P y = 1 x 20
21 Logistic regression (Cont d) f x; w = g(w T x) g. is an activation function Sigmoid (logistic) function Activation function g z = e z 21
22 Logistic regression (Cont d) 0 f x; w 1 estimated probability of y = 1 on input x (y = 0 or y = 1) f x; w : probability that y = 1 given x, parameterized by w f x; w = P y = 1 x, w P y = 1 x, w = 1 P y = 0 x, w 22
23 Logistic regression (Cont d) Example: Cancer (Malignant, Benign) f x; w = % chance of tumor being malignant 23
24 Logistic regression: Decision surface Decision surface f x; w = constant g w T 1 x = = constant 1+e wt x Decision surfaces are linear functions of x Generalized linear models More complex analytical and computational properties than linear regression 24
25 Logistic regression: Decision surface (Cont d) if f x; w 0.5 then y = 1 else y = 0 Equivalent to if w T x 0 then y = 1 else y = 0 25
26 Logistic regression: cost function Is SSE is a proper cost function for classification? J w = 1 n Is it convex? n i=1? y i f x i ; w 2 Is it a proper for classification problem? Are conditional distributions p y x, w in the classification problem Gaussian? 26
27 Logistic regression: loss function Loss y, f x; w = log(f(x; w)) if y = 1 log(1 f x; w ) if y = 0 Is it related to? 1 y y Loss y, y = 0 y = y 27
28 Logistic regression: loss function y = 1 or y = 0 Loss y, f x; w = y log f x; w (1 y) log(1 f x; w ) 1 f x; w = 1 + exp( w T x) 28
29 Logistic regression: cost function Loss y, f x; w = log p y f(x; w) when assuming Bernoulli distribution for p y w, x p y w, x = f x; w y (1 f x; w ) (1 y) Maximum (conditional) log likelihood: log p D y D x, w = log p y (i) w, x (i) n i=1 29
30 Logistic regression: cost function J w = 1 n = 1 n n i=1 n i=1 Loss y i, f x i ; w y (i) log f x (i) ; w (1 y (i) )log 1 f x (i) ; w No closed form solution for min w J(w) However J(w) is convex. 30
31 Gradient descent w t+1 = w t η w J(w t ) w J w = 1 n n i=1 (f x i ; w y i )x i Similar to gradient of SSE for linear regression? 31
32 Logistic regression example 32 Figure has been adapted from Jaakkola s slides
33 Generalization to multi-class Train a logistic regression classifier f i x; w for each class i f i x; w predicts the probability of y = i (i.e., P(y = i x, w)). On a new input x, to make a prediction, pick the class that maximizes f i x; w y x = argmax i f i x; w 33
34 Logistic regression: multi-class K > 2 p y = k x = exp (w k T x ) K exp (w T j x ) j=1 Normalized exponential (aka softmax) If w k T x w j T x for all j k then p(c k x) 1, p(c j x) 0 34 p C k x = p x C k p(c k ) K p x C j p(c j ) j=1
35 Logistic Regression (LR): summary LR is a linear classifier LR optimization problem is obtained by maximum likelihood when assuming Bernoulli distribution for conditional probabilities No closed-form solution But convex cost function and global optimum with gradient ascent 35
36 Perceptron algorithm Linear discriminant model Two-class: y * 1,1+ y = 1 for C 2, y = 1 for C 1 Goal: i, x (i) C 1 w T x (i) > 0 i, x i C 2 w T x i < 0 f x; w g z = = g(w T x) 1, z < 0 1, z 0 36
37 Perceptron architecture f(x) 37
38 Perceptron criterion J P w = w T x i y i i M M: subset of training data that are misclassified Assumption: Classes are linearly separable 38
39 Some classification criteria J(w) J P (w) w 1 w 0 w 1 w 0 misclassification Perceptron 39 Figure source: Duda s Book
40 Some classification criteria J P (w) J q (w) w 1 w 0 w 1 w 0 Perceptron Least squares 40 Figure source: Duda s Book
41 Stochastic gradient descent for Perceptron w t+1 = w t η w J P (w t ) w J P w = x i y i Online perceptron: If x (i) is misclassified: i M w t+1 = w t + ηx (i) y (i) Perceptron convergence theorem: for linearly separable data Many solutions? Which solution among them? 41
42 Convergence of Perceptron Change w in a direction that corrects the error 42
43 Bayes decision theory Bayes theorem p C k x = p x C k p(c k ) p(x) Posterior probability: p C k x Likelihood: p x C k Prior probability: p(c k ) 43
44 Bayes decision theory Bayes decision: Choose the class with highest p C k x Cost function: probability of misclassification minimizing the chance of assigning x to the wrong class Can be extended to minimizing loss instead of misclassification error 44
45 Minimizing misclassification rate R k : Decision regions All points in R k are assigned to class C k p mistake = p x R 1, C 2 + p x R 2, C 1 = p x, C 2 R 1 dx + p x, C 2 R 2 dx Choose class with highest p C k x 45
46 Probability of correct classification Multi-class K p correct = p x R k, C k k=1 K = p x, C k dx k=1 R k p correct = 1 p mistake 46
47 Discriminative vs. generative approach 47
48 Generative approach Inference stage Determine P(x C k ) for each class individually. Determine P(C k ) Use the Bayes theorem to find P(C k x) Decision stage: After learning the model (inference stage), make optimal class assignment for new input if P C i x > P C j x j i then decide C i It models the distribution of inputs as well as outputs. Can generate synthetic data points 48
49 Discriminative approach Inference stage Determine the posterior class probabilities P(C k x) directly Decision stage: After learning the model (inference stage), make optimal class assignment for new input if P C i x > P C j x j i then decide C i Example: Logistic regression 49
50 Generative approach: example p x C k = k = 1,2 1 2π d/2 Σ 1/2 exp * 1 2 x μ k T Σ 1 x μ k + p C 1 = p, p C 2 = 1 p 50
51 Generative approach: example p x C k = k = 1,2 1 2π d/2 Σ 1/2 exp * 1 2 x μ k T Σ 1 x μ k + p C 1 = p, p C 2 = 1 p Maximum likelihood estimation (D = x i, y i n i=1 ): 51 p = n 1 n μ 1 = n i=1 y (i) x (i) n 1, μ 2 = S = n 1 n S 1 + n 2 n S 2 S k = 1 n n n i=1 (1 y (i) )x (i) n 2 x μ k x μ T i=1 k (k = 1,2)
52 Class conditional densities vs. posterior 52
53 Maximum likelihood on what? Generative models: Data likelihood P(X, Y w) Compute P(X w) Discriminative models: Conditional Data Likelihood Learn P(Y X, w) required for classification Doesn t waste effort learning P(X w) 53
54 Discriminative vs. generative: number of parameters d-dimensional feature space Logistic regression: d + 1 parameters w = (w 0, w 1,.., w d ) Generative approach: Gaussian class-conditional densities with shared covariance matrix 2d parameters for means d(d + 1)/2 parameters for shared covariance matrix one parameter for class prior p(c 1 ). 54
55 Naïve Bayes classifier Generative methods High number of parameters Conditional independence assumption p x C k = p x 1 C k p x 2 C k p x d C k 55
56 Naïve Bayes classifier p x C k = p x 1 C k p x 2 C k p x d C k p C k x p(c k ) p(x i C k ) n i=1 Two-class Gaussian Naïve Bayes: 2d + 1 parameters 56
57 Naïve Bayes: discrete example p PlayTennis = Yes = 9 14 = 0.64 p PlayTennis = No = 5 14 = 0.36 p Outlook = Sunny PlayTennis = Yes = 3 9 = p Outlook = Sunny PlayTennis = No = 3 5 =
58 Naïve Bayes: discrete example x = Sunny, Cool, Hig, Strong p Yes x = p Yes p Sunny Yes P Cool Yes P Hig Yes P Strong Yes = p No x = p No p Sunny Yes P Cool Yes P Hig Yes P Strong Yes =
59 Bayes optimal classifier Training Data: D = x i, y i i=1 H: Hypothesis space n 59 p C k x, D = p C k, x p( D) h H n p D = p y i x i, i=1 p( D) p D p()
60 Summary of alternatives Generative Most demanding, because it finds the joint distribution p(x, C k ) Usually needs a large training set to find p(x C k ) Can find p(x) Outlier or novelty detection Discriminative Specifies what is really needed (i.e., p(c k x)) More computationally efficient Discriminant function Does not have many capabilities of probabilistic methods 60
Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationPart 4: Conditional Random Fields
Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationEE 511 Online Learning Perceptron
Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Kevin Jamison EE 511 Online Learning Perceptron Instructor: Hanna Hajishirzi hannaneh@washington.edu
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More information