Linear Models for Classification
|
|
- Noreen Carroll
- 5 years ago
- Views:
Transcription
1 Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning
2 Congradulations!!!! You have just inherited an old house from your great grand aunt on your mothers side twice removed by marriage (and once by divorce). There is an amazing collection of books in the old library (and in almost every other room in the house) containing every thing from old leather bounded tomes and crinkly old parchments to the newer dust jacket bound best sellers and academic text books along with a sizable collection of paper back pulp fiction and DC comic books. Yep, old Aunt Lacee was a collector and you have to clean out the old house before you can sell it.
3 Your Mission Being the over worked (and underpaid) student that you are, you have limited time to spend on this task... but because you spend your leasuire time listening to NPR ( the Book Guys ) you know that there is money to be made in old books. In other words, you need to quickly determine which books to throw out (or better still recycle), which to sell, and which to hang onto as an investment.
4 The Task From listening to The Book Guys you know thqt there are many aspects of a book that determine its present value which will help you determine if you wish to toss, sell or keep it. These aspects include: date published author topic condition genre presence of dust jacket number of volume know to be published etc... And to your advantage, you have just completed a course in machine learning so you recognize that what you have is a straight forward classification problem.
5 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
6 Outline Defining the problem Approaches in modeling 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
7 Classification Defining the problem Approaches in modeling Problem Components - A group, X, of items, x with common characteristics with specific values assigned to these characterisitcs: values can be nominal, numeric, discrete or continuous. - A set of disjoint classes, into which we wish to place each of the above items. - A function that assigns each item to one and only one of these disjoint classes. Classification: Assigning each item to one discrete class using a function devised specifically for this purpose.
8 Structure of items Defining the problem Approaches in modeling Each item can be represented as a D-dimensional vector, x = {x 1, x 2,..., x D }, where D is the number of aspects, attributes or value fields used to described the item. Aunt Lacee s Collection - Items to be classified are books, comics and parchments, each of which has a set of values attached to it (type, title, publish date, genre, conditions,...) Sample items from Aunt Lace s Collection: x = { book, Origin of Species, 1872, biology, mint,...} x = { parchment, Magna Carta, 1210, history, brittle,...}
9 Structure of Classes Defining the problem Approaches in modeling A set of K classes, C = {c 1, c 2,..., c K }, where each x can belong to only one class Input space is divided into K decision areas, each area corresponding to a class Boundries of decision areas are decision boundaries or decision surfaces In linear classification models these surfaces are linear function of x In other words, these surfaces are defined by (D 1)-dimensional hyperplanes within the D-dimensional input space.
10 Defining the problem Approaches in modeling Example: two dimensions, two classes x 2 y > 0 y = 0 y < 0 R 1 R 2 w x y(x) w x x 1 w 0 w
11 Structure of Classes Defining the problem Approaches in modeling For Aunt Lacee s book collection, K =3 c 1 = no value - books with no value which will be recycled c 2 = sell immediately - books with immediate cash value such as current text books and best sellers which will be sold quickly. c 3 = keep - these books (or parchments or comics ) have museum quality price tags and require time in order to place properly (for maximum profit). Each item of the collection will be assigned one and only one class. By their very nature, they are mutually exclusive.
12 Defining the problem Approaches in modeling Representation of a K Class Label Let t be a vector of length K, used to represent a class label. Each element t k of t is 0 except for element i when x c i For Aunt Lacee s collection, the values of t are as follows: t i = {1, 0, 0} indicates x i c 1 and should be recycled. t i = {0, 1, 0} indicates x i c 2 and should be sold. t i = {0, 0, 1} indicates x i c 3 and should be kept. A binary class is a special case, needing only a single dimension vector. t = {0} indicates x i c 0 t = {1} indicates x i c 1
13 Outline Defining the problem Approaches in modeling 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
14 Approaches to the problem Defining the problem Approaches in modeling Three approaches to finding the function for our classification problem - The simplest approach is a function which directly assigns each x to one c i C Probabilistic - Separates the inference stage from the decision stage. - In the inference stage, the conditional probability distribution, p(c k x), is modeled directly. - In decision stage, class is assigned based on these distributions.
15 Approaches to the problem Defining the problem Approaches in modeling Probabilistic Generative Functions - Both the class conditional probability distribution, p(x C k ) as well as the prior probabilities p(c k ), are modeled and used to compute posterior probabilites using bayes theorem. p(c k x) = p(x C k) p(c k ) p(x) - This model develops the probability densities of the input space such that new examples can accurately be generated.
16 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
17 Two class problem y(x) = w T x + w 0 where w is a weight vector of same dimension D as x. w 0 is the bias or threshold (-w 0 ). An input vector x is assigned to one of the two classes as follows: { c0 if y(x) < 0 x c 1 if y(x) 0 Decision boundary will be a hyperplane in D 1 dimensions
18 Matrix notation As a reminder of the convention: vectors are column matrices where w 1 w 2 w = so w T = [ ] w 1 w 2 w D. and w D w T x = [ ] w 1 w 2 w D x 1 x 2. x D = w 1x 1 +w 2 x 2 + +w D x D
19 Example: two dimensions, two classes x 2 y > 0 y = 0 y < 0 R 1 R 2 w x y(x) w x x 1 w 0 w y(x) = w T x + w 0
20 Multi-class (K > 2) K-class discriminant comprised of K functions of the form y k (x) = w T k + w k0 Assign input vector as follows x c k where k = argmax y k (x) k {1,2...} R j R i R k x A ˆx x B
21 Learning parameter w Three techniques for learning the parameter of the discriminant function, w. Least Squares Fisher s linear discriminant
22 Least Squares Once again, each class, C k, has it s own linear model : y k (x) = w T k x + w k0. As a reminder of the convention: vectors are column matrices w 1 w 2 w = so w T = [ ] w 1 w 2 w D. w D
23 Least Squares, compact notation Let W be a D + 1 K matrix whose columns represent the column vector w: w 0 w 1 w =. w D Let x be a D column matrix (1, x T ) T : x = 1 x 1. x D
24 Least Squares, compact notation The individual class discriminant functions y k (x) = w T k x + w k0 can be written y(x) = W T x
25 Least Squares, determining W W is determined by minimizing a sum-of-squares error function whose form is given as: E(w) = 1 2 N {y(x n, w) t n } 2 n=1 Let X be a n (D + 1) matrix representing a training set of n examples. Let T be a n k matrix representing the targets for the n training examples
26 Least Squares, determining W This yields the expression E D ( W) = 1 {( X 2 Tr W T) T ( X W } T) To minimize, take the derivative with respect to W and set to zero to obtain W = ( X T X) 1 XT T = X T and finally the discriminant function y(x) = W T x = T T ( X ) T x
27 Least Squares, considerations Under certain conditions, this model will have the property that the elements of y(x)will sum to 1 for any value of x. However, since they are not constraint to lay on the interval (0, 1), meaning that negative numbers and numbers larger than 1 might occur, then the elements cannot be treated as probabilities. Among other disadvantages, this approach has an inappropriate response to outliers.
28 Least Squares - Response to outliers a) Well separated b) In the presence of outliers several misclassified examples
29 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
30 Fisher s Linear Discriminant in concept 4 An approach that reduces the dimensionality of the model by projecting the input vector to a reduced dimension space. Simple example, two dimensional input vectors, projected down to one
31 Fisher s Linear Discriminant Start with a two class problem: y = w T x whose class mean vectors are given as. m 1 = 1 N 1 Choose w to maximize n C 1 x n and m 2 = 1 N 2 m 2 m 1 = w T (m 2 m 1 ) n C 2 x n
32 Fisher s Linear Discriminant Maximizing the separation of the mean for each class However, classes still overlap
33 Fisher s Linear Discriminant Add the condition of minimizing the within-class variance, which is given as s 2 k = n C k (y n m k ) 2 Fishers criterion is based on the maximization of separation of class mean with minimized within-class variance. These two conditioned are captured in the ratio between the variance of the class means and the within-class variance, given by J(w) = (m 2 m 1 ) 2 s S2 2
34 Fisher s Linear Discriminant Casting this ratio back into terms of the original frame of reference, J(w) = wt S B w w T S B w where S B = (m 2 m 1 )(m 2 m 1 ) T and S w = n C 1 (x n m 1 )(x n m 1 ) T + n C 2 (x n m 2 )(x n m 2 ) T Take derivative with respect to w and set to zero to find minimum.
35 Fisher s Linear Discriminant derivative : (w T S B w)s w w = (w T S w w)s B w Only direction of w is important w S 1 w (m 2 m 1 ) To make this a discriminant function, y 0 is chosen so that { C1 if y(x) y x 0 C 2 if y(x) < y 0
36 Fisher s Linear Discriminant Second consideration: minimize the variance within-class Two classes nicely separated in the dimensionally reduced space
37 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
38 This model takes the form y(x) = f (w T Φ(x)) - where Φ is a transformation function that creates the feature vector from the input vector. We will use the indentity transformation function to begin our discussion. - where f ( ) is given by { +1, a 0 f (a) = 1, a < 0.
39 - The binary problem There is a change in target coding: t is now a scaler, taking the values or either 1 or -1. This value is interpreted as the input vector belonging to C 1 if t = 1, else C 2 when t = 1. In considering w, we want which means we want x n C 1 w T Φ(x n ) > 0 and x n C 2 w T Φ(x n ) 0 x n X, w T Φ(x n )t n > 0
40 - weight update The perceptron error E p (w) = w T Φ n t n n M The perceptron update rule if x is misclassified w (τ+1) = w (τ) η E p (w) = w (τ) + ηφ n t n
41 - example a) Misclassified example b) w after update
42 - example a) Next misclassified example b) w after update
43 - consideration The update rule is guaranteed to reduce the error from that specific example It does not guarantee to reduce the error contribution from the other misclassified examples. Could change previously correctly classified example to misclassified. However, the perceptron convergence theorem does guarantee to find an exact solution if one exists It will find this exact solution in a finite number of steps.
44 review We have seen
45 Outline Logistic Regression 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
46 Logistic Regression A Logit - What is it when it s at home A logit is simply the natural log of the odds Odds are simply the ratio of two probabilites In a binary classification problem, the sum of the two posterior probabilities sum to 1 If p(c 1 x) is the probability that x belongs to c 1, then p(c 2 x) = 1 p(c 1 x). So the odds are odds = p(c 1 x) 1 p(c 1 x)
47 A Logit - What benefits Logistic Regression Example: if an individual is 6 foot tall, then according to census data that probability that the individual is male is 0.9. This makes the probability of being female = 0.1 The odds on being male are 0.9/0.1 = 9. However, the odds of being female are 0.1/0.9 =.11 The lack of symmetry is unappealing. Intuition would appreciate the odds on being female being the opposite of the odds on being male.
48 A Logit - linear model Logistic Regression The natural log supplies this symmetry: ln(9.0) = ln(0.1) = Now, if we assume that the logit is linear with respect to x we have ( ) P logit(p) = ln = a + Bx 1 P where a and B are parameters.
49 from logit to sigmoid Logistic Regression From Exponentiate both sides ( ) P logit(p) = ln = a + Bx 1 P P = (1 P)e (a+bx) = e (a+bx) Pe (a+bx) P + Pe (a+bx) = e (a+bx) P(1 + e (a+bx) ) = e (a+bx) P = e(a+bx) 1 + e (a+bx) = e (a+bx) where a is the probability when x is zero and B adjusts the rate that the probability changes with x.
50 The sigmoid Logistic Regression Sigmoid mean S-shaped Also called a squashing function because it maps a very large domain into the relatively small interval (0, 1)
51 The model Logistic Regression The posterior probability of C 1 can be written p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) w must be learned by adjusting its M components ( input vector has length M) weight update: w (τ+1) = w (τ) η E n where N E(w) = (y n t n )Φ n and E n = (y n t n )Φ n n=1
52 Maximum Likelihood Logistic Regression Maximum likelihood: the probability p(t w), which read the probability of the observed data set given the parameter vector w. This can be calculated by taking the product of individual probabilities of the class assigned to each x n D agreeing with t n. N p(t w) = p(c n = t n x) where t n = {0, 1} and { p(c n = t n x) = n=1 p(c 1 Φ n ) if c n = 1 1 p(c 1 Φ n ) if c n = 0
53 Maximum Likelihood Logistic Regression Since the target is either 1 or 0, this allows for a mathematically convenient expression for this product p(t w) = N (p(c 1 Φ n )) tn (1 p(c 1 Φ n )) (1 tn) n=1 From p(c 1 Φ) = y(φ) = σ(w T Φ) p(t w) = N (y(φ)) tn (1 y(φ)) (1 tn) n=1
54 Maximum Likelihood and error Logistic Regression The negative log of the maximum likelihood function is E(w) = ln p(t w) = The gradient of this is N (t n ln y n + (1 t n ) ln(1 y n )) n=1 E(w) = d ( ln p(t w)) dw
55 Maximum Likelihood and error Logistic Regression = E(w) = N n 1 = = N n=1 N n=1 d dw (t n ln y n + (1 t n ) ln(1 y n )) t n dy n y n dw + (1 t n) d(1 y n ) (1 y n ) dw ( tn Φ n y n (1 y n ) + (1 t ) n) y n (1 y n ) ( Φ ny n (1 y n )) N (t n t n y n y n + t n y n )Φ n = n=1 N (y n t n )Φ n n=1
56 Logistic regression model Logistic Regression The model based on maximum likelihood p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) weight update based on gradient of maximum likelihood: w (τ+1) = w (τ) η E n = w (τ) + η((y n t n )Φ n )
57 A new model Logistic Regression The model based on literative reweighted least squares p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) weight update based on a Newton-Raphson iterative optimization scheme: w new = w old H 1 E(w) The Hessian H, is a matrix whose elements are are the second derivatives of E(w) with respect to w. This is an Numerical analysis technique which is an alternative to the first one covered. Faster convergence at the cost of more computationaly expense steps is the trade off.
58 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
59 Modeling conditional class probabilities Bayes Theorem Discrete Features Probabilistic Generative Models: The approach This approach tends to be more computationally expensive. The training data and any information on the distribution of the training data within input space is used to model the class conditional probabilities. Then using Bayes Theorem, the posterior probability is calculated. The descision of label is made by choosing the maximum posterior probability.
60 Modeling conditional class probabilities Bayes Theorem Discrete Features Modeling class conditional probabilities with prior probabilities The class conditional probability is given by p(x c k ) and is read the probability of x given the class c k. The prior probability p(c k ) which is the probability of c k independent of any other variable. The probability p(x n, c 1 ) = p(c 1 )p(x n c 1 )
61 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features Let t n = 1 c 1 and t n = 0 c 2 Let p(c 1 ) = π so p(c 2 ) = 1 π Let each class have a Gaussian class-conditional density with shared covariance matrix. { 1 1 N (x µ, Σ) = (2π) D/2 exp 1 } Σ 1/2 2 (x µ)t Σ 1 (x µ) where µ is a D-dimensional mean vector, Σ is a D D covariance matrix, and Σ is the determinant of Σ.
62 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features The conditional probabilities for each class are p(c 1 )p(x n c 1 ) = πn (x µ 1, Σ) p(c 2 )p(x n c 2 ) = (1 π)n (x µ 2, Σ) The likelihood function is given by N p(t π, µ 1, µ 2, Σ) = [πn (x n µ 1, Σ)] tn [(1 π)n (x n µ 2, Σ)] 1 tn n=1
63 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features The error for this is the negative log of the likelihood N (t n ln π + (1 t n ) ln(1 π)) n=1 We minimize this by setting the derviative with respect to π to zero and solve for π. π = 1 N N n=1 t n = N 1 N = N 1 N 1 + N 2
64 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
65 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(c k x) = P(x c k)p(c k ) P(x) P(x) is the prior probability that x will be observed, meaning the probability of x given no knowledge about which c k is observed. It can be seen that as P(x) increases,p(c k x) decreases, indicating that the higher a probability of an incident independent of any other factor, the lower the probability of that incident dependent on another condition.
66 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(x c k ) is the class conditional probability that x will be observed once class c k is observed. Both P(x c k ) and P(c k ) have been modeled Now P(c k x), A posterior probability, can be calculated The label is assigned as the class that generates the Maximum A Posterior (MAP) probability for the input vector P(x c k )P(c k ) c MAP argmax P(c k x) = argmax c k C c k C P(x) c MAP argmax P(x c k )P(c k ) c k C
67 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(x c k ) is the class conditional probability that x will be observed once class c k is observed. Both P(x c k ) and P(c k ) have been modeled Now P(c k x), A posterior probability, can be calculated The label is assigned as the class that generates the Maximum A Posterior (MAP) probability for the input vector P(x c k )P(c k ) c MAP argmax P(c k x) = argmax c k C c k C P(x) c MAP argmax P(x c k )P(c k ) c k C
68 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features
69 Discrete feature Values Modeling conditional class probabilities Bayes Theorem Discrete Features Each x is made up of an ordered set of feature values: x = {a 1, a 2,..., a i ) where i = number of attributes. Sample problem: Aunt Lacee s Library x = { book, Origin of Species, , biology, mint,...} Each attribute has as set of allowed values a 1 {book, paperback, parchment, comic}. a 3 {<1200, , , , , 1960-current}
70 Naïve Bayes assumption Modeling conditional class probabilities Bayes Theorem Discrete Features Assume that the attributes are conditionally independent. P(x c k ) = P(a 1, a 2,..., a i c k ) = i P(a i c k ) where any given P(a i c k ) = number of instances in training set with same a i value and target value c k divided by number of instances with target c k. P(c k ) is the number of instances with target = c k divided by total number of instances. Final label is determined by naïve Bayes c NB = argmax c k {c 1,c 2 } P(c k ) i P(a i c k )
71 Review Modeling conditional class probabilities Bayes Theorem Discrete Features Discriminant functions Least Squares Probabilistic Logistic Regression - With maximum likelihood error approximation - With Newton-Raphson approach to error approximation Probabilistic Generative Functions Gaussian class conditional probabilites Discrete Attribute values with the Naïve Bayes Classifier
Linear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationSlides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP and: Computer vision: models, learning and inference. 2011 Simon J.D. Prince ClassificaLon Example: Gender ClassificaLon
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationProbabilistic generative models
Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationLinear Models for Classification
Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationPartially Directed Graphs and Conditional Random Fields. Sargur Srihari
Partially Directed Graphs and Conditional Random Fields Sargur srihari@cedar.buffalo.edu 1 Topics Conditional Random Fields Gibbs distribution and CRF Directed and Undirected Independencies View as combination
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationLecture 15: Logistic Regression
Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationOverview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met
c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More information