Linear Models for Classification

Size: px
Start display at page:

Download "Linear Models for Classification"

Transcription

1 Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning

2 Congradulations!!!! You have just inherited an old house from your great grand aunt on your mothers side twice removed by marriage (and once by divorce). There is an amazing collection of books in the old library (and in almost every other room in the house) containing every thing from old leather bounded tomes and crinkly old parchments to the newer dust jacket bound best sellers and academic text books along with a sizable collection of paper back pulp fiction and DC comic books. Yep, old Aunt Lacee was a collector and you have to clean out the old house before you can sell it.

3 Your Mission Being the over worked (and underpaid) student that you are, you have limited time to spend on this task... but because you spend your leasuire time listening to NPR ( the Book Guys ) you know that there is money to be made in old books. In other words, you need to quickly determine which books to throw out (or better still recycle), which to sell, and which to hang onto as an investment.

4 The Task From listening to The Book Guys you know thqt there are many aspects of a book that determine its present value which will help you determine if you wish to toss, sell or keep it. These aspects include: date published author topic condition genre presence of dust jacket number of volume know to be published etc... And to your advantage, you have just completed a course in machine learning so you recognize that what you have is a straight forward classification problem.

5 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

6 Outline Defining the problem Approaches in modeling 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

7 Classification Defining the problem Approaches in modeling Problem Components - A group, X, of items, x with common characteristics with specific values assigned to these characterisitcs: values can be nominal, numeric, discrete or continuous. - A set of disjoint classes, into which we wish to place each of the above items. - A function that assigns each item to one and only one of these disjoint classes. Classification: Assigning each item to one discrete class using a function devised specifically for this purpose.

8 Structure of items Defining the problem Approaches in modeling Each item can be represented as a D-dimensional vector, x = {x 1, x 2,..., x D }, where D is the number of aspects, attributes or value fields used to described the item. Aunt Lacee s Collection - Items to be classified are books, comics and parchments, each of which has a set of values attached to it (type, title, publish date, genre, conditions,...) Sample items from Aunt Lace s Collection: x = { book, Origin of Species, 1872, biology, mint,...} x = { parchment, Magna Carta, 1210, history, brittle,...}

9 Structure of Classes Defining the problem Approaches in modeling A set of K classes, C = {c 1, c 2,..., c K }, where each x can belong to only one class Input space is divided into K decision areas, each area corresponding to a class Boundries of decision areas are decision boundaries or decision surfaces In linear classification models these surfaces are linear function of x In other words, these surfaces are defined by (D 1)-dimensional hyperplanes within the D-dimensional input space.

10 Defining the problem Approaches in modeling Example: two dimensions, two classes x 2 y > 0 y = 0 y < 0 R 1 R 2 w x y(x) w x x 1 w 0 w

11 Structure of Classes Defining the problem Approaches in modeling For Aunt Lacee s book collection, K =3 c 1 = no value - books with no value which will be recycled c 2 = sell immediately - books with immediate cash value such as current text books and best sellers which will be sold quickly. c 3 = keep - these books (or parchments or comics ) have museum quality price tags and require time in order to place properly (for maximum profit). Each item of the collection will be assigned one and only one class. By their very nature, they are mutually exclusive.

12 Defining the problem Approaches in modeling Representation of a K Class Label Let t be a vector of length K, used to represent a class label. Each element t k of t is 0 except for element i when x c i For Aunt Lacee s collection, the values of t are as follows: t i = {1, 0, 0} indicates x i c 1 and should be recycled. t i = {0, 1, 0} indicates x i c 2 and should be sold. t i = {0, 0, 1} indicates x i c 3 and should be kept. A binary class is a special case, needing only a single dimension vector. t = {0} indicates x i c 0 t = {1} indicates x i c 1

13 Outline Defining the problem Approaches in modeling 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

14 Approaches to the problem Defining the problem Approaches in modeling Three approaches to finding the function for our classification problem - The simplest approach is a function which directly assigns each x to one c i C Probabilistic - Separates the inference stage from the decision stage. - In the inference stage, the conditional probability distribution, p(c k x), is modeled directly. - In decision stage, class is assigned based on these distributions.

15 Approaches to the problem Defining the problem Approaches in modeling Probabilistic Generative Functions - Both the class conditional probability distribution, p(x C k ) as well as the prior probabilities p(c k ), are modeled and used to compute posterior probabilites using bayes theorem. p(c k x) = p(x C k) p(c k ) p(x) - This model develops the probability densities of the input space such that new examples can accurately be generated.

16 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

17 Two class problem y(x) = w T x + w 0 where w is a weight vector of same dimension D as x. w 0 is the bias or threshold (-w 0 ). An input vector x is assigned to one of the two classes as follows: { c0 if y(x) < 0 x c 1 if y(x) 0 Decision boundary will be a hyperplane in D 1 dimensions

18 Matrix notation As a reminder of the convention: vectors are column matrices where w 1 w 2 w = so w T = [ ] w 1 w 2 w D. and w D w T x = [ ] w 1 w 2 w D x 1 x 2. x D = w 1x 1 +w 2 x 2 + +w D x D

19 Example: two dimensions, two classes x 2 y > 0 y = 0 y < 0 R 1 R 2 w x y(x) w x x 1 w 0 w y(x) = w T x + w 0

20 Multi-class (K > 2) K-class discriminant comprised of K functions of the form y k (x) = w T k + w k0 Assign input vector as follows x c k where k = argmax y k (x) k {1,2...} R j R i R k x A ˆx x B

21 Learning parameter w Three techniques for learning the parameter of the discriminant function, w. Least Squares Fisher s linear discriminant

22 Least Squares Once again, each class, C k, has it s own linear model : y k (x) = w T k x + w k0. As a reminder of the convention: vectors are column matrices w 1 w 2 w = so w T = [ ] w 1 w 2 w D. w D

23 Least Squares, compact notation Let W be a D + 1 K matrix whose columns represent the column vector w: w 0 w 1 w =. w D Let x be a D column matrix (1, x T ) T : x = 1 x 1. x D

24 Least Squares, compact notation The individual class discriminant functions y k (x) = w T k x + w k0 can be written y(x) = W T x

25 Least Squares, determining W W is determined by minimizing a sum-of-squares error function whose form is given as: E(w) = 1 2 N {y(x n, w) t n } 2 n=1 Let X be a n (D + 1) matrix representing a training set of n examples. Let T be a n k matrix representing the targets for the n training examples

26 Least Squares, determining W This yields the expression E D ( W) = 1 {( X 2 Tr W T) T ( X W } T) To minimize, take the derivative with respect to W and set to zero to obtain W = ( X T X) 1 XT T = X T and finally the discriminant function y(x) = W T x = T T ( X ) T x

27 Least Squares, considerations Under certain conditions, this model will have the property that the elements of y(x)will sum to 1 for any value of x. However, since they are not constraint to lay on the interval (0, 1), meaning that negative numbers and numbers larger than 1 might occur, then the elements cannot be treated as probabilities. Among other disadvantages, this approach has an inappropriate response to outliers.

28 Least Squares - Response to outliers a) Well separated b) In the presence of outliers several misclassified examples

29 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

30 Fisher s Linear Discriminant in concept 4 An approach that reduces the dimensionality of the model by projecting the input vector to a reduced dimension space. Simple example, two dimensional input vectors, projected down to one

31 Fisher s Linear Discriminant Start with a two class problem: y = w T x whose class mean vectors are given as. m 1 = 1 N 1 Choose w to maximize n C 1 x n and m 2 = 1 N 2 m 2 m 1 = w T (m 2 m 1 ) n C 2 x n

32 Fisher s Linear Discriminant Maximizing the separation of the mean for each class However, classes still overlap

33 Fisher s Linear Discriminant Add the condition of minimizing the within-class variance, which is given as s 2 k = n C k (y n m k ) 2 Fishers criterion is based on the maximization of separation of class mean with minimized within-class variance. These two conditioned are captured in the ratio between the variance of the class means and the within-class variance, given by J(w) = (m 2 m 1 ) 2 s S2 2

34 Fisher s Linear Discriminant Casting this ratio back into terms of the original frame of reference, J(w) = wt S B w w T S B w where S B = (m 2 m 1 )(m 2 m 1 ) T and S w = n C 1 (x n m 1 )(x n m 1 ) T + n C 2 (x n m 2 )(x n m 2 ) T Take derivative with respect to w and set to zero to find minimum.

35 Fisher s Linear Discriminant derivative : (w T S B w)s w w = (w T S w w)s B w Only direction of w is important w S 1 w (m 2 m 1 ) To make this a discriminant function, y 0 is chosen so that { C1 if y(x) y x 0 C 2 if y(x) < y 0

36 Fisher s Linear Discriminant Second consideration: minimize the variance within-class Two classes nicely separated in the dimensionally reduced space

37 Outline 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

38 This model takes the form y(x) = f (w T Φ(x)) - where Φ is a transformation function that creates the feature vector from the input vector. We will use the indentity transformation function to begin our discussion. - where f ( ) is given by { +1, a 0 f (a) = 1, a < 0.

39 - The binary problem There is a change in target coding: t is now a scaler, taking the values or either 1 or -1. This value is interpreted as the input vector belonging to C 1 if t = 1, else C 2 when t = 1. In considering w, we want which means we want x n C 1 w T Φ(x n ) > 0 and x n C 2 w T Φ(x n ) 0 x n X, w T Φ(x n )t n > 0

40 - weight update The perceptron error E p (w) = w T Φ n t n n M The perceptron update rule if x is misclassified w (τ+1) = w (τ) η E p (w) = w (τ) + ηφ n t n

41 - example a) Misclassified example b) w after update

42 - example a) Next misclassified example b) w after update

43 - consideration The update rule is guaranteed to reduce the error from that specific example It does not guarantee to reduce the error contribution from the other misclassified examples. Could change previously correctly classified example to misclassified. However, the perceptron convergence theorem does guarantee to find an exact solution if one exists It will find this exact solution in a finite number of steps.

44 review We have seen

45 Outline Logistic Regression 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

46 Logistic Regression A Logit - What is it when it s at home A logit is simply the natural log of the odds Odds are simply the ratio of two probabilites In a binary classification problem, the sum of the two posterior probabilities sum to 1 If p(c 1 x) is the probability that x belongs to c 1, then p(c 2 x) = 1 p(c 1 x). So the odds are odds = p(c 1 x) 1 p(c 1 x)

47 A Logit - What benefits Logistic Regression Example: if an individual is 6 foot tall, then according to census data that probability that the individual is male is 0.9. This makes the probability of being female = 0.1 The odds on being male are 0.9/0.1 = 9. However, the odds of being female are 0.1/0.9 =.11 The lack of symmetry is unappealing. Intuition would appreciate the odds on being female being the opposite of the odds on being male.

48 A Logit - linear model Logistic Regression The natural log supplies this symmetry: ln(9.0) = ln(0.1) = Now, if we assume that the logit is linear with respect to x we have ( ) P logit(p) = ln = a + Bx 1 P where a and B are parameters.

49 from logit to sigmoid Logistic Regression From Exponentiate both sides ( ) P logit(p) = ln = a + Bx 1 P P = (1 P)e (a+bx) = e (a+bx) Pe (a+bx) P + Pe (a+bx) = e (a+bx) P(1 + e (a+bx) ) = e (a+bx) P = e(a+bx) 1 + e (a+bx) = e (a+bx) where a is the probability when x is zero and B adjusts the rate that the probability changes with x.

50 The sigmoid Logistic Regression Sigmoid mean S-shaped Also called a squashing function because it maps a very large domain into the relatively small interval (0, 1)

51 The model Logistic Regression The posterior probability of C 1 can be written p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) w must be learned by adjusting its M components ( input vector has length M) weight update: w (τ+1) = w (τ) η E n where N E(w) = (y n t n )Φ n and E n = (y n t n )Φ n n=1

52 Maximum Likelihood Logistic Regression Maximum likelihood: the probability p(t w), which read the probability of the observed data set given the parameter vector w. This can be calculated by taking the product of individual probabilities of the class assigned to each x n D agreeing with t n. N p(t w) = p(c n = t n x) where t n = {0, 1} and { p(c n = t n x) = n=1 p(c 1 Φ n ) if c n = 1 1 p(c 1 Φ n ) if c n = 0

53 Maximum Likelihood Logistic Regression Since the target is either 1 or 0, this allows for a mathematically convenient expression for this product p(t w) = N (p(c 1 Φ n )) tn (1 p(c 1 Φ n )) (1 tn) n=1 From p(c 1 Φ) = y(φ) = σ(w T Φ) p(t w) = N (y(φ)) tn (1 y(φ)) (1 tn) n=1

54 Maximum Likelihood and error Logistic Regression The negative log of the maximum likelihood function is E(w) = ln p(t w) = The gradient of this is N (t n ln y n + (1 t n ) ln(1 y n )) n=1 E(w) = d ( ln p(t w)) dw

55 Maximum Likelihood and error Logistic Regression = E(w) = N n 1 = = N n=1 N n=1 d dw (t n ln y n + (1 t n ) ln(1 y n )) t n dy n y n dw + (1 t n) d(1 y n ) (1 y n ) dw ( tn Φ n y n (1 y n ) + (1 t ) n) y n (1 y n ) ( Φ ny n (1 y n )) N (t n t n y n y n + t n y n )Φ n = n=1 N (y n t n )Φ n n=1

56 Logistic regression model Logistic Regression The model based on maximum likelihood p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) weight update based on gradient of maximum likelihood: w (τ+1) = w (τ) η E n = w (τ) + η((y n t n )Φ n )

57 A new model Logistic Regression The model based on literative reweighted least squares p(c 1 Φ) = y(φ) = σ(w T 1 Φ) = 1 + e (wt Φ) weight update based on a Newton-Raphson iterative optimization scheme: w new = w old H 1 E(w) The Hessian H, is a matrix whose elements are are the second derivatives of E(w) with respect to w. This is an Numerical analysis technique which is an alternative to the first one covered. Faster convergence at the cost of more computationaly expense steps is the trade off.

58 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

59 Modeling conditional class probabilities Bayes Theorem Discrete Features Probabilistic Generative Models: The approach This approach tends to be more computationally expensive. The training data and any information on the distribution of the training data within input space is used to model the class conditional probabilities. Then using Bayes Theorem, the posterior probability is calculated. The descision of label is made by choosing the maximum posterior probability.

60 Modeling conditional class probabilities Bayes Theorem Discrete Features Modeling class conditional probabilities with prior probabilities The class conditional probability is given by p(x c k ) and is read the probability of x given the class c k. The prior probability p(c k ) which is the probability of c k independent of any other variable. The probability p(x n, c 1 ) = p(c 1 )p(x n c 1 )

61 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features Let t n = 1 c 1 and t n = 0 c 2 Let p(c 1 ) = π so p(c 2 ) = 1 π Let each class have a Gaussian class-conditional density with shared covariance matrix. { 1 1 N (x µ, Σ) = (2π) D/2 exp 1 } Σ 1/2 2 (x µ)t Σ 1 (x µ) where µ is a D-dimensional mean vector, Σ is a D D covariance matrix, and Σ is the determinant of Σ.

62 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features The conditional probabilities for each class are p(c 1 )p(x n c 1 ) = πn (x µ 1, Σ) p(c 2 )p(x n c 2 ) = (1 π)n (x µ 2, Σ) The likelihood function is given by N p(t π, µ 1, µ 2, Σ) = [πn (x n µ 1, Σ)] tn [(1 π)n (x n µ 2, Σ)] 1 tn n=1

63 Specific case of Binary label Modeling conditional class probabilities Bayes Theorem Discrete Features The error for this is the negative log of the likelihood N (t n ln π + (1 t n ) ln(1 π)) n=1 We minimize this by setting the derviative with respect to π to zero and solve for π. π = 1 N N n=1 t n = N 1 N = N 1 N 1 + N 2

64 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

65 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(c k x) = P(x c k)p(c k ) P(x) P(x) is the prior probability that x will be observed, meaning the probability of x given no knowledge about which c k is observed. It can be seen that as P(x) increases,p(c k x) decreases, indicating that the higher a probability of an incident independent of any other factor, the lower the probability of that incident dependent on another condition.

66 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(x c k ) is the class conditional probability that x will be observed once class c k is observed. Both P(x c k ) and P(c k ) have been modeled Now P(c k x), A posterior probability, can be calculated The label is assigned as the class that generates the Maximum A Posterior (MAP) probability for the input vector P(x c k )P(c k ) c MAP argmax P(c k x) = argmax c k C c k C P(x) c MAP argmax P(x c k )P(c k ) c k C

67 Review of Bayes Theorem Modeling conditional class probabilities Bayes Theorem Discrete Features P(x c k ) is the class conditional probability that x will be observed once class c k is observed. Both P(x c k ) and P(c k ) have been modeled Now P(c k x), A posterior probability, can be calculated The label is assigned as the class that generates the Maximum A Posterior (MAP) probability for the input vector P(x c k )P(c k ) c MAP argmax P(c k x) = argmax c k C c k C P(x) c MAP argmax P(x c k )P(c k ) c k C

68 Outline Modeling conditional class probabilities Bayes Theorem Discrete Features 1 Defining the problem Approaches in modeling 2 3 Logistic Regression 4 Modeling conditional class probabilities Bayes Theorem Discrete Features

69 Discrete feature Values Modeling conditional class probabilities Bayes Theorem Discrete Features Each x is made up of an ordered set of feature values: x = {a 1, a 2,..., a i ) where i = number of attributes. Sample problem: Aunt Lacee s Library x = { book, Origin of Species, , biology, mint,...} Each attribute has as set of allowed values a 1 {book, paperback, parchment, comic}. a 3 {<1200, , , , , 1960-current}

70 Naïve Bayes assumption Modeling conditional class probabilities Bayes Theorem Discrete Features Assume that the attributes are conditionally independent. P(x c k ) = P(a 1, a 2,..., a i c k ) = i P(a i c k ) where any given P(a i c k ) = number of instances in training set with same a i value and target value c k divided by number of instances with target c k. P(c k ) is the number of instances with target = c k divided by total number of instances. Final label is determined by naïve Bayes c NB = argmax c k {c 1,c 2 } P(c k ) i P(a i c k )

71 Review Modeling conditional class probabilities Bayes Theorem Discrete Features Discriminant functions Least Squares Probabilistic Logistic Regression - With maximum likelihood error approximation - With Newton-Raphson approach to error approximation Probabilistic Generative Functions Gaussian class conditional probabilites Discrete Attribute values with the Naïve Bayes Classifier

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D. Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP and: Computer vision: models, learning and inference. 2011 Simon J.D. Prince ClassificaLon Example: Gender ClassificaLon

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Probabilistic generative models

Probabilistic generative models Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari Partially Directed Graphs and Conditional Random Fields Sargur srihari@cedar.buffalo.edu 1 Topics Conditional Random Fields Gibbs distribution and CRF Directed and Undirected Independencies View as combination

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Lecture 15: Logistic Regression

Lecture 15: Logistic Regression Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Linear Decision Boundaries

Linear Decision Boundaries Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information