Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
|
|
- Nicholas Baker
- 6 years ago
- Views:
Transcription
1 Last updated: Oct 22, 2012 LINEAR CLASSIFIERS
2 Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class.
3 Classification: Problem Statement 3 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input variable x may still be continuous, but the target variable is discrete. In the simplest case, t can have only 2 values. e.g., Let t = +1 x assigned to C 1 t = 1 x assigned to C 2
4 Example Problem 4 Animal or Vegetable?
5 Discriminative Classifiers 5 If the conditional distributions are normal, the best thing to do is to estimate the parameters of these distributions and use Bayesian decision theory to classify input vectors. Decision boundaries are generally quadratic. However if the conditional distributions are not exactly normal, this generative approach will yield sub-optimal results. Another alternative is to build a discriminative classifier, that focuses on modeling the decision boundary directly, rather than the conditional distributions themselves.
6 Linear Models for Classification 6 Linear models for classification separate input vectors into classes using linear (hyperplane) decision boundaries. Example: 2D Input vector x Two discrete classes C 1 and C x x 1
7 Two Class Discriminant Function 7 y>0 y =0 y<0 x 2 R 2 R 1 y(x) = w t x +w 0 y(x) 0 x assigned to C 1 w x y(x) w y(x) < 0 x assigned to C 2 Thus y(x) = 0 defines the decision boundary x w 0 w x 1
8 Two-Class Discriminant Function 8 y(x) = w t x +w 0 y(x) 0 x assigned to C 1 y(x) < 0 x assigned to C 2 y>0 y =0 y<0 x 2 R 2 R 1 For convenience, let w = w 1 w M t w 0 w 1 w M and t w x x y(x) w x = x 1 x M t 1 x1 x M t w 0 w x 1 So we can express y(x) = w t x
9 Generalized Linear Models 9 For classification problems, we want y to be a predictor of t. In other words, we wish to map the input vector into one of a number of discrete classes, or to posterior probabilities that lie between 0 and 1. For this purpose, it is useful to elaborate the linear model by introducing a nonlinear activation function f, which typically will constrain y to lie between -1 and 1 or between 0 and 1. y(x) = f ( w t x + w ) 0
10 The Perceptron 10 y(x) = f ( w t x + w ) 0 y(x) 0 x assigned to C 1 y(x) < 0 x assigned to C 2 A classifier based upon this simple generalized linear model is called a (single layer) perceptron. It can also be identified with an abstracted model of a neuron called the McCulloch Pitts model.
11 End of Lecture Oct 15, 2012
12 Parameter Learning 12 How do we learn the parameters of a perceptron?
13 Outline 13 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers
14 Case 1. Linearly Separable Inputs 14 For starters, let s assume that the training data is in fact perfectly linearly separable. In other words, there exists at least one hyperplane (one set of weights) that yields 0 classification error. We seek an algorithm that can automatically find such a hyperplane.
15 The Perceptron Algorithm 15 The perceptron algorithm was invented by Frank Rosenblatt (1962). The algorithm is iterative. The strategy is to start with a random guess at the weights w, and to then iteratively change the weights to move the hyperplane in a direction that lowers the classification error. Frank Rosenblatt ( )
16 The Perceptron Algorithm 16 Note that as we change the weights continuously, the classification error changes in discontinuous, piecewise constant fashion. Thus we cannot use the classification error per se as our objective function to minimize. What would be a better objective function?
17 The Perceptron Criterion 17 Note that we seek w such that w t x 0 when t = +1 w t x < 0 when t = 1 In other words, we would like w t x n t n 0 n Thus we seek to minimize E P ( w) = w t x n t n n M where M is the set of misclassified inputs.
18 The Perceptron Criterion 18 E P ( w) = w t x n t n n M where M is the set of misclassified inputs. Observations: E P (w) is always non-negative. E P (w) is continuous and piecewise linear, and thus easier to minimize. E P ( w) w i
19 The Perceptron Algorithm 19 E P ( w) = w t x n t n n M where M is the set of misclassified inputs. ( ) de P w dw = x t n n n M where the derivative exists. Gradient descent: E P ( w) w τ +1 = w τ η E P ( w) = w τ + η x n t n n M w i
20 The Perceptron Algorithm 20 w τ +1 = w τ η E P ( w) = w t + η x n t n n M Why does this make sense? If an input from C 1 (t = +1) is misclassified, we need to make its projection on w more positive. If an input from C 2 (t = -1) is misclassified, we need to make its projection on w more negative.
21 The Perceptron Algorithm 21 The algorithm can be implemented sequentially: Repeat until convergence: n For each input (x n, t n ): n If it is correctly classified, do nothing n If it is misclassified, update the weight vector to be w τ +1 = w τ + ηx n t n n Note that this will lower the contribution of input n to the objective function: ( ) t (τ +1) x n t n ( w ) t x n t n = ( w ) (τ ) t x n t n η x n t n w (τ ) ( ) t x n t n < w (τ ) ( ) t x n t n.
22 Not Monotonic 22 While updating with respect to a misclassified input n will lower the error for that input, the error for other misclassified inputs may increase. Also, new inputs that had been classified correctly may now be misclassified. The result is that the perceptron algorithm is not guaranteed to reduce the total error monotonically at each stage.
23 The Perceptron Convergence Theorem 23 Despite this non-monotonicity, if in fact the data are linearly separable, then the algorithm is guaranteed to find an exact solution in a finite number of steps (Rosenblatt, 1962).
24 Example
25 The First Learning Machine 25 Mark 1 Perceptron Hardware (c. 1960) Visual Inputs Patch board allowing configuration of inputsφ Rack of adaptive weights w (motor-driven potentiometers)
26 Practical Limitations 26 The Perceptron Convergence Theorem is an important result. However, there are practical limitations: Convergence may be slow If the data are not separable, the algorithm will not converge. We will only know that the data are separable once the algorithm converges. The solution is in general not unique, and will depend upon initialization, scheduling of input vectors, and the learning rate η.
27 Generalization to inputs that are not linearly separable. 27 The single-layer perceptron can be generalized to yield good linear solutions to problems that are not linearly separable. Example: The Pocket Algorithm (Gal 1990) Idea: n Run the perceptron algorithm n Keep track of the weight vector w* that has produced the best classification error achieved so far. n It can be shown that w* will converge to an optimal solution with probability 1.
28 Generalization to Multiclass Problems 28 How can we use perceptrons, or linear classifiers in general, to classify inputs when there are K > 2 classes?
29 K>2 Classes 29 Idea #1: Just use K-1 discriminant functions, each of which separates one class C k from the rest. (Oneversus-the-rest classifier.) Problem: Ambiguous regions? R 1 R 2 C 1 R 3 not C 1 C 2 not C 2
30 K>2 Classes 30 Idea #2: Use K(K-1)/2 discriminant functions, each of which separates two classes C j, C k from each other. (One-versus-one classifier.) Each point classified by majority vote. Problem: Ambiguous regions C 1 C 3 R 1 C 1? R 3 C 2 R 2 C 3 C 2
31 K>2 Classes 31 Idea #3: Use K discriminant functions y k (x) Use the magnitude of y k (x), not just the sign. y k (x) = w k t x x assigned to C k if y k (x) > y j (x) j k Decision boundary y k (x) = y j (x) ( w k w j ) t x + ( w k 0 w j 0 ) = 0 Results in decision regions that are simply-connected and convex. R i R j R k x B x A x
32 Example: Kesler s Construction 32 The perceptron algorithm can be generalized to K- class classification problems. Example: Kesler s Construction: n Allows use of the perceptron algorithm to simultaneously learn K separate weight vectors w i. n Inputs are then classified in Class i if and only if w t i x > w t j x j i n The algorithm will converge to an optimal solution if a solution exists, i.e., if all training vectors can be correctly classified according to this rule.
33 1-of-K Coding Scheme 33 When there are K>2 classes, target variables can be coded using the 1-of-K coding scheme: Input from Class C i t = [ ] t Element i
34 Computational Limitations of Perceptrons 34 Initially, the perceptron was thought to be a potentially powerful learning machine that could model human neural processing. However, Minsky & Papert (1969) showed that the singlelayer perceptron could not learn a simple XOR function. This is just one example of a non-linearly separable pattern that cannot be learned by a single-layer perceptron. x 2 x 1 Marvin Minsky ( )
35 Multi-Layer Perceptrons 35 Minsky & Papert s book was widely misinterpreted as showing that artificial neural networks were inherently limited. This contributed to a decline in the reputation of neural network research through the 70s and 80s. However, their findings apply only to single-layer perceptrons. Multilayer perceptrons are capable of learning highly nonlinear functions, and are used in many practical applications.
36 Outline 36 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers
37 Dealing with Non-Linearly Separable Inputs 37 The perceptron algorithm fails when the training data are not perfectly linearly separable. Let s now turn to methods for learning the parameter vector w of a perceptron (linear classifier) even when the training data are not linearly separable.
38 The Least Squares Method 38 In the least squares method, we simply fit the (x, t) observations with a hyperplane y(x). Note that this is kind of a weird idea, since the t values are binary (when K=2), e.g., 0 or 1. However it can work pretty well.
39 Least Squares: Learning the Parameters 39 Assume D dimensional input vectors x. For each class k 1 K : y k (x) = w k t x + w k 0 y(x) = W t x where x = (1,x t ) t ( ) t W is a (D + 1) K matrix whose kth column is w k = w 0,w k t
40 Learning the Parameters 40 Method #2: Least Squares y(x) = W t x Training dataset ( x n,t n ), n = 1,,N where we use the 1-of-K coding scheme for t n Let T be the N K matrix whose n th row is t n t Let X be the N (D + 1) matrix whose n th row is x n t Let R D ( W ) = X W T Then we define the error as E D ( W ) = 1 2 i,j R ij 2 { ( )} = 1 2 Tr R W D ( ) t R D W Setting derivative wrt W to 0 yields: W = ( X t X ) 1 X t T = X T Recall: Tr ( AB) = B t. A
41 Outline 41 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers
42 Fisher s Linear Discriminant 42 Another way to view linear discriminants: find the 1D subspace that maximizes the separation between the two classes. Let m 1 = 1 x N n, m 2 = 1 x 1 N n 2 n C 1 n C 2 For example, might choose w to maximize w t ( m 2 m 1 ), subject to w = 1 This leads to w m 2 m However, if conditional distributions are not isotropic, this is typically not optimal
43 Fisher s Linear Discriminant 43 Let m 1 = w t m 1, m 2 = w t m 2 be the conditional means on the 1D subspace. Let s k 2 = ( y n m k ) 2 be the within-class variance on the subspace for class C k n C k ( The Fisher criterion is then J(w) = m m 2 1) 2 s s 2 This can be rewritten as J(w) = w t S B w w t S W w where S B = m 2 m 1 and ( )( m 2 m 1 ) t is the between-class variance S W = x n m 1 + x n m 2 is the within-class variance n C 1 ( )( x n m 1 ) t n C 2 J(w) is maximized for w S 1 W ( m 2 m 1 ) ( )( x n m 2 ) t
44 Connection to MVN Maximum Likelihood 44 J(w) is maximized for w S 1 W ( m 2 m 1 ) Recall that if the two distributions are normal with the same covarianceσ, the maximum likelihood classifier is linear, with w Σ 1 ( m 2 m ) 1 Further, note that S W is proportional to the maximum likelihood estimator forσ. Thus FLD is equivalent to assuming MVN distributions with common covariance.
45 45 Connection to Least-Squares Change coding scheme used in least-squares method to t n = N N 1 for C 1 t n = N N 2 for C 2 Then one can show that the ML w satisfies ( ) w S W 1 m 2 m 1
46 End of Lecture October 17, 2012
47 Problems with Least Squares 47 Problem #1: Sensitivity to outliers
48 Problems with Least Squares 48 Problem #2: Linear activation function is not a good fit to binary data. This can lead to problems
49 Outline 49 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers
50 Logistic Regression (K = 2) 50 ( ) p( C 1 φ) = y ( φ) = σ w t φ p( C 2 φ) = 1 p( C 1 φ) p ( C 1 φ) = y ( φ) = σ ( w t φ) where σ (a) = 1 1+ exp( a) w 1 w 2 φ 2 w t φ x φ 1
51 Logistic Regression 51 ( ) = y ( φ) = σ w t φ ( ) = 1 p ( C 1 φ) p C 1 φ p C 2 φ where σ (a) = 1 1+ exp( a) ( ) Number of parameters Logistic regression: M Gaussian model: 2M + 2M(M+1)/2 + 1 = M 2 +3M+1
52 ML for Logistic Regression 52 ( ) t = y n { n 1 y } 1 t n n p t w N where t = t 1,,t N n=1 ( ) t and y n = p ( C 1 φ ) n We define the error function to be E(w) = log p ( t w) Given y n = σ ( a ) n and a n = w t φ n, one can show that E(w) = N ( y n t n )φ n n=1 Unfortunately, there is no closed form solution for w.
53 ML for Logistic Regression: 53 Iterative Reweighted Least Squares Although there is no closed form solution for the ML estimate of w, fortunately, the error function is convex. Thus an appropriate iterative method is guaranteed to find the exact solution. A good method is to use a local quadratic approximation to the log likelihood function (Newton- Raphson update): w (new) = w (old ) H 1 E(w) where H is the Hessian matrix of E(w)
54 The Hessian Matrix H 54 H = w w E(w), i.e., H ij = E(w) w i w j. H ij describes how the i th component of the gradient varies as we move in the w j direction. Let u be any unit vector. Then Hu describes the variation in the gradient as we move in the direction u. u t Hu describes the projection of this variation onto u. Thus u t Hu measures how much the gradient is changing in the u direction as we move in the u direction.
55 ML for Logistic Regression 55 w (new) = w (old ) H 1 E(w) where H is the Hessian matrix of E(w) : H = Φ t RΦ where R is the N N diagonal weight matrix with R nn = y n ( 1 y n ) and Φ is the N M design matrix whose n th row is given by φ n t. (Note that, since R nn 0, R is positive semi-definite, and hence H is positive semi-definite Thus E(w) is convex.) Thus ( ) w new = w (old ) ( Φ t RΦ) 1 Φ t y t See Problem 8.3 in the text!
56 ML for Logistic Regression 56 Iterative Reweighted Least Squares ( ) = y ( φ) = σ w t φ p C 1 φ ( ) w 1 w 2 w t φ
57 Logistic Regression 57 For K>2, we can generalize the activation function by modeling the posterior probabilities as p( C k φ) = y k ( φ) = exp a k ( ) ( ) exp a j where the activations a k are given by a k = w k t φ j
58 Example Least-Squares Logistic
59 Outline 59 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers
LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More information3.4 Linear Least-Squares Filter
X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLinear Models for Classification
Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationMachine Learning: The Perceptron. Lecture 06
Machine Learning: he Perceptron Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu 1 McCulloch-Pitts Neuron Function 0 1 w 0 activation / output function 1 w 1 w w
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More information17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk
94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationVote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9
Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationIntroduction To Artificial Neural Networks
Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationCOMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 2. Perceptrons COMP9444 17s2 Perceptrons 1 Outline Neurons Biological and Artificial Perceptron Learning Linear Separability Multi-Layer Networks COMP9444 17s2
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationPlan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation
Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More information