3. Linear discrimination and single-layer neural networks

Size: px
Start display at page:

Download "3. Linear discrimination and single-layer neural networks"

Transcription

1 History: Sept 6, 014: created Oct 1, 014: corrected typos, made fig 3. visible 3. Linear discrimination and single-layer neural networs In this section we will treat a special case of two-class classification, namely, linear discrimination. Together with the maths we will introduce a particular conceptual / graphical notation, namely, cast the classification algorithm as a neural networ. Linear discrimination is the basis for more advanced techniques that we will treat later, but already by itself it can be applied in clever ways that mae it quite powerful. I follow closely chapter 3 of Bishop's boo Linear discrimination: two classes Recall. Toward the end of Section we introduced discriminant functions as monotonically increasing functions f (3.1) y i (e) = f(p(e C i ) P(C i )) of the class-conditional probability times the prior. For the case of binary discrimination we mentioned that one can introduce a single discrimination function (3.) y(e) = y 1 (e) y (e) and decide that e falls into class 1 whenever y(e) > 0. We remared that it is sometimes easier to learn discriminant functions directly from the training data, than first estimate the class-conditional distributions and then construct the discriminant function from those distributions in the second place. This is the approach we will tae in this section: we ignore the connection of discriminant functions with distributions and start directly from a given functional form of the two-class discriminant function (3.), namely, linear discriminants of the form (3.3) y(x) = w T x + w 0, where w is a weight (column) vector and w 0 is a bias. The vector x here is either the raw input data example (if it comes in the form of a real-valued vector, as for instance the 40-dimensional pixel value vectors for our digit images), or some suitably transformed version of the raw data example (for instance, a one-dimensional feature value or a feature vector). For the case of two-dimensional inputs x = (x 1, x ), linear discriminants can be visualized as in Fig See figure caption for a geometrical interpretation. 1

2 In a neural networ interpretation, a linear discriminant corresponds to a networ with M + 1 input neurons, where M is the dimension of the inputs x, and a single output neuron, where the output y(x) of the discriminant is read from. The first input neuron x 0 receives constant input 1, the remaining input neurons receive input x = (x 1,..., x M ). The total input is x = (1, x1,..., x M ) =: (x 0, x 1,..., x M ). The networ weights are (w 0, w 1,..., w M ) = (w 0, w T ) = w T. The output in this networ is computed in the output neuron by summing up the inputs, weighted by the weights, which gives (3.4) y(x) = w T x = w 0 + w T x. y Fig. 3.1 Geometrical interpretation of two-class linear discriminant y(x) = w T x + w 0 for two-dimensional inputs x. A hyperplane H defined by y(x) = 0 separates the feature space into two decision regions 1 and. The hyperplane has orientation perpendicular to w and distance w 0 / 7 w 7 to the origin. (Figure after the boo from Bishop). The notation y(x) = w T x is often more convenient than (3.3). The networ representation of the discriminant is shown in Fig. 3.. Figure 3.: A networ representation of a two-class linear discriminant.

3 3. Linear discrimination: several classes The two-class case can be extended to K classes by introducing linear discriminant functions y for each class: (3.5) y (x) = w T x + w 0, assigning an input pattern x to class if y (x) > y j (x) for all j. Because y (x) > y j (x) if y (x) y j (x) > 0, the decision boundary between classes and j are given by (3.6) y (x) y j (x) = (w w j ) T x + (w 0 wj 0 ) = 0. The networ representation of (3.6) is setched in Fig. 3.3: y 1 (x)... outputs... y n (x) x 0 = 1 x 1... x M Figure 3.3: Representation of multiple linear discriminant functions. As before in (3.4), we cover the bias by an additional constant input x 0 of unit size and thus may re-write (3.6) as (3.7) y (x) = w T x, where x = (1, x 1,..., x M ) =: (x 0, x 1,..., x M 0 1 M ) and (w, w,..., w ) = w T. The decision regions are now regions in R M+1. They have linear hyperplanes as boundaries, as can be seen from (3.6). Furthermore, the decision regions are connected and convex. To see this, consider two points x A and x B, which both lie in region. Any point x that lies on a line between x A and x B can be written as x = α x A + (1 α) x B for some 0 α 1. From the linearity of the discriminant functions it follows that y ( x ) > y j ( x ) for all j. Therefore all x between x A and x B are in class, too. This is schematically shown in Fig

4 x A i xˆ j x B Figure 3.4 Convexity and connectedness of linear decision regions. 3.3 Learning the weights of a linear discriminant ext we embar on the question how to learn the weights w. A very popular (I dare say by far the most popular) method to learn these weights is to re-interpret the classification tas as a regression tas and then use linear regression to obtain the w. In more detail, this wors as follows. 1. A perfectly woring linear discriminant would yield y (x) = w T x = 1 and y j (x) = w jt x = 0 for j, if the input x belongs to class. Regarded as training data for a regression, the training sample would be a collection (3.8) (, y i ) i = 1,,, where y i = (y 1 i,..., y K i ) T is a K-dimensional indicator vector that is zero everywhere except at position, when the input belongs to class. Seen in this way, the linear discriminant is just a function f: R M R K, f(x) = w T x.. As we described in Section.6, in order to obtain "optimal" weights we need to fix a loss function which defines, in the first place, what "optimal" means. We opt for the most convenient loss function that exists, the square error: (3.9) SE train = ( w x i ), where y i () is the -th entry of the K-dimensional indicator vector y i. The sought optimal weights are then given by (3.10) w ˆ = argmin ( w x i ). w 3. In order to compute the solution of the minimization tas (3.10), we first write out the inner product in (3.10) as a sum (3.11) ( w x i ) = ( w j j ) M 4

5 and then observe that when (3.11) attains its minimum in the M+1-dimensional space of the w 0, w 1,..., w M, all the following partial derivatives must be zero: (3.1) M ( w j j α ) = 0, w where α = 0,, M. This gives M+1 linear equations (3.13) 0 = M ( w j j α ) = w M w ( w j x j α i ) M M M = ( w j j ) w j α w x j = ( w j j i = ( w T x i )x α i = w T α x i y i α ) α 0 1 M for M +1 unnowns w, w,..., w. Writing all training inputs x T i as rows into a (M+1) sized data collection matrix X, all training target outputs y i into a 1 sized data collection vector y, by joining the M+1 equations (3.13) into a single matrix equation we obtain 0 T = w X T X y T X, or equivalently (3.14) w R = y T X, where R = X T X is the (M+1) (M+1) sized correlation matrix of the training inputs x i. By a final assembly step, we join the K equations (3.14) into a single matrix equation (3.15) WR = Y T X, where W is the K (M+1) sized networ weight matrix which contains w in its -th row, and Y is the K sized data collection made from joining all target vectors y i in its rows. Solving (3.15) in a naively straightforward attempt would lead to (3.16) W = Y T X R 1, which however would often not wor in practice because the correlation matrix R may be singular (then the inverse does not exist), or it might be close to singular ("ill-conditioned", then computing the inverse is numerically instable). Especially the latter situation is the rule, not the exception, for most real-life training data X. One remedies this situation by adding a scaled version of the (M+1)-dimensional identity matrix I to R before inverting it: (3.17) W = Y T X (R + c I) 1, which renders the calculation well-defined and numerically stable. 5

6 There is a second, illuminating interpretation of the helper term c I. In Section.8 I mentioned that a common way of regularizing a loss minimization problem is to add a weighted penalty term to the loss function, that is, estimate model parameters by ˆ θ = argmin θ L(D, θ) + c Ω(θ). I furthermore said that one of the most popular regularizers is Ω(θ) = α α θ that is, the sum of squared model parameters. In our regression case, the model parameters are the networ weights W. Thus the regularized version of our optimization problem (3.10) would read (3.18) w ˆ = argmin w ( w x i ) + c w It can be shown (homewor exercise) that (3.17) is the solution to the regularized optimization problem (3.18). Because adding a multiple c I to R loos geometrically lie putting a diagonal "ridge" on R, (3.17) is also widely nown as ridge regression. The more educated terminology is to call it Tychonov regularization after the mathematician who explored this method (in more depth than was indicated here). I conclude this part by a general, abstract formulation of linear regression tass. To state it conveniently, I use the notion of the Frobenius norm of a matrix M: (3.19) M FRO := m ij i, j 1/. Hence, M FRO is the sum of all squared entries of M. The linear regression tas in its general format is to compute w. (3.0) ˆ W = argmin W WX T Y T FRO, where X is a data collection matrix containing data input vectors in its rows and Y contains target output vectors y i in its rows. This problem is usually solved in its Tychonov-regularized version, which gives us the following tae-home message: The Tychonov regularized solution to a linear regression problem W ˆ = argmin WX T Y T ( + c W W FRO FRO ) is ˆ W = Y T X (R + c I) 1. 6

7 The practical usefulness and wide applicability range of Tychonov-regularized linear regression can hardly be over-estimated. 3.4 Generalized linear discriminants At the end of Section 3. we have seen that the decision boundaries obtained through linear discriminants are linear hyperplanes, and the resulting decision regions are convex and connected. This may appear to severely limit the usefulness of linear discriminants, because it seems obvious that in many if not almost all real-life problems the class boundaries should have a "nonlinear" geometry. Consider for example the case of two classes C 1 and C, where inputs x are two-dimensional vectors and x falls into class C 1 iff x lies within the unit circle: Figure 3.5 Two linearly inseparable classes C 1 (red dots) and C (blue crosses) There is no way to separate C 1 from C by a linear discriminant y(x) = w T x which would classify x as belonging to class C 1 whenever y(x) < 0. However, if instead of the original -dimensional input vectors x = (x 1, x ) we would use the onedimensional feature φ(x) = x T x (the squared norm of x) as input, the linear discriminant y(φ(x)) = ( 1, 1) (1, φ(x)) T would clearly do the job. This motivates the introduction of generlized linear discriminants. A generalized l.d. is a two-stage process where the raw M-dimensional inputs x are first passed through a ban of L feature "filters" φ 1,..., φ L, transforming x to an L-dimensional feature vector (φ 1 (x),..., φ L (x)) T, which is then used instead of the raw input x to feed a linear discriminant. The general form of such networs is the following variant of (3.7): (3.1) y (x) = w (1, φ 1 (x),..., φ L (x)), where = 1,..., K is again the index for the classes to be discriminated (= number of output neurons). 7

8 A popular type of such networs is radial basis function networs (RBF networs). If dim(x) = d, each filter φ j is a symmetric ("radial"), typically unimodal function on R d with center µ j. Gaussian density functions are a typical choice. For Gaussians, the output φ j (x) of filter φ j (j > 0) is (3.) φ j (x) = exp x µ j σ j. ote that (i) we do not normalize the Gaussian function here to integral 1, and (ii) although we have an d-dimensional Gaussian, we do not have to care about a covariance matrix Σ because we restrict ourselves to radially symmetric Gaussians. The filter φ j (x) returns large values (close to 1) if the input vector x lies close (in metric distance) to the Gaussian's center µ j, and decreases (with a rate given by σ j ) when the distance grows larger. RBF networs offer the possibility to place many fine-grained filters φ j into regions of the input space X where we need a fine-tuned discrimination, and to be more generous in "less interesting" regions where we plant only a few broad filters. Figure 3.6 shows an example where the raw data points x are one-dimensional and where we want a high discriminiation precision around the origin and around 1. φ1... φl 0 1 X Figure 3.6. Radial basis functions example. Two bacground notes: Remar 1: The performance of RBF networs obviously depends on the proper sizing and placement of the basis functions φ j. These are often optimized by unsupervised training schemes in a data-driven way. In Section 4 we will introduce such an algorithm that is often used with RBF networs. Remar : Any desired input-output mapping f from the original input data space X to the output unit y can be achieved with perfect precision with networs of the ind specified by (3.1). This is trivially clear because you may just use L = K and φ = f and w j = δ j all the wor is done by the filters φ. However, more interesting results state that any desired input-output mapping f can be approximated arbitrarily well with radial basis functions of a given simple class, for instance Gaussians. The art of designing RBF networs is to achieve good performance with as few as possible basis 8

9 filters because the fewer filters you have, the fewer training data points you need for estimating the networ weights (another instance of the bias-variance dilemma!). A cautionary remar. The least mean square solution for learning networ weights from data is easy to compute and does not require much thining about the classconditional distributions of the input features. That's good. However, it is not trivial to find good preprocessing filters φ j if you want to tacle nonlinear classification problems (you will need unsupervised learning techniques to optimize them), nor even if you have found good φ j is the least mean square approach necessarily the best you can do for training classificators (because it tends to over-represent extreme or even outlier inputs; you may land far from the optimal weights that would be yielded by a probabilistic approach where you first estimate the posterior class distributions). So there is ample room for improvement. This all said, in practice a linear discriminant trained by minimizing square error often is a quite accurate and certainly a simple way to learn a classificator. 3.5 Perceptrons Historically, the first "neural networs" (not called lie that then) for classification tass were introduced by the psychologist and neurobiologist Fran Rosenblatt in the late 50ies and early 60ies and named Perceptrons. Today we would call them linear discrimination networs. Perceptrons were biologically inspired in a context of visual pattern classification from pixel images. Another characteristic of perceptrons is that they come with a particular type of feature extraction, that is, their input neurons correspond to a particular ind of (linear) features extracted from pixel images, and the values of the output neurons (which we called y) were passed through a binary threshold function f to yield binary classification outputs. Figure 3.7. (redrawn from Bishop's boo) shows the setup of a perceptron. There exists a learning rule for perceptrons that incrementally adapts networ weights for maximal discrimination rates; this rule can be proven to converge. Figure 3.7: The perceptron's input neurons φ j are patched to the input pattern by random linear lins (which maes the φ j linear and the total behavior of the perceptron linear too). They typically compute their outputs by a threshold function from the sum of the signals received through these lins. Input neuron outputs are weighted, summed, and passed through another threshold function f whose output indicates whether the pattern belongs to class 1 or class (binary classification). 9

10 Rosenblatt first implemented his perceptrons on one of the early digital computers, the IBM Mar 1. ot satisfied with the performance, he proceeded to implement perceptrons in analog hardware, with the adaptive networ weights realized by potentiometers driven by electrical motors. Figure 3.8 shows some impressions of this physical neural networ. (a) (b) (c) Fig. 3.8: Rosenblatts analog-electronic realization of the perceptron. (a): Pattern input: brightly lit B/W patterns recorded by an array of 0 x 0 photocells. (b): "neuronal wiring". (c): Adaptive weights realized by motor-driven potentiometers (all images from Bishop 006 1, Chapter 4.1) Perceptrons led to an early hype (several others to follow) in how artificial intelligence was perceived in the general public. Here is a sniplet from the ew Yor Times ("ew avy Device Learns by Doing", YT July 8, 1958), after a press conference held by Rosenblatt at the US Office of aval Research on July 7, 1958 (cited after Olazaran 1996 ): Perceptron research suffered a sudden and dramatic death when Marvin Minsy and Seymour Papert, pioneers of (symbolic) AI, published a boo Perceptrons in 1969 where they put their finger on what you have learnt earlier in this lecture: the perceptron as it were could only distinguish between pattern classes that were linearly separable. In particular, the XOR function which maps binary input pairs to their XOR cannot be learnt by a perceptron. This (obvious, by today's standards) insight shattered neural networ research and sent it into a sleep from which it woe up only a little less than 0 years later, when multilayer perceptrons (MLPs) which can learn the XOR tas became trainable by the bacpropagation algorithm. We will learn about MLPs later in this course. Today certain versions of MLPs are the most powerful ML methods that exist for a number of highly relevant applications: speech recognition, automated text translation, image classification, handwriting recognition. Except for the reproduction capability and consciousness predicted in the YT 1 Bishop, C. M. (006), Pattern Recognition and Machine Learning. Springer Verlag Olazaran, M. (1996), A sociological study of the official history of the Perceptrons Controversy. Social Studies of Science 6(3),

11 article, the other capabilities have come close to be mastered by artificial neural networs. Perceptrons with their original learning rule are still sometimes used due to their simplicity and their ancient fame, though hardly by ML professionals. 11

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Perceptron. (c) Marcin Sydow. Summary. Perceptron Topics covered by this lecture: Neuron and its properties Mathematical model of neuron: as a classier ' Learning Rule (Delta Rule) Neuron Human neural system has been a natural source of inspiration for

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers. Michael Collins. January 18, 2012 Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Introduction To Artificial Neural Networks

Introduction To Artificial Neural Networks Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

The Perceptron. Volker Tresp Summer 2018

The Perceptron. Volker Tresp Summer 2018 The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Learning Linear Detectors

Learning Linear Detectors Learning Linear Detectors Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Detection versus Classification Bayes Classifiers Linear Classifiers Examples of Detection 3 Learning: Detection

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG CS4442/9542b Artificial Intelligence II prof. Olga Vesler Lecture 5 Machine Learning Neural Networs Many presentation Ideas are due to Andrew NG Outline Motivation Non linear discriminant functions Introduction

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Neural Networks Lecture 2:Single Layer Classifiers

Neural Networks Lecture 2:Single Layer Classifiers Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Computational Intelligence

Computational Intelligence Plan for Today Computational Intelligence Winter Term 207/8 Organization (Lectures / Tutorials) Overview CI Introduction to ANN McCulloch Pitts Neuron (MCP) Minsky / Papert Perceptron (MPP) Prof. Dr. Günter

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information