Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
|
|
- Berenice McDowell
- 5 years ago
- Views:
Transcription
1 Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: Michaelmas 202 Engineering Part IIB: Module 4F0
2 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Introduction Previously generative models with Gaussian & GMM class-conditional PDFs were discussed. For Gaussian PDFs with common covariance matrices, the decision boundary is linear. The linear decision boundary from the generative model was briefly compared with a discriminative model, logistic regression. An alternative to modelling the class conditional PDFs functions is to select a functional form for a discriminant function and construct a mapping from the observation to the class directly. Here we will concentrate on the construction of linear classifiers, however it is also possible to construct quadratic or other non-linear decision boundaries (this is how some types of neural networks operate). There are several methods of classifier construction for a linear discriminant function. The following schemes will be examined: Iterative solution for the weights via the perceptron algorithm which directly minimises the number of misclassifications Fisher linear discriminant: directly maximises a measure of class separability Engineering Part IIB: Module 4F0
3 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Simple Vowel Classifier Select two vowels to classify with a linear decision boundary 2400 Linear Classifier Formant Two Most of the data is correctly classified but classification errors occur realisation of vowels vary from speaker to speaker the same vowel varies for the same speaker as well! (vowel context etc) Formant One Engineering Part IIB: Module 4F0 2
4 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers x x 2 w w 2 w 0 Σ z Single Layer Perceptron y(x) Typically uses threshold activation function The d-dimensional input vector x and scalar value z are related by x d w d z is then fed to the activation function to yield y(x). The parameters of this system are weights: w = w. selects w d direction of decision boundary bias: w 0, sets position of decision boundary. z = w x+w 0 Parameters are often combined into single composite vector, w, & extended input vector, x. w = w. w d w 0 ; x = x. x d Engineering Part IIB: Module 4F0 3
5 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers We can then write z = w x The task is to train the set of model parameters w. For this example a decision boundary is placed at z = 0. The decision rule is Single Layer Perceptron (cont).5 y(x) = {, z 0, z < If the training data is linearly separable in the d-dimensional space then using an appropriate training algorithm perfect classification (on the training data at least!) can be achieved Is the solution unique? Precise solution depends on training criterion/algorithm used. Engineering Part IIB: Module 4F0 4
6 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Parameter Optimisation First a cost or loss function of weights defined, E( w). A learning process which minimises the cost function is used. One basic procedure is gradient descent:. Start with some initial estimate w[0], τ = Compute the gradient E( w) w[τ] 3. Update weight by moving a small distance in the steepest downhill direction, to give the estimate of the weights at iteration τ +, w[τ +], Set τ = τ + w[τ +] = w[τ] η E( w) w[τ] 4. Repeat steps (2) and (3) until convergence, or optimisation criterion satisfied If using gradient descent, then cost function E( w) must be differentiable (and hence continuous). This means mis-classification cannot be used as the cost function for gradient descent schemes. Gradient descent is not usually guaranteed to decrease cost function. Engineering Part IIB: Module 4F0 5
7 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Choice of η In the previous slide the learning rate term η was used in the optimisation scheme. E slow descent Step size too large - divergence Desired minima η is positive. When setting η we need to consider: if η is too small, convergence is slow; if η is too large, we may overshoot the solution and diverge. Later in the course we will examine improvements for gradient descent schemes. Some of these give automated techniques for setting η. Engineering Part IIB: Module 4F0 6
8 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Local Minima Another problem with gradient descent is local minima. At all maxima and minima E( w) = 0 gradient descent stops at all maxima/minima estimated parameters will depend on starting point Criterion Global Minimum Local Minima single solution if function convex Expectation-Maximisation is also only guaranteed to find a local maximum of the likelihood function for example. θ θ Engineering Part IIB: Module 4F0 7
9 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Perceptron Criterion The decision boundary should be constructed to minimise the number of misclassified training examples. For each training example x i w x i > 0 x i ω w x i < 0 x i ω 2 Distance from the decision boundary is given by w x, but cannot simply sum the values of w x as a cost function since the sign depends on the class. For training, we replace all the observations of class ω 2 by their negative value. Thus x = { x; belongs to ω x; belongs to ω 2 This means that for a correctly classified symbol w x > 0 and for mis-classified training example w x < 0 We will refer to this as normalising the data. Engineering Part IIB: Module 4F0 8
10 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Perceptron Solution Region Each sample x i places a constraint on possible location of a solution vector that classifies all samples correctly. w x i = 0 defines a hyperplane through the origin of the weight-space of w vectors with x i as a normal vector. For normalised data, solution vector on positive side of every such hyperplane. Solution vector, if it exists, is not unique lies anywhere within solution region Figure below (from DHS) shows 4 training points and the solution region for both normalised data and un-normalised data. solution region x 2 solution region x 2 w w separating plane x "separating" plane x Engineering Part IIB: Module 4F0 9
11 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Perceptron Criterion (cont) The perceptron criterion may be expressed as E( w) = x Y( w x) where Y is the set of mis-classified points. We now want to minimise the perceptron criterion. We can use gradient descent. It is simple to show that E( w) = x Y( x) Hence the GD update rule is w[τ +] = w[τ]+η x x Y[τ] where Y[τ] is the set of mis-classified points using w[τ]. For the case when the samples are linearly separable using a value of η = is guaranteed to converge. Engineering Part IIB: Module 4F0 0
12 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers The basic algorithm is:. Take the extended observations x for class ω 2 and invert the sign to give x. 2. Initialise the weight vector w[0], τ = Using w[τ] produce the set of mis-classified samples Y[τ]. 4. Use update rule w[τ +] = w[τ]+ x then set τ = τ +. x Y[τ] 5. Repeat steps (3) and (4) until the convergence criterion is satisfied. Engineering Part IIB: Module 4F0
13 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Class has points Class 2 has points Initial estimate: w[0] = [ ] [, 0 [ ] [ 0 0, A Simple Example ] [ ] [ ] ,, ] [ ] [ ] ,, This yields the following initial estimate of the decision boundary. 0.5 Given this initial estimate we need to train the decision boundary Engineering Part IIB: Module 4F0 2
14 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Simple Example (cont) First use current decision boundary to obtain the set of mis-classified points. For the data from Class ω Class ω z Class 2 2 The set of mis-classified points, Y[0], is and for Class ω 2 Class ω z Class 2 2 Y[0] = { ω,4 ω,2 ω2,3 ω2 } From the perceptron update rule this yields the updated vector w[] = w[0] = Engineering Part IIB: Module 4F0 3
15 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Apply decision rule again: class ω data Simple Example (cont) and for class ω 2 Class Class z z Class Class All points correctly classified the algorithm has converged..5 This yields the following decision boundary 0.5 Is this a good decision boundary? Engineering Part IIB: Module 4F0 4
16 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Fisher s Discriminant Analysis An alternative approach to training the perceptron uses Fisher s discriminant analysis. The aim is to find the projection that maximises the distance between the class means, whilst minimising the within class variance. Note only the projection w is determined. The following cost function is used E(w) = (µ µ 2 ) 2 s +s 2 where s j and µ j are the projected scatter matrix and mean for class ω j. The projected scatter matrix is defined as s j = x i ω j (x i µ j ) 2 The cost function may be expressed as where E(w) = w S b w w S w w S b = (µ µ 2 )(µ µ 2 ) S w = S +S 2 S j = x i ω j (x i µ j )(x i µ j ) the mean of class µ j is defined as usual. Engineering Part IIB: Module 4F0 5
17 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Fisher s Discriminant Analysis (cont) Differerentiating E(w) w.r.t. weights, it is minimised when (ŵ S b ŵ)s w ŵ = (ŵ S w ŵ)s b ŵ From the definition of S b S b ŵ = (µ µ 2 )(µ µ 2 ) ŵ = ((µ µ 2 ) ŵ)(µ µ 2 ) We therefore know that S w ŵ (µ µ 2 ) Multiplying both sides by S w yields ŵ S w (µ µ 2 ) This has given the direction of the decision boundary. However we still need the bias value b. If the data is separable using the Fisher s discriminant it makes sense to select the value of b that maximises the margin. Simply put this means that, given no additional information, the boundary should be equidistant from the two points either side of the decision boundary. Engineering Part IIB: Module 4F0 6
18 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Example Using the previously described data µ = [ ] ; µ 2 = [ ] ; S w = [ ] Solving yields the direction. ŵ = [ ].5 Taking the midpoint between the observations closest to the decision boundary w[] = Engineering Part IIB: Module 4F0 7
19 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Logistic Regression/Classification Interesting to contrast the perceptron algorithm with logistic regression/classification. This is a linear classifier, but a discriminative model not a discriminant function. The training criterion is L( w) = n i= [ ( ) y i log +exp( w x i ) ( )] +( y i )log +exp( w x i ) where (note change of label definition) y i = {, xi generated by class ω 0, x i generated by class ω 2 It is simple to show that (has a nice intuitive feel) L( w) = n i= ( y i ) +exp( w x i ) Engineering Part IIB: Module 4F0 8 x i
20 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Maximising the log-likelihood is equal to minimising the negative log-likelihood, E( w) = L( w). Update rule then becomes w[τ +] = w[τ]+η L( w) w[τ] Compared to the perceptron algorithm: need an appropriate method to select η; does not necessarily correctly classify training data even if linearly separable; not necessary for data to be linearly separable; yields a class posterior (use Bayes decision rule to get hypothesis). Engineering Part IIB: Module 4F0 9
21 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Kesler s Construct So far only examined binary classifiers direct use of multiple binary classifiers can result in no-decision regions (see examples paper). The multi-class problem can be converted to a 2-class problem. Consider an extended observation x which belongs to class ω. Then to be correctly classified w x w j x > 0, j = 2,...,K There are therefore K inequalities requiring that K(d + )-dimensional vector α = w. w K Engineering Part IIB: Module 4F0 20
22 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers correctly classifies all K set of K(d+)-dimensional samples γ 2 = x x 0. 0, γ 3 = x 0 x. 0,..., γ K = x 0 0. x The multi-class problem has now been transformed to a two-class problem at the expense of increasing the effective dimensionality of the data and increasing the number of training samples. We now simply optimise for α. Engineering Part IIB: Module 4F0 2
23 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Limitations of Linear Decision Boundaries Perceptrons were very popular until in 969, it was realised that it couldn t solve the XOR problem. We can use perceptrons to solve the binary logic operators AND, OR, NAND, NOR. x.5 AND x 0.5 OR x 2 Σ (a) AND operator x 2 Σ (b) OR operator x.5 NAND x 0.5 NOR x 2 Σ (c) NAND operator x 2 Σ (d) NOR operator Engineering Part IIB: Module 4F0 22
24 4F0: Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers XOR (cont) But XOR may be written in terms of AND, NAND and OR gates 0.5 OR x x 2 Σ.5 AND.5 NAND Σ Σ This yields the decision boundaries So XOR can be solved using a two-layer network. The problem is how to train multi-layer perceptrons. In the 980 s an algorithm for training such networks was proposed, error back propagation. Engineering Part IIB: Module 4F0 23
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationCS 590D Lecture Notes
CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More information3.4 Linear Least-Squares Filter
X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum
More informationLEARNING & LINEAR CLASSIFIERS
LEARNING & LINEAR CLASSIFIERS 1/26 J. Matas Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationIncorporating detractors into SVM classification
Incorporating detractors into SVM classification AGH University of Science and Technology 1 2 3 4 5 (SVM) SVM - are a set of supervised learning methods used for classification and regression SVM maximal
More information