Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
|
|
- Sheila Rodgers
- 5 years ago
- Views:
Transcription
1 Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10
2 Introduction In the next 2 lectures we ll look at Multi-Layer Perceptrons (MLPs) which are more powerful than Single-Layer models. It will be seen that MLPs can be trained as a discriminative model to yield class posteriors. MLPs are a type of Artificial Neural Network: the computation is performed using a set of (many) simple units with weighted connections between them. Also there are learning algorithms to set the values of the weights and the same basic structure (with different weight values) is able to perform many tasks. In this and the following lecture we will consider Overall structure of multi-layer perceptrons Decision boundaries that they can form Training Criteria Networks as posterior probability estimators Basic Error back-propagation training algorithm Improved training methods Engineering Part IIB: Module 4F10 1
3 Application of MLPs Application of multi-layer perceptron neural networks to vision problems Application of Multilayer Perceptron Network for Tagging Parts-of-Speech An application of the multilayer perceptron: Solar radiation maps in Spain Application of multilayer perceptron networks to laser diode nonlinearity determination for radio-over-fibre mobile communications A Neural Network Based System for Intrusion Detection and Classification of Attacks Water quality sensing using multi-layer percepton artificial neural networks Evaluation of a multilayer perceptron based model to forecast O3 and NO2 hourly levels Many applications in many areas! Recently much interest in various areas of speech recognition from Deep Neural Networks: MLPs with many hidden layers. Engineering Part IIB: Module 4F10 2
4 Multi-Layer Perceptron A multi-layer perceptron is needed to handle the XOR problem. More generally multi-layer perceptrons allow a neural network to perform arbitrary mappings. A 2-hidden layer neural network is shown. The aim is to map an input vector x into an output y(x). The layers may be described as: Inputs x 1 x 2 y (x) 1 y (x) 2 Outputs Input layer: accepts the data vector or pattern; x d First layer Second Output layer layer y (x) K Hidden layers: one or more layers. They accept the output from the previous layer, weight them, and pass through a, normally, non-linear activation function. Output layer: takes the output from the final hidden layer weights them, and possibly pass through an output non-linearity, to produce the target values. Engineering Part IIB: Module 4F10 3
5 Possible Decision Boundaries The decision boundaries that may be produced varies with network topology. Here only threshold (see the single layer perceptron) activation functions are used. There are three situations to consider Single layer: able to position a hyperplane in input space (the SLP). Two layers (one hidden layer): this is able to describe a decision boundary which surrounds a single convex region of the input space. (1) (2) (3) Three layers (two hidden layers): able to generate arbitrary decision boundaries Note: any decision boundary can be approximated arbitrarily closely by a two layer network with sigmoidal activation functions if enough hidden units. Engineering Part IIB: Module 4F10 4
6 Number of Hidden Units The number of hidden layers determines the decision boundaries that can be generated. Choosing the number of layers the following considerations are made. Multi-layer networks are harder to train than single layer networks. Two layer networks (one hidden) with sigmoidal activation functions can model any decision boundary. Two layer networks are commonly used in pattern recognition ( hidden layer with sigmoidal activation functions), although increasing interest in deep networks. How many units to have in each layer? Number of output units is often determined by number of output classes. The number of inputs is determined by the number of input dimensions The number of hidden units is a design issue. The problems are: too few, the network will not model complex decision boundaries; too many, the network will have poor generalisation. Engineering Part IIB: Module 4F10 5
7 Hidden Layer Perceptron The hidden and output layer perceptron is a generalisation of the SLP. The weighted input is passed to a general activation function, rather than a threshold. Consider a single perceptron. Assume that there are n units at the previous level (k 1). x x w i1 w i2 w i0 Σ z i Activation function i y (x ) x j = y j (x (k 1) ) The output from the perceptron, y i (x) may be written as y i (x ) = φ(z i ) = φ(w ix +w i0 ) = φ( where φ() is the activation function. x n w in n j=1 w ij x j +w i0 ) Note: activation function is not necessarily non-linear. However, if linear activation functions are used throughout much of the power of the network is lost. Engineering Part IIB: Module 4F10 6
8 Activation Functions A variety of non-linear activation functions that may be used. General form: y i (x) = φ(w ix+w i ) = φ(z i ) and there are n units, perceptrons, for the current level. Heaviside (or step) function: y i (x) = { 0, zi < 0 1, z i 0 Used in threshold units, output is binary (gives a discriminant function). Sigmoid (or logistic regression) function: y i (x) = 1 1+exp( z i ) The output is continuous, 0 y i (x) 1. Engineering Part IIB: Module 4F10 7
9 Softmax (or normalised exponential or generalised logistic) function: y i (x) = exp(z i ) n j=1 exp(z j) The output is positive and the sum of all the outputs at the current level is 1, 0 y i (x) 1. Hyperbolic tan (or tanh) function: y i (x) = exp(z i) exp( z i ) exp(z i )+exp( z i ) The output is continuous, 1 y i (x) 1. Engineering Part IIB: Module 4F10 8
10 Activation Functions (cont) Graph shows: step activation function (green) sigmoid activation function (red) tanh activation function (blue) Softmax often used at output layer as sum-to-one constraint is enforced (for e.g. probability over output classes). Sigmoid can be used as an approximation to a 0/1 threshold Linear output units can be used for regression problems. Engineering Part IIB: Module 4F10 9
11 Notation Used Consider a multi-layer perceptron with: d-dimensional input data; L hidden layers (L+1 layer including the output layer); N units in the k th level; K-dimensional output. x is the input to the k th layer x is the extended input to the k th layer x = [ 1 x ] x x x 1 2 N (k 1) 1 w i1 w i2 w (k 1) in w i0 Σ z i Activation function y i W is the weight matrix of the k th layer. By definition this is a N N (k 1) matrix. Engineering Part IIB: Module 4F10 10
12 Notation (cont) W is the weight matrix including the bias weight of[ the k th layer. ] By definition this is a N (N (k 1) +1) matrix. W = w 0 W z is the N -dimensional vector defined as z = y is the output from the k th layer, so y j = φ(z j ) W x All hidden layer activation functions assumed same φ(). Initially also assume that output activation function is also φ(). These matrix notation feed forward equations may then used for an MLP with input x and output y(x). x (1) = x x = y (k 1) z = W x y = φ(z ) y(x) = y (L+1) where 1 k L+1. The target values for the training of the networks will be denoted as t(x) for training example x. Engineering Part IIB: Module 4F10 11
13 Training Criteria A variety of training criteria may be used. Assuming we have supervised training examples {{x 1,t 1 }...,{x n,t n }} Some standard examples are: Least squares error: one of the most common training criteria. E = 1 2 n y(x p ) t p ) 2 = 1 2 p=1 n p=1 K (y i (x p ) t pi ) 2 i=1 This may be derived from considering the targets as being corrupted by zero-mean Gaussian distributed noise. Engineering Part IIB: Module 4F10 12
14 Cross-Entropy for two classes: consider the case when t is binary (and softmax output). The expression is E = n (t p log(y(x p ))+(1 t p )log(1 y(x p ))) p=1 This expression goes to zero with the perfect mapping - compare logistic regression training. Cross-Entropy for multiple classes: the above expression becomes (again softmax output) E = n p=1 K t pi log(y i (x p )) i=1 The minimum value is now non-zero, it represents the entropy of the target values. Engineering Part IIB: Module 4F10 13
15 Posterior Probabilities Consider the multi-class classification training problem with d-dimensional feature vectors: x; K-dimensional output from network: y(x) & targets t. Want output of the network, y(x), to approximate the posterior distribution of the set of K classes. So y i (x) P(ω i x) Consider training a network with: { 1 if x ωi 1-out-ofK coding, i.e. t i = 0 if x ω i The network will act as a d-dimensional to K-dimensional mapping. Can we interpret the output of the network? A couple of assumptions will be made errors from each output independent (not true of soft-max) error is a function of the magnitude difference of output and target E = K i=1 f( y i(x) t i ) Engineering Part IIB: Module 4F10 14
16 Posterior Probabilities (cont) In the limit of infinite training data, E{E} = K f( y i (x) t i )P(t x)p(x)dx i=1 t As we are using the 1-out-of-K coding P(t x) = K i=1 where δ ij = This yields K δ(t i δ ij )P(ω j x) j=1 { 1, (i = j) 0, otherwise E{E} = K i=1 (f(1 y i (x))p(ω i x)+f(y i (x))(1 P(ω i x)))p(x)dx Engineering Part IIB: Module 4F10 15
17 Differentiating and equating to zero, for least squared, cross-entropy solutions are y i (x) = P(ω i x) - this is a discriminative model. Some limitations exist for this to be valid: an infinite amount of training data, or knowledge of correct joint distribution for p(x,t); the topology of the network is complex enough that final error is small; good training algorithm for the network - it finds global optimum Engineering Part IIB: Module 4F10 16
18 Compensating for Different Priors Bayes law is used for the posterior probability for generative models. P(ω j x) = p(x ω whereclasspriors, P(ω j ), andclassconditional j)p(ω j ) densities, p(x ω j ), are estimated separately. p(x) For some tasks two use different training data sets (for example for speech recognition, the language model and the acoustic model). How can this difference in priors from the training and the test conditions be built into the MLP framework where the posterior probability is directly calculated? (This is applicable to all discriminative models) Again using Bayes law p(x ω j ) P(ω j x) P(ω j ) Thus if posterior is divided by the training data prior a value proportional to the class-conditional probability can be obtained. The standard form of Bayes rule may now be used with the appropriate, different, prior. Engineering Part IIB: Module 4F10 17
19 Error Back Propagation So far the training of MLPs has not been discussed. Originally interest in MLPs was limited due to issues in training. These problems were partly addressed with the development of the error back propagation algorithm. Error back propagation algorithm is based on gradient descent. Hence x 1 y 1(x) the activation function must be differentiable: need to be able to compute the derivative of the error function with respect to the weights of all layers. Inputs x 2 x d Hidden layer y (x) 2 y K(x) Output layer Outputs All gradients are evaluated at the current model parameters. Engineering Part IIB: Module 4F10 18
20 Output Layer or SLP Want to minimise e.g. the square error between the target of the output, t p, and the current output value y(x p ). Assume training criterion is least squares and the activation function is a sigmoid function. The cost function may be written as E = 1 2 n (y(x p ) t p ) (y(x p ) t p )) = p=1 n p=1 E (p) To simplify notation, we will only consider a single observation x with associated target values t and current output from the network y(x). The error with this single observation is denoted E. How does the error change as y(x) changes? But we are not interested in y(x) E y(x) = y(x) t How do we find the effect of varying the weights? Engineering Part IIB: Module 4F10 19
Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationArtificial Neural Networks (ANN)
Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationNeural Networks Lecture 4: Radial Bases Function Networks
Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLearning from Data: Multi-layer Perceptrons
Learning from Data: Multi-layer Perceptrons Amos Storkey, School of Informatics University of Edinburgh Semester, 24 LfD 24 Layered Neural Networks Background Single Neurons Relationship to logistic regression.
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationLecture 4: Feed Forward Neural Networks
Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationRadial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition
Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More information