Neuro-Fuzzy Comp. Ch. 4 June 1, 2005

Size: px
Start display at page:

Download "Neuro-Fuzzy Comp. Ch. 4 June 1, 2005"

Transcription

1 Neuro-Fuzz Comp Ch 4 June, 25 4 Feedforward Multilaer Neural Networks part I Feedforward multilaer neural networks (introduced in sec 7) with supervised error correcting learning are used to approximate (snthesise) a non-linear input-output mapping from a set of training patterns Consider the following mapping f(x) from a p-dimensional domain X into an m-dimensional output space D: R p X f(x) F(W, X) R m D Y Figure 4 : Mapping from a p-dimensional domain into an m-dimensional output space A function: f : X D, or d = f(x) ; x X R p, d D R m is assumed to be unknown, but it is specified b a set of training examples, {X; D} This function is approximated b a fixed, parameterised function (a neural network) F : R p R M R m, or = F (W, x); x R p, d R m, W R M Approximation is performed in such a wa that some performance index, J, tpicall a function of errors between D and Y, J = J(W, D Y ) is minimised AP Papliński 4 Neuro-Fuzz Comp Ch 4 June, 25 Basic tpes of NETWORKS for APPROXIMATION: Linear Networks Adaline F (W, x) = W x Classical approximation schemes Consider a set of suitable basis functions {φ} n i= then F (W, x) = n w i φ i (x) Popular examples: power series, trigonometric series, splines, Radial Basis Functions The Radial Basis Functions guarantee, under certain conditions, an optimal solution of the approximation problem A special case: Gaussian Radial Basis Functions: φ i (x) = exp 2 (x t i) T Σ i (x t i ) Multilaer Perceptrons Feedforward neural networks i= where t i and Σ i are the centre and covariance matrix of the i-th RBF representing adjustible parameters (weights) of the network Each laer of the network is characterised b its matrix of parameters, and the network performs composition of nonlinear operations as follows: F (W, x) = (W (W l x) ) A feedforward neural network with two laers (one hidden and one output) is ver commonl used to approximate unknown mappings If the output laer is linear, such a network ma have a structure similar to an RBF network AP Papliński 4 2

2 Neuro-Fuzz Comp Ch 4 June, 25 4 Multilaer perceptrons (MLPs) Multilaer perceptrons are commonl used to approximate complex nonlinear mappings In general, it is possible to show that two laers are sufficient to approximate an nonlinear function Therefore, we restrict our considerations to such two-laer networks The structure of the decoding part of the two-laer back-propagation network is presented in Figure (4 2) x u h p W h L L } {{ } Hidden laer ψ W v m m } {{ } Output laer Figure 4 2: A block-diagram of a single-hidden-laer feedforward neural network The structure of each laer has been discussed in sec 6 Nonlinear functions used in the hidden laer and in the output laer can be different The output function can be linear There are two weight matrices: an L p matrix W h in the output laer in the hidden laer, and an m L matrix W AP Papliński 4 3 Neuro-Fuzz Comp Ch 4 June, 25 The working of the network can be described in the following wa: or simpl as u(n) = W h x(n) ; h(n) = ψ(u(n)) hidden signals ; v(n) = W h(n) ; (n) = (v(n)) output signals (n) = ( W ψ(w h x(n)) ) (4) Tpicall, sigmoidal functions (hperbolic tangents) are used, but other choices are also possible The important condition from the point of view of the learning law is for the function to be differentiable Tpical non-linear functions and their derivatives used in multi-laer perceptrons: Sigmoidal unipolar: = (v) = + e = (tanh(βv/2) ) βv 2 The derivative of the unipolar sigmoidal function: v e βv = d dv = β = β ( ) ( + e βv ) 2 AP Papliński 4 4

3 Neuro-Fuzz Comp Ch 4 June, 25 Sigmoidal bipolar: (v) = tanh(βv) The derivative of the bipolar sigmoidal function: - v = d dv = 4βe2βv (e 2βv + ) = β( 2 2 ) Note that Derivatives of the sigmoidal functions are alwas non-negative Derivatives can be calculated directl from output signals using simple arithmetic operations In saturation, for big values of the activation potential, v, derivatives are close to zero Derivatives of used in the error-correction learning law AP Papliński 4 5 Neuro-Fuzz Comp Ch 4 June, 25 Comments on multi-laer linear networks Multi-laer feedforward linear neural networks can be alwas replaced b an equivalent single-laer network Consider a linear network consisting of two laers: x h p W W h L m The hidden and output signals in the network can be calculated as follows: h = W h x, = W h After substitution we have: = W W h x = W x where W = W W h which is equivalent to a single-laer network described b the weight matrix, W : x p W m AP Papliński 4 6

4 Neuro-Fuzz Comp Ch 4 June, Detailed structure of a Two-Laer Perceptron the most commonl used feedforward neural network Signal-flow diagram: w h u x h w 3 v x i x = w ji h u j h j w kj v k k = x p w h Lp 3 7 u W h L h L w v p m ml W Input Laer Hidden Laer Output Laer u j = W h j: x ; h j = (u j ) ; v k = W k: h ; k = (v k ) Dendritic diagram: We can have: x p = and possibl h L = x x i x p h bus u h W: h W : u j h j Wj: h wji h W k: u L h L WL: h W h is L p Wm: h h j h L v v k k w kj v p ṁ W is m L Figure 4 3: Various representations of a Two-Laer Perceptron AP Papliński 4 7 Neuro-Fuzz Comp Ch 4 June, Example of function approximation with a two-laer perceptron Consider a single-variable function approximated b the following two-laer perceptron: x + w w 2 w 2 w 22 h h 2 w w 2, d Function approximation with a two laer perceptron w h w 2 h 2 w 3 h 3 w 3 w 32 W h h 3 w w x The neural net generates the following function: = w h = w tanh W h x = w h + w 2 h 2 + w 3 h 3 = w tanh(w x + w 2 ) + w 2 tanh(w 2 x + w 22 ) + w 3 tanh(w 3 x + w 32 ) = 5 tanh(2x ) + tanh(3x 4) 3 tanh(75x 2) AP Papliński 4 8

5 Neuro-Fuzz Comp Ch 4 June, Structure of a Gaussian Radial Basis Functions (RBF) Neural Network An RBF neural network is similar to a two-laer perceptron with the linear output laer: x p exp( 2 (x t ) T Σ (x t ) p p p p t Σ exp( 2 (x t 2) T Σ 2 (x t 2 ) p p t 2 Σ 2 p p p exp( 2 (x t L) T Σ L (x t L ) w w 2 w L Let us consider a single output case, that is, approximation of a single function of p variables: where = F (x; t,, t L, Σ,, Σ L, w) = L l= w l exp( 2 (x t l) T Σ l (x t l ) t l, p-element vectors, are the centers of the Gaussian functions, and Σ l, p p covariance matrices, specified a shape of the Gaussian functions All parameters: (t,, t L, Σ,, Σ L, w) are adjusted during learning procedure t L Σ L When the covariance matrix is diagonal, the axes of the Gaussian shape are aligned with the coordinate sstem axes If, in addition, all diagonal elements are identical, the Gaussian function is smmetrical in all directions AP Papliński 4 9 Neuro-Fuzz Comp Ch 4 June, Example of function approximation with a Gaussian RBF network Consider a single-variable function approximated b the following Gaussian RBF network: Function approximation with a Gaussian RBF x ( ) x t 2 2 exp h w, d 5 w h w 3 h 3 ( ) x t ( ) x t exp exp h 2 h 3 w 2 w w 2 h x The neural net generates the following function: = w h = w h + w 2 h 2 + w 3 h 3 = w exp( 2 x t 2 ) + w 2 exp( 2 x t ) + w 3 exp( 2 x t ) AP Papliński 4

6 Neuro-Fuzz Comp Ch 4 June, Error-Correcting Learning Algorithms for Feedforward Neural Networks Error-correcting learning algorithms are supervised training algorithms that modif the parameters of the network in such a wa to minimise that error between the desired and actual outputs The vector w represents all the adjustable parameters of the network x = F( x; w ) w = w + w ε d Training data consists of N p-dimensional vectors x(n), and N m-dimensional desired output vectors, d(n), that are organized in two matrices: X = [ x() x(n) x(n) ] is p N matrix, D = [ d() d(n) d(n) ] is m N matrix For each input vector, x(n), the network calculates the actual output vector, (n), as (n) = F (x(n); w(n)) The output vector is compared with the desired output d(n) and the error is calculated: ε(n) = [ε (n) ε m (n)] T = d(n) (n) is an m vector, (42) In a pattern training algorithm, at each step the weight vector is updated so that the total error is minimised w(n + ) = w(n) + w(n) AP Papliński 4 Neuro-Fuzz Comp Ch 4 June, 25 More specificall, we tr to minimise a performance index, tpicall the mean-squared error (mse), J(w), specified as an averaged sum of instantaneous squared errors at the network output: J(w) = m N N n= E(w(n)) (43) where the total instantaneous squared error (tise) at the step n, E(w(n)), is defined as E(w(n)) = 2 m k= ε 2 k(n) = 2 εt (n) ε(n) (44) and ε is an m vector of instantaneous errors as in eqn (42) To consider possible minimization algorithm we can expand J(w) into the Talor power series: J(w(n + )) = J(w + w) = J(w) + w J(w) + 2 w 2 J(w) w T + = J(w) + w g(w) + 2 w H(w) wt + (45) (n) has been omitted for brevit where: J(w) = g is the gradient vector of the performance index, 2 J(w) = H is the Hessian matrix (matrix of second derivatives) AP Papliński 4 2

7 Neuro-Fuzz Comp Ch 4 June, 25 As for the Adaline, we infer that in order for the performance index to be reduced, that is the following condition must be satisfied: J(w + w) < J(w) w J(w) < where the higher order terms in the expansion (45) have been ignored This condition describes the steepest descent method in which the weight vector is modified in the direction opposite to the gradient vector: w = η J(w) (46) However, the gradient of the total error is equal to the sum of its components: J(w) = m N N n= E(w(n)) (47) Therefore, in the pattern training, at each step the weight vector can be modified in the direction opposite to the gradient of the total instantaneous squared error: w(n) = η E(w(n)) (48) AP Papliński 4 3 Neuro-Fuzz Comp Ch 4 June, 25 The gradient of the total instantaneous squared error (tise) is a K-component vector, where K is the total number of weights, of all partial derivatives of E(w) with respect to w: (n) has been omitted for brevit Using eqns (44) and (42) we have E(w) = w = w w K where w = j w k m K E(w) = w = εt ε w = εt w is a m K matrix of all derivatives j w k (Jacobian matrix) (49) Details of calculations of the gradient of the total instantaneous squared error (gradient of tise), in particular the Jacobian matrix will be different for a specific tpe of neural nets, that is, for a specific mapping function = F (x; w) AP Papliński 4 4

8 Neuro-Fuzz Comp Ch 4 June, Steepest Descent Backpropagation Learning Algorithm for a Multi-Laer Perceptron The steepest descent backpropagation learning algorithm is the simplest error-correcting algorithms for a multi-laer perceptron For simplicit we will derive it for a two-laer perceptron as in Figure 4 3 where the vector of output signals is calculated as: = (v) ; v = W h ; h = ψ(w h x) (4) We start with re-shaping two weight matrices into one weight vector of the following structure: w = scan(w h, W ) = [w h wp h wlp h w w L w ml] = [ w h w ] }{{}}{{} hidden weights output weights The size of w h is L p and w is m L, total number of weights being equal to K = L(p + m) Now, the gradient of tise has components associated with the hidden laer weights, W h, and the output laer weights, W, which are arranged in the following wa: E(w) = w h w = w h wji h w h Lp w Using eqns (49) and (4) the gradient of tise can be further expressed as: w kj w ml E(w) = w = εt w = εt v v w (the chain rule) (4) AP Papliński 4 5 Neuro-Fuzz Comp Ch 4 June, Output laer The vector of the output signals is calculated as: h L W v m m ε m m = (v) ; v = W h + d The first Jacobian matrix of eqn (4) can be evaluated as follows: v = diag( ) = diag( v v k = w k h which is a diagonal matrix of derivatives of the activation function k = k v k m ) (42) v m The products of errors ε k and derivatives of the activation function, k are known as the delta errors: is a vector of delta errors δ T = ε T diag( ) = [ ε ε m m Substituting eqns (43) and (42) into eqn (4) the gradient of tise can be expressed as: ] (43) E(w) = w = δt v w (44) AP Papliński 4 6

9 Neuro-Fuzz Comp Ch 4 June, 25 Now we will separatel calculate the output and hidden sections of the gradient of tise, namel, and wh w From eqn (44) the output section of the gradient can be written as: In order to evaluate the Jacobian v note first that w w = v δt (45) w since v = W h than Therefore, after scanning W row-wise into w we have v W = ht h v T w = h T = I h T (46) where denotes the Kronecker product and I is an m m identit matrix Combining eqns (45) and (46), we finall have: w = δt (I h T ) or W = δ ht (47) Hence, if we reshape the weight vector w back into a weight matrix W, the gradient of the total instantaneous squared error can be expressed as an outer product of δ and h vectors AP Papliński 4 7 Neuro-Fuzz Comp Ch 4 June, 25 Using eqn (48), the steepest descent pattern training algorithm for minimisation of the performance index with respect to the output weight matrix W (n) = η (n) W (n) = η δ(n) h T (n) ; W (n + ) = W (n) + W (n) (48) In the batch training mode the gradient of the total performance index, J(W h, W ), related to the output weight matrix, W, can be obtained b summing the gradients of the total instantaneous squared errors: J W = m N N n= (n) W (n) = m N If we take into account that the sum of outer products can be replaced b a product of matrices collecting the contributing vectors, then the gradient can be written as: N n= δ(n) h T (n) (49) J W = m N S HT (42) where S is the m N matrix of output delta errors: S = [δ() δ(n)] (42) and H is the L N matrix of the hidden signals: H = ψ(w h X) Therefore, in the batch training LMS algorithm, the weight update after k-th epoch can be written as: W (k) = η J W = η S(k) H T (k) ; W (k + ) = W (k) + W (k) (422) AP Papliński 4 8

10 Neuro-Fuzz Comp Ch 4 June, Hidden laer As for the output laer we start with eqn (44) which for the hidden weight can be re-written in a form analogous to eqn (45) and then, using a chain rule of differentiation, we can write: w h = δt Since v = W h we have v h = W v w = h δt v h h w h (423) and we can define the backpropagated error as: ε h = (W ) T δ (424) The gradient of tise with respect to hidden weights can be now written as: x u = W h x ψ h = ψ(u) p W h L L εh L w h = (εh ) T h w = h (εh ) T h u u w h (425) which is structurall identical to eqn (49), w, ε and being replaced b w h, ε h and h, respectivel Following eqn (43) we can define the backpropagated delta errors: (δ h ) T = (ε h ) T h u = (εh ) T diag(ψ ) = [ ε h ψ ε h L ψ L ] ; ψ j = h j u j (426) AP Papliński 4 9 Neuro-Fuzz Comp Ch 4 June, 25 Now, the gradient of tise can be expressed in a form analogous to eqn (45): Finall we have: w h = (δh ) T (I x T ) w = h (δh ) T v (427) w h or W h = δh x T (428) Hence, if we reshape the weight vector w h back into a weight matrix W h, the gradient of the total instantaneous squared error can be expressed as an outer product of δ h and x vectors Therefore, for the pattern training algorithm, the update of the hidden weight matrix for the n the training pattern now becomes: W h (n) = η h (n) W h (n) = η h δ h (n) x T (n) ; W h (n + ) = W h (n) + W h (n) (429) It is interesting to note that the weight update rule is identical in its form for both the output and hidden laers AP Papliński 4 2

11 Neuro-Fuzz Comp Ch 4 June, 25 For the batch training, the gradient of the total performance index, J(W h, W ), related to the hidden weight matrix, W h, can be obtained b summing the instantaneous gradients: J W = h m N N n= (n) W h (n) = m N N n= δ h (n) x T (n) (43) If we take into account that the sum of outer products can be replaced b a product of matrices collecting the contributing vectors, then we finall have J W = m N Sh X T (43) where S h is the L N matrix of hidden delta errors: S h = [ δ h () δ h (N) ] (432) and X is the p N matrix of the input signals In the batch training steepest descent algorithm, the weight update after k-th epoch can be written as: W h J (k) = η h W = η h h S h (k) X T ; W h (k + ) = W h (k) + W h (k) (433) AP Papliński 4 2 Neuro-Fuzz Comp Ch 4 June, Alternative derivation The basic signal flow referred to in the above calculations is as follows: x h v h h j h L ε h v Good to remember x x i x p u j h j w h ji h L v p d ε d v k w k kj ε k d k m ε m d m E = 2 m k= ε 2 k, ε k = d k k k h j = (w k h) = ( L w kj h j ) j= = (wj h x) = ( p wji x i ) i= AP Papliński 4 22

12 Neuro-Fuzz Comp Ch 4 June, 25 The gradient component related to a snaptic weight w kj for the n th training vector can be calculated as follows: (n) w kj = ε k k w kj = ε k k v k v k w kj E = 2 ( + ε2 k + ), k = (v k ) v k = W k: h = + w kjh j + where the delta error, δ k, is the output error, ε k, modified with the derivative of the activation function, k = ε k k h j k = k v k = δ k h j δ k = ε k k Alternativel, the gradient components related to the complete weight vector, W k: of the kth output neuron can be calculated as: In the above expression, each component of the gradient is a function of the delta error for the k th output, δ k, and respective output signal from the hidden laer, h T = [h h L ] (n) W k: = ε k k W k: = ε k k v k v k W k: = ε k k h T = δ k h T k = (v k ) v k = W k: h AP Papliński 4 23 Neuro-Fuzz Comp Ch 4 June, 25 Finall, the gradient components related to the complete weight matrix of the output laer, W, can be collected in an m L matrix, as follows: and denotes an element-b-element multiplication, and the hidden signals are calculated as follows: where δ = δ δ m (n) W = δ(n) h T (n) (434) = ε ε m m h(n) = ψ(w h (n) x(n)) = ε, (435) Gradient components related to the weight matrix of the hidden laer: (n) w h ji = m k= ε k k w h ji k = (v k ) = m k= = ( m k= ε k k v k v k w h ji δ k w kj) h j w h ji = W T :j δ h j w h ji v k = + w kjh j + δ k = ε k k W T :j δ = δ T W :j = m k= δ k w kj = ε h j AP Papliński 4 24

13 Neuro-Fuzz Comp Ch 4 June, 25 Note, that the j th column of the output weight matrix, W :j, is used to modif the delta errors to create the equivalent hidden laer errors, known as back-propagated errors which are specified as follows: ε h j = W T :j δ, for j =,, L Using the back-propagated error, we can now repeat the steps performed for the output laer, with ε h j and h j replacing ε k and k, respectivel (n) w h ji = ε h j h j w h ji = ε h j ψ j x i ψ j = ψ j u j = δ h j x i δ h j = ε h j ψ j h j = ψ(u j ), u j = W h j: x where the back-propagated error has been used to generate the delta-error for the hidden laer, δ h j All gradient components related to the hidden weight matrix, W h, can now be calculated in a wa similar to that for the output laer as in eqn (434): (n) W h = δh x T, where δ h = ε h ψ ε h L ψ L = ε h ψ and ε h = W T δ (436) AP Papliński 4 25 Neuro-Fuzz Comp Ch 4 June, The structure of the two-laer back-propagation network with learning The structure of the decoding and encoding parts of the two-laer back-propagation network: x p ɛ u L ψ W h Lp W h η h δ h x T δ h L h = ψ(u) dψ du L ε h L W ml v m m d dv = (v) ψ W δ ε h ψ η ε δ h T m Σ ε d Note the decoding and encoding parts, and the blocks which calculate derivatives, delta signals and the weight update signals The process of computing the signals (pattern mode) during each time step consists of the: forward pass in which the signals of the decoding part are determined starting from x, through u, h, ψ, v to and backward pass in which the signals of the learning part are determined starting from d, through ε, δ, W, ε h, δ h and W h From Figure 474 and the relevant equations note that, in general, the weight update is proportional to the snaptic input signals (x, or h) and the delta signals (δ h, or δ) The delta signals, in turn, are proportional to the derivatives the activation functions, ψ, or AP Papliński 4 26

14 Neuro-Fuzz Comp Ch 4 June, 25 Comments on Learning Algorithms for Multi-Laer Perceptrons The process of training a neural network is monitored b observing the value of the performance index, J(W (n)), tpicall the mean-squared error as defined in eqns (43) and (44) In order to reduce the value of this error function, it is tpicall necessar to go through the set of training patterns (epochs) a number of times as discussed in page 3 2 There are two basic modes of updating weights: the pattern mode in which weights are updated after the presentation of a single training pattern, the batch mode in which weights are updated after each epoch For the basic steepest descent backpropagation algorithm the relevant equations are: pattern mode where n is the pattern index batch mode W (n + ) = W (n) + η δ(n) h T (n) W h (n + ) = W h (n) + η h δ h (n) x T (n) W (k + ) = W (k) + η S(k) H T (k) W h (k + ) = W h (k) + η h S h (k) X T where k is the epoch counter Definitions of the other variable have been alread given AP Papliński 4 27 Neuro-Fuzz Comp Ch 4 June, 25 Weight Initialisation The weight are initialised in one of the following was: using prior information if available The Nguen-Widrow algorithm presented in sec 54 is a good example of such initialisation to small uniforml distributed random numbers Incorrectl initialised weights cause that the activation potentials ma become large which saturates the neurons In saturation, derivatives = and no learning takes place A good initialisation can significantl speed up the learning process Randomisation For the pattern training it might be a good practice to randomise the order of presentation of training examples between epochs Validation In order to validate the process of learning the available data is randoml partitioned into a training set which is used for training, and a test set which is used for validation of the obtained data model AP Papliński 4 28

15 Neuro-Fuzz Comp Ch 4 June, Example of function approximation (fap2dm) In this MATLAB example we approximate two functions of two variables, using a two-laer perceptron, = f(x), or = f (x, x 2 ), 2 = f 2 (x, x 2 ) = (W (W h x)) The weights of the perceptron, W h, W, are trained using the basic back-propagation algorithm in a batch mode as discussed in the previous section Specification of the the neural network (fap2dim): p = 3 ; % Number of inputs plus the bias input L = 2; % Number of hidden signals (with bias) m = 2 ; % Number of outputs x W u h h W v x x 2 x 3 = u h u 2 h 2 W h u h h bus h h 2 h 2 = v W v 2 2 AP Papliński 4 29 Neuro-Fuzz Comp Ch 4 June, 25 Two functions to be approximated b the two-laer perceptron are as follows: = x e ρ2, 2 = sin 2ρ2 4ρ 2, where ρ 2 = x 2 + x 2 2 The domain of the function is a square x, x 2 [ 2, 2) In order to form the training set the functions are sampled on a regular 6 6 grid The relevant MATLAB code to form the matrices X and D follows: na = 6; N = naˆ2; nn = :na-; % Number of training cases Specification of the domain of functions: X = nn*4/na-2; % na points from -2 step (4/na)=25 to (2-4/na)=75 [X X2] = meshgrid(x); % coordinates of the grid vertices X and X2 are na b na R=(Xˆ2+X2ˆ2+e-5); % R (rhoˆ2) is a matrix of squares of distances of the grid vertices from the origin D = X*exp(-R); D = (D(:)) ; % D is na b na, D is b N D2 = 25*sin(2*R)/R ; D = [D ; (D2(:)) ]; % D2 is na b na, D is a 2 b N matrix of 2-D target vectors The domain sampling points are as follows: X= X2= AP Papliński 4 3

16 Neuro-Fuzz Comp Ch 4 June, 25 Scanning X and X2 column-wise and appending the bias inputs, we obtain the input matrix X which is p N: X = [X(:) ; X2(:) ;ones(,n)]; The training exemplars are as follows: X = D = Two 2 D target functions 5 The functions to be approximated are plotted side-b-side, which distorts the domain which in realit is the same for both functions, namel, x, x 2 [ 2, 2) 2 surfc([x-2 X+2], [X2 X2], [D D2]) AP Papliński 4 3 Neuro-Fuzz Comp Ch 4 June, 25 Random initialization of the weight matrices: Wh = randn(l,p)/p; % the hidden-laer weight matrix W h is L p W = randn(m,l)/l; % the output-laer weight matrix W is m L C = 2; % maximum number of training epochs J = zeros(m,c); % Memor allocation for the error function eta = [3 ]; % Training gains for c = :C % The main loop (fap2dm) The forward pass: H = ones(l-,n)/(+exp(-wh*x)); % Hidden signals (L- b N) Hp = H*(-H); % Derivatives of hidden signals H = [H ; ones(,n)]; % bias signal appended Y = tanh(w*h); % Output signals (m b N) Yp = - Yˆ2; % Derivatives of output signals The backward pass: E = D - Y; % The output errors (m b K) JJ = (sum((e*e) )) ; % The total error after one epoch % the performance function m b dely = E*Yp; % Output delta signal (m b K) dw = dely*h ; % Update of the output matrix dw is L b m Eh = W(:,:L-) *dely % The back-propagated hidden error Eh is L- b N delh = Eh*Hp ; % Hidden delta signals (L- b N) dwh = delh*x ; % Update of the hidden matrix dwh is L- b p W = W+eta*dW; Wh = Wh+etah*dWh; % The batch update of the weights AP Papliński 4 32

17 Neuro-Fuzz Comp Ch 4 June, 25 Two 2-D approximated functions are plotted after each epoch D(:)=Y(,:) ; D2(:)=Y(2,:) ; surfc([x-2 X+2], [X2 X2], [D D2]) J(:,c) = JJ ; end % of the epoch loop Approximation after epochs: Approximation: epoch:, error: eta: 4 4 The sum of squared errors at the end of each training epoch is collected in a 2 C matrix The approximation error for each function at the end of each epoch: 6 The approximation error number of training epochs AP Papliński 4 33 Neuro-Fuzz Comp Ch 4 June, 25 From this example ou will note that the backpropagation algorithm is painfull slow sensitive to the weight initialization sensitive to the training gains We will address these problems in the subsequent sections It is good to know that the best training algorithm can be two orders of magnitude faster that a basic backpropagation algorithm AP Papliński 4 34

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Neuro-Fuzzy Comp. Ch. 4 March 24, R p 4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear

More information

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n) Neuro-Fuzzy, Revision questions June, 25. A discrete-time recurrent network is described by the following equation: y(n + ) = A y(n) + B x(n) where A =.7.5.4.6, B = 2 (a) Sketch the dendritic and signal-flow

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Learning in Multidimensional Spaces Neural Networks. Matrix Formulation

Learning in Multidimensional Spaces Neural Networks. Matrix Formulation Learning in Multidimensional Spaces Neural Networks Matrix Formulation Andrew P Papliński Monash University, Faculty of Information Technology Technical Report June 3, 27 (This is a work in progress Expect

More information

Other examples of neurons can be found in:

Other examples of neurons can be found in: 4 Concept neurons Introduction to artificial neural networks 41 Tpical neuron/nerve cell: from Kandel, Schwartz and Jessel, Principles of Neural Science The cell od or soma contains the nucleus, the storehouse

More information

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow Plan training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method single neuron with continuous activation function

More information

In this section we first consider selected applications of the multi-layer perceptrons.

In this section we first consider selected applications of the multi-layer perceptrons. Neuro-Fuzzy Comp. Ch. 5 March 24, 25 5 Feedforward Multilayer Neural Networks part II In this section we first consider selected applications of the multi-layer perceptrons. 5. Image Coding using Multi-layer

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Neural networks III: The delta learning rule with semilinear activation function

Neural networks III: The delta learning rule with semilinear activation function Neural networks III: The delta learning rule with semilinear activation function The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. We

More information

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

7 Advanced Learning Algorithms for Multilayer Perceptrons

7 Advanced Learning Algorithms for Multilayer Perceptrons 7 Advanced Learning Algorithms for Multilayer Perceptrons The basic back-propagation learning law is a gradient-descent algorithm based on the estimation of the gradient of the instantaneous sum-squared

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Learning strategies for neuronal nets - the backpropagation algorithm

Learning strategies for neuronal nets - the backpropagation algorithm Learning strategies for neuronal nets - the backpropagation algorithm In contrast to the NNs with thresholds we handled until now NNs are the NNs with non-linear activation functions f(x). The most common

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

The Multi-Layer Perceptron

The Multi-Layer Perceptron EC 6430 Pattern Recognition and Analysis Monsoon 2011 Lecture Notes - 6 The Multi-Layer Perceptron Single layer networks have limitations in terms of the range of functions they can represent. Multi-layer

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Backpropagation: The Good, the Bad and the Ugly

Backpropagation: The Good, the Bad and the Ugly Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

C1.2 Multilayer perceptrons

C1.2 Multilayer perceptrons Supervised Models C1.2 Multilayer perceptrons Luis B Almeida Abstract This section introduces multilayer perceptrons, which are the most commonly used type of neural network. The popular backpropagation

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1. CompNeuroSci Ch 10 September 8, 2004 10 Associative Memory Networks 101 Introductory concepts Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1 Figure 10 1: A key

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Neuro-Fuzzy Comp. Ch. 1 March 24, 2005

Neuro-Fuzzy Comp. Ch. 1 March 24, 2005 Conclusion Neuro-Fuzzy Comp Ch 1 March 24, 2005 1 Basic concepts of Neural Networks and Fuzzy Logic Systems Inspirations based on course material by Professors Heikki Koiovo http://wwwcontrolhutfi/kurssit/as-74115/material/

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity Architecture We consider

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

EPL442: Computational

EPL442: Computational EPL442: Computational Learning Systems Lab 2 Vassilis Vassiliades Department of Computer Science University of Cyprus Outline Artificial Neuron Feedforward Neural Network Back-propagation Algorithm Notes

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun The

More information

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Supervised Learning in Neural Networks

Supervised Learning in Neural Networks The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS RBFN and TS systems Equivalent if the following hold: Both RBFN and TS use same aggregation method for output (weighted sum or weighted average) Number of basis functions

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Chapter 6: Backpropagation Nets

Chapter 6: Backpropagation Nets Chapter 6: Bacpropagation Nets Architecture: at least one laer of non-linear hidden units Learning: supervised, error driven, generalized delta rule Derivation of the weight update formula (with gradient

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information