Introduction to Neural Networks: Structure and Training

Similar documents
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Artificial Neural Networks. Edward Gatt

Introduction to Neural Networks

Multi-layer Neural Networks

Artificial Neural Networks Examination, June 2004

Multilayer Neural Networks

Chapter 4 Neural Networks in System Identification

Neural Networks DWML, /25

Neural networks. Chapter 19, Sections 1 5 1

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Neural networks. Chapter 20. Chapter 20 1

Neural Networks Lecture 4: Radial Bases Function Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Multilayer Neural Networks

Artificial Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Pattern Classification

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks Examination, June 2005

Christian Mohr

4. Multilayer Perceptrons

Unit III. A Survey of Neural Network Model

Learning and Memory in Neural Networks

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Artificial Neural Network

Synaptic Devices and Neuron Circuits for Neuron-Inspired NanoElectronics

y(x n, w) t n 2. (1)

EEE 241: Linear Systems

Artificial Neural Networks Examination, March 2004

NN V: The generalized delta learning rule

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Artificial Neural Networks

Regularization in Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

CS:4420 Artificial Intelligence

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Multilayer Perceptrons and Backpropagation

Artificial neural networks

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural Networks Lecture 3:Multi-Layer Perceptron

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Neural networks. Chapter 20, Section 5 1

Lecture 5: Recurrent Neural Networks

Lecture 3 Feedforward Networks and Backpropagation

CSC 411 Lecture 10: Neural Networks

Convolutional networks. Sebastian Seung

Neural Networks: Basics. Darrell Whitley Colorado State University

Artificial Neural Network and Fuzzy Logic

Data Mining Part 5. Prediction

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Introduction to Neural Networks

Backpropagation Neural Net

18.6 Regression and Classification with Linear Models

Artificial Neural Networks Examination, March 2002

Lecture 4: Perceptrons and Multilayer Perceptrons

Feed-forward Network Functions

Artifical Neural Networks

Statistical Machine Learning from Data

Ch.8 Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Multilayer Perceptrons (MLPs)

Part 8: Neural Networks

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Radial Basis Function (RBF) Networks

Multilayer Perceptron

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Network Training

COMS 4771 Introduction to Machine Learning. Nakul Verma

CSCI 315: Artificial Intelligence through Deep Learning

Artificial Neural Network : Training

Identification of two-mass system parameters using neural networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

CRYPTOGRAPHY USING CHAOTIC NEURAL NETWORK

Artificial Neural Networks. MGS Lecture 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Neural Networks (Part 1) Goals for the lecture

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Artificial Neural Network

Neural Networks and Ensemble Methods for Classification

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Multilayer Perceptron = FeedForward Neural Network

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Wavelet Neural Networks for Nonlinear Time Series Analysis

Deep Feedforward Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

CMSC 421: Neural Computation. Applications of Neural Networks

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Transcription:

Introduction to Neural Networks: Structure and Training Professor Q.J. Zhang Department of Electronics Carleton University, Ottawa, Canada www.doe.carleton.ca/~qjz, qjz@doe.carleton.ca

A Quick Illustration Example: Neural Network Model for Delay Estimation in a High-Speed Interconnect Network

High-Speed VLSI Interconnect Network Receiver 1 Driver 1 Driver 2 Receiver 4 Driver 3 Receiver 3 Receiver 2

Circuit Representation of the Interconnect Network L1 τ1 L2 τ2 L3 τ3 Rs R3 C3 L4 Source Vp, Tr R1 C1 R2 C2 τ4 R4 C4

Massive Analysis of Signal Delay

Need for a Neural Network Model A PCB contains large number of interconnect networks, each with different interconnect lengths, terminations, and topology, leading to need of massive analysis of interconnect networks During PCB design/optimization, the interconnect networks need to be adjusted in terms of interconnect lengths, receiver-pin load characteristics, etc, leading to need of repetitive analysis of interconnect networks This necessitates fast and accurate interconnect network models and neural network model is a good candidate

Neural Network Model for Delay Analysis τ1 τ2 τ3 τ4... L1 L2 L3 L4 R1 R2 R3 R4 C1 C2 C3 C4 Rs Vp Tr e1 e2 e3

3 Layer MLP: Feedforward Computation Outputs y 1 y 2 y j =Σ W jkz k k W jk Hidden Neuron Values Z 1 Z 2 Z 3 Z 4 W ki Z k = tanh(σ W ki x i ) i x 1 x 2 x 3 Inputs

Neural Net Training Objective: Training Data by simulation/ measurement d = d(x) Neural Network y = y(x) to adjust W,V such that minimize Σ (y( - d) 2 W,V x

Simulation Time for 20,000 Interconnect Configurations Method CPU Circuit Simulator (NILT) 34.43 hours AWE 9.56 hours Neural Network Approach 6.67 minutes

Important Features of Neural Networks Neural networks have the ability to model multi-dimensional nonlinear relationships Neural models are simple and the model computation is fast Neural networks can learn and generalize from available data thus making model development possible even when component formulae are unavailable Neural network approach is generic, i.e., the same modeling technique can be re-used for passive/active devices/circuits It is easier to update neural models whenever device or component technology changes

Inspiration Mary Lisa John Stop Start Help

A Biological Neuron Figure from Reference [L.H. Tsoukalas and R.E. Uhrig, Fuzzy and Neural Approaches in Engineering, Wiley, 1997.]

Neural Network Structures

Neural Network Structures A neural network contains neurons (processing elements) connections (links between neurons) A neural network structure defines how information is processed inside a neuron how the neurons are connected Examples of neural network structures multi-layer perceptrons (MLP) radial basis function (RBF) networks wavelet networks recurrent neural networks knowledge based neural networks MLP is the basic and most frequently used structure

MLP Structure y 1 y 2 y m 1 2 N.... L (Output) Layer L 1 2 3.... N L-1 (Hidden) Layer L-1....... 1 2 3.... N 2 (Hidden) Layer 2 1 2 3.... N 1 (Input) Layer 1 x 1 x 2 x 3 x n

Information Processing In a Neuron l z i σ (.) γ l γ i z l 1 0 z l wi 0 l 1 1 l wi 1 z l 1 2 l wi 2 Σ. l win l 1 l 1 z N l 1

Neuron Activation Functions Input layer neurons simply relay the external inputs to the neural network Hidden layer neurons have smooth switch-type activation functions Output layer neurons can have simple linear activation functions

Forms of Activation Functions: z = σ(γ) 1.5 Sigmoid 1 0.5 0-25 -20-15 -10-5 0 5 10 15 20 25 2 1 Arctangent 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10 2 Hyperbolic tangent 1 0-1 -2-10 -8-6 -4-2 0 2 4 6 8 10

Forms of Activation Functions: z = σ(γ) Sigmoid function: z = σ ( γ ) = 1 1 + e γ Arctangent function: 2 z = σ ( γ ) = arctan( γ ) π Hyperbolic Tangent function: e z = σ ( γ ) = e + γ + γ e + e γ γ

Structure: Multilayer Perceptrons (MLP): 1 2 N.... L (Output) layer L 1 2 3.... N L-1 (Hidden) layer l....... 1 2 3.... N 2 (Hidden) layer 2 1 2 3.... N 1 (Input) layer 1 x 1 x 2 x 3 x n

where: L = Total No. of the layers N l = No. of neurons in the layer #l l = 1, 2, 3,, L l w ij = Link (weight) between the neuron #i in the layer #l and the neuron #j in the layer #(l-1) NN inputs = x, x2,, (where n = N 1 ) 1 L NN outputs = y, y2,, (where m = N L ) 1 L Let the neuron output value be represented by z, l = Output of neuron #i in the layer #l z i Each neuron will have an activation function: z = σ (γ ) x n y m

Neural Network Feedforward: Problem statement: x x M 1 2 Given: x =, get y = from NN. x n y y M y m Solution: feed to layer #1, and feed outputs of layer #l-1 to l. 1 2 For l l i i = = 1 : z = x, i = 1,2, L, n, n N 1 N l = 1 l l l 1 l l i ij j i i ) j= 0 For l = 2,3, L, L : γ w z, z = σ ( γ and solution is i L i y = z, i = 1,2, L, m, m = N L

Question: How can NN represent an arbitrary nonlinear input-output relationship? Summary in plain words: Given enough hidden neurons, a 3-layer-perceptron can approximate an arbitrary continuous multidimensional function to any required accuracy.

Theorem (Cybenko, 1989): Let σ (.) be any continuous sigmoid function, the finite sums of the form: N 2 ( 3 ) n j= 1 kj i= 0 ( 2 ) ji yk = fk ( x ) = w σ ( w xi ) k = 1,2, L, m are dense in C(I n ). In other words, given any f C( In ) and ε > 0, there is a sum, f (x), of the above form, for which ( x ) f ( x ) < ε for all x I f where: I n -- n-dimensional unit cube [ 0, 1], i = 1,2, n x space: xi L, C(I n ): Space of continuous functions on I n. e.g., Original problem is y = f ( x), where f C( I n ), the form of f (x) is a 3-layer-perceptron NN. n [ 0, 1] n

Illustration of the Effect of Neural Network Weights 1.2 1 0.8 0.6 Standard signmoid function 1/(1 + e x ) 0.4 0.2 0-6 -4-2 0 2 4 6 2.5 2 1.5 1 bias: -2.5 2.0 0.5 Hidden neuron 1/(1 + e (0.5x 2.5) ) 0.5 0-10 -5 0 5 10 15 20 Suppose this neuron is z il. Weights w ijl (or w ki l+1 ) affect the figure horizontally (or vertically). Values of the w s are not unique, e.g., by changing signs of the 3 values in the example above.

Question: How many neurons are needed? Essence: The degree of non-linearity in original problem. Highly nonlinear problems need more neurons, more smooth problems need fewer neurons. Too many neurons may lead to over learning Too few neurons will not represent problem well enough Solution: experience trial/error adaptive schemes

Question: How many layers are needed? 3 or more layers are necessary and sufficient for arbitrary nonlinear approximation 3 or 4 layers are used more frequently more layers allow more effective representation of hierarchical information in the original problem

Radial Basis Function Network (RBF) Structure: y 1 y m w ij Z 1 Z 2 Z N λ ij,c ij x 1 x n

RBF Function (multi-quadratic) 1.5 1 σ(γ) 0.5 a = 0.4 0 a = 0.9-10 -6-2 2 6 10 γ 1 σ ( γ ) = α > 0 2 2 α ( c + γ ),

RBF Function (Gaussian) 1.2 1 σ(γ) 0.8 0.6 0.4 0.2 λ=1 λ=0.2 0-4 -3-2 -1 0 1 2 3 4-0.2 γ σ ( γ ) = exp( ( γ / λ ) 2 )

RBF Feedforward: y where cij k γ = i N 3 w ki z i i = 0 z i = σ ( γ i n x j ( j= 1 λ ij = ) = c ij ) e 2 γ i k = 1,2, L, is the center of radial basis function, λ is the factor. So a set of parameters c ij (i = 1, 2,, N, j = 1,, n) represent centers of the RBF. Parameters λ ij, (i = 1, 2,, N, j = 1,, n) represent standard deviation of the RBF. m i = 1, 2, L, N, γ 0 = 0 ij Universal Approximation Theorem (RBF) (Krzyzak, Linder & Lugosi, 1996): An RBF network exists such that it will approximate an arbitrary continuous y = f (x) to any accuracy required.

Wavelet Neural Network Structure: y 1 y m w ij Z 1 Z 2 Z N a 1 a 2 a N t ij x 1 x t j For hidden neuron z j: z j = σ ( γ j) = ψ ( ) a where t j = [ t, t j1 j2,..., t jn ]T is the translation vector, is the dilation factor, ψ ( ) is a wavelet function: ψ ( γ j ) 2 γ j 2 x t σ ( γ ) ( n) e 2 j n xi t ji 2 = j = γ j where γ j = = ( ) a j i= 1 a j x n j a j

Wavelet Function 1 z 0.5 0 a1 = 1-0.5 a1 = 5-1 -20-10 0 10 20 x

Wavelet Transform: R space of a real variable n R -- n-dimensional space, i.e. space of vectors of n real variables A function f : R n R is radial, if a function g : R R, n exists such that x R, f ( x) = g( x ) If ψ(x) is radial, its Fourier Transform ψˆ ( w) is also radial. Let ψ ˆ( w) = η ( w ), ψ(x) is a wavelet function, if = 2 n η ( ξ ) C ψ (2π ) dξ < 0 ξ Wavelet transform of f (x) is: n x t w( a, t) f ( x) a 2 = ψ ( a Inverse transform is: R n ) dx 1 ( n+ 1) x t f ( x) a n w( a, t) a 2 = ( ) dtda C 0 ψ R a ψ n

Feedforward Neural Networks The neural network accepts the input information sent to input neurons, and proceeds to produce the response at the output neurons. There is no feedback from neurons at layer l back to neurons at layer k, k l. Examples of feedforward neural networks: Multilayer Perceptrons (MLP) Radial Basis Function Networks (RBF) Wavelet Networks

Recurrent Neural Network (RNN): Discrete Time Domain The neural network output is a function of its present input, and a history of its input and output. y(t). Feedforwad neural network τ y(t-τ) τ τ τ τ y(t-2τ) y(t-3τ) x(t-τ) x(t-2τ) x(t)

Dynamic Neural Networks (DNN) (continuous time domain) The neural network directly represents the dynamic input-output relationship of the problem. The input-output signals and their time derivatives are related through a feedforward network. y (t). Feedforwad neural network y (t) y (t) y(t) x (t) x (t) x(t)

Self-Organizing Maps (SOM), (Kohonen, 1984) Clustering Problem: Given training data, p = 1, 2,, P, Find cluster centers, k = 1, 2,, N Basic Clustering Algorithm: For each cluster,, initialize an index set: 1 ck = x p size( R ) k p R k R k = { empty}, then x p x p For, p = 1, 2,, P, find the cluster that is closest to,. i.e. find c, such that, i { } c i x < c x, j, Let Ri = Ri p, 1 Then the center is adjusted: ci = size( Ri ) p continues until c i does not move any more. p j p R i x p i j and the process

Self Organizing Maps (SOM) SOM is a one or two dimensional array of neurons where the neighboring neurons in the map correspond to the neighboring cluster centers in the input data space. Principle of Topographic Map Information: (Kohonen, 1990) The spatial location of an output neuron in the topographic map corresponds to a particular domain or features of the input data Two-dimensional array of neurons input

Training of SOM: For each training sample, p = 1, 2,, P, find the nearest cluster,, such that c ij Then update x p cij x p < ckl x p, k, l c kl c ij and neighboring centers: = ckl + α ( t)( x p ckj ) where k i < Nc, l j < Nc Nc size of neighborhood α (t) -- a positive value decaying as training proceeds

Filter Clustering Example (Burrascano et al) x Specific MLP for group 1 y x x Specific MLP for group 2 y y Control x Specific MLP for group 3 y General MLP x Specific MLP for group 4 y Q Control SOM c

Other Advanced Structures: Knowledge Based Neural Networks embedding application specific knowledge into networks Empirical formula Equivalent Circuit + Pure Neural Networks