The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Similar documents
CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Slide03 Historical Overview Haykin Chapter 3 (Chap 1, 3, 3rd Ed): Single-Layer Perceptrons Multiple Faces of a Single Neuron Part I: Adaptive Filter

Vehicle Arrival Models : Headway

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Experiments on logistic regression

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Online Convex Optimization Example And Follow-The-Leader

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Random Walk with Anti-Correlated Steps

Theory of! Partial Differential Equations!

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Ensamble methods: Boosting

References are appeared in the last slide. Last update: (1393/08/19)

CSCE 496/896 Lecture 2: Basic Artificial Neural Networks. Stephen Scott. Introduction. Supervised Learning. Basic Units.

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

A Dynamic Model of Economic Fluctuations

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Theory of! Partial Differential Equations-I!

Ensamble methods: Bagging and Boosting

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

Errata (1 st Edition)

Chapter 4. Truncation Errors

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

This is an example to show you how SMath can calculate the movement of kinematic mechanisms.

Sequential Importance Resampling (SIR) Particle Filter

Numerical Dispersion

KINEMATICS IN ONE DIMENSION

Data Fusion using Kalman Filter. Ioannis Rekleitis

EE 435. Lecture 31. Absolute and Relative Accuracy DAC Design. The String DAC

Differential Geometry: Numerical Integration and Surface Flow

Some Basic Information about M-S-D Systems

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Pattern Classification and NNet applications with memristive crossbar circuits. Fabien ALIBART D. Strukov s group, ECE-UCSB Now at IEMN-CNRS, France

An EM algorithm for maximum likelihood estimation given corrupted observations. E. E. Holmes, National Marine Fisheries Service

GMM - Generalized Method of Moments

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Pattern Classification (VI) 杜俊

Particle Swarm Optimization

Math 333 Problem Set #2 Solution 14 February 2003

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Math 2214 Solution Test 1B Fall 2017

BU Macro BU Macro Fall 2008, Lecture 4

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

RC, RL and RLC circuits

Spring Ammar Abu-Hudrouss Islamic University Gaza

) were both constant and we brought them from under the integral.

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

Lecture 3: Exponential Smoothing

Smoothing. Backward smoother: At any give T, replace the observation yt by a combination of observations at & before T

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Design of a control system

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

Presentation Overview

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Outline of Topics. Analysis of ODE models with MATLAB. What will we learn from this lecture. Aim of analysis: Why such analysis matters?

Neural Networks. Understanding the Brain

Sliding Mode Controller for Unstable Systems

Monitoring and data filtering II. Dynamic Linear Models

Stable block Toeplitz matrix for the processing of multichannel seismic data

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Mean Square Projection Error Gradient-based Variable Forgetting Factor FAPI

Embedded Systems and Software. A Simple Introduction to Embedded Control Systems (PID Control)

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Probabilistic Robotics

Matlab and Python programming: how to get started

Notes on Kalman Filtering

Robust and Learning Control for Complex Systems

Math 334 Test 1 KEY Spring 2010 Section: 001. Instructor: Scott Glasgow Dates: May 10 and 11.

Chapter 7 Response of First-order RL and RC Circuits

arxiv: v2 [math.oc] 19 Jun 2016

LAPLACE TRANSFORM AND TRANSFER FUNCTION

Lesson 3.1 Recursive Sequences

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

The average rate of change between two points on a function is d t

04. Kinetics of a second order reaction

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

1 Review of Zero-Sum Games

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

Welcome Back to Physics 215!

Independent component analysis for nonminimum phase systems using H filters

Lab 10: RC, RL, and RLC Circuits

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

The fundamental mass balance equation is ( 1 ) where: I = inputs P = production O = outputs L = losses A = accumulation

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

CHAPTER 12 DIRECT CURRENT CIRCUITS

Transcription:

In The name of God Lecure4: Percepron and AALIE r. Majid MjidGhoshunih Inroducion The Rosenbla s LMS algorihm for Percepron 958 is buil around a linear neuron a neuron ih a linear acivaion funcion. Hoever, he Percepron is buil around a nonlinear neuron, namely he McCulloch-Pis model of a neuron. This neuron has a hard- limiing acivaion funcion performing he signum funcion. Recenly he erm mulilayer Percepron has ofen been used as a synonym for he erm mulilayer feedforard neural neork. In his secion e ill be referring o he former meaning.

Percepron Goal classifying i applied Inpu ino one of fo classes Procedure if oupu of hard limier is +, o class C ; if i is -, o class C inpu of hard limier : eighed sum of inpu effec of bias b is merely o shif decision boundary aay from origin synapic eighs adaped on ieraion by ieraion basis 3 Percepron ecision regions separaed by a hyper plane poin, above boundary line is assigned o C poin y,y belo boundary line o class C 4

Percepron LearningTheorem Linearly separable if o classes are linearly separable, here eiss decision surface consising of hyper plane. If so, here eiss eigh vecor T > 0 for every inpu vecor belonging o class C T 0 for every inpu vecor belonging o class C for only linearly separable classes, percepron orks ell 5 Percepron LearningTheorem Using modified signal-flo graph bias bn is reaed as synapic eigh driven by fied inpu + 0 n is bn linear combiner oupu 6

Percepron LearningTheorem 3 Weigh adjusmen if n is correcly classified Oherise learning rae parameer ηn conrols adjusmen applied o eigh vecor 7 Summary of Learning. Iniializaion se 0=0. Acivaion a ime sep n, acivae percepron by applying coninuous valued inpu vecor n and desired response dn 3. Compuaion of acual response yn = sgn[ T n n] 4. Adapaion of Weigh Vecor 5. Coninuaion incremen ime sep n and go back o sep 8

The neork is capable of solving linearly separable problem 9 Learning rule An algorihm o updae he eighs so ha finally he inpu paerns lie on boh sides of he line decided by he percepron Le be he ime, a =0 0, e have 0

Learning rule An algorihm o updae he eighs so ha finally he inpu paerns lie on boh sides of he line decided by he percepron Le be he ime, a =, e have Learning rule An algorihm o updae he eighs so ha finally he inpu paerns lie on boh sides of he line decided by he percepron Le be he ime, a =, e have

Learning rule An algorihm o updae he eighs so ha finally he inpu paerns lie on boh sides of he line decided by he percepron Le be he ime, a =3 3, e have 3 Implemenaion of Logical OT, A, and OR 4

Implemenaion of Logical Gae 5 Finding Weighs by MSE Mehod Finding Weighs by MSE Mehod Wrie a equaion for each raining daa Oupu for firs class is + and for second class Oupu for firs class is + and for second class is -or 0 Apply he MSE mehod o solve he problem Eample: Implemenaion of A gae 0 0 pe: pe e o o g e.5 0 0 0 b b * 0 6

Summary: Percepron vs. MSE procedures Percepron rule The percepron rule alays finds a soluion if he classes are linearly separable. Bu does no converge if he classes are no-separable. MSE crierion The MSE soluion has guaraneed convergence, bu i may no find a separaing hyperplane if classes are linearly separable oice ha MSE ries o minimize he sum of he squares of he disances of he raining daa o he separaing hyperplane. 7 Convergence of he Percepron learning la Rosenbla proved ha if inpu paerns are linearly separable, hen he Percepron learning la converges, and he hyperplane separaing o classes of inpu paerns can be deermined. Fied incremen convergence heorem for linearly separable vecors X and X, percepron converges afer some n 0 ieraions 8

Limiaion of Percepron The XOR problem Minsky: nonlinear separabiliy 9 Percepron ih sigmoid acivaion funcion For single neuron ih sep acivaion funcion: For single neuron ih Sigmoid acivaion funcion: 0

Represenaionof of Percepron in MATLAB MATLAB TOOLBOX ne = nepp,,f,lf escripion of funcion Perceprons are used o solve simple i.e. linearly separable classificaion problems. ET = EWPP,T,TF,LF TF akes o inpus, P : R-by-Q mari of Q inpu vecors of R elemens each.. T : S-by-Q mari of Q arge vecors of S elemens each. TF: Transfer funcion, defaul = 'hardlim'. LF: Learningfuncion funcion, defaul='learnp' learnp. Reurns a ne percepron.

Classificaion eample: Linear separabiliy See he M_ file 3 4

5 Classificaion of daa: nonlinear separabiliy 6

Classificaion of daa: nonlinear separabiliy 7 AALIE: The Adapive Linear Elemen AALIE is Percepron ih linear acivaion funcion This is proposed by Widro 8

Applicaions of Adaline In general, he Adaline is used o perform Linear approimaion of a small segmen of a nonlinear hyper surface, hich is generaed by a p variable funcion, y =f. Inhiscase, he bias is usually needed. Linear filering and predicion of daa signals; Paern associaion, ha is, generaion of m elemen oupu vecors associaed ih respecive p elemen inpu vecors. 9 For single neuron ε = d-y For muli neuron Error concep ε i = d i -y i i=:m ε m = d m -y m m is number of oupu neuron The oal measure of he goodness of approimaion, or he performance inde, can be specified by he mean- squared error over m neurons and raining vecors: J W m m i j e j i 30

W : eigh m m m W p g pm p 3 The MSE soluion is: m p p p m p X X X W The Error equaion is: m m E m E W J m p p m W X E 3

For single neuron m=: E E W J Replacing error in equaion: ] [ XW XW W J ] [ ] [ XW X W XW XW W J ] [ XW X W X W XW 33 ] [ XW X W XW 0. Eample: 3..8 4 3 X 4.8 4. W 6 5 X 7.3 5.7 W 8 7 9. 7.9 9 9 9. ] 45 85 ] [ 55] *[330 38 [385 J 0 90 85 0 660 38 [385 ] 0 45 ] [ 55] *[330 [385.38 0, J J 34 0 90 85 0 660 [385.38 0, J

The plo of performance inde J, of eample 35 Eample :he performance inde in general case 36

Mehod of seepes descen If is large he order of calculaion ill be high. In order o avoid his problem, e can find he opimal eigh vecor for hich he meansquared error, J. J aains minimum by ieraive modificaion of he eigh vecor for each raining eemplar in he direcion opposie o he gradien of he performance inde, J, An eample illusraed in Figure 4 5 for a single eigh siuaion. 37 Illusraion of he seepes descen mehod 38

Mehod of seepes descen When heeigh ihvecor aains he opimal value for hich he gradien is zero 0 in Figure 4 5, he ieraions are sopped. More precisely, he ieraions are specified as here he eigh adjusmen, n, is proporional o he gradien of he mean-squared error here η is a learning gain. 39 The LMS Widro Hoff Learning La The Leas- Mean- Square learning la replaces he gradien of he mean- squared error ih he gradien updae and can be rien in folloing form: W i m pm d i n n n d y i m i y p : m m m W n W n n n 40

The LMS Widro Hoff Learning La For single neuron For linear neuron y i i d y J d i i J y d i i i i i For nonlinear neuron v i i y v d y J d v J J v d v v v i i i 4 neork raining To ypes of neork raining: Sequenial mode or incremenal on-line, sochasic, or per-paern: Weighs updaed afer each paern is presened Bach mode off-line or per-epoch : Weighs updaed afer all paern is presened 4

Some general commens on he learning process Compuaionally, i he learning process goes hrough all raining eamples an epoch number of imes, unil a sopping crierion is reached. The convergence process can be moniored ih he plo of he mean- squared error funcion JWn. The popular sopping crieria are: he mean- squared error is sufficienly small: The rae of change of he mean- squared error is sufficienly small: 43 The effec of learning R Rae: ƞ 44

Applicaions Applicaions MA Moving average modeling filering For m= 0 b y 3 0 X 0 b b 3 y 45 3 3 b y Applicaions Applicaions AR auo regressive modeling: M M i i M n b i n y a n y Model Order of : For M=: y y y 3 0 y y y y X a a 3 y y 3 y y X b a y 46 3 3 y

PI conroller: Applicaions 3 47 Simulaion of MA modeling Suppose he MA model as: b M 3 Inpu is Gaussian noise ih mean=0 and var= y is Calculaed by recursive equaion Please see he M_file 48

M_file of MA Modeling 49 MA Modeling Zeros eigh iniial and η= 0.0 umber of daa raining se: =0 50

MA Modeling Zeros eigh iniial and η= 0. umber of daa raining se: =0 5 MA Modeling Random eigh iniial and η= 0.0 umber of daa raining se: =0 5

MA Modeling Random eigh iniial and η= 0. umber of daa raining se: =0 53 MA Modeling Random eigh iniial and η= 0. umber of daa raining se: =0 54

MATLAB TOOLBOX ne = nelinpr,s,i,lr escripion of funcion Linear layers are ofen used as adapive filers for signal processing and predicion. EWLIPR,S,I,LR I LR akes hese argumens, PR - RQ mari of Q represenaive inpu vecors. S - umber of elemens in he oupu vecor. I - Inpu delay vecor, defaul = [0]. LR - Learning rae, defaul = 0.0; and reurns a ne linear layer. 55