Linear Associator Linear Layer

Similar documents
Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

ME 539, Fall 2008: Learning-Based Control

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

10-701/ Machine Learning Mid-term Exam Solution

Intermittent demand forecasting by using Neural Network with simulated data

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Computing the output response of LTI Systems.

Introduction to Signals and Systems, Part V: Lecture Summary

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Adaptive Resonance Theory (ART)

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

ChE 471 Lecture 10 Fall 2005 SAFE OPERATION OF TUBULAR (PFR) ADIABATIC REACTORS

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pixel Recurrent Neural Networks

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

Multilayer perceptrons

NUMERICAL METHODS FOR SOLVING EQUATIONS

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

Math 312 Lecture Notes One Dimensional Maps

Statistical Pattern Recognition

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

An Introduction to Neural Networks

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

1 Review of Probability & Statistics

Castiel, Supernatural, Season 6, Episode 18

CSI 2101 Discrete Structures Winter Homework Assignment #4 (100 points, weight 5%) Due: Thursday, April 5, at 1:00pm (in lecture)

Zeros of Polynomials

The Phi Power Series

ADVANCED DIGITAL SIGNAL PROCESSING

Math 2784 (or 2794W) University of Connecticut

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Approximations and more PMFs and PDFs

CS:3330 (Prof. Pemmaraju ): Assignment #1 Solutions. (b) For n = 3, we will have 3 men and 3 women with preferences as follows: m 1 : w 3 > w 1 > w 2

Orthogonal Gaussian Filters for Signal Processing

Principle Of Superposition

Recurrence Relations

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Machine Learning Brett Bernstein

Disjoint set (Union-Find)

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

FREE VIBRATION RESPONSE OF A SYSTEM WITH COULOMB DAMPING

Activity 3: Length Measurements with the Four-Sided Meter Stick

Pattern Classification, Ch4 (Part 1)

Time-Domain Representations of LTI Systems

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Problem Set 4 Due Oct, 12

OBJECTIVES. Chapter 1 INTRODUCTION TO INSTRUMENTATION FUNCTION AND ADVANTAGES INTRODUCTION. At the end of this chapter, students should be able to:

Analysis of Algorithms. Introduction. Contents

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

GEORGIA INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING

Machine Learning Theory (CS 6783)

Mixtures of Gaussians and the EM Algorithm

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

Chapter 4 : Laplace Transform

CS / MCS 401 Homework 3 grader solutions

CS322: Network Analysis. Problem Set 2 - Fall 2009

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

Expectation-Maximization Algorithm.

ELEC1200: A System View of Communications: from Signals to Packets Lecture 3

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Lecture 11: Pseudorandom functions

f(x) dx as we do. 2x dx x also diverges. Solution: We compute 2x dx lim

The Riemann Zeta Function

Lecture 9: Hierarchy Theorems

Signal Processing in Mechatronics. Lecture 3, Convolution, Fourier Series and Fourier Transform

AN ALMOST LINEAR RECURRENCE. Donald E. Knuth Calif. Institute of Technology, Pasadena, Calif.

Information-based Feature Selection

CS321. Numerical Analysis and Computing

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

CS537. Numerical Analysis and Computing

Appendix: The Laplace Transform

Lecture 3. Digital Signal Processing. Chapter 3. z-transforms. Mikael Swartling Nedelko Grbic Bengt Mandersson. rev. 2016

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

Signals and Systems. Problem Set: From Continuous-Time to Discrete-Time

Signal Processing. Lecture 02: Discrete Time Signals and Systems. Ahmet Taha Koru, Ph. D. Yildiz Technical University.

MA Advanced Econometrics: Properties of Least Squares Estimators

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Unit 6: Sequences and Series

Math 113 Exam 3 Practice

Massachusetts Institute of Technology

Vector Quantization: a Limiting Case of EM

Regression with quadratic loss

Solution of Linear Constant-Coefficient Difference Equations

ME NUMERICAL METHODS Fall 2007

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Title: Damage Identification of Structures Based on Pattern Classification Using Limited Number of Sensors

-ORDER CONVERGENCE FOR FINDING SIMPLE ROOT OF A POLYNOMIAL EQUATION

Olli Simula T / Chapter 1 3. Olli Simula T / Chapter 1 5

Research Article Nonautonomous Discrete Neuron Model with Multiple Periodic and Eventually Periodic Solutions

Transcription:

Hebbia Learig opic 6 Note: lecture otes by Michael Negevitsky (uiversity of asmaia) Bob Keller (Harvey Mudd College CA) ad Marti Haga (Uiversity of Colorado) are used Mai idea: learig based o associatio betwee euros he mai property of a eural etwork is a ability to lear from its eviromet ad to improve its performace through learig. So far we have cosidered supervised or active learig learig with a exteral teacher or a supervisor who presets a traiig set to the etwork. But aother type of learig also exists: usupervised learig.. I cotrast to supervised learig usupervised or self-orgaised learig does ot require a exteral teacher. Durig the traiig sessio the eural etwork receives a umber of differet iput patters discovers sigificat features i these patters ad lears how to classify iput data ito appropriate categories. Usupervised learig teds to follow the euro-biological orgaisatio of the brai. Usupervised learig algorithms aim to lear rapidly ad ca be used i real-time. Hebbia learig I 99 Doald Hebb proposed oe of the key ideas i biological learig commoly kow as Hebb s Law. Hebb s Law states that if euro i is ear eough to excite euro j ad repeatedly participates i its activatio the syaptic coectio betwee these two euros is stregtheed ad euro j becomes more sesitive to stimuli from euro i. Hebb s Postulate Whe a axo of cell A is ear eough to excite a cell B ad repeatedly or persistetly takes part i firig it some growth process or metabolic chage takes place i oe or both cells such that A s efficiecy as oe of the cells firig B is icreased. D. O. Hebb 99 Dedrites B Syapse Cell Body Axo A Iputs R Liear Associator p R x W Liear Layer S x R a S a S x S x a = pureli (Wp) = Wp raiig Set: R a = i w p j j = { p t } { p t } { p Q t } Q Hebb Rule w ew = Simplified Form: Supervised Form: Matrix Form: wold + α f i ( a iq )g j ( p jq ) Postsyaptic Sigal w ew = w old+ αa iq p jq w ew = w old + t iq p jq W ew = W old + t q p q Presyaptic Sigal

Matrix Form: Batch Operatio Q W t p t p = + + + t Q p Q = t q p q q = p W = p t t t Q = P p Q P = p p p Q = t t t Q (Zero Iitial Weights) Hebb s Law ca be represeted i the form of two rules:. If two euros o either side of a coectio are activated sychroously the the weight of that coectio is icreased.. If two euros o either side of a coectio are activated asychroously the the weight of that coectio is decreased (added later) Hebb s Law provides the basis for learig without a teacher. Learig here is a local pheomeo occurrig without feedback from the eviromet. I p u t S i g a l s Hebbia learig i a eural etwork i j O u t p u t S i g a l s Hebbia learig implies that weights ca oly icrease. o resolve this problem we might impose a limit o the growth of syaptic weights. It ca be doe by itroducig a o-liear forgettig factor ito Hebb s Law: w = α y j xi ϕ y j w where ϕ is the forgettig factor. Forgettig factor usually falls i the iterval betwee ad typically betwee. ad. to allow oly a little forgettig while limitig the weight growth. Simple Associative Network Iputs p w Hard Limit Neuro a Σ b = -.5 a = hardlim (wp + b) a = hardlim( wp + b) = hardlim( wp.5 ) stimulus respose p = a = o stimulus o respose Shape Fruit Network Baaa? Baaa Associator Smell Sight of baaa p Smell of baaa p Ucoditioed Stimulus Iputs Hard Limit Neuro w = Σ w = p shape detected = p = shape ot detected b = -.5 a = hardlim (w p + w p + b) Coditioed Stimulus smell detected smell ot detected a Baaa?

Usupervised Hebb Rule w q ( ) = w ( q ) + αa i ( q)p j ( q) Vector Form: W( q) = W ( q ) + αa( q )p ( q) raiig Sequece: p()p () p () Q Baaa Recogitio Example Iitial Weights: w = w( ) = raiig Sequece: { p ( ) = p( ) = } { p ( ) = p( ) = } α = wq () = wq ( ) + aq ( )pq ( ) First Iteratio (sight fails): a( ) = hardlim( w p ( ) + w( )p( ).5) = hardlim( +.5) = (o respose) w( ) = w ( ) + a( )p( ) = + = Example Secod Iteratio (sight works): a( ) = hardlim( w p ( ) + w( )p( ).5) = hardlim( +.5) = (baaa) w( ) = w ( ) + a( )p( ) = + = hird Iteratio (sight fails): a( ) = hardlim( w p ( ) + w ( )p( ).5) = hardlim( +.5) = (baaa) w( ) = w( ) + a( )p( ) = + = Baaa will ow be detected if either sesor works. Problems with Hebb Rule Weights ca become arbitrarily large here is o mechaism for weights to decrease Hebb Rule with Decay W ( q) = W ( q ) + αa( q)p( q) γw( q ) W ( q ) = ( γ )W ( q ) + αa( q)p( q) his keeps the weight matrix from growig without boud which ca be demostrated by settig both a i ad p j to : wmax = ( γ )wmax + α a p i j wmax = ( γ )wmax + α wmax = α -- γ Example: Baaa Associator α = γ =. First Iteratio (sight fails): a( ) = hardlim( w p ( ) + w( )p( ).5) = hardlim( +.5) = (o respose) w ( ) = w ( ) + a( )p( ). w( ) = +. ( ) = Secod Iteratio (sight works): a( ) = hardlim( w p ( ) + w( )p( ).5) = hardlim( +.5) = (baaa) w() = w( ) + a( )p( ).w( ) = +.( ) =

Example hird Iteratio (sight fails): a( ) = hardlim( w p ( ) + w ( )p( ).5) = hardlim( +.5) = (baaa) w( ) = w( ) + a( )p( ).w( ) = +.( ) =.9 Hebb Rule 8 6 max α w = -- = ---- = γ. Hebb with Decay Problem of Hebb with Decay Associatios will decay away if stimuli are ot occasioally preseted. If a i = the w ( q) = ( γ )w ( q ) If γ = this becomes w ( q) = (.9)w ( q ) herefore the weight decays by % at each iteratio where there is o stimulus. Usig Hebb s Law we ca express the adjustmet applied to the weight w at iteratio p i the followig form: w ( p ) = F [ y j ( p ) x i ( p )] As a special case we ca represet Hebb s Law as follows: w = α y x j i where α is the learig rate parameter. his equatio is referred to as the activity product rule. Hebbia learig algorithm Step : Iitialisatio. Set iitial syaptic weights ad thresholds to small radom values say i a iterval [ ]. Step : Activatio. Compute the euro output at iteratio p y j = xi w θ j i= where is the umber of euro iputs ad θ j is the threshold value of euro j. Step : Learig. Update the weights i the etwork: w( p + ) = w( p) + w( p) where w (p)) is the weight correctio at iteratio p. he weight correctio is determied by the geeralised activity product rule: w = ϕ y j [ λ xi w ] Step : Iteratio. Icrease iteratio p by oe go back to Step. Hebbia learig examplee xample o illustrate Hebbia learig cosider a fully coected feedforward etwork with a sigle layer of five computatio euros. Each euro is represeted by a McCulloch ad Pitts model with the sig activatio fuctio. he etwork is traied o the followig set of iput vectors: X = X = X = X = X 5 =

Iitial ad fial states of the etwork x x x x x 5 5 Iput layer y y y y 5 y 5 Output layer x x x x x 5 5 Iput layer y y y y 5 y 5 Output layer I p u t l a y e r Iitial ad fial weight matrices O u t p u t l a y e r 5 5 I p u t l a y e r O u t p u t l a y e r 5....9996 5.. A test iput vector or probe is defied as X = Whe this probe is preseted to the etwork we obtai: Y = sig.9...66..97 =.9996.978...77 Variatios of Hebbia Learig Basic Rule: W ew W old = + t q p q Learig Rate: W ew W old = + α t q p q Smoothig: W ew W old + α t q p q γ W old ( γ )W old = = + αt q p q Delta Rule: W ew = W old + α( t q a q )p q Usupervised: W ew = W old + αa q p q