From Neural Networks To Reservoir Computing...

From Neural Networks To Reservoir Computing... An Introduction Naveen Kuppuswamy, PhD Candidate, A.I Lab, University of Zurich 1

Disclaimer : I am only an egg 2

Part I From Classical Computation to ANNs 3

Whis happening here? 4

What is happening here? Computation? 5

This might be feasible en... But Let's back up a bit... 6

What is Computation? 7

What is Computation? A Mapping of each element of an input set X to an output which is an element of e output set Y. 8

How are ese ings defined? Input is : Sequences of 0 and 1 : x {0,1}* Output is : y {0,1} Implementation? 9

How do you implement it? 10

Practical Implementation Distinction between following elements : Program (in memory) Data (input device), Output (output device) Computation in : processor (sequential) 11

Where is e tape here? 12

Where is e tape here? What makes it unique en? 13

The Brain Massively parallel computation Analog information processing Self-adaptation Implementation in wetware 14

The Brain Massively parallel computation Analog information processing Self-adaptation Implementation in wetware Whats in it en? 15

Computational Units : Neurons many different types of neurons 16

Computational Units : Neurons many different types of neurons How to explain e Big picture? 18 May, 2012 17

Abstractions of Neural Function McCulloch and Pitts Artificial neuron, 1943 Hebb's Learning Rule, 1949 Rosenblatt's Perceptrons and backpropogation learning, 1958 18

From biological neurons to abstract models 19

From biological neurons to abstract models some important abstractions: abstract artificial neurons: simple but very powerful Learning Rule 20

Feedforward Neural Networks Output Layer O1 O2 Middle (Hidden) Layer Input Layer I1 More General case can be obtained using Multiple layers Nonlinearities can be introduced in resholding Eg. Multi Layer Perceptrons + Backpropogation, Radial Basis Function Networks I2 21

Some Applications Output Layer O1 O2 Action Movie Storylines Middle (Hidden) Layer Input Layer I1 I2 22

Some Applications Output Layer O1 O2 Classification Pattern Recognition Forecasting and Series Prediction Control - Industrial/Robots Middle (Hidden) Layer Input Layer I1 I2 23

Classification Examples Handwriting Target Identification < I: Pixels, O: Digit Face Recognition I: Pixels, O: Frame Coords. Lipreading I: Pixels, O: Letters I: Pixels, O: Face Y/N Source: http://tralvex.com/pub/nap/ 24

Oer Examples Obstacle Climbing Function Approximation Soccer (Strategy) Trajectory Prediction Source: http://tralvex.com/pub/nap/ 25

Feedforward Neural Networks : Features Output Layer O1 O2 Information moves in 1 direction Nonlinear Static Networks Middle (Hidden) Layer Input Layer I1 I2 26

Feedforward Neural Networks : Features Output Layer O1 O2 Nonlinear Static Networks Sweet! But What about biological plausibility? Middle (Hidden) Layer Input Layer Information moves in 1 direction I1 I2 27

How do biological Neurons differ? Spiking Neurons Neuro-modulators 28

Spike Timing Dependent Plasticity The input and output of biological neurons are voltage spikes. Traditionally, e input and output numbers of artificial neurons are interpreted as spike rates (number of spikes per time interval). Neuron becomes stronger if input spike occurs before output spike. Learning Rule Neuron becomes weaker if input spike occurs after output spike. 29

Why is 'Timing' important in e game? 30

Temporal aspects in many problems 31

Dynamic problem example Song / Speech Recognition : Works for songs and short (fixed) duration speech! 32

Dynamic problem example Song / Speech Recognition : Works for songs and short (fixed) duration speech! How can is be implmented on a Neural Network? 33

FF Networks wi time window inputs Practically possible, but is it general? 34

FF Networks wi time window inputs 35

Part II From RNNs to Reservoir Computing 37

Offline vs. Online Computation Static Function Dynamical System 38

From Feedforward to Recurrent Output Layer Middle (Hidden) Layer Input Layer 39

Recurrent Neural Networks RNN: O = ft (I), O(t+1)= g(i, O(t)) Output Layer Feedback loop Middle (Hidden) Layer Input Layer 40

Recurrent Neural Network (RNN) :a comparison FFN: O = f (I) Output Layer RNN: O = ft (I), O(t+1)= g(i, O(t)) Output Layer Feedback loop Middle (Hidden) Layer Input Layer Middle (Hidden) Layer Input Layer RNN is a dynamical system! 41

Recurrent Neural Networks : Topology? RNN: O = ft (I), O(t+1)= g(i, O(t)) 42

RNN : Features Input Layer Output Layer Universal approximator of dynamical systems Nearly biological modules exhibit recurrent paways Different topologies : feedback, symmetric or fully connected But what about learning of RNNs? 43

RNN : Learning Meods Gradient Descent Meods Real Time Recurrent Learning (RTRL) Back Propogation Through Time (BPTT) Atiya Parlos Recurrent Learning (APRL) Global Optimization Meods Genetic Algorims Simulated Annealing 44

Back Propogation Through Time Prepare Ordered Pairs of Training data : <a0,y0>, <a1,y1>...<an-1,yn-1> Unfold a Neural Network during e Training Phase... 45

Back Propogation Through Time? Prepare Ordered Pairs of Training data : <a0,y0>, <a1,y1>...<an-1,yn-1> Unfold a Neural Network during e Training Phase... Apply regular Back Propogation! 46

RNN : Learning Problems The gradual change of network parameters might reach points where e gradient information degenerates,ill-deﬁned convergence cannot be guaranteed. A single parameter update is expensive, many update cycles may be necessary. Long training times RNN training feasible only for relatively small networks (in e order of tens of units). Intrinsically hard to learn dependences requiring long-range memory - gradient information exponentially dissolves over time Most meods require a lot of experience : Almost an Art! 47

A curious Insight into RNN Learning If a random recurrent neural network (RNN) possesses certain algebraic properties, training only a linear readout from it is often sufficient to achieve excellent performance in practical applications Jaeger, 2001 48

The Reservoir Approach Use a large, random and fixed RNN- called a reservoir in is context -, inducing in each unit in e RNN a nonlinear transform of e input; Output signals read out from excited RNN by some readout mechanism : typically a simple linear combination of e reservoir signals; Outputs can be trained in a supervised way : typically by linear regression of e teacher output on e tapped reservoir signals. Jaeger, 2007 Reservoir Input Linear readout Output 49

Contrast wi Traditional RNN training Traditional RNN Reservoir Computing Lukosevicius and Jaeger, 2010 51

Main Flavours Echo State Networks BackPropagation-DeCorrelation (BPDC) Liquid State Machines Temporal Recurrent Networks 52

Main Flavours Engineering (Application) Oriented Echo State Networks BackPropagation-DeCorrelation Temporal Recurrent Networks Liquid State Machines Biological Modeling 53

Echo State Networks (ESN) Observation : if a random RNN possesses certain algebraic properties, training only a linear readout from it is often suﬃcient. Jaeger, 2001 The untrained RNN part of an ESN is called a dynamical reservoir, and e resulting states x(n) are termed echoes of its input history. Proposed for machine learning and nonlinear signal processing 54

ESN : Characteristics Uses a large number of internal neurons Weight matrices randomly initialised Neurons typically use nonlinearity of form: Uses a weighted linear readout of form : Outputs trained as a linear regression 55

ESN : Parameters Connection Weights can be chosen Randomly *But* Reservoir has to have a fading memory The connection weights should ensure at e network functions at e edge of chaos 56

ESN : Parameters Time Scale : Sampling and Leak Rate Size of e Reservoir 57

ESN : Best Learning Results 3 Fold Cross validation 58

BackPropogation DeCorrelation (BPDC) An interesting insight into e Atiyas Parlos Recurrent Learning (ATPRL) technique : A functional decomposition of e trained networks into a fast adapting readout layer and a slowly changing dynamic reservoir Steil, 2004 ATPRL : Basically differentiates error function wrt states instead of wrt weights in BP (Virtual Teacher Forcing) Error computed wrt state Weight update drives network towards... Name refers to fact at x-δx is never fed back..hence a virtual force 59

DeCorrelation of States and input? Wait. What? 60

The BPDC principle explained.. Steil, 2004 To train e network, treat inner neurons as a dynamic reservoir providing dynamical memory. The Information Processing capacity is maximal IF e states are maximally decorrelated wi e input. A compromise of correlation wi conventional error backpropogation allows derivation of e learning rule. 61

Biological Models : Liquid State Machines Developed from a computational neuroscience perspective aiming at explaining principal computational properties of neural microcircuits. The reservoir is often referred to as e liquid, following an intuitive metaphor of e excited states as ripples on e surface of a pool of water. 63

Liquid State Machines Sophisticated, biologically realistic models of spiking integrate-and-fire neurons and dynamic synaptic connection models in reservoir. Neuron connectivity often follows topological and metric constraints. Maass et al, 2002 bio-motivated Inputs : Spike trains, Readouts : Originally used MLFFNN (of eier spiking or sigmoid neurons) 64

Temporal Recurrent Networks Based on research into corticostriatal circuits in e human brain by Dominey. Focuss on empirical cognitive neuroscience and functional neuroanatomy Dominey et al, 1995,2003, 2006... ere is no learning in e recurrent connections, only between e State units and e Output units. adaptation is based on a simple associative learning mechanism..." "... It is wor noting at e simulated recurrent prefrontal network relies on fixed randomized recurrent connections,..." 67

Temporal Recurrent Networks Based on research into corticostriatal circuits in e human brain by Dominey. Focuss on empirical cognitive neuroscience and functional neuroanatomy Dominey et al, 1995,2003, 2006... ere is no learning in e recurrent connections, only between e State units and e Output units. adaptation is based on a simple associative learning mechanism..." We finally might see e tape ;) "... It is wor noting at e simulated recurrent prefrontal network relies on fixed randomized recurrent connections,..." 68

Part III Applications, Case Studies 70

Some Applications Nonlinear Time series prediction System modeling Financial data modelling Signal Generation and prediction Classification Speech / audio Epileptic Seizure detection Robotics and Control 71

Nonlinear System Modeling System Modeling, AMARSi Workshop 2011 http://reslab.elis.ugent.be/seminars/amarsiworkshop-reservoir-computing 72

Nonlinear time series prediction Nonlinear Auto Regressive Moving Average Impressed? 73

Nonlinear time series prediction Nonlinear Auto Regressive Moving Average Spock is Impressed. Spock is not impressed. 74

Financial Series Prediction 75

Financial Series Prediction Spock is not impressed. Wyffels and Schrauwen, 2010 76

Classification Speech Classfication, Genre Classification of Songs : AMARSi Workshop 2011, http://reslab.elis.ugent.be/seminars/amarsiworkshop-reservoir-computing Speech Classification Music Genre Classification 77

Epileptic Seizure Detection Buteneers et al, 2010 78

Robotics and Control ReservoirDog : Testing Pattern Generation based locomotion on a Quadruped Wyffels et al, 2010 79

Physical body as a reservoir ESN, LSN: artificial neurons high-dimensional dynamical system A physical body = a high-dimensional dynamical system A physical body might be able to work as a reservoir? Input stream u (t ) Recurrently connected nonlinear mass-spring system Mass points Fix point A linear, static readout States l1 (t ) w1 l N (t ) y (t ) wn Physical body x = v 18 May, 2012 mass-spring system F = kx k x 3 dv d v 3 + Fx x : difference between e actual leng and e resting one 80

Theoretical Foundation for Morphological Computation? Readout 1 u (t ) Readout 2 Physical body Readout 3 y1 (t ) y2 (t ) y3 (t ) (Linear regression on raw input data: No physical body) Readout 1 w1u (t ) + wb,1 u (t ) Readout 2 w2u (t ) + wb, 2 Readout 3 w3u (t ) + wb,3 u y1 y2 y1 (t ) y2 (t ) y3 y3 (t ) The physical body contributes much to e computation! 81

Influence of biologically-inspired constraints u Readout 1 u (t ) y1 Readout 2 Physical body Constraints: - Rigid part - Joint - Arrangement of springs u (t ) Rod Springs Readout 1 Ball joint Readout 2 y2 y1 (t ) y2 (t ) Musculoskeletal model The biologically-inspired model can have computational capability 18 May, 2012 82

Future application Quadrupedal robot wi a multi-joint spine Spine movement itself works as a controller u (t ) Readout 1 Readout 2 Spine movement A y1 (t ) y2 (t ) Walking Spine movement B Bounding Spine movement C Trotting For furer questions, please contact sumioka@ifi.uzh.ch 83

Literal Liquid State Machines Pattern Recognition in a Bucket Fernando and Sojakka, Objective: Robust spatiotemporal pattern recognition in a2003 noisy environment 20+20 samples ( zero and one ), 1.5-2 seconds in leng Short-Time Fourier transform on active frequency range (1-3000Hz) to create a 8x8 matrix of inputs from each sample (8 motors, 8 time slices) Each sample to drive motors for 4 seconds, one after e oer Vision Processing : edge detection and to produce 700 outputs. 50 perceptrons in parallel trained using e p-delta rule 84

States of e Liquid Brain Zero One 85

Key References Lukoševičius, M., and H. Jaeger, "Reservoir computing approaches to recurrent neural network training", Computer Science Review, vol. 3, no. 3, pp. 127-149, August, 2009. Jaeger, H., "Echo state network", Scholarpedia, vol. 2, no. 9, pp. 2330, 2007. Jaeger, H., and H. Haas, "Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication", Science, vol. 304, no. 5667, pp. 78 80, April 2, 2004. Maass, W., T. Natschlaeger, and H. Markram, "Real-time Computing wiout stable states: A New Framework for Neural Computation Based on Perturbations", Neural Computation, vol. 14, no. 11, pp. 2531 2560, 2002. Schiller, U. D., and J. J. Steil, "Analyzing e weight dynamics of recurrent learning algorims", Neurocomputing, vol. 63C, pp. 5-23, 2005. Steil, J. J., "Backpropagation-Decorrelation: online recurrent learning wi O(N) complexity," IJCNN, 2004. 86

Additional Resources Website wi information and resources : http://reservoir-computing.org/ EU FP7 funded collaborative projects AMARSi (Adaptive Modular Architecture for Rich Motor Skills) PHOCUS (toward a PHOtonic liquid state machine based on delay-coupled Systems) Mission: design and implement a photonics realization of a liquid state machine (LSM), wi e potential for versatile and fast signal handling. ORGANIC (Self-Organized Recurrent Neural Learning for Language Processing) Mission: exploit principles of neurodynamics and neurocontrol to endow complex and compliant robots wi rich sets of motor skills Mission: Establish neurodynamical architectures as viable alternative to statistical meods for speech and handwriting recognition. The OrGanic Environment for Reservoir computing (Oger) toolbox : Pyon toolbox for rapidly building, training and evaluating modular learning architectures on large datasets. http://reservoir-computing.org/organic/engine 87

Perhaps Someday? Questions, Comments, feedback? Thanks for all e FISH! A.I. Lab (Zurich): Matej Hoffmann, Dr. Hidenobu Sumioka, Dr. Kohei Nakajima, Dr. Helmut Hauser, Qian Zhao, Maias Weyland Reservoir Lab (Ghent) : Francis wyffels, Tim waegeman, Ken Caluwaerts 88