Several ways to solve the MSO problem

Similar documents
Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks

Reservoir Computing and Echo State Networks

Negatively Correlated Echo State Networks

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory and Pattern Matching with Simple Echo State Networks

Echo State Networks with Filter Neurons and a Delay&Sum Readout

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving

Refutation of Second Reviewer's Objections

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos

Reservoir Computing Methods for Prognostics and Health Management (PHM) Piero Baraldi Energy Department Politecnico di Milano Italy

Advanced Methods for Recurrent Neural Networks Design

Analyzing the weight dynamics of recurrent learning algorithms

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

arxiv: v1 [cs.lg] 2 Feb 2018

Explorations in Echo State Networks

Good vibrations: the issue of optimizing dynamical reservoirs

Lecture 4: Feed Forward Neural Networks

Initialization and Self-Organized Optimization of Recurrent Neural Network Connectivity

Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps

Lecture 5: Recurrent Neural Networks

An Introductory Course in Computational Neuroscience

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Data Mining Part 5. Prediction

The Intrinsic Recurrent Support Vector Machine

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Vertex Routing Models and Polyhomeostatic Optimization. Claudius Gros. Institute for Theoretical Physics Goethe University Frankfurt, Germany

The Spike Response Model: A Framework to Predict Neuronal Spike Trains

Christian Mohr

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Day 3 Lecture 3. Optimizing deep networks

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Learning Gaussian Process Models from Uncertain Data

ECE521 Lectures 9 Fully Connected Neural Networks

arxiv: v1 [astro-ph.im] 5 May 2015

Using reservoir computing in a decomposition approach for time series prediction.

Artificial Neural Networks Examination, June 2005

Linear Models for Regression. Sargur Srihari

A First Attempt of Reservoir Pruning for Classification Problems

Lecture 11 Recurrent Neural Networks I

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Gaussian process for nonstationary time series prediction

Artificial Neural Network

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Neural Networks biological neuron artificial neuron 1

EEG- Signal Processing

An overview of reservoir computing: theory, applications and implementations

Reservoir Computing with Stochastic Bitstream Neurons

Consider the following spike trains from two different neurons N1 and N2:

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

Complex systems maximizing information entropy - From neural activity to global human data production

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fisher Memory of Linear Wigner Echo State Networks

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

Fast classification using sparsely active spiking networks. Hesham Mostafa Institute of neural computation, UCSD

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

Dual Estimation and the Unscented Transformation

POWER SYSTEM DYNAMIC SECURITY ASSESSMENT CLASSICAL TO MODERN APPROACH

CHAPTER 3. Pattern Association. Neural Networks

Neural networks. Chapter 20, Section 5 1

Recurrent Latent Variable Networks for Session-Based Recommendation

Neural Nets and Symbolic Reasoning Hopfield Networks

A Practical Guide to Applying Echo State Networks

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Liquid Computing in a Simplified Model of Cortical Layer IV: Learning to Balance a Ball

How to do backpropagation in a brain

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

NON-NEGATIVE SPARSE CODING

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Neural networks. Chapter 20. Chapter 20 1

Error Entropy Criterion in Echo State Network Training

International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training

Multilayer Perceptrons (MLPs)

Introduction to Neural Networks

Online Identification And Control of A PV-Supplied DC Motor Using Universal Learning Networks

Guided Self-Organization of Input-Driven Recurrent Neural Networks

Artificial Neural Networks. Edward Gatt

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Bearing fault diagnosis based on EMD-KPCA and ELM

Cascade Neural Networks with Node-Decoupled Extended Kalman Filtering

Basic Principles of Unsupervised and Unsupervised

A Model for Real-Time Computation in Generic Neural Microcircuits

Deep Learning Architecture for Univariate Time Series Forecasting

Least Squares SVM Regression

Nonlinear reverse-correlation with synthesized naturalistic noise

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Analysis of an Attractor Neural Network s Response to Conflicting External Inputs

Lecture 11 Recurrent Neural Networks I

Learning Tetris. 1 Tetris. February 3, 2009

Learning in State-Space Reinforcement Learning CIS 32

Transcription:

Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition of two or more sinusoidal waves, has recently been discussed as a benchmark problem for reservoir computing and was shown to be not learnable by standard echo state regression. However, we show that are at least three simple ways to learn the MSO signal by introducing a time window on the input, by changing the network time step to match the sampling rate of the signal, and by reservoir adaptation. The latter approach is based on an universal principle to implement a sparsity constraint on the network activity patterns which improves spatio-temporal encoding in the network. Introduction Reservoir computing is a new approach to signal processing with recurrent networks by implementing a kernel-like idea for time dependent signals. In its original form introduced by Jaeger [] for standard sigmoid neurons, the idea is to drive a fixed recurrent network by input, regard the high dimensional network state as time dependent feature vector coding for the driving input, and then analytically compute a linear output function by linear regression, see Fig.. Though it has been shown that difficult benchmark tasks in time series prediction can be solved with this approach, the power of this method obviously depends on the properties of the fixed recurrent network, the reservoir. As of today, there are no systematic methods to quantify the information processing capacities of nonlinear recurrent networks. Therefore approaches and networks are often compared by applying them to certain benchmark tasks. In this context, it has been argued that the so called MSO problem is difficult to solve with the standard ESN approach. The MSO problem is the task to learn a simple superposition of sin waves y(k) = sin(0.2k) + sin(0.3k). Recently in [2], it was shown that an evolutionary evolved reservoir network can recursively predict this combination of sinusoidal signals. However, evolutionary optimization in itself is a complex technique, which dispenses with one of the main ingredients of the reservoir approach its simplicity. A further approach has successfully solved the problem by utilizing decoupled multiple reservoirs connected by weak inhibition [3]. This approach externalizes the problem to recognize the different base frequencies and leaves the question unresolved, which kind of signals can be learned within a single reservoir. Both approaches discussed measure the performance for the period immediately following the start of recursive mode. However, for standard ESN regression, the experience shows that many networks diverge very quickly, which is possible because after recursively connecting there is an unbounded linear subloop present in the network. 489

0. recursive MSO prediction 200 steps 600 steps 000 steps 0.0 0.00 e-04 e-05 0. 0.2 0.3 0.4 0.5 0.6 time step Figure : a) The echo state network with nonlinear reservoir and readout function < w, y >. b) (log-scale) for recursive prediction of 000 time steps vs. network time step k. Even worse (H. Jaeger, personal communication), the networks performing well initially in most cases diverge in the long term. We show a solution to this problem in Section 6. 2 Recurrent reservoir dynamics In the following, we consider the recurrent reservoir dynamics x(k+ k) = x(k) + k [ x(k) + W res f(x(k)) + W u u(k)], () where x i, i =,..., N are the neural activations, W res R N N is the reservoir weight matrix, W u R N R the input weight matrix, and k is the discretized time with time step k. Let u(k) = (u (k),..., u R (k)) T the R-dimensional vector of inputs. Throughout the paper we assume that y = f(x) is the vector of neuron activities obtained by applying parameterized Fermi functions component wise to the vector x as ( y i = f i (x i, a i, b i ) = / + exp ( aixi bi)). (2) In the following sections, we follow the standard procedure to train an echo state network. All networks are generated with 0 internal recurrent connections per node on average, i.e. 0% connectivity for a 00 neuron network, and 50% input connectivity in W u. All weights are initialized with uniform distribution from the interval [ 0.02, +0.02]. Networks subject to standard ESN regression are rescaled to have 0.8 spectral radius, networks using reservoir adaptation are not rescaled, i.e. have initially very small weight matrix eigenvalues. We first iterate the network driven by input for 000 steps, then record the network states y(k) for the next 000 steps together with the target output d(k), which in this case is a one-step-ahead prediction of the MSO input signal. Then we perform a linear regression for the output weights w out such that the network output is ˆd(k) =< 490

0. recursive MSO prediction, 00 nodes, 0% connectivity recursive MSO prediction, 00 nodes, 0 epochs intrinsic plasticity 0. 0.0 0.0 0.00 e-04 0.00 e-04 his 5 (8/50) e-05 his 9 (28/50) his 3 (29/50) his 4 (40/50) his 5 (28/50) e-06 0 00 200 300 400 500 600 700 800 900 000 prediction his 5 (36/50) e-05 his 9 (30/50) his 3 (3/50) his 4 (30/50) his 5 (43/50) e-06 0 00 200 300 400 500 600 700 800 900 000 prediction Figure 2: Mean for 50 runs of recursive prediction. Errors are computed every 20 steps of recursive prediction after ESN regression for 00 neuron reservoir network. Left: direct regression, right: regression after 0 epochs (= 00000 steps) of intrinsic plasticity training. Number of converged runs in brackets. w out, y(k) >. Finally, we reconnect the output to the input u (k + ) ˆd(k) to let the network run recursively on the predicted( target. Performance is measured as for T test time steps by computing /T ) T k= e(k)2 /var(d), where e(k) = ˆd(k) d(k) is the error with respect to the target signal d(k) and var(d) is the variance of the target. 3 The MSO-problem and the time step For continuous signals with discrete sampling, it is often crucial to adjust the network time-step k to the sampling rate of the problem. Fig. shows that for the MSO problem time steps between 0.05 and 0.5 allow the network to succeed in recursive prediction very easily. Thus in principle the MSO problem can be solved straightforward and in a very simple manner within the standard ESN framework and there is no need for advanced techniques. However, setting the time step equal to one makes the problem harder and it remains interesting to investigate whether the network could internally develop a dynamics providing the necessary coding even under these circumstances. 4 The MSO-problem and the time history window In the next experiment, we supply an increasingly long time history window as input to the network, i.e. u (k) = d(k), u 2 (k) = d(k ),..., u R (k) = d(k R + ). When recursively connecting after training, the predicted d(k + ) is feed back to the first input u r and the rest of the history is shifted by one time step. Thereby the network runs fully autonomous in recursive mode after R time steps. In confirmation of the earlier results of Jaeger, for small history windows R < 5 the networks almost always (> 95%) diverge in recursive mode and if they 49

% of times firing for single neuron 3 % of times firing for single neuron 0 0 5 0 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 % of spatio temporal network firing 4 3 2 0 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 neuron output 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 % of times firing for single neuron 4 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.9 Figure 3: Output distribution of randomly selected single neurons after the IP learning and spatio-temporal average of all neurons in the network histogramed over one epoch of 4000 steps. do not diverge, in most cases oscillate without implementing the MSO dynamics. But for history length R larger than five the network sometimes diverges, but if it does not, it is able to learn the MSO task. Fig. 2 shows mean for up to 000 steps of recursive prediction for different length of the history window. We conducted 50 runs for every history length and give the number of runs not diverging within the prediction period together with the mean prediction. 5 The MSO-problem and intrinsic plasticity Our third approach to solve the MSO problem is based on adapting the reservoir to the input signal to first optimize the temporal coding in the network before computing the readout function. For adaptation we use a learning rule developed in [4], which has been introduced to reservoir computing in [5]. It is motivated from the capability of biological neurons to change their excitability according to the stimulus distribution the neuron is exposed to. It has been argued that the underlying principle may be that exponential output distributions maximize information transmission under the constraints imposed by maintaining a constant mean firing rate. In [4], an online adaption rule to achieve this goal of maximizing information transmission has been developed. It adjusts the parameters a, b of the Fermi function in order to minimize the Kullback-Leibler divergence of the actual output distribution of the neuron with respect to an exponential distribution with desired mean activity level µ. It adapts parameters a, b for the neuron with learning rate ζ in time step k as ([4]): ( ( ) b(k) = ζ y(k) + µ ) y(k)2, (3) a(k) = ζ 2 + µ ( a + x(k) ( 2 + µ ) x(k)y(k) + µ x(k)y(k)2 ). (4) Fig. 3 illustrates the effect of IP learning by displaying a randomly chosen 492

0.07 0.06 0.05 0.04 0.03 0.02 0.0 recursive MSO prediction, 300 nodes 50 epochs IP 200 epochs IP 250 epochs IP 300 epochs IP 0 0 00 200 300 400 500 600 700 800 900000 recursive prediction timesteps #res #stable #MSO 50 6/50 0/50 00 4/50 0/50 50 50/50 44/50 200 50/50 43/50 250 50/50 42/50 300 50/50 46/50 Figure 4: Left: Recursive prediction errors computed every 20 steps after preadapting 300 neuron reservoir networks with intrinsic plasticity. Means and variances are given over 20 network initializations, no short term divergence occurs. Right: Long term stability and convergence to the desired MSO attractor. The table displays #res = number of nodes in the reservoir, #stable = number of networks converging to an attractor, #MSO = number of networks converging to the MSO attractor according to the measure given in the text. neuron s output distribution after application of the IP rule to a network of 200 neurons driven by the MSO signal. The network is iterated in epochs of 4300 steps while the IP learning rule is applied in each step with target average activity µ = 0.2 and learning rate ζ = 0.00. We use identical inputs from the MSO time series and zero initial states for each epoch. In epoch 500 we record the network outputs y(k) after a relaxation phase of 300 steps, and plot the histogram of activities of a randomly chosen neuron in Fig. 3 in 00 bins of width 0.0. It is clearly visible that the neuron qualitatively approximates an exponential output characteristic as well as the overall network output firing pattern shown in Fig. 3, bottom. This activation pattern implies a sparse coding, because most of the neuron have small output for most of the time. We now persue the idea to pre-optimize the reservoir with IP adaptation and to apply a standard echo state regression to the adapted reservoir. Fig. 2 shows that application of IP pre-training can improve the performance on an short time-scale of only 0000 training points (0 epochs of 000 points). However, still a considerable number of networks approximate the target network not well enough and diverge in recursive mode. On the other hand this is not surprising, because the statistical IP adaptation can exhibit its full power only on a longer time scale. Fig. 4 shows the for up to 000 steps of recursive prediction for different lengths of long term IP training. The effect of sparse coding here is that the network even with a single input can reliably learn the MSO task, the better, the longer we pre-train it with the IP rule. Further, there occur no instabilities in recursive mode. 493

6 Long term stability All our experiments confirm that the recursively connected networks tend to diverge when running for long times (we tested up to 0 7 steps). The reason seems to be that initially the network trajectory stays close to the MSO oscillator, while later slight drift inside the system tends to amplify. However, a combination of intrinsic plasticity, time step 0.3, and the addition of noise in the frequency domain helps to solve this problem, i.e. we train with sin(r 0.2k) + sin(r 2 0.2k), where r,2 are uniformly drawn from ±0 7. Then, we pre-train with IP for.2 0 6 steps and apply the regression. Every 892 time steps we compute the power spectrum of the discrete Fourier transform and compute the angle to the spectrum of the MSO signal as similarity measure. We regard the network as converged, if the variation in this measure is below a small threshold for 0 consecutive times ( 82000 steps) and consider the network as implementing the MSO attractor if the angle is below 0.0rad in the high dimensional space of the frequency spectra vectors. Fig.3 shows that with this training networks converge and most of the time even to the desired attractor. 7 Conclusion While measuring the encoding quality for a given input signal remains an open and important problem, we believe that the MSO problem is not the best benchmark to show what is difficult for echo state regression. The three approaches we showed to solve the MSO problem complement the more complex methods to tackle this problem introduced earlier. We conclude that the MSO problem is too simple to serve as hard benchmark. On the other hand, the present investigation shows that there exists a rich variety of modifications to the standard ESN network approach, which can significantly change the performance. The most instructive result from the present work is that intrinsic plasticity learning is capable to change the reservoir dynamics based on a very general and unspecific statistical unsupervised learning principle, which is used to optimize coding in the reservoir in a signal specific way. It can also be used to stabilize the long term dynamics to converge to the desired attractor. References [] H. Jaeger. Adaptive nonlinear system identification with echo state networks. In NIPS, pages 593 600, 2002. [2] J. Schmidhuber D. Wierstra, F. Gomez. Modeling systems with internal state using evolino. In Proc. GECCO, pages 795 802, 2005. [3] Xue and S. Haykin. Decoupled ESNs with lateral inhibition. NIPS Workshop on ESN, 2006. To appear in Neural Networks special issue on ESN and Liquid states. [4] J. Triesch. A gradient rule for the plasticity of a neuron s intrinsic excitability. In Proc. ICANN, number 3695 in LNCS, pages 65 79, 2005. [5] J. J. Steil. Online reservoir adaptation by intrinsic plasticity for backpropagationdecorrelation and echo state learning. Neural Networks, 2007. 494