Using reservoir computing in a decomposition approach for time series prediction.

Size: px
Start display at page:

Download "Using reservoir computing in a decomposition approach for time series prediction."

Transcription

1 Using reservoir computing in a decomposition approach for time series prediction. Francis wyffels, Benjamin Schrauwen and Dirk Stroobandt Ghent University - Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, 9 Gent - Belgium Abstract. In this paper we combine wavelet decomposition and recurrent neural networks to provide fast and accurate time series predictions. The original time series is decomposed by means of wavelet decomposition into a hierarchy of time series which are easier to predict. The prediction core of our solution is given by reservoir computing, which is a recently developed technique for the very fast training of recurrent neural networks. The three time series of the ESTSP 28 competition will be used as an illustration for our method. 1 Introduction Forecasting is a domain with a broad range of useful applications. Therefore, researchers working on time series prediction come from a wide variety of fields and are using many methods such as theta method [1], support vector machines, neural networks, local modeling [2], wavelet-decomposition [3] and many more. This year, for the second time, the European Symposium on Time Series Prediction is held. This symposium always presents a challenging competition in the domain of time series prediction. This year the competition concerns the prediction of three different time series which can be found on the website Each time series has different properties which is interesting because this will reveal the strength and weaknesses of the many methods that are applied on the time series during the contest. Although we have no profound experience in the domain of time series prediction, we wanted to join this competition in order to compare our method with many others in the forecasting domain. In this paper we describe how a combined approach of wavelet decomposition and reservoir computing can be used for forecasting. Before we continue the full explanation of our method we give a short overview of reservoir computing which will be the baseline of our method. This research is partially funded by FWO Flanders project G and the Photonics@be Interuniversity Attraction Poles program (IAP 6/1), initiated by the Belgian State, Prime Minister s Services, Science Policy Office.

2 2 Reservoir computing: a short overview Reservoir computing is a novel technique for the fast training of large recurrent neural networks which have been successfully applied in a broad range of temporal tasks such as robotics [4], speech recognition [5, 6] and time series generation [7]. Last year, reservoir computing outperformed all other methods in the NN3 competition for financial time series prediction [8]. The reservoir computing technique is based on the use of a large untrained dynamical system, the reservoir, where the desired function is implemented by a linear memory-less mapping from the full instantaneous state of the dynamic system to the desired output. Only this linear mapping is learned. When the dynamical system is a recurrent neural network of analog neurons the method is referred to as echo state networks [7]. When spiking neurons are used, one often speaks of liquid state machines [9]. But both are now commonly referred to as reservoir computing [1]. Training is done in a supervised way by first driving the reservoir with teacher forced inputs and/or teacher forced output feedback. Secondly, the output layer is trained by using linear regression methods. This is summarized by the following equations: ( ) x[k + 1] = f Wres res x[k] + Winpu[k] res + Wouty[k] res + Wbias res ŷ[k + 1] = W out res x[k + 1] + W out inp u[k] + W out bias, (1) where x[k] is the reservoir s state, u[k] is the input, y[k] is the desired output and ŷ[k] is the actual output. When analog neurons are used, the nonlinearity f often represents the sigmoid function. All the weight matrices denoted by W out are trained, while those denoted by W res are fixed and randomly created. During testing, the teacher forced output feedback y[k] is replaced by the actual output ŷ[k] which we call free run mode. Because only the output weights are changed, training can be realized very quickly which can be an additional benefit in comparison with other methods. Additionally, reservoir computing doesn t suffer from local optima like other methods based on neural networks do. We conclude that reservoir computing gives us a powerful tool that can be easily used in a broad range of applications. 3 Time series prediction Now we have introduced the baseline of our prediction mechanism we can formalize our overall prediction scheme which is illustrated in Fig. 1. We present now each of the modules in greater detail.

3 Normalize Decompose Discard d1 Rescale Recombine Predict (RC) Fig. 1: This is a schematic overview of our prediction methodology. We start by normalizing the given time series. Next, the normalized time series is decomposed by means of wavelet decomposition what results in a trend and a bunch of detail coefficients. Hereafter the level 1 detail coefficient is discarded. The obtained components are predicted separately using reservoir computing (RC). Finally, the predicted components are combined and rescaled in order to get a prediction of the given time series. 3.1 Normalizing the time series Because we want to work in both the linear and nonlinear part of our recurrent neural network we need to normalize the time series to the interval [ 1, 1]. Otherwise all neurons would be saturated and thus loosing information. Normalization is done by removing the mean and dividing the outcome by the maximal absolute value: 3.2 Decomposition x = x x x norm = x / max ( x ), (2) Time series can be very often decomposed into components with different dynamics: a trend, periodical effects (sometimes denoted as seasonal effects) and irregular residual components. In [3] wavelet decomposition was motivated because of the easy analysis of the obtained components. A second motivation to use decomposition of the original time series is inherent to the use of reservoir computing which tend to be sensitive to a small temporal range [11]. This can be a problem with time series which contain information on different timescales. When there is no additional information available about the time series, decomposition can be done by using a set of successive filters. This is also known as multiscale decomposition and described in more detail in [12]. The filters are obtained by rescaling the so called mother wavelet. When the filters are applied iteratively, one obtains a slow varying trend series and a hierarchy of detail components which contain the system s dynamics at different timescales [3]. Because

4 the system s dynamics are now splitted up into different timescales processing will be a lot easier with reservoir computing. The number of iterations L depends on the length N of the time series and is limited by L max = log 2 N. But less iterations can be considered by inspecting the derived detail components. After L iterations, time series y[k], k = 1...N, with length N can be written as the sum of the trend c L [k] and L detail coefficients d m [k], m = 1...L: y[k] = c L [k] + L d m [k], (3) For our experiments we used the MATLAB Wavelet toolbox for decomposition of the time series. The filters we used were obtained from the discrete Meyer filter because this gave components which have few discontinuities. But we suspect that a Daubechies filter of a sufficient high order is also feasible. In Fig. 3 the first time series of the ESTSP 28 competition is shown with its trend and detail coefficients using a level eight decomposition. The most noisy coefficient d 1 is not illustrated. By way of inspecting the derived trend and detail components we decided that level eight decomposition produced sufficient smooth components for the first and second time series of the ESTSP 28 competition. For the third time series we used level 12 decomposition because this gave smoother and more predictable coefficients. 3.3 Prediction The trend and detail components we obtain from decomposition are used to predict the original time series. We neglect the level 1 component d 1 because it is too noisy to predict. The remaining components are predicted separately by means of reservoir computing. This has as an important implication that correlations between different components are neglected. We plan to investigate the use of many timescales within a reservoir so that all components can be predicted at once instead of training them separately. This would boost calculation time and has the benefit of combining possible correlation between the components. For each component a fully connected reservoir with 5 sigmoid neurons, one output and only output feedback as an input is constructed. No other external inputs are used. An illustration of this topology can be seen in Fig. 2 The connection weights are scaled so that the largest eigenvalue is nearly 1 which makes the reservoir nearly unstable. The output feedback weights were scaled to.1. Depending on the desired output, classical neurons, leaky neurons [7] or band-pass neurons [11] are used. A rule of thumb here is that leaky neurons are used for the slowest components, band-pass neurons for the faster components and classical neurons for the fastest varying components. We determined the leak-rates and band-pass neurons manually based on the frequency spectrum of the components. Because we want to generate the future of a time series, the output is fed back to each of the neurons in the reservoir. During training the desired signal is used as feedback using teacher forcing as we previously explained. m=1

5 Reservoir with N nodes Output node Random but xed interconnections Trained interconnections Fig. 2: Schematic overview of the described reservoir topology. The reservoir has only output feedback as an input. Only the output weights, presented by dashed lines, are trained. All other connection weights are initialized randomly and scaled so that the largest eigenvalue is nearly 1. In order to avoid overfitting we use ridge regression to train the output which proved to have good regularization properties [13], even for generation tasks. For training and testing we divide each component in three parts: one for training (the largest part), and the two last parts (which have lengths equal to the desired prediction horizon) for validation and testing. The final results for both testing and the competition were obtained by first training the reservoir using teacher forcing with the largest part. The optimal regularization parameter is determined using the performance of the reservoir in predicting the validation part. Next, the reservoir is retrained by teacher forcing it with the first and the second part and using the obtained optimal regularization parameter in order to predict the third (known) testing part. This part is used for evaluation of our approach and these results are presented in the next section. Finally, we train the reservoir again using the complete component in order to predict the unknown samples which are needed for the competition. We repeat this process ten times for each of the components, each time using an other reservoir. The unknown samples were generated by the reservoir which had the best performance on the testing part.

6 3.4 Composition and rescaling Recombination of the components can be done by using equation 3. Afterward the composed time series is rescaled again to undo the normalization. 4 Experimental results The goal of the ESTSP 28 competition is to predict three different time series each for a different prediction horizon. The evaluation of time series y with length N and its prediction ŷ is done by calculating the NMSE: NMSE = N t=1 (y t ŷ t ) 2, (4) Lσ 2 y For the first time series we have to predict the next 18 samples based on a history of 354 samples. Two additional time series were given which could be helpful for prediction of the first time series. After first trying a few prediction setups were these additional time series were used as an input of our reservoir we decided to neglect these external variables. This because they gave no significant improvement. The final result with decomposition of the time series is presented in Fig. 3. A NMSE of.25 was obtained on the last known 18 samples. The complete training and prediction procedure takes nearly 2 minutes using an average desktop computer with two gigabyte of memory and a 2.4 GHz Intel based CPU. The second time series of the ESTSP 28 competition consists of 13 samples of which we have to predict the next 1 samples. This time series has a period of 7 samples which makes it convenient to think that it was sampled from a daily updated variable. This thought becomes more pronounced when we cut this time series into sets of 365 samples and look to the correlations between the different sets. Altough we wanted to use this analysis first as additional information into a different approach, we choose to reject it because our results were comparable to the result we have now. We wanted to have one consistent methodology for the three time series. The predictions are shown in Fig. 4. A NMSE of.14 was obtained which is the best of the three given time series. A total time of nearly 5 minutes was needed to complete the training and testing procedure. For the third time series we got samples of which we had to predict the next 2 samples. Completion of the prediction procedure took three hours which is due to the long sample history and the use of more decomposition levels (and thus needing more reservoirs for prediction of the components). The results are shown in Fig. 5. A NMSE of.42 was obtained which gives us the worst performance, this possibly due to the discrete jumps in the trend and the many noisy detail coefficients that we derived from decomposition.

7 3 Time series a8 d7 d5 d ! !.1.1!.1.1! d8 d6 d4 d2! !.1.1!.1.1! Fig. 3: In solid black lines the original time series 1 of the ESTSP 28 competition and its decomposition (using level eight wavelet decomposition) into its trend and detail coefficients are shown. The level 1 detail coefficient is not shown because it was too noisy to predict. The last 18 samples of the original time series and its components were predicted in order to evaluate our prediction methodology which is given in a dashed gray line. This resulted in a NMSE of.25. The 18 unknown samples which were predicted for the competition are shown in a solid gray line.

8 x Time series x 1 8 Subset of time series Fig. 4: At the top, the complete time series 2 of the ESTSP 28 competition is shown in a solid black line. Our method was evaluated on the last 1 samples which gave a NMSE of.14. At the bottom, these predictions are marked with a dashed gray line. The next 1 unknown samples for the competition are marked with a solid gray line.

9 3 25 Time series x Subset of time series x 1 4 Fig. 5: At the top, an impression of the complete time series 3 of the ESTSP 28 competition is given. The last 2 samples were used for evaluating our technique which resulted in a N M SE of.42. These predicted samples are shown with a dashed gray line. Our prediction for the unknown future of 2 samples is illustrated with solid gray line.

10 5 Conclusions In this work a prediction scheme for fast and accurate time series prediction based on wavelet decomposition and reservoir computing was presented. By using time series decomposition, components of different time scales were obtained which are easier to predict. The obtained trend series and detail coefficients were predicted using reservoir computing. We evaluated our method on a known part of the three time series of the ESTSP 28 competition. In the end, the unknown samples of the three time series were generated. For future work we plan to use a setup using a single reservoir for both decomposition and prediction. This will give us a prediction mechanism which is able to use the interdependence between the obtained components. Therefore we will need to investigate first how many timescales can be used within one reservoir. References [1] V. Assimakopoulos and K. Nikolopoulos. The theta method: a decomposition approach for forecasting. International journal of forecasting, 16:521 53, 2. [2] J. McNames. Innovations in local modeling for time series prediction. PhD thesis, Stanford University, [3] S. Soltani. On the use of the wavelet decomposition for time series prediction. Neurocomputing, 48: , 22. [4] Eric. A. Antonelo, Benjamin Schrauwen, and Jan Van Campenhout. Generative modeling of autonomous robots and their environments using reservoir computing. Neural Processing Letters, 26(3): , 27. [5] Mark D. Skowronski and John G. Harris. 27 Special Issue: Automatic speech recognition using a predictive echo state network classifier. Neural Networks, 2(3): , 27. [6] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout. Isolated word recognition with the liquid state machine: a case study. Information Processing Letters, 95(6): , 25. [7] Herbert Jaeger. The echo state approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology, 21. [8] H. Jaeger. Background information: Jacobs university smart systems seminar wins international financial time series competition, 27. [9] W. Maass, T. Natschläger, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11): , 22. [1] D. Verstraeten, B. Schrauwen, M. D Haene, and D. Stroobandt. An experimental unification of reservoir computing methods. Neural Networks, 2:391 43, 27. [11] F. wyffels, B. Schrauwen, D. Verstraeten, and D. Stroobandt. Band-pass reservoir computing. In Proceedings of the International Joint Conference on Neural Networks, 28. [12] Ingrid Daubechies. Ten Lectures on Wavelets (C B M S - N S F Regional Conference Series in Applied Mathematics). Soc for Industrial & Applied Math, December [13] F. wyffels, B. Schrauwen, and D. Stroobandt. Regularization methods for reservoir computing. In Proceedings of the International Conference on Analog Neural Networks (ICANN), 28. (accepted).

A First Attempt of Reservoir Pruning for Classification Problems

A First Attempt of Reservoir Pruning for Classification Problems A First Attempt of Reservoir Pruning for Classification Problems Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin Katholieke Universiteit Leuven - P.M.A. Celestijnenlaan 300b, 3000 Leuven - Belgium Abstract.

More information

Reservoir Computing with Stochastic Bitstream Neurons

Reservoir Computing with Stochastic Bitstream Neurons Reservoir Computing with Stochastic Bitstream Neurons David Verstraeten, Benjamin Schrauwen and Dirk Stroobandt Department of Electronics and Information Systems (ELIS), Ugent {david.verstraeten, benjamin.schrauwen,

More information

Negatively Correlated Echo State Networks

Negatively Correlated Echo State Networks Negatively Correlated Echo State Networks Ali Rodan and Peter Tiňo School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {a.a.rodan, P.Tino}@cs.bham.ac.uk

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

arxiv: v1 [cs.lg] 2 Feb 2018

arxiv: v1 [cs.lg] 2 Feb 2018 Short-term Memory of Deep RNN Claudio Gallicchio arxiv:1802.00748v1 [cs.lg] 2 Feb 2018 Department of Computer Science, University of Pisa Largo Bruno Pontecorvo 3-56127 Pisa, Italy Abstract. The extension

More information

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld

More information

Short Term Memory and Pattern Matching with Simple Echo State Networks

Short Term Memory and Pattern Matching with Simple Echo State Networks Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,

More information

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication Publishde in Science Magazine, 2004 Siamak Saliminejad Overview Eco State Networks How to build ESNs Chaotic

More information

Reservoir Computing and Echo State Networks

Reservoir Computing and Echo State Networks An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised

More information

Good vibrations: the issue of optimizing dynamical reservoirs

Good vibrations: the issue of optimizing dynamical reservoirs Good vibrations: the issue of optimizing dynamical reservoirs Workshop on ESNs / LSMs, NIPS 2006 Herbert Jaeger International University Bremen (Jacobs University Bremen, as of Spring 2007) The basic idea:

More information

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,

More information

Echo State Networks with Filter Neurons and a Delay&Sum Readout

Echo State Networks with Filter Neurons and a Delay&Sum Readout Echo State Networks with Filter Neurons and a Delay&Sum Readout Georg Holzmann 2,1 (Corresponding Author) http://grh.mur.at grh@mur.at Helmut Hauser 1 helmut.hauser@igi.tugraz.at 1 Institute for Theoretical

More information

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Computing and Informatics, Vol. 30, 2011, 321 334 MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Štefan Babinec, Jiří Pospíchal Department of Mathematics Faculty of Chemical and Food Technology

More information

Effect of number of hidden neurons on learning in large-scale layered neural networks

Effect of number of hidden neurons on learning in large-scale layered neural networks ICROS-SICE International Joint Conference 009 August 18-1, 009, Fukuoka International Congress Center, Japan Effect of on learning in large-scale layered neural networks Katsunari Shibata (Oita Univ.;

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Improving the Separability of a Reservoir Facilitates Learning Transfer

Improving the Separability of a Reservoir Facilitates Learning Transfer Brigham Young University BYU ScholarsArchive All Faculty Publications 2009-06-19 Improving the Separability of a Reservoir Facilitates Learning Transfer David Norton dnorton@byu.edu Dan A. Ventura ventura@cs.byu.edu

More information

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Peter Barančok and Igor Farkaš Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava, Slovakia farkas@fmph.uniba.sk

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Fast and exact simulation methods applied on a broad range of neuron models

Fast and exact simulation methods applied on a broad range of neuron models Fast and exact simulation methods applied on a broad range of neuron models Michiel D Haene michiel.dhaene@ugent.be Benjamin Schrauwen benjamin.schrauwen@ugent.be Ghent University, Electronics and Information

More information

Several ways to solve the MSO problem

Several ways to solve the MSO problem Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition

More information

A quick introduction to reservoir computing

A quick introduction to reservoir computing A quick introduction to reservoir computing Herbert Jaeger Jacobs University Bremen 1 Recurrent neural networks Feedforward and recurrent ANNs A. feedforward B. recurrent Characteristics: Has at least

More information

An overview of reservoir computing: theory, applications and implementations

An overview of reservoir computing: theory, applications and implementations An overview of reservoir computing: theory, applications and implementations Benjamin Schrauwen David Verstraeten Jan Van Campenhout Electronics and Information Systems Department, Ghent University, Belgium

More information

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks Thomas Natschläger Software Competence Center Hagenberg A-4232 Hagenberg, Austria Thomas.Natschlaeger@scch.at

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

The Markov Decision Process Extraction Network

The Markov Decision Process Extraction Network The Markov Decision Process Extraction Network Siegmund Duell 1,2, Alexander Hans 1,3, and Steffen Udluft 1 1- Siemens AG, Corporate Research and Technologies, Learning Systems, Otto-Hahn-Ring 6, D-81739

More information

Refutation of Second Reviewer's Objections

Refutation of Second Reviewer's Objections Re: Submission to Science, "Harnessing nonlinearity: predicting chaotic systems and boosting wireless communication." (Ref: 1091277) Refutation of Second Reviewer's Objections Herbert Jaeger, Dec. 23,

More information

Lecture 5: Recurrent Neural Networks

Lecture 5: Recurrent Neural Networks 1/25 Lecture 5: Recurrent Neural Networks Nima Mohajerin University of Waterloo WAVE Lab nima.mohajerin@uwaterloo.ca July 4, 2017 2/25 Overview 1 Recap 2 RNN Architectures for Learning Long Term Dependencies

More information

Structured reservoir computing with spatiotemporal chaotic attractors

Structured reservoir computing with spatiotemporal chaotic attractors Structured reservoir computing with spatiotemporal chaotic attractors Carlos Lourenço 1,2 1- Faculty of Sciences of the University of Lisbon - Informatics Department Campo Grande, 1749-016 Lisboa - Portugal

More information

Optoelectronic Systems Trained With Backpropagation Through Time

Optoelectronic Systems Trained With Backpropagation Through Time IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 7, JULY 2015 1545 Optoelectronic Systems Trained With Backpropagation Through Time Michiel Hermans, Joni Dambre, and Peter Bienstman

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data Revisiting linear and non-linear methodologies for time series - application to ESTSP 08 competition data Madalina Olteanu Universite Paris 1 - SAMOS CES 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Do we need Experts for Time Series Forecasting?

Do we need Experts for Time Series Forecasting? Do we need Experts for Time Series Forecasting? Christiane Lemke and Bogdan Gabrys Bournemouth University - School of Design, Engineering and Computing Poole House, Talbot Campus, Poole, BH12 5BB - United

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Advanced Methods for Recurrent Neural Networks Design

Advanced Methods for Recurrent Neural Networks Design Universidad Autónoma de Madrid Escuela Politécnica Superior Departamento de Ingeniería Informática Advanced Methods for Recurrent Neural Networks Design Master s thesis presented to apply for the Master

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler + Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Ensembles of Nearest Neighbor Forecasts

Ensembles of Nearest Neighbor Forecasts Ensembles of Nearest Neighbor Forecasts Dragomir Yankov 1, Dennis DeCoste 2, and Eamonn Keogh 1 1 University of California, Riverside CA 92507, USA, {dyankov,eamonn}@cs.ucr.edu, 2 Yahoo! Research, 3333

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

A Hybrid Time-delay Prediction Method for Networked Control System

A Hybrid Time-delay Prediction Method for Networked Control System International Journal of Automation and Computing 11(1), February 2014, 19-24 DOI: 10.1007/s11633-014-0761-1 A Hybrid Time-delay Prediction Method for Networked Control System Zhong-Da Tian Xian-Wen Gao

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps

Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps 1 Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps Ali Rodan 1 and Peter Tiňo 1 1 School of computer Science, The University of Birmingham, Birmingham B15 2TT, United Kingdom, (email:

More information

From Neural Networks To Reservoir Computing...

From Neural Networks To Reservoir Computing... From Neural Networks To Reservoir Computing... An Introduction Naveen Kuppuswamy, PhD Candidate, A.I Lab, University of Zurich 1 Disclaimer : I am only an egg 2 Part I From Classical Computation to ANNs

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

Analyzing the weight dynamics of recurrent learning algorithms

Analyzing the weight dynamics of recurrent learning algorithms Analyzing the weight dynamics of recurrent learning algorithms Ulf D. Schiller and Jochen J. Steil Neuroinformatics Group, Faculty of Technology, Bielefeld University Abstract We provide insights into

More information

Prediction for night-time ventilation in Stanford s Y2E2 building

Prediction for night-time ventilation in Stanford s Y2E2 building Prediction for night-time ventilation in Stanford s Y2E2 building Balthazar Donon Stanford University December 16, 2016 Indoor ventilation I. Introduction In the United States, around 40% of the energy

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

REAL-TIME COMPUTING WITHOUT STABLE

REAL-TIME COMPUTING WITHOUT STABLE REAL-TIME COMPUTING WITHOUT STABLE STATES: A NEW FRAMEWORK FOR NEURAL COMPUTATION BASED ON PERTURBATIONS Wolfgang Maass Thomas Natschlager Henry Markram Presented by Qiong Zhao April 28 th, 2010 OUTLINE

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

A gradient descent rule for spiking neurons emitting multiple spikes

A gradient descent rule for spiking neurons emitting multiple spikes A gradient descent rule for spiking neurons emitting multiple spikes Olaf Booij a, Hieu tat Nguyen a a Intelligent Sensory Information Systems, University of Amsterdam, Faculty of Science, Kruislaan 403,

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

A Model for Real-Time Computation in Generic Neural Microcircuits

A Model for Real-Time Computation in Generic Neural Microcircuits A Model for Real-Time Computation in Generic Neural Microcircuits Wolfgang Maass, Thomas Natschläger Institute for Theoretical Computer Science Technische Universitaet Graz A-81 Graz, Austria maass, tnatschl

More information

Towards a Calculus of Echo State Networks arxiv: v1 [cs.ne] 1 Sep 2014

Towards a Calculus of Echo State Networks arxiv: v1 [cs.ne] 1 Sep 2014 his space is reserved for the Procedia header, do not use it owards a Calculus of Echo State Networks arxiv:49.28v [cs.ne] Sep 24 Alireza Goudarzi and Darko Stefanovic,2 Department of Computer Science,

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Carol Hsin Abstract The objective of this project is to return expected electricity

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

The abstract can be found on the IEEE Xplore web site:

The abstract can be found on the IEEE Xplore web site: This document contains the accepted but unedited version of the following publication: Feedback control by online learning an inverse model. Tim Waegeman, Francis wyffels and Benjamin Schrauwen. IEEE TRANSACTIONS

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone: Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about: Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

RECENTLY there has been an outburst of research activity

RECENTLY there has been an outburst of research activity IEEE TRANSACTIONS ON NEURAL NETWORKS 1 Minimum Complexity Echo State Network Ali Rodan, Student Member, IEEE, and Peter Tiňo Abstract Reservoir computing (RC) refers to a new class of state-space models

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

Lecture 9: Generalization

Lecture 9: Generalization Lecture 9: Generalization Roger Grosse 1 Introduction When we train a machine learning model, we don t just want it to learn to model the training data. We want it to generalize to data it hasn t seen

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Model Fitting COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Parameter estimation Model selection 2 Parameter Estimation 3 Generally

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information