On the use of Long-Short Term Memory neural networks for time series prediction

Similar documents
Long-Short Term Memory

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

arxiv: v3 [cs.lg] 14 Jan 2018

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point?

Long-Short Term Memory and Other Gated RNNs

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I

Reservoir Computing and Echo State Networks

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Recurrent Neural Networks. Jian Tang

Generating Sequences with Recurrent Neural Networks

CSCI 315: Artificial Intelligence through Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Slide credit from Hung-Yi Lee & Richard Socher

Deep Recurrent Neural Networks

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

Lecture 15: Exploding and Vanishing Gradients

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Sequence Modeling with Neural Networks

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent and Recursive Networks

Deep Learning Recurrent Networks 2/28/2018

Introduction to RNNs!

Recurrent Neural Networks

Using a Hopfield Network: A Nuts and Bolts Approach

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Christian Mohr

EE-559 Deep learning LSTM and GRU

CSC321 Lecture 15: Exploding and Vanishing Gradients

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Neural Networks Language Models

Development of a Deep Recurrent Neural Network Controller for Flight Applications

CSC321 Lecture 16: ResNets and Attention

Recurrent Neural Network Training with Preconditioned Stochastic Gradient Descent

Stephen Scott.

Conditional Language modeling with attention

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Information Dynamics Foundations and Applications

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Natural Language Processing and Recurrent Neural Networks

Experiments with a Hybrid-Complex Neural Networks for Long Term Prediction of Electrocardiograms

Synaptic Devices and Neuron Circuits for Neuron-Inspired NanoElectronics

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

An Investigation of the State Formation and Transition Limitations for Prediction Problems in Recurrent Neural Networks

Lecture 5: Recurrent Neural Networks

Lecture 17: Neural Networks and Deep Learning

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Neural Architectures for Image, Language, and Speech Processing

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

CSC321 Lecture 10 Training RNNs

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks

Ways to make neural networks generalize better

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Back-propagation as reinforcement in prediction tasks

ECE521 Lectures 9 Fully Connected Neural Networks

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A.

Based on the original slides of Hung-yi Lee

RECURRENT NETWORKS I. Philipp Krähenbühl

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Biologically Plausible Speech Recognition with LSTM Neural Nets

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Learning Long-Term Dependencies with Gradient Descent is Difficult

Machine Learning (CS 567) Lecture 2

arxiv: v1 [cs.ne] 14 Nov 2012

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Faster Training of Very Deep Networks Via p-norm Gates

arxiv: v1 [cs.cl] 21 May 2017

INAOE. Dra. Ma. del Pilar Gómez Gil. Tutorial An Introduction to the Use of Artificial Neural Networks. Part 4: Examples using Matlab

STA 4273H: Statistical Machine Learning

A New Concept using LSTM Neural Networks for Dynamic System Identification

Chinese Character Handwriting Generation in TensorFlow

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

arxiv: v1 [cs.ne] 11 May 2017

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Tracking the World State with Recurrent Entity Networks

Recurrent Neural Net Learning and Vanishing Gradient. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(2):107{116, 1998

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

Sequence Transduction with Recurrent Neural Networks

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

Machine Learning for OR & FE

Neural networks. Chapter 20. Chapter 20 1

Spike-based Long Short-Term Memory networks

Neural Networks Introduction

Task-Oriented Dialogue System (Young, 2000)

Transcription:

On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J. Carlos Galán Hernandez, Rigoberto Fonseca Delgado, Manuel Ramírez-Cortés, Vicente Alarcón-Aquino PSIC group This document is available at ccc.inaoep.mx/~pgomez/conferences/pggisci14.pdf 1

Outline Prediction Recurrent neural networks Temporal Classification The LSTM network Applications of LSTM Results modeling sine function so far Conclusions 2

About INAOE www.inaoep.mx ccc.inaoep.mx This is a public research center, with the aim of creating and spreading knowledge related to the areas of astrophysics, optics, electronics, computer science and similar fields. INAOE has a mission to contribute for the development of Mexico and the whole humanity, to solve real problems and to prepare advanced professionals in such areas. 3

Prediction 4

Why is it important to forecast a time series? Because most of business and projects require some planning, which most of the time is performed with an uncertainty knowledge of future conditions, Because it is mandatory to measure the possible risks around future events, Because most off the time it is required to calculate some metric indices, which may be related to economy, politics, technology etc. 5

Time series A time series is a signal that is measured in regular time steps. The estimation of future values in a time series is commonly done using past values of the same time series. Notice that the time step may of a series may be of any length, for example: seconds, hours, days, years etc. This will bring on very different looks of the time series 6

Example: a time series measured each hour 7

Few days of the same time series 8

A few months of the same time series. 9

Four hundred years of measuring sunspots 10

Examples of Prediction Tools Linear models: ARMA ARIMA Kalman filters Non-linear models: Neural networks Support vector machines Fuzzy systems Bayesian estimators 11

Prediction and Chaos A chaotic system presents some special characteristics (Kaplan & Cohen, 90): A trajectory created by a chaotic system is non-periodic and deterministic, It is highly dependent on initial conditions, It is bounded by strange attractors (An attractor is a point or set of point where a trajectory is conducted, when the transient of such system ends). 12

Chaotic time series prediction To predict the behavior of long-trajectories created by a chaotic system is mathematically impossible. Even though, it is required for many applications to estimate in a reasonable way, the possible behavior of chaotic time series. 13

What is forecasting? Given a time series, forecasting refers to the process of calculating one of several values ahead, using just the information given by the past values of the time series. If no external values are used to calculate a time series, then it is supposed that all required information is located into the time series itself 14

Types of time series forecasting One-step prediction Several-steps prediction or long-time prediction 15

Recursive prediction One-step prediction is calculated using one or several measured past values. To calculate several steps ahead, a predictor may use measured past values. However, if several future values are required to be calculated, then recursive prediction is used. Recursive prediction eventually uses values already predicted, instead of measured past values. This produces an accumulation of errors, which may grow very fast. In highly non-linear systems, this accumulation of errors may be an important problem 16

Recurrent Neural Networks 17

Types of Recurrent Neural Networks According to Kremer (2001),recurrent neural networks can be categorized into two classes: Networks with a one-time input signal designed to enter an stable state. Networks with time-varying inputs, designed to provide outputs in different points in time, known as dynamic neural networks. These networks can be applied to the problem of identifying a subset of a language (sequence) in a string of discrete values 18

Sequence labeling It encompasses all tasks where sequences of data are transcribed with sequences of discrete labels (Graves 2012). Examples: Speech Handwritten recognition Protein secondary structure prediction In these kind of problems, individual data points cannot be assumed to be independent. Both the inputs and the labels form strongly correlated sequences. 19

Example Identifying the sequence 1010 in a string 1 0 0 0 1 0 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 20

RNN and Sequence Labeling RNN are an attractive choice for sequence labeling because (Graves, 2012): They are flexible in their use of context information, due to the fact that they can learn what to store and what to forget). They accept many types and representations of data They can recognize sequential patters in the presence of sequential distortions. The main drawback of RNN is that it is very difficult to get them to store information for long periods of time. 21

The LSTM network 22

Long short-term memory (LSTM) network They were proposed by (Hochreiter and Schmidhuber, 1997) Long-Short Time Memory architecture consists of a set of recurrently connected subnets. The objective of the LSTM architecture is to overcome the problem known as vanishing error problem. The vanishing error problem refers to how the influence of past inputs decays quickly over time. LSTM networks aims to solve this problem using memory cells. 23

Recurrent Neural Network with a hidden layer based on LSTM. 24

Basic structure of a LSTM cell 25

Calculation of LSTM output Because of the several recurrent connections, the update of the state of the network must be done in a particular way: 1. Input gate activation 2. Forget gate activation 3. Cell input and cell state 4. Output gate activation 5. Cell output 26

RNN with one hidden layer 27

Training algorithm The recommended training algorithm for LSTM is the un-truncated Back Propagation Through Time (BPTT) [6]. BPTT requires calculating the gradients of each cell and gate and the error associated with each cell output. This is known as the Backward Pass. The equation for calculating the error of the cell output is given in [2] and is defined as 28

Applications of LSTM 29

Applications of LSTM (1/2) Prediction of Reber s Grammar. The aim of this task was to predict the following character after a given sequence. Prediction of the next symbol of a noisy free and noisy sequences. LSTM is able to learn each sequence successfully Discrimination a sequence from an input with the sequence mixed with a noisy sequence. Adding a given sequence of real values between -1 and 1. The input size of the network was 1 as well as its output. 30

Applications of LSTM (2/2) Multiply all inputs on a given sequence and outputs the result. Learn temporal relations of particular inputs that are far apart on a given sequence. The LSTM networks can learn to predict which symbol is next to a given sequence of inputs that are related on time. Handwriting Recognition. LSTM showed good performance with Arabic handwriting recognition. Protein Localization of specific sub-cellular types known as Eukaryotic proteins on a given sequence. 31

Results modeling sine function so far 32

PyBrain PyBrain is a Python Language library for the use of LSTM and other neural networks implementations It requires a CPython implementation with a few extra modules. To have a functional PyBrain environment on any platform is recommended to install the Anaconda distribution of CPython (http://docs.continuum.io/anaconda/index.html). More information: http://pybrain.org 33

Results modeling sine function so far (1/3) 34

Results modeling sine function so far (2/3) 35

Results modeling sine function so far (3/3) 36

Conclusions LSTM is a powerful tool that has showed be useful for sequence labeling and other time-related identifications LSTM is a complex RNN to program and to train for an specific task The use of LSTM for time series prediction may be too complicated to work in real problems, The use of Pbrain for LSTM is not straightforward. More experimentations is required, however, results so far show that other recurrent neural networks are more efficient that LSTM on learning a sine function, as the HCNN or the HWRN, or even plain RNN. 37

References Hochreiter, s., Schmidhuber, J. Long Sort-term Memory. Neural Computation, (1997), 9 (8), 1735-1780. Kremer, Stefan C. "Lessons from language learning." Recurrent Neural Networks, Design and Appl (2001): 179-204. Galán-Hernandez, C. LSTM- Technical report project CONACYT-CB-2010-155250. Oct.05, 2014. INAOE. México Gaves, Alex. Supervised sequence labelling with recurrent neural networks. Springer-Verlag, Berlin Heidelberg, 2012 38

Thank you! Pilar Gómez-Gil pgomez@acm.org ccc.inaoep.mx/~pgomez 39